Concordia University

HeroSvm 2.1


Support Vector Machine (SVM)  represents the state-of-the-art classification technique. However, training SVM on a large training set becomes a bottleneck. HeroSvm is a high-performance library for training SVM for classification to solve this problem. It  has been implemented based on our proposed method [1][5][22].  In order to facilitate the software portability and maintenance, an object-oriented method has been applied to design the package. Error handling is supported and HeroSvm is exception-safe. HeroSvm is written in C++. In the current version, a dynamic link library in windows or a shared library in linux is provided to train SVM on a large-scale learning problem efficiently for research purpose in PC platform. We expect that HeroSVM can facilitate the training of support vector machine and solve some real-world problems in various engineering fields.

[What is new for HeroSvm 2.1]
  •  40 times faster than HeroSvm 1.0
  •  Train SVM with thousands of classes
  •  Support large training set with several million samples
  •  Provide a complete demo program, which includes SVM training and testing routines
[What are limitations for HeroSvm 2.1
  • Only support Pentium 4 and above. HeroSvm 1.0 does not have this limitation.
  • Only support RBF, Polynomial and linear kernels. HeroSvm 1.0 supports a user-specified kernel.

  • Much faster than the previous training method, especially on a large training set with many classes
  • Simple function interface
  • Detailed user guide
  • Error handling is supported and exception-safe.
  • No limitation on the size of  the training set and dimension of feature vectors.
  • Efficient algorithm to speed up the evaluation of SVM in the testing phase
  • Support windows and Linux
  • Compared to HeroSvm2.0, the version 2.1 provides a very efficient algorithm to speed up the evaluation of SVM for multi-class classification in the testing phase [29]. 
  • Support Linux
  • PC,  RAM (at least 256 Megabytes)
[Thread limitation]
  • Only single thread is supported.
[Generalization Performance]

Feature-based suppport vector machine is applied to three well-known handwritten digit databases and has obtained the-state-of-the-art generalization performance on all four public handwritten character databases, given as follows


Database Training set Testing set
CENPARMI (digit) 4000 2000
USPS (digit) 7291 2007
MNIST(digit) 60,000 10,000
NIST database  (lowercase characters) 24,092 10,688
Error rates of different methods on CENPARMI databases
Methods Raw Error rate without rejection (%)
Cho's [2] 3.95
S. W. Lee [3]  2.20
Local Learning Framework [4] 1.90
Virtual SVM [5] 1.30
SVC-rbf [6]  1.10
Error rates of different methods on USPS databases
Classifiers Training set Raw error rate without rejection (%)
Nearest-neighbor [7] USPS+ 5.9
LeNet1[8] USPS+ 5.0
Optimal margin Classifier [9] USPS 4.6
SVM [10] USPS 4.0
Local Learning [11] USPS+ 3.3
Virtual SVM, local kernel [12] USPS 3.0
Boosted Neural Nets [13] USPS+ 2.6
Tangent Distance [7] USPS+ 2.6
Feature-based virtual SVM [5]  USPS 2.34
Combination of tangent vector and local representation[28] USPS 2.0
Virtual SVM [14] USPS  3.2
Human error rate [15] --- 2.5
Human error rate [16]  --- 1.51
*Note that training set on USPS+ contains some machine-printed patterns.
Error rates of different methods on MNIST database
Methods Raw Error rate without rejection (%)
3-NN [17]  2.4
LeNet1 [17]  1.7
400-300-10 network [17]  1.6
Polynomial SVM [18]  1.4
Tangent Distance [17]  1.1
LeNet4 [17]  1.1
LeNet5 [17]  0.9
Virtual polynomial SVM [19] 0.8
Boosted LeNet4 [17] 0.7
Virtual SVM with 2pix VSVx [20] 0.56
SVM on vision-based feature [23] 0.59
Feature-based SVM [5] 0.60
PCA-based SVM [5]  0.60
VSVMa [5] 0.44
SVC-rbf [6] 0.42
VSVMb  [5][21] 0.38
Error rates of different methods on NIST lowercase database
Methods Raw Error rate without rejection (%)
GLVQ [4] 10.17
MQDF [4] 10.37
160-100-27 MLP [4] 8.40
Local Learning Framework [4] 7.66
Feature-based SVM [5] 7.44
VSVM  6.70
Misclassified Patterns on test set of CENPARMI
Misclassified patterns on test set of USPS

Fig. 1 The 47 errors ( 2.34% error rate) for the USPS test set. The number in the upper-left corner of each image indicates the test sample number. The first digit on the bottom specifies the true label. The second digit on the bottom is the recognized digit label. From the above figure, it can be seen that there are obviously four errors in the original labelling (234, 971, 994, 1978).

Misclassified patterns on the test set of MNIST

Fig. 2 The 38 errors ( 0.38% error rate) for MNIST test set. The number in the upper-left corner of each image indicates the test example number. The first digit on the bottom specifies the true label. The second digit on the bottom is the recognized digit label.

Some misclassified patterns on NIST lowercase character database
[Rejection Performance]

In some real world application such as check recognition, the reliability rate is more important than the raw error rate. So it is necessary to evaluate SVM's rejection performance. The reliability is defined by

                        Reliability = Recognition rate /(100% - Rejection rate )

Patterns are rejected when the difference between the outputs of the top two classes is smaller than a predefined threshold beta . Experiments aimed at evaluating rejection performance were conducted on MNIST database. The results is shown in the following Table:

                         Rejection Performances for VSVMa under different beta 

beta Recognition rate Substitution rate Rejection rate Reliability
1.1 96.69% 0.04% 3.27% 99.96%
1.4 94.41% 0.01% 5.58% 99.99%
1.7 91.51% 0 8.49% 100%
--- 99.56% 0.44% 0 99.56%
[Training speed for HeroSvm 2.0]

Two large handwritten digit database and ETL9B and Hanwang (handwritten Chinese character database) are used to test the performance of HeroSvm2.0. Some important performance measures are listed in the following Table.

Testing platform:

P4 1.7Ghz with 256K L2 (second-level ) cache, 1.5 Gegabytes SDRAM (Single Data Rate Memory). 
Operating system: Windows 2000 Professional 

Database Classes Training set Testing set Feature dimension Kernel Total training time (Proposed) Training time for Keerthi et al.'s SMO  Training time for modified Svmlight-A[24](v5.00) 
Training time for modified svm-light -B(v5.00)
Training time for Modified libsvm-A[25](v2.4) 
Training time for Modified libsvm-B(v2.4) 
Testing error rate Speed-up factor for Keerthi et al's SMO Speed-up factor for modified Svmlight -A Speed-up factor for modified svmlight -B Speed-up factor for Modified Libsvm-A Speed-up factor for Modified libsvm-B
Hanwang(handwritten digit) 10 1321718 300000 120 RBF 0.75 hours ---- ------- 7.63 hours --------- 16.21 hours 0.5% -------- --------- 10.17 --------- 21.61
MNIST (A) 10 60000 10000 576 RBF 0.064 hours 10.4 hours 1.11 hours 0.37 hours 3.04hours 0.39 hours  0.6% 161.38 17.22 5.74 47.17 6.09
MNIST(B) 10 60000 10000 784 Polynomial 0.075 hours 21.44 hours 2.03  hours 0.54 hours 4.52 hours 0.54 hours 1.1% 285.87 27.07 7.2  60.27 7.20
ETL9B 3036 485760 121440 392 RBF   ------ -------   ---------   1.1% -------- ---------   ---------

Hanwang (handwritten Chinese) 3755 2144489 542122 392 RBF 19.28 hours ------- ------- 644 hours ---------   3.03% --------- --------- 33.40 ---------

RingNorm 2 100,000,000 3,000,000 20 RBF 15.89 hours           1.24%          

Note: Speed-up factor = training time for one algorithm /training time for the proposed algorithm. 
Note that Keerthi et al's improved SMO was implemented by myself by following the pseudo code in the technical report [26].

Note: Modified libsvm-B and svmlight-B use one-against-one method for multi-classes; other methods use one-against-others.

Pre-conditions of performance comparion:

  • Dense feature vector which is stored in a contiguous memory address.
  • Almost the same size of kernel cache.
  • The fixed stopping tolerance.
  • The fixed experimental platform.
  • The fixed kernel parameter and C.

Post condition:

  • After training, the error rate on the testing set must be very close.

Comparisons of IO time and memory usage 

Database HeroSvm2 IO (read/write file) time HeroSvm2 Peak (Megabytes) of memory usage modified SVMlight  (v5.00) -A Memory Usage (Megabytes) Modified Svmlight(v5.00)-B Memory Usage (Megabytes) Modified LIBSVM-A (v2.4) Memory Usage (Megabytes) Modified Libsvm-B (v2.4) Memory Usage(Megabytes)
MNIST (A)  RBF 5.02 seconds 384  425  430 460M 180M
MNIST (B)  Poly 7.98 seconds 427 455 460 485 210M
Hanwang handwritten digit database 14.57 seconds 579  ---------------- 840 -------------------- 810 M
**Note that training time of Svmlight and LIBSVM does not include IO time.
 [Testing speed for HeroSvm 2.1]

The detailed algorithms, which solve the bottle-neck problem of low classification speed in the testing phase, are referred to [21][29].  The proposed algorithm dramatically boosts the classification speed of SVM without sacrificing the accuracy of the original SVM in the testing phase, especially for multi-class classification. The experiments were conducted on the same platform as above.

Database Kernel Dimension of feature vector Total number of merged Support vectors Classification speed of the original SVM Classification speed of the approximated SVM Speed-up factor
MNIST RBF 576 9739 149patterns/second 16393 patterns/second 110
Hanwang digit database RBF 120 41417 62patterns/second 10895 patterns/second 175
[Software request]

If you like to use this software in research or commercial applications, please contact Nicola Nobile (, technical manager at CENPARMI.

[Package for HeroSvm 2.1]

[Package for HeroSvm 1.0]
[Benchmark dataset for training SVM]
  • MNIST dataset for RBF kernel. Training vector consists of discriminative features ( (about 100M)
  • MNIST dataset for Polynomial kernel. Training vector consists of pixel values. ( (about 62M).
  • ETL9B dataset for RBF kernel. Training vector consists of discriminative features. (soon)

Note: MNIST handwritten digit database is originally built from NIST dataset by LeCun research group at AT&T lab.  If you use this dataset in your research, please cite LeCun et al.'s paper [27] to acknowledge their contribution.

Back to top

© Concordia University