SVM is a popular statistical learning method for either classification or regression. For classification, a linear classifier or a hyperplane, such as with w as weight vector and b as the bias, would label data into various categories. The geometric margin is defined as . For SVM, the maximum margin approach is equivalent to minimize . However, with the introduction of regularization to inhibit complexity, the optimization has to upgrade to minimize , where C is the regularization parameter. Eventually the solution for SVM turns out to be a quadratic optimization problem over w and ξ with the constraints of .
Since the
sashelp.class
dataset is extremely simple, I attempt to use its variables age
and weight
to predict sex
, which is just for demonstration purpose. According to the plot, the data points are linearly non-separable. The kernel methods have to be applied to map the input data to a high-dimensional space so that they are linearly separable. To harness SVM in SAS, three procedures are commonly used under the license of SAS EMiner. For example, PROC DMDB
is used to recode the categorical data and set up the working catalog, PROC SVM
is used to build the model, and PROC SVMSCORE
is applied to implement the model.proc sgplot data=sashelp.class;
scatter x = weight y = age / group = sex;
run;
proc dmdb batch data=sashelp.class dmdbcat=_cat out=_class;
var weight age;
class sex;
run;
Hard margin
If we let C be infinitely large, then all constraints will be executed. Therefore, the margin is narrowed down.
proc svm data=_class dmdbcat=_cat c=1e11 kernel=linear out= _1;
title 'hard margin';
ods output restab = restab1;
var weight age ;
target sex;
run;
The accuracy is 63.16%. Overall, the result is below.
Name | Value |
---|---|
Regularization Parameter C | 100000000000 |
Classification Error (Training) | 7.000000 |
Geometric Margin | 1.624447E-10 |
Number of Support Vectors | 17 |
Estimated VC Dim of Classifier | 3.4494098E24 |
Number of Kernel Calls | 74 |
Soft margin
On the contrary, the small C allows constraints to be easily ignored, which leads to the desired large margin.
proc svm data=_class dmdbcat=_cat kernel=linear out= _2;
title 'soft margin';
ods output restab = restab2;
var weight age;
target sex;
run;
The accuracy or miscalculation rate keeps the same, since the data is so small. In
PROC SVM
, without the specification, the C value is solved to be almost near zero, and the margin are huge.Name | Value |
---|---|
Regularization Parameter C | 0.000098161 |
Classification Error (Training) | 7.000000 |
Geometric Margin | 158.553426 |
Number of Support Vectors | 18 |
Estimated VC Dim of Classifier | 3.850370 |
Number of Kernel Calls | 76 |
Conclusion
- For the SVM procedure, except the training data, adding a validation data for the
testdata
option at thePROC
statement could effectivley increase the C parameter and decrease the possibility of overfitting. - There are a few advantages for SVM over other data mining methods. First SVM is suitable for high dimension data, and more importantly the complexity can be easily controlled by the adjustment of the regularization parameter C.