CSC43042EP - TD7: Support Vector Machines
Before you begin
Download and unzip the archive CSC43042EP-td7-1-handin.zip.
Overview and Instructions
TD Objectives:
- This TD corresponds to Lecture 7: Linear Models for Classification.
 - We will focus on Support Vector Machines (SVM).
 - You will implement Soft Margin Kernel SVMs by solving the dual
problem using quadratic programming (via the 
qpsolverspackage). 
Datasets: The CSV files (e.g.,
hilly-train.csv,ebike-train.csv, etc.) are located in thecsv/directory. These datasets include:- Scooter data
 - Ebike data
 - Hilly data
 
Introduction and Objectives
This TD corresponds to Lecture 7: Linear Models for Classification.
We will implement a Soft Margin Kernel SVM by
solving its dual problem using quadratic
programming (via the qpsolvers package).
- You will code kernels, an SVM trainer and predictor.
 - You will test the SVM on multiple datasets.
 - You will classify spam emails using your own SVM.
 
Environment Setup
You will need Python with the following packages:
pip install --user numpy scipy scikit-learn qpsolvers[cvxopt]Please not that we are using “cvxopt” as the quadratic solver, and it will influence in the results.
1. Quiz
Complete Questions 1-5 in the
quiz on Moodle. You should use the provided Python script
svm_with_sklearn.py in the quiz/ folder which
can read from standard CSV files. It uses scikit-learn (with SVM based).
Typing python svm_with_sklearn.py should give you an idea
of how to use this script:
python svm_with_sklearn.py <dataset_train> <dataset_test> [<id_label> <rest_of_cols> ['<char_sep>']]
Supervised Classification
In supervised learning (classification): - Input: \(n\) samples \((x_i, y_i) \in \mathcal{X} \times \mathcal{Y}\). - Goal: Find a function \(f: \mathcal{X} \to \mathcal{Y}\) minimizing the prediction error. - Loss: Use the 0-1 loss \(L(y, \hat{y}) = \mathbb{1}_{y \neq \hat{y}}\). - Risk minimization: Minimize expected risk \(\mathbb{E}[L(Y, f(X))]\).
For SVMs, we seek a predictor separating the data with a maximum margin hyperplane.
Theory: Linear Models and SVMs
Linear Classifiers
- Decision boundary is linear: a hyperplane \(\boldsymbol{x}^T \beta - \beta_0 = 0\).
 - For binary classification: labels \(y_i \in \{+1, -1\}\).
 
Support Vector Machines
- Linearly separable case: maximize the margin while classifying all points correctly.
 - Non-separable case: allow some misclassification with slack variables \(\xi_i\).
 
Soft-margin SVM optimization:
\[ \min_{\beta, \beta_0} \quad \frac{1}{2} \|\beta\|^2 + C \sum_{i=1}^n \xi_i \] subject to:
\[ y_i (\boldsymbol{x}_i^T \beta - \beta_0) \geq 1 - \xi_i, \quad \xi_i \geq 0 \]
Dual problem (which we will solve):
\[ \max_{\alpha} \sum_{i=1}^n \alpha_i - \frac{1}{2} \sum_{i,j=1}^n \alpha_i \alpha_j y_i y_j K(\boldsymbol{x}_i, \boldsymbol{x}_j) \]
subject to:
\[ 0 \leq \alpha_i \leq C, \quad \sum_{i=1}^n \alpha_i y_i = 0 \]
2. Exercises TD
2.1 Implementing Kernels
Theory reminder:
Kernels implicitly map data to a higher-dimensional space.
Common kernels:
- Linear: \(K(\boldsymbol{x}, \boldsymbol{y}) = \boldsymbol{x}^T \boldsymbol{y}\)
 - Polynomial: \(K(\boldsymbol{x}, \boldsymbol{y}) = (\gamma \boldsymbol{x}^T \boldsymbol{y} + \text{coef0})^{\text{degree}}\)
 - RBF: \(K(\boldsymbol{x}, \boldsymbol{y}) = \exp(-\gamma \|\boldsymbol{x} - \boldsymbol{y}\|^2)\)
 - Sigmoid: \(K(\boldsymbol{x}, \boldsymbol{y}) = \tanh(\gamma \boldsymbol{x}^T \boldsymbol{y} + \text{coef0})\)
 - Rational Quadratic: \(K(\boldsymbol{x}, \boldsymbol{y}) = \frac{\text{coef0}}{\|\boldsymbol{x} - \boldsymbol{y}\|^2 + \text{coef0}}\)
 
Task: Implement the different kernels in
kernel.py.
Test with:
python grader.py 1Upload kernel.py.
2.2 SVM Constructor
Theory reminder:
- Extract features and labels.
 - Labels must be transformed to \(\pm1\) (: use +1 for label 1, and -1 otherwise.).
 - Precompute the kernel matrix \(K_{ij} = K(\boldsymbol{x}_i, \boldsymbol{x}_j)\).
 
Task: Complete the constructor of SVM
class in svm.py.
Test with:
python grader.py 2Upload svm.py.
2.3 Training the SVM
Theory reminder:
- The bias term is computed using the support vectors.
 - Formula:
 
\[ \beta_0 = \text{average}_{i \in S} \left( y_i - \sum_{j=1}^n \alpha_j y_j K(\boldsymbol{x}_j, \boldsymbol{x}_i) \right) \]
Numerical Stability Reminder
In practice, when building the \(P\)
matrix defined by \(P_{ij} = y_i y_j K(x_i,
x_j)\), numerical issues can arise if \(P\) is not perfectly symmetric or not
strictly positive definite.
To ensure stability for the quadratic solver, it is recommended to:
Symmetrize \(P\) by doing:
\[ P \leftarrow \frac{1}{2}(P + P^T) \]
Slightly regularize \(P\) by adding a small value to the diagonal:
\[ P \leftarrow P + \epsilon I \]
where \(\epsilon\) is a small positive value like \(10^{-8}\).
Thus, before solving the QP, you can do in your code:
P = (P + P.T) / 2
P += np.eye(P.shape[0]) * 1e-8This helps avoid solver errors or inaccurate solutions.
Theory reminder: - Solve the dual quadratic
programming (QP) problem. - Use qpsolvers to solve it.
Task:
- Implement 
compute_beta_0insvm.py. - Implement the 
trainmethod insvm.py. 
Test with:
python grader.py 3Upload svm.py.
2.4 Predicting with the SVM
Theory reminder: - Prediction for a new point:
\[ \hat{f}(\boldsymbol{x}) = \text{sign}\left( \sum_{i=1}^n \alpha_i y_i K(\boldsymbol{x}_i, \boldsymbol{x}) + \beta_0 \right) \]
Task: Implement predict in
svm.py.
Test with:
python grader.py 4Upload svm.py.
2.5 Testing the SVM
Theory reminder:
- Evaluate model on a test set.
 - Build a confusion matrix.
 
Task: Implement test method in
svm.py using ConfusionMatrix.
Test with:
python grader.py 5Upload svm.py.
2.6 Mail Classification
Theory reminder:
Evaluation Metrics:
- Recall = \(\frac{TP}{TP+FN}\)
 - False Alarm Rate = \(\frac{FP}{FP+TN}\)
 - Precision = \(\frac{TP}{TP+FP}\)
 - F-Score = \(\frac{2 \times \text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}\)
 
Task: Use your SVM to classify emails into spam and ham. (Remember from TD5 the col_class is the first one). Evaluate the metrics.
Use the datasets:
csv/mail_train_server.csvcsv/mail_test_server.csv
A real dataset is given and you can try for fun:
csv/mail_train.csvcsv/mail_test.csv
Complete mail_selector.py.
Test with:
python grader.py 6Upload mail_selector.py.