CSC43042EP / TD7 (2024)

This assignment has been closed on May 08, 2025.

You must be authenticated to submit your files

CSC43042EP - TD7: Support Vector Machines

CSC43042EP - TD7: Support Vector Machines

Before you begin

Download and unzip the archive CSC43042EP-td7-1-handin.zip.

Overview and Instructions

TD Objectives:
- This TD corresponds to Lecture 7: Linear Models for Classification.
- We will focus on Support Vector Machines (SVM).
- You will implement Soft Margin Kernel SVMs by solving the dual problem using quadratic programming (via the qpsolvers package).
Datasets: The CSV files (e.g., hilly-train.csv, ebike-train.csv, etc.) are located in the csv/ directory. These datasets include:
- Scooter data
- Ebike data
- Hilly data

Introduction and Objectives

This TD corresponds to Lecture 7: Linear Models for Classification.

We will implement a Soft Margin Kernel SVM by solving its dual problem using quadratic programming (via the qpsolvers package).

You will code kernels, an SVM trainer and predictor.
You will test the SVM on multiple datasets.
You will classify spam emails using your own SVM.

Environment Setup

You will need Python with the following packages:

pip install --user numpy scipy scikit-learn qpsolvers[cvxopt]

Please not that we are using “cvxopt” as the quadratic solver, and it will influence in the results.

1. Quiz

Complete Questions 1-5 in the quiz on Moodle. You should use the provided Python script svm_with_sklearn.py in the quiz/ folder which can read from standard CSV files. It uses scikit-learn (with SVM based). Typing python svm_with_sklearn.py should give you an idea of how to use this script:

python svm_with_sklearn.py <dataset_train> <dataset_test> [<id_label> <rest_of_cols> ['<char_sep>']]

Supervised Classification

In supervised learning (classification): - Input: \(n\) samples \((x_i, y_i) \in \mathcal{X} \times \mathcal{Y}\). - Goal: Find a function \(f: \mathcal{X} \to \mathcal{Y}\) minimizing the prediction error. - Loss: Use the 0-1 loss \(L(y, \hat{y}) = \mathbb{1}_{y \neq \hat{y}}\). - Risk minimization: Minimize expected risk \(\mathbb{E}[L(Y, f(X))]\).

For SVMs, we seek a predictor separating the data with a maximum margin hyperplane.

Theory: Linear Models and SVMs

Linear Classifiers

Decision boundary is linear: a hyperplane \(\boldsymbol{x}^T \beta - \beta_0 = 0\).
For binary classification: labels \(y_i \in \{+1, -1\}\).

Support Vector Machines

Linearly separable case: maximize the margin while classifying all points correctly.
Non-separable case: allow some misclassification with slack variables \(\xi_i\).

Soft-margin SVM optimization:

\[ \min_{\beta, \beta_0} \quad \frac{1}{2} \|\beta\|^2 + C \sum_{i=1}^n \xi_i \] subject to:

\[ y_i (\boldsymbol{x}_i^T \beta - \beta_0) \geq 1 - \xi_i, \quad \xi_i \geq 0 \]

Dual problem (which we will solve):

\[ \max_{\alpha} \sum_{i=1}^n \alpha_i - \frac{1}{2} \sum_{i,j=1}^n \alpha_i \alpha_j y_i y_j K(\boldsymbol{x}_i, \boldsymbol{x}_j) \]

subject to:

\[ 0 \leq \alpha_i \leq C, \quad \sum_{i=1}^n \alpha_i y_i = 0 \]

2. Exercises TD

2.1 Implementing Kernels

Theory reminder:

Kernels implicitly map data to a higher-dimensional space.
Common kernels:
- Linear: \(K(\boldsymbol{x}, \boldsymbol{y}) = \boldsymbol{x}^T \boldsymbol{y}\)
- Polynomial: \(K(\boldsymbol{x}, \boldsymbol{y}) = (\gamma \boldsymbol{x}^T \boldsymbol{y} + \text{coef0})^{\text{degree}}\)
- RBF: \(K(\boldsymbol{x}, \boldsymbol{y}) = \exp(-\gamma \|\boldsymbol{x} - \boldsymbol{y}\|^2)\)
- Sigmoid: \(K(\boldsymbol{x}, \boldsymbol{y}) = \tanh(\gamma \boldsymbol{x}^T \boldsymbol{y} + \text{coef0})\)
- Rational Quadratic: \(K(\boldsymbol{x}, \boldsymbol{y}) = \frac{\text{coef0}}{\|\boldsymbol{x} - \boldsymbol{y}\|^2 + \text{coef0}}\)

Task: Implement the different kernels in kernel.py.

Test with:

python grader.py 1

Upload kernel.py.

Upload form is only available when connected

2.2 SVM Constructor

Theory reminder:

Extract features and labels.
Labels must be transformed to \(\pm1\) (: use +1 for label 1, and -1 otherwise.).
Precompute the kernel matrix \(K_{ij} = K(\boldsymbol{x}_i, \boldsymbol{x}_j)\).

Task: Complete the constructor of SVM class in svm.py.

Test with:

python grader.py 2

Upload svm.py.

Upload form is only available when connected

2.3 Training the SVM

Theory reminder:

The bias term is computed using the support vectors.
Formula:

\[ \beta_0 = \text{average}_{i \in S} \left( y_i - \sum_{j=1}^n \alpha_j y_j K(\boldsymbol{x}_j, \boldsymbol{x}_i) \right) \]

Numerical Stability Reminder

In practice, when building the \(P\) matrix defined by \(P_{ij} = y_i y_j K(x_i, x_j)\), numerical issues can arise if \(P\) is not perfectly symmetric or not strictly positive definite.
To ensure stability for the quadratic solver, it is recommended to:

Symmetrize \(P\) by doing:

\[ P \leftarrow \frac{1}{2}(P + P^T) \]
Slightly regularize \(P\) by adding a small value to the diagonal:

\[ P \leftarrow P + \epsilon I \]

where \(\epsilon\) is a small positive value like \(10^{-8}\).

Thus, before solving the QP, you can do in your code:

P = (P + P.T) / 2
P += np.eye(P.shape[0]) * 1e-8

This helps avoid solver errors or inaccurate solutions.

Theory reminder: - Solve the dual quadratic programming (QP) problem. - Use qpsolvers to solve it.

Task:

Implement compute_beta_0 in svm.py.
Implement the train method in svm.py.

Test with:

python grader.py 3

Upload svm.py.

Upload form is only available when connected

2.4 Predicting with the SVM

Theory reminder: - Prediction for a new point:

\[ \hat{f}(\boldsymbol{x}) = \text{sign}\left( \sum_{i=1}^n \alpha_i y_i K(\boldsymbol{x}_i, \boldsymbol{x}) + \beta_0 \right) \]

Task: Implement predict in svm.py.

Test with:

python grader.py 4

Upload svm.py.

Upload form is only available when connected

2.5 Testing the SVM

Theory reminder:

Evaluate model on a test set.
Build a confusion matrix.

Task: Implement test method in svm.py using ConfusionMatrix.

Test with:

python grader.py 5

Upload svm.py.

Upload form is only available when connected

2.6 Mail Classification

Theory reminder:

Evaluation Metrics:
- Recall = \(\frac{TP}{TP+FN}\)
- False Alarm Rate = \(\frac{FP}{FP+TN}\)
- Precision = \(\frac{TP}{TP+FP}\)
- F-Score = \(\frac{2 \times \text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}\)

Task: Use your SVM to classify emails into spam and ham. (Remember from TD5 the col_class is the first one). Evaluate the metrics.

Use the datasets:

csv/mail_train_server.csv
csv/mail_test_server.csv

A real dataset is given and you can try for fun:

csv/mail_train.csv
csv/mail_test.csv

Complete mail_selector.py.

Test with:

python grader.py 6

Upload mail_selector.py.

Upload form is only available when connected

References

A Practical Guide to Support Vector Classification (Chih-Wei Hsu et al.)