This assignment has been closed on May 08, 2025.
You must be authenticated to submit your files

CSC43042EP - TD7: Support Vector Machines

Before you begin

Download and unzip the archive CSC43042EP-td7-1-handin.zip.

Overview and Instructions

Introduction and Objectives

This TD corresponds to Lecture 7: Linear Models for Classification.

We will implement a Soft Margin Kernel SVM by solving its dual problem using quadratic programming (via the qpsolvers package).

Environment Setup

You will need Python with the following packages:

pip install --user numpy scipy scikit-learn qpsolvers[cvxopt]

Please not that we are using “cvxopt” as the quadratic solver, and it will influence in the results.

1. Quiz

Complete Questions 1-5 in the quiz on Moodle. You should use the provided Python script svm_with_sklearn.py in the quiz/ folder which can read from standard CSV files. It uses scikit-learn (with SVM based). Typing python svm_with_sklearn.py should give you an idea of how to use this script:

python svm_with_sklearn.py <dataset_train> <dataset_test> [<id_label> <rest_of_cols> ['<char_sep>']]
Supervised Classification

In supervised learning (classification): - Input: \(n\) samples \((x_i, y_i) \in \mathcal{X} \times \mathcal{Y}\). - Goal: Find a function \(f: \mathcal{X} \to \mathcal{Y}\) minimizing the prediction error. - Loss: Use the 0-1 loss \(L(y, \hat{y}) = \mathbb{1}_{y \neq \hat{y}}\). - Risk minimization: Minimize expected risk \(\mathbb{E}[L(Y, f(X))]\).

For SVMs, we seek a predictor separating the data with a maximum margin hyperplane.

Theory: Linear Models and SVMs

Linear Classifiers
Support Vector Machines

Soft-margin SVM optimization:

\[ \min_{\beta, \beta_0} \quad \frac{1}{2} \|\beta\|^2 + C \sum_{i=1}^n \xi_i \] subject to:

\[ y_i (\boldsymbol{x}_i^T \beta - \beta_0) \geq 1 - \xi_i, \quad \xi_i \geq 0 \]

Dual problem (which we will solve):

\[ \max_{\alpha} \sum_{i=1}^n \alpha_i - \frac{1}{2} \sum_{i,j=1}^n \alpha_i \alpha_j y_i y_j K(\boldsymbol{x}_i, \boldsymbol{x}_j) \]

subject to:

\[ 0 \leq \alpha_i \leq C, \quad \sum_{i=1}^n \alpha_i y_i = 0 \]

2. Exercises TD

2.1 Implementing Kernels

Theory reminder:

Task: Implement the different kernels in kernel.py.

Test with:

python grader.py 1

Upload kernel.py.

Upload form is only available when connected
2.2 SVM Constructor

Theory reminder:

Task: Complete the constructor of SVM class in svm.py.

Test with:

python grader.py 2

Upload svm.py.

Upload form is only available when connected
2.3 Training the SVM

Theory reminder:

\[ \beta_0 = \text{average}_{i \in S} \left( y_i - \sum_{j=1}^n \alpha_j y_j K(\boldsymbol{x}_j, \boldsymbol{x}_i) \right) \]

Numerical Stability Reminder

In practice, when building the \(P\) matrix defined by \(P_{ij} = y_i y_j K(x_i, x_j)\), numerical issues can arise if \(P\) is not perfectly symmetric or not strictly positive definite.
To ensure stability for the quadratic solver, it is recommended to:

Thus, before solving the QP, you can do in your code:

P = (P + P.T) / 2
P += np.eye(P.shape[0]) * 1e-8

This helps avoid solver errors or inaccurate solutions.

Theory reminder: - Solve the dual quadratic programming (QP) problem. - Use qpsolvers to solve it.

Task:

Test with:

python grader.py 3

Upload svm.py.

Upload form is only available when connected
2.4 Predicting with the SVM

Theory reminder: - Prediction for a new point:

\[ \hat{f}(\boldsymbol{x}) = \text{sign}\left( \sum_{i=1}^n \alpha_i y_i K(\boldsymbol{x}_i, \boldsymbol{x}) + \beta_0 \right) \]

Task: Implement predict in svm.py.

Test with:

python grader.py 4

Upload svm.py.

Upload form is only available when connected
2.5 Testing the SVM

Theory reminder:

Task: Implement test method in svm.py using ConfusionMatrix.

Test with:

python grader.py 5

Upload svm.py.

Upload form is only available when connected
2.6 Mail Classification

Theory reminder:

Task: Use your SVM to classify emails into spam and ham. (Remember from TD5 the col_class is the first one). Evaluate the metrics.

Use the datasets:

A real dataset is given and you can try for fun:

Complete mail_selector.py.

Test with:

python grader.py 6

Upload mail_selector.py.

Upload form is only available when connected

References