This assignment has been closed on May 22, 2024.

You must be authenticated to submit your files

INF442 - Regression with a One-Layer Perceptron

Structure
0. The quiz
1. A model of a neuron
2. Assembling the perceptron
3. Training and regression

Download and unzip the archive INF442-td9-1-handin.zip. It contains several source files to get you started, a Makefile to compile the various tests, a quiz folder with a Python starter-kit script to do the quiz, as well as data sets (.csv files in the csv folder).

Structure

We will implement a 1-layer perceptron. The perceptron will have one hidden layer of neurons and one additional neuron generating the output.

Section 0 is about the quiz;
Section 1 is about a single neuron;
Section 2 consists in arranging neurons so as to obtain a perceptron;
Section 3 consists in training the perceptron (updating the neuron’s weights).
In Exercises 1 and 2, you will complete the partially implemented class Neuron.
Then, in Exercises 3 and 4, you will assemble neurons into a perceptron by connecting their dendrites and axons accordingly.
You can train and observe the behavior of this perceptron on the provided data.
Bonus exercises (Exercises 5 and 6) give you an opportunity to optimise a bit the performance and to generalize the previous exercises to a multi-layer perceptron.

To get the full grade, you have to complete Exercises 1–4.

0. The quiz

In the quiz, you will be asked to train simple perceptrons for artificial 2D data sets and to explore the impact of the architecture and the activation function on the result. As usual, you are given a Python script, which trains a perceptron, plots its decision boundary, and evaluates it on the test data.

Important remark: The performance of the trained model frequently depends heavily on the initialization of the parameters, so different runs of the script may easily result in dramatically different performance. Therefore, for the quiz, we recommend running each model about 5-6 times.

1. A model of a neuron

Biological and mathematical models of neurons

Below, we reproduce two figures from yesterday’s lecture slides showing two illustrations of a neuron: a biological and a mathematical one used in machine learning.

Figure 1. A biological neuron (page 193 of the lecture)

Figure 2. An AI neuron (page 202 of the lecture)

From these two figures, it appears pretty intuitive that a neuron (the mathematical model) can be viewed as consisting of a body responsible for the computations, holding, in particular, the activation function. A neuron communicates with other neurons through a set of buffer nodes called dendrites on the input side and axons on the output side. In this TD, we consider neurons that have precisely one axon.

In the files node.[ch]pp and neuron.[ch]pp you will find, respectively, the classes Node and Neuron, modeling the buffer nodes (dendrites and axons) and complete neurons.

In addition to the above-mentioned information, a neuron contains the weights associated to each dendrite and to the bias. These will be used to train the neuron. Contrary to what is shown in Figure 2, we represent the bias as an extra dendrite.

Since most of the dendrites (except the input ones) are connected to axons, we chose to represent dendrites only by pointers: The node pointed to by a dendrite is either an input node or an axon of another neuron.

Finally, nodes and neurons also provide mechanisms to implement the back propagation of an error, necessary for training.

Take a few minutes to analyze the two classes. Observe, in particular, that:

All the dendrites and the axon of a neuron are modeled by pointers to Node objects. When assembling a perceptron in Section 2, below, the pointers modeling the dendrites will be pointing to the same objects as the corresponding axons. Thus, each Node object will be pointed to by 1 axon and, potentially, many dendrites.
Furthermore, the dendrites of a Neuron are stored in an array of pointers
```
std::vector<Node *> dendrites;
```
Dendrites that are not connected are initialized to nullptr.
We treat the bias of a neuron as an additional dendrite. More precisely, for a neuron with nb_dendrites dendrites, the size of the array dendrites is nb_dendrites + 1 with the first cell dendrites[0] pointing to a dedicated Node object modeling the bias.
The signal in the node modeling the bias is a constant -1 (compare to \(-\beta_0\) in the figure above).
Last, the private field collected_input in the class Neuron stores the last computed value of \(x^{\top} \beta - \beta_0\). It is computed during the forward step (Exercise 1) and used again in the backward pass (Exercise 2).

Exercise 1.

In the file neuron.cpp, implement the function void Neuron::step(), which reads the input signals from all the dendrites of the neuron and computes the output signal by applying the activation function stored in the private variable activation: \(y = f(x^{\top} \beta - \beta_0)\), where \(f\) is the activation function. (Recall that the activation function is a parameter of the neuron. In the proposed implementation, the default is the step function, but we will use the identity and the logistic sigmoid functions.) Do not forget to store \(x^{\top} \beta - \beta_0\) in the member attribute collected_input!

Note: We expect that all the dendrites of the neuron are connected at this stage. Use an assert to ensure that this is, indeed, the case. (If this is not the case, that means there is a bug when the perceptron is built, which would be immediately visible since one of the asserts would fail.)

You can test your code by running:

make grader
./grader 1

Upload your file neuron.cpp here:

Upload form is only available when connected

Exercise 2.

Implement the function void Neuron::step_back(), which realizes the neuron training by reading the propagated error from the axon and which computes the adjustments to all weights of the neuron (those associated to the dendrites and to the bias) as described below.

Figure 3. Illustration for the backward propagation phase

The weights are updated according to the following formulas (see page 208 of the lecture slides):

\[\begin{align} \beta_0 &= \beta_0 - \rho (\nabla_\beta R)_0, \\ \beta_j &= \beta_j - \rho (\nabla_\beta R)_j, \end{align} \]

where \(j \in [1 .. r]\) are the indices of the dendrites (also keep in mind that \(\nabla_{\beta_0} R = (\nabla_\beta R)_0\)), \(\rho\) is the learning rate (see the private variable rate), \(\nabla_\beta R = \mathrm{err} \cdot z\), and, finally,

\[\mathrm{err} = \left(\sum_{j=1}^s \gamma_j \cdot \mathrm{err}_j\right) \sigma'\bigl(z^{\top}\beta - \beta_0\bigr),\]

with \(\sigma'\) the derivative of the activation function. (Note: Here, \(\sigma\) need not be the sigmoid but could denote any activation function!)

Notice that \(z^{\top}\beta - \beta_0\) is the collected_input stored in the neuron as discussed in Exercise 1.

The value of the error defined by this formula cannot be computed from the information inherent to the neuron: The values \(\gamma_j\) and \(\mathrm{err}_j\) come from either the next layer, i.e., the output neuron, if the current one is hidden, or the feedback, if the current neuron is the output one (you can already have a quick look at the code provided for the function OneLayerPerceptron::compute_output_step in file perceptron.cpp, which you will complete in Exercise 4). This value must be taken from the back_value of the axon, where it must be placed during the backward propagation phase of the neuron downstream (in the direction of the main computation of the perceptron, i.e., the one on the right in the figure). In particular, the current neuron must propagate the computed err to the back_value of each dendrite by spreading err among them with the weights \(\{\beta_j\}_{j \in [0 .. r]}\).

To summarize:

The above discussion covers two parallel processes: The propagation of the error and the adjustment of the weights.
In this exercise, all the neurons (hidden or output) have their axons connected to at most one neuron of the next layer, so \(s = 1\). (In Exercise 6, you will need to modify this to allow for \(s > 1\).)
The first parenthesis in the formula defining \(\mathrm{err}\) above is the back_value of the current neuron. It is defined in terms of the corresponding weights (\(\gamma_i\)) and errors (\(\mathrm{err}_i\)) at the neurons downstream. The value to propagate back to the neuron upstream (on the left in the figure) is to be computed based on the error at the current neuron (\(\mathrm{err}\)) and the weight given to the corresponding dendrite (\(\beta_i\)) before the update.

Again, you can test your code by running:

make grader
./grader 2

Upload your file neuron.cpp here:

Upload form is only available when connected

2. Assembling the perceptron

Figure 4. A 1-layer perceptron (page 205 of the lecture). Note that we will not implement the inputs as neurons but directly as nodes in order to avoid creating unnecessary objects.

Exercise 3.

In the file perceptron.cpp, complete the constructor and destructor of the OneLayerPerceptron class. In the constructor,

create all the necessary nodes (for the inputs) and neurons (the hidden layer and the output);
connections should be automatically set up by the corresponding constructors (have a look at the constructors of the classes Node and Neuron);
for the Neuron constructors, you will have to provide activation functions and their corresponding derivatives—you can use the functions provided in the files activation.cpp and activation.hpp, or define the functions directly as lambda-expressions.

The fields dim and size of the OneLayerPerceptron class represent, respectively, the dimension of the input (excluding already the regression column) and the number of neurons in the hidden layer of the perceptron.

You can test your code by running

make grader
./grader 3

Upload your file perceptron.cpp here:

Upload form is only available when connected

Exercise 4.

A high-level specification of one execution cycle of the perceptron is provided by the function double OneLayerPerceptron::run(Dataset *data, int row, int regr, bool print).

Using the functions implemented in Exercises 1 and 2, implement the four functions involved:

prepare_inputs(Dataset *data, int row, int regr, bool print)—initialize the input signals with the data from row row of the data set pointed to by data, skipping the regression column. Each row of data in the data set has dimension dim + 1, including the column used for regression (index regr). Do not forget to normalize the data, using the function normalize just above in perceptron.cpp.
compute_hidden_step(bool print)—compute one step of the hidden layer.
compute_output_step(Dataset *data, int row, int regr, bool print)—compute one step of the output neuron. This should comprise both forward and backward propagation.
propagate_back_hidden(bool print)—propagate the error back through the hidden layer by calling the corresponding functions of each neuron.

You can ignore the bool print parameter, which is only used for debugging.

Again, you can test your code by running

make grader
./grader 4

Upload your file perceptron.cpp here:

Upload form is only available when connected

3. Training and regression

You can train your perceptron and do some regression analysis using the main program in file test_perceptron.cpp. Compile and run it as follows:

make test_perceptron
./test_perceptron help

This will print the usage of the program.

Running the program with the following parameters

./test_perceptron csv/train_boston_housing.csv csv/regr_boston_housing.csv 13 5 100

should produce an output resembling to this:

Read training data from csv/train_boston_housing.csv
405 rows of dimension 14
Read testing data from csv/regr_boston_housing.csv
405 rows of dimension 14
If you see an assert failure now that probably means that
	some of the neurons are not properly initialized or not properly connected.
Initialized a 1-layer perceptron
	Size of the hidden layer:	5
	Learning rate:			0.1
	Decay:				0.001
Training the perceptron over 100 epochs for regression over column 13
Mean RSS: 3768.69
Total time elapsed = 0.449439s
Mean time per epoch/round = 0.00449439s
Switching off learning...	done. Learning rate = 0
Testing the perceptron on the training data (100 times)
Mean RSS: 3454.66
Total time elapsed = 0.453808s
Mean time per epoch/round = 0.00453808s
Testing the perceptron on the testing data (100 times)
Mean RSS: 2895.68
Total time elapsed = 0.109262s
Mean time per epoch/round = 0.00109262s
Deleting the perceptron...	done.

In this test, the perceptron parameters (i.e., the weights associated to the dendrites and the bias of each neuron) are initialized randomly exactly once (can you explain precisely at which moment?). Hence, subsequent executions with the same input data and parameters produce different results. However, this is not exploited to optimize the perceptron.

Exercise 5. (optional) (10 % extra points)

Modify the function run() in test_perceptron.cpp so as to repeat perceptron training several times with different random initializations of the weights and keep the best performing one.

Upload your file test_perceptron.cpp here:

Upload form is only available when connected

Multi-layer perceptron

Exercise 6. (optional) (10 % extra points)

Modify the code in files neuron.[ch]pp and perceptron.[ch]pp as necessary so as to be able to construct multi-layer perceptrons.

Hint: The main difficulty is to ensure correct backward propagation. Indeed, in a multi-layer perceptron, the outputs of some neurons will be connected to multiple neurons (all the neurons in the next layer). Therefore, during the backward propagation phase, the error information from all these (next layer) neurons must be compounded.

Upload a ZIP archive named mlp.zip containing the files you have modified:

Upload form is only available when connected

To create the archive, you can use the following command:

zip mlp.zip neuron.[ch]pp perceptron.[ch]pp