INF442 - Regression with a One-Layer Perceptron
- Download and unzip the archive
INF442-td9-1-handin.zip
. It contains several source files to get you started, aMakefile
to compile the various tests, aquiz
folder with a Python starter-kit script to do the quiz, as well as data sets (.csv
files in thecsv
folder).
Structure
We will implement a 1-layer perceptron. The perceptron will have one hidden layer of neurons and one additional neuron generating the output.
Section 0 is about the quiz;
Section 1 is about a single neuron;
Section 2 consists in arranging neurons so as to obtain a perceptron;
Section 3 consists in training the perceptron (updating the neuron’s weights).
In Exercises 1 and 2, you will complete the partially implemented class
Neuron
.Then, in Exercises 3 and 4, you will assemble neurons into a perceptron by connecting their dendrites and axons accordingly.
You can train and observe the behavior of this perceptron on the provided data.
Bonus exercises (Exercises 5 and 6) give you an opportunity to optimise a bit the performance and to generalize the previous exercises to a multi-layer perceptron.
To get the full grade, you have to complete Exercises 1–4.
0. The quiz
In the quiz, you will be asked to train simple perceptrons for artificial 2D data sets and to explore the impact of the architecture and the activation function on the result. As usual, you are given a Python script, which trains a perceptron, plots its decision boundary, and evaluates it on the test data.
Important remark: The performance of the trained model frequently depends heavily on the initialization of the parameters, so different runs of the script may easily result in dramatically different performance. Therefore, for the quiz, we recommend running each model about 5-6 times.1. A model of a neuron
Biological and mathematical models of neurons
Below, we reproduce two figures from yesterday’s lecture slides showing two illustrations of a neuron: a biological and a mathematical one used in machine learning.
From these two figures, it appears pretty intuitive that a neuron (the mathematical model) can be viewed as consisting of a body responsible for the computations, holding, in particular, the activation function. A neuron communicates with other neurons through a set of buffer nodes called dendrites on the input side and axons on the output side. In this TD, we consider neurons that have precisely one axon.
In the files node.[ch]pp
and neuron.[ch]pp
you will find, respectively, the classes Node
and
Neuron
, modeling the buffer nodes (dendrites and axons) and
complete neurons.
In addition to the above-mentioned information, a neuron contains the weights associated to each dendrite and to the bias. These will be used to train the neuron. Contrary to what is shown in Figure 2, we represent the bias as an extra dendrite.
Since most of the dendrites (except the input ones) are connected to axons, we chose to represent dendrites only by pointers: The node pointed to by a dendrite is either an input node or an axon of another neuron.
Finally, nodes and neurons also provide mechanisms to implement the back propagation of an error, necessary for training.
Take a few minutes to analyze the two classes. Observe, in particular, that:
All the dendrites and the axon of a neuron are modeled by pointers to
Node
objects. When assembling a perceptron in Section 2, below, the pointers modeling the dendrites will be pointing to the same objects as the corresponding axons. Thus, eachNode
object will be pointed to by 1 axon and, potentially, many dendrites.Furthermore, the dendrites of a
Neuron
are stored in an array of pointersstd::vector<Node *> dendrites;
Dendrites that are not connected are initialized to
nullptr
.We treat the bias of a neuron as an additional dendrite. More precisely, for a neuron with
nb_dendrites
dendrites, the size of the arraydendrites
isnb_dendrites + 1
with the first celldendrites[0]
pointing to a dedicatedNode
object modeling the bias.The signal in the node modeling the bias is a constant
-1
(compare to \(-\beta_0\) in the figure above).Last, the private field
collected_input
in the classNeuron
stores the last computed value of \(x^{\top} \beta - \beta_0\). It is computed during the forward step (Exercise 1) and used again in the backward pass (Exercise 2).
Exercise 1.
In the file neuron.cpp
, implement the function
void Neuron::step()
, which reads the input signals from all
the dendrites of the neuron and computes the output signal by applying
the activation function stored in the private variable
activation
: \(y = f(x^{\top}
\beta - \beta_0)\), where \(f\)
is the activation function. (Recall that the activation function is a
parameter of the neuron. In the proposed implementation, the default is
the step function, but we will use the identity and the logistic sigmoid
functions.) Do not forget to store \(x^{\top}
\beta - \beta_0\) in the member attribute
collected_input
!
Note: We expect that all the dendrites of the neuron are
connected at this stage. Use an assert
to ensure that this
is, indeed, the case. (If this is not the case, that means there is a
bug when the perceptron is built, which would be immediately visible
since one of the asserts would fail.)
You can test your code by running:
make grader
./grader 1
Upload your file neuron.cpp
here:
Exercise 2.
Implement the function void Neuron::step_back()
, which
realizes the neuron training by reading the propagated error from the
axon and which computes the adjustments to all weights of the neuron
(those associated to the dendrites and to the bias) as described
below.
The weights are updated according to the following formulas (see page 208 of the lecture slides):
\[\begin{align} \beta_0 &= \beta_0 - \rho (\nabla_\beta R)_0, \\ \beta_j &= \beta_j - \rho (\nabla_\beta R)_j, \end{align} \]
where \(j \in [1 .. r]\) are the
indices of the dendrites (also keep in mind that \(\nabla_{\beta_0} R = (\nabla_\beta R)_0\)),
\(\rho\) is the learning rate
(see the private variable rate
), \(\nabla_\beta R = \mathrm{err} \cdot z\),
and, finally,
\[\mathrm{err} = \left(\sum_{j=1}^s \gamma_j \cdot \mathrm{err}_j\right) \sigma'\bigl(z^{\top}\beta - \beta_0\bigr),\]
with \(\sigma'\) the derivative of the activation function. (Note: Here, \(\sigma\) need not be the sigmoid but could denote any activation function!)
Notice that \(z^{\top}\beta -
\beta_0\) is the collected_input
stored in the
neuron as discussed in Exercise 1.
The value of the error defined by this formula cannot be computed
from the information inherent to the neuron: The values \(\gamma_j\) and \(\mathrm{err}_j\) come from either the next
layer, i.e., the output neuron, if the current one is hidden, or the
feedback, if the current neuron is the output one (you can already have
a quick look at the code provided for the function
OneLayerPerceptron::compute_output_step
in file
perceptron.cpp
, which you will complete in Exercise 4). This value must be taken from the
back_value
of the axon, where it must be placed during the
backward propagation phase of the neuron downstream (in the
direction of the main computation of the perceptron, i.e., the one on
the right in the figure). In particular, the current neuron must
propagate the computed err
to the back_value
of each dendrite by spreading err
among them with the
weights \(\{\beta_j\}_{j \in [0 ..
r]}\).
To summarize:
- The above discussion covers two parallel processes: The propagation of the error and the adjustment of the weights.
- In this exercise, all the neurons (hidden or output) have their axons connected to at most one neuron of the next layer, so \(s = 1\). (In Exercise 6, you will need to modify this to allow for \(s > 1\).)
- The first parenthesis in the formula defining \(\mathrm{err}\) above is the
back_value
of the current neuron. It is defined in terms of the corresponding weights (\(\gamma_i\)) and errors (\(\mathrm{err}_i\)) at the neurons downstream. The value to propagate back to the neuron upstream (on the left in the figure) is to be computed based on the error at the current neuron (\(\mathrm{err}\)) and the weight given to the corresponding dendrite (\(\beta_i\)) before the update.
Again, you can test your code by running:
make grader
./grader 2
Upload your file neuron.cpp
here:
2. Assembling the perceptron
Exercise 3.
In the file perceptron.cpp
, complete the constructor and
destructor of the OneLayerPerceptron
class. In the
constructor,
- create all the necessary nodes (for the inputs) and neurons (the hidden layer and the output);
- connections should be automatically set up by the corresponding
constructors (have a look at the constructors of the classes
Node
andNeuron
); - for the
Neuron
constructors, you will have to provide activation functions and their corresponding derivatives—you can use the functions provided in the filesactivation.cpp
andactivation.hpp
, or define the functions directly as lambda-expressions.
The fields dim
and size
of the
OneLayerPerceptron
class represent, respectively, the
dimension of the input (excluding already the regression column) and the
number of neurons in the hidden layer of the perceptron.
You can test your code by running
make grader
./grader 3
Upload your file perceptron.cpp
here:
Exercise 4.
A high-level specification of one execution cycle of the perceptron
is provided by the function
double OneLayerPerceptron::run(Dataset *data, int row, int regr, bool print)
.
Using the functions implemented in Exercises 1 and 2, implement the four functions involved:
prepare_inputs(Dataset *data, int row, int regr, bool print)
—initialize the input signals with the data from rowrow
of the data set pointed to bydata
, skipping the regression column. Each row of data in the data set has dimensiondim + 1
, including the column used for regression (indexregr
). Do not forget to normalize the data, using the functionnormalize
just above inperceptron.cpp
.compute_hidden_step(bool print)
—compute one step of the hidden layer.compute_output_step(Dataset *data, int row, int regr, bool print)
—compute one step of the output neuron. This should comprise both forward and backward propagation.propagate_back_hidden(bool print)
—propagate the error back through the hidden layer by calling the corresponding functions of each neuron.
You can ignore the bool print
parameter, which is only
used for debugging.
Again, you can test your code by running
make grader
./grader 4
Upload your file perceptron.cpp
here:
3. Training and regression
You can train your perceptron and do some regression analysis using
the main program in file test_perceptron.cpp
. Compile and
run it as follows:
make test_perceptron
./test_perceptron help
This will print the usage of the program.
Running the program with the following parameters
./test_perceptron csv/train_boston_housing.csv csv/regr_boston_housing.csv 13 5 100
should produce an output resembling to this:
Read training data from csv/train_boston_housing.csv
405 rows of dimension 14
Read testing data from csv/regr_boston_housing.csv
405 rows of dimension 14
If you see an assert failure now that probably means that
some of the neurons are not properly initialized or not properly connected.
Initialized a 1-layer perceptron
Size of the hidden layer: 5
Learning rate: 0.1
Decay: 0.001
Training the perceptron over 100 epochs for regression over column 13
Mean RSS: 3768.69
Total time elapsed = 0.449439s
Mean time per epoch/round = 0.00449439s
Switching off learning... done. Learning rate = 0
Testing the perceptron on the training data (100 times)
Mean RSS: 3454.66
Total time elapsed = 0.453808s
Mean time per epoch/round = 0.00453808s
Testing the perceptron on the testing data (100 times)
Mean RSS: 2895.68
Total time elapsed = 0.109262s
Mean time per epoch/round = 0.00109262s
Deleting the perceptron... done.
In this test, the perceptron parameters (i.e., the weights associated to the dendrites and the bias of each neuron) are initialized randomly exactly once (can you explain precisely at which moment?). Hence, subsequent executions with the same input data and parameters produce different results. However, this is not exploited to optimize the perceptron.
Exercise 5. (optional) (10 % extra points)
Modify the function run()
in
test_perceptron.cpp
so as to repeat perceptron training
several times with different random initializations of the weights and
keep the best performing one.
Upload your file test_perceptron.cpp
here:
Multi-layer perceptron
Exercise 6. (optional) (10 % extra points)
Modify the code in files neuron.[ch]pp
and
perceptron.[ch]pp
as necessary so as to be able to
construct multi-layer perceptrons.
Hint: The main difficulty is to ensure correct backward propagation. Indeed, in a multi-layer perceptron, the outputs of some neurons will be connected to multiple neurons (all the neurons in the next layer). Therefore, during the backward propagation phase, the error information from all these (next layer) neurons must be compounded.
Upload a ZIP archive named mlp.zip
containing the files
you have modified:
To create the archive, you can use the following command:
zip mlp.zip neuron.[ch]pp perceptron.[ch]pp