CSC_4302_EP - Density Estimation
- Download and unzip the archive
CSC43042EP-td4-1-handin.zip
.
It contains in particular:
- the file
grader.py
, which is used for the various test scripts, - the scripts
galaxy_plot.py
which uses your estimators to plot a nice galaxy from star data, andestimate_TV.py
which you will use for the last section, - different datasets.
Log in to be able to upload your files.
Context
This TD corresponds to lecture 4.
We will implement three different kernel density estimators and integrate them into a common class structure based on inheritance.
Then, we will introduce the technique of mean shifts, which can help us improve the quality of the estimations.
We will experiment to see the crucial effect of bandwidth selection.
Structure
The TD is structured as follows:
- Section 1 is the main section of the TD: we will implement the flat, Gaussian, and \(k\)-NN kernels;
- in Section 2, we implement the mean shift algorithm;
- Section 3 puts our estimations at a test.
First, if not already done, install the necessary libraries:
$ pip3 install --user --only-binary=:all: numpy pandas scikit-learn matplotlib
As usual, the goal is to implement basic versions of the algorithms by ourselves, so we will not rely on existing libraries such as scikit-learn in our implementations. There will be one exception to this: we will rely on scikit-learn to compute \(k\)-nearest neighbors, because we have already studied this question in a previous TD.
1. Kernel Density Estimators
We will represent our samples \((p_i)_1^n\) by a list data
of
numpy arrays (each element of the list is a numpy array representing one
given point).
Kernel estimators are non-parametric density estimators. Recall that, given a “training” dataset of \(n\) samples \((p_i)_1^n\) drawn from the unknown distribution \(f\), the estimated density has the form:
\[\hat{f}_n(x) = \dfrac{c_{k, d}}{n \sigma^d} \sum\limits_{i=1}^n k \left( \dfrac{||x - p_i||_2^2}{\sigma^2} \right).\]
We will define a base class Kernel
with a method
density(x: np.ndarray) -> float
which should return the
estimated density \(\hat{f}_n(x)\) at
point x
.
We will implement three different kernels:
- flat
- Gaussian
- \(k\)-nearest neighbors
The first two (flat and Gaussian) are instances of radial kernels, i.e., they only depend on the distance to a single data point at a time. The third (\(k\)-nearest neighbors) is not radial as it depends on several (i.e., \(k\)) nearest data points.
Hence, we will group the flat and Gaussian kernels as derived classes
of an intermediate class Radial
. The class diagram is as
follows:
Flat
/
Radial
/ \
Kernel Gaussian
\
Knn
To keep things simple, we will use a unique Python file for all these
classes. Open the file kernels.py
in the directory
TD
and copy the following code into it:
import numpy as np
from typing import Self
from abc import abstractmethod
class Kernel:
"""A class for kernel density estimation, which also stores the cloud of points
Attributes:
d: int -- dimension of the ambient space
data: list[np.ndarray] -- list of coordinates of the points (each of dimension self.d)
"""
def __init__(self: Self, d: int, data: list[np.ndarray]):
self.data = data
self.d = d
@abstractmethod
def density(self: Self, x: np.ndarray) -> float:
pass
The class Kernel is abstract, we will not implement its
density
method. The @abstractmethod
decorator
means that each class derived from Kernel
will be forced to
give a concrete implementation of density
(unless they are
also abstract).
1.1 The Radial
class
We will now define the Radial
class inherited from
Kernel
. Objects of the Radial
class have an
additional attribute self.bandwidth: float
, and they need
an initializer taking three arguments: d
, data
and bandwidth
. Implement a first version of the
Radial
class containing only this initializer
(__init__
). We recommend that you call
Kernel
’s initializer inside Radial
’s
initializer, to limit code duplication.
Then, in the Radial
class, add two abstract
methods:
- a method
volume(self: Self) -> float
that returns the volume of the unit ball associated with the kernel. (This volume is equal to \(\dfrac{1}{c_{k, d}}\).)- For the flat kernel, this will be \(\dfrac{\pi^{d/2}}{\Gamma(d/2 + 1)}\).
- For the Gaussian kernel, this will be \((2\pi)^{d/2}\).
- a method
profile(self: Self, t: float) -> float
that returns the kernel profile function \(k\) evaluated at the given real number \(t\).- For the flat kernel, this will be \(1\) if and only if \(t \leq 1\) and \(0\) otherwise.
- For the Gaussian kernel, this will be \(e^{-t/2}\).
As in Kernel
, the use of abstract member methods makes
Radial
an abstract class. In order to be able to
instantiate an actual object, we will need derived classes which
“implement” (override) these methods: in this case, Flat
and Gaussian
. We will implement these classes, and in
particular their methods volume
and profile
,
in Sections 1.1.2 and 1.1.3
respectively.
Assuming that volume
and profile
are
already implemented, implement density
in
Radial
. Recall that the density for radial kernels
is defined as \[\hat{f}_n(x) =
\dfrac{c_{k,d}}{n\sigma^d} \sum\limits_{i=1}^n k\left(\dfrac{\lVert x -
p_i\rVert_2^2}{\sigma^2} \right)\] where the \(p_i\) are the data points, \(k\) is the kernel profile
function, \(c_{k,d}\) is the inverse of
the volume
, and \(\sigma\)
is the chosen bandwidth
.
Test your implementation by running
python grader.py 1
(or an equivalent command depending on
your operating system and Python installation).
Upload your file kernels.py
here:
1.1.1 Flat Kernel
Implement the class Flat
derived from
Radial
by implementing the member methods
volume
and profile
.
Recall that volume
should return \(\dfrac{\pi^{d/2}}{\Gamma(d/2 + 1)}\), and
profile
should return \(1\) if and only if \(t \leq 1\) and \(0\) otherwise.
To compute the \(\Gamma\) function, you can use
a few tricks, or you can just import math
and use
math.gamma
.
Test your implementation by running
python grader.py 2
.
Upload your file kernels.py
here:
1.1.2 Gaussian Kernel
Implement the class Gaussian
derived from
Radial
by implementing the member methods
volume
and profile
.
Recall that volume
should return \((2\pi)^{d/2}\), and profile
should return \(e^{-t/2}\) (you can use
from math import exp
).
Implement a member function
guess_bandwidth(self: Self)-> None
which sets
the bandwidth according to the so-called Silverman rule of
thumb , \[\mathrm{guessed~bandwidth} =
\left(\frac{n (d+2)}{4}\right)^{-1/(d+4)} \cdot \hat{\sigma},\]
where \(\hat{\sigma}\) is the
sample standard deviation of the dataset, namely \(\hat\sigma =
\sqrt{\dfrac{1}{n-1}\sum\limits_{i=1}^{n}\|p_i- mean\|^2 }\).
Test your implementation by running
python grader.py 3
.
Upload your file kernels.py
here:
1.2 \(k\)-Nearest Neighbor Kernel
The return value of the density
function of the
Knn
kernel should be \[\hat{f}_n(x) = \frac{k}{2nVd_k(x)}\] where
\(d_k(x)\) is the distance of the given
point \(x\) to its \(k\)-th nearest neighbor, and \(k\) and \(V\) are two parameters set in
Knn
’s initializer.
Because \(k\)-NN is not our main
focus today, we will rely on scikit-learn
. We have prepared
a class Knn
for this purpose:
from sklearn.neighbors import NearestNeighbors
class Knn(Kernel):
"""A class for kernel density estimation with k-Nearest Neighbors
derived from Kernel
Attributes not already in Kernel:
k: int -- parameter for k-NN
V: float -- "volume" constant appearing in density
neigh: sklearn.neighbors.NearestNeighbors -- data structure and methods for efficient k-NN computations
"""
def __init__(self: Self, d: int, data: list[np.ndarray], k: int, V: float):
super().__init__(d,data)
self.k, self.V = k, V
self.neigh = NearestNeighbors(n_neighbors=self.k)
#....
def fit_knn(self):
"""Computes the inner data structure acccording to the data points."""
self.neigh.fit(np.array(self.data))
def knn(self, x: np.ndarray, vk:int):
"""The vk nearest-neighbors (vk can be different from self.k)."""
return [np.array(self.data[i]) for i in self.neigh.kneighbors([x], n_neighbors=vk)[1][0] ]
def k_dist_knn(self, x: np.ndarray, vk: int) -> float:
"""The distance to vk-th nearest-neighbor."""
return self.neigh.kneighbors([x], n_neighbors=vk)[0][0][vk-1]
Complete the class Knn
, by completing the
missing line(s) in its initializer, and by implementing the member
function density(self: Self, x:np.ndarray) -> float:
for
the \(k\)-nearest neighbor
kernel.
Test your implementation by running
python grader.py 4
.
Upload your file kernels.py
here:
Note: we call the parameter \(V\) the total volume, which we set manually for simplicity. For given \(V\) (for example \(V=1\)), the kernel’s density is not strictly speaking a (probability) density because its integral is not necessarily equal to \(1\). Depending on applications it may be necessary or not to rescale \(V\) in order to work with a valid probability density.
1.3 Visualization
We recommend you try to visualize the outputs of your estimators on real data. The following code loads the double_spiral dataset from the archive you unpacked, and plots the estimated density. You can modify this code to test other datasets, kernels, and parameters.
= np.loadtxt("csv/double_spiral.csv", delimiter=" ", dtype=float)
mydata = 80
vk = Knn(2, mydata, vk, 1.0)
myknn = 100
grid -np.ceil(-mydata.min(axis=0)).astype(int), np.ceil(mydata.max(axis=0)).astype(int), grid, f"Knn k={vk} grid={grid}") print_2D_density(myknn.density,
The code uses the plotting function print_2D_density
below, which relies on mathplotlib
. The parameter
grid: int
indicates the number of rows/columns of the
underlying plotting grid.
def print_2D_density(density, min_xy, max_xy, grid: int, title: str):
= np.linspace(min_xy[0], max_xy[0], grid), np.linspace(min_xy[1], max_xy[1], grid)
x_range,y_range = np.meshgrid(x_range, y_range)
X, Y = np.array([[density(np.array([x, y])) for x in x_range] for y in y_range])
Z =(10, 5))
plt.figure(figsize="viridis")
plt.contourf(X, Y, Z, cmap
plt.title(title)### Use plt.savefig(f'{title}.png') to save to a file instead of displaying plt.show()
You can run the script galaxy_plot.py
and you should
obtain a density plot as this one (feel free to try different datasets
and parameters!)

2. Mean Shift
It is often necessary and desirable to remove noise from a data set. For that, multiple possibilities exist. We will implement one of them, a mean shift algorithm based on the \(k\)-nearest neighbor kernel. Beyond noise removal, mean shift is strongly related to density estimation, since the displacement of points in the mean shift algorithm is a way to estimate the gradient of the density function. In particular mean shift can be useful to detect modes of a distribution.
One iteration of the algorithm updates the position of every data point to the mean of its \(k\) nearest neighbors.
Add a method
meanshift(self: Self, k: int) -> None
to the class
Knn
which implements one iteration of this
algorithm and updates the new coordinates of the points.
Test your implementation by running
python grader.py 5
.
For reference, the first 10 data points of normal.csv
after running one iteration with \(k =
3\) should be equal (or close) to:
-0.742035 1.09605
0.025417 0.1229
0.845654 -0.97052
0.04139 0.921662
0.00173367 -0.848124
0.095338 1.4491
2.13119 -0.484897
-0.302607 0.184375
1.26503 -0.312559
-0.467785 0.744896
Upload your file kernels.py
here:
You can use the estimators you implemented on the datasets in conjunction with the mean shift algorithm with different parameters and compare the results.
The mean shift algorithm can be particularly useful for
higher-dimensional datasets. To experiment with this, you can use the 3D
dataset galaxies_3D.csv
. For visualization of 3D data, use
the program MeshLab. Note that on
some versions of Meshlab you should rename the file from
.csv
to .xyz
so that MeshLab opens it. After
opening the dataset galaxies_3D.xyz
you will see something
like this:
For shorter computation time, we recommend that you use the dataset
galaxies_3D-short.xyz
instead, which contains much less
points. Try using the mean shift algorithm on this dataset and compare
the resulting set of data points with the original. Apply four
iterations of the algorithm with the parameter \(k=10\) and save the results to a new
.xyz
file. Then open both files simultaneously on Meshlab,
and display the points simultaneously with two different colors. You
should get noticeable results. How do the points distribute in space
after performing this operation?
- the points are well distributed in 3D space
- the points seem to be distributed along some 2-dimensional structure resembling a “wave”
- the points seem to be distributed along some 1-dimensional structures resembling “filaments”.
- the points seem to concentrate around a small number of centers.
—-> Please answer this question on Moodle (in the quiz named “TD4”).
Note: in order to load/save the data to/from .xyz
files,
you can you use code such as this:
= np.loadtxt("csv/galaxies_3D-short.xyz", delimiter=" ", dtype=float)
mydata =Knn(3,mydata,10,1.0)
myknn
...."csv/galaxies_3D-short-afterMeanshift.xyz", np.array(myknn.data), delimiter=" ") np.savetxt(
3. Putting it to a test: total variation distance
The total variation distance \(d_{TV}(p,q)\) between two probability densities \(p\) and \(q\) is defined as half of their \(L^1\) distance, namely \[ d\_{TV}(p,q) = \frac{1}{2} \displaystyle\int |p(x)-q(x)| dx\] where the integral is taken over the whole (\(d\)-dimensional) space. It is a number between zero and one, which is zero if and only if \(p\equiv q\) as probability measures.
The attached file estimate_TV.py
generates sample points
from a certain 2-dimensional density (known to the program, in fact it
is a multinormal distribution), then computes the total variation
distance between this known density and the estimated density. Here is
the output of the program if you run it with the parameter
job=1
(you can change this parameter somewhere inside the
code, look for it).

For the central plot, we used the built-in density estimator from the
sklearn
library, as you can see from the code lines:
= KernelDensity(bandwidth='silverman', kernel='gaussian')
kde
kde.fit(data)= lambda x, y: np.exp(kde.score_samples(np.array([[x, y]])))[0] estimated_density_sk
For the rightmost plot, we use our naive implementation of the Gaussian kernel, which gives a much worse total variation distance (around 0.4). The main reason is that our “rule of thumb” for bandwidth estimation is not performing well on this distribution.
Add the method guess_bandwidth_challenge()
to
the class Gaussian
in such a way that it performs better on
the multi-normal examples provided by estimate_TV.py
than
the previous guess_bandwidth()
. You are free to
use your own heuristics. You can then run estimate_TV.py
with each value of job=1,2,3,4
to see how your heuristics
performs. You should not modify estimate_TV.py
except for
the job number and for the lines:
#### Comment the following line and uncomment the next one:
kg.guess_bandwidth()#kg.guess_bandwidth_challenge()
You will be graded according to the ratio between the TV distance you reach and the one reached by scikit-learn, with the following scale (this scale gives you an idea of your grade for for this question only)
- 4 < ratio : D
- 1.8 < ratio <= 4 : C
- 1.2 < ratio <= 1.8 : B
- 0.9 < ratio <=1.2 : A
- ratio < 0.9 : A+
To get a grade you need to obtain it on at least three of the four
jobs, except for A and A+ where you need to get it for all of them. Test
your implementation by running python grader.py 6
to test
the four jobs at once. Note: the grader on the server has a limit of 45
seconds, so the test has to pass before that time.
Hint: try different bandwidths, estimate the TV for each of them, and
try to find a better rule for bandwidth estimation. Once this is done,
implement this heuristic in guess_bandwidth_challenge()
. To
get A+, you might need to do more, and modify the Gaussian
class to handle (for example) some preprocessing.
Upload your file kernels.py
here: