This assignment has been closed on June 21, 2022.
You must be authenticated to submit your files

CSE 102 – Project – Lossy image compression

This tutorial is part of the project.

Collaborating in graded assignments without prior permission will be considered to be plagiarism, as the purpose is not just to learn but to be assessed. In this vein, do not post solutions to any of the graded assignments to a public repository.

You are not allowed to pair for this project.

You are not allowed to use numpyfor this project.

We expect you to write your solution to this problem in a file named myjpeg.py and to upload it using the form below. You can resubmit your solution as many times as you like.

Upload form is only available when connected

The goal of this project is to develop a lossy image compression format close to JPEG. Lossy compression algorithms allow to greatly reduce the size of image files at the price of loosing some data from the original image. In general, a lossy image compression works by pruning the information the human eye is not sensible to. This is the case, for instance, in the JPEG file format.

The JPEG compression algorithm can be decomposed into several steps:

The project

The goal of this project is to write a JPEG-like image encoder/decoder. This is a 3-week project:

Representing images

In this project, we will only consider color-images with an 8-bit precision. An \(w \times h\) image (\(w\) is the width, \(h\) is the height) is represented by a row-major \(w \times h\) matrix \(M\) of square pixels where the value \(M_{i,j}\) contains the color information of the pixel at line \(i\) and column \(j\) (where the first line is the top-most line and the first column is the left-most column). The color information is expressed in the RGB (Red-Green-Blue) color-space, i.e. \(M_{i, j}\) is a 3-element tuple \((r, g, b)\) where \(r\) (resp. \(g\) & \(b\)) contains the information for the red component (resp. green component, blue component) of the pixel’s color. Each component is coded using a 8-bit unsigned integer, i.e. each compoment will range in the interval \([0 \ldots 255]\).

The PPM input format

Our file compressor will take as input an image in the ASCII-variant of the PPM (portable pixmap file) format. This file format is used for color images where each pixel is coded by three values in the RGB (red, green and blue) color-space.

Here is an example of a image encoded using the PPM ASCII-format:

P3           # "P3" means this is a RGB color image in ASCII
3 2          # "3 2" is the width and height of the image in pixels
255          # "255" is the maximum value for each color
# The part above is the header
# The part below is the image data: RGB triplets
255   0   0  # red
  0 255   0  # green
  0   0 255  # blue
255 255   0  # yellow
255 255 255  # white
  0   0   0  # black

The ASCII-PPM image format is encoded in human-readable text. A PPM file consists of a sequence of tokens (a token is here defined as a non-empty sequence of non-whitespace characters) delineated by white spaces and/or carriage returns. In addition, a comment can be placed anywhere with a “#” character. The comment extends to the end of the line.

In the ASCII-PPM format, the first token is a magic PPM identifier that should always be equal to the string P3. The next two tokens consists of the width and height of the image as ASCII numbers. Finally, the fourth token gives the maximum value (depth) of the color components for the pixels. In our case, we will only accept 8-bit images, i.e. PPM files with a maxium value of 255 for the color components.

Follows the image data. The pixels’ values are given line by line, starting from the top, each line begin coded from left to right. For each pixel, the Red-Green-Blue components are given, in this order, as ASCII numbers between 0 and 255.

  1. Write a function ppm_tokenize(stream) that takes an input stream and that returns an iterator for all the tokens of stream, ignoring the comments. For example, if file.ppm is a file whose contents is the PPM example given above, then the following program:

    with open('file.ppm') as stream:
    	for token in ppm_tokenize(stream):
    		print(token)

    should print:

    P3
    3
    2
    255
    255
    0
    0
    0
    255
    0
    0
    0
    255
    255
    255
    0
    255
    255
    255
    0
    0
    0
  2. Write a function ppm_load(stream) that takes an input stream and that loads the PPM image, returning the resulting image as a \(3\)-element tuple (w, h, img) where w is the image’s width, h is the image’s height and img is a 2D-array that contains the image’s pixels’ color information, as described in the previous section. (You can use the function from the previous question)

    For example, executing the following code

    with open('file.ppm') as stream:
        w, h, img = ppm_load(stream)
    print(w)
    print(h)
    print(img)

    should give

    3
    2
    [[(255, 0, 0), (0, 255, 0), (0, 0, 255)], [(255, 255, 0), (255, 255, 255), (0, 0, 0)]]
  3. Write a function ppm_save(w, h, img, output) that takes an output stream output and that saves the PPM image img whose size is w x h.

RGB to YCbCr conversion & channel separation

As said in the introduction, our compressor will work in a color-space where the information of luminance and colors are separated. More precisely, we will use the YCbCr color-space, where the light-intensity or luma information (Y) is separated from the color information (Cb for blue-difference & Cr for red-difference).

Theoretically, all the colors playing a role in the light-intensity, we can define the luminance Y as the sum of the color information. The Cb and Cr color informations are then defined relatively to Y: Cb (resp. Cr) is defined as the difference between the luminance Y and the blue component value (resp. the red component value):

\[\left\{ \begin{align} Y &= R + G + B \\ C_b &= Y - B \\ C_r &= Y - R \end{align} \right.\]

However, in practice, each color component has a different impact on the perceived brightness changes in an image and Y is then defined as a weighted sum of the color components. Moreover, as for the RGB values, we want a 8-bit precision color-space, i.e. we want the values for Y, Cb & Cr to range over the interval \([0 \ldots 255]\).

In this project, we will follow the JPEG standard that uses a modified version of the ITU-R BT.601 standard, as defined below:

\[ \left\{ \begin{array}{llllllll} Y &= 0 & +\ 0.299 & R & +\ 0.587 & G & +\ 0.114 & B \\ C_b &= 128 & -\ 0.168736 & R & -\ 0.331264 & G & +\ 0.5 & B \\ C_r &= 128 & +\ 0.5 & R & -\ 0.418688 & G & -\ 0.081312 & B \end{array} \right.\]

and whose inverse transform is:

\[ \left\{ \begin{array}{llllll} R &= Y && & +\ 1.402 & (Cr - 128)\\ G &= Y & -\ 0.344136 & (Cb - 128) & -\ 0.714136 & (Cr - 128)\\ B &= Y & +\ 1.772 & (Cb - 128) \end{array} \right.\]

  1. write a function RGB2YCbCr(r, g, b) that takes a pixel’s color in the RGB color space and that converts it in the YCbCr color space, returning the 3-element tuple (Y, Cb, Cr).

  2. write a function YCbCr2RGB(Y, Cb, Cr) that takes a point in the YCbCr and that converts it in the RGB color space, returning the 3-element tuple (R, G, B).

Note that due to rounding errors, you might get some values outside the range \([0 \ldots 255]\). In that case, you have to clamp the values to the range \([0 \ldots 255]\). (I.e. any value below \(0\) (resp. above \(255\)) should be normalized to \(0\) (resp. to \(255\))).

It remains to lift these conversion functions to images. Given that all the subsequent phases will work on the Y / Cb / Cr channels separately, our RGB to YCbCr conversion function will return 3 matrices (or channels) that resp. contain the values for the Y, Cb and Cr components.

  1. write a function img_RGB2YCbCr(img) that takes an image in the RGB-color space and that return a 3-element tuple (Y, Cb, Cr) where Y (resp. Cb, Cr) is a matrix s.t. Y[i][j] (resp. Cb[i][j], Cr[i][j]) denotes the Y (resp. Cb, Cr) component of img[i][j].

  2. write a function img_YCbCr2RGB(channels) that performs the inverse transformation of img_RGB2YCbCr(img). In that case, channels is a 3-element tuple (Y, Cb, Cr) where Y (resp. Cb, Cr) is a matrix s.t. Y[i][j] (resp. Cb[i][j], Cr[i][j]) denotes the Y (resp. Cb, Cr) component at pixel (i, j). (Formally, the function img_YCbCr2RGB is only an “approximate” inverse to img_RGB2YCbCr, in the sense that there may not be an exact equality img_YCbCr2RGB(img_RGB2YCbCr(img)) == img due to the presence of rounding errors.)

Subsampling

As said in the introduction, our eyes are much more sensitive to black and white (luminance) than to color (chrominance). To exploit this low sensitivity of the human eye to chrominance, we can proceed to a subsampling of the related luminance channel by encoding the information of multiple pixels by a single sample.

In this project, we will have a family a:b of subsampling modes parameterized by two positive integers \(a\) & \(b\). In the a:b mode, subsampling will split the image into blocks of size a x b (i.e. a rows and b columns) that will be encoded by a sample whose value is the average (rounded to the nearest integer) of the block’s samples. (Note that some blocks might be smaller than \(a \times b\) as nothing enforces that a (resp. b) divides the image’s height (resp. image’s width)).

  1. Write a function subsampling(w, h, C, a, b) that performs & returns the subsampling of the channel C (of width w and height h) in the a:b subsampling mode. (In practice, we will only use subsampling on the Cb or Cr channels, as returned in the previous question)

  2. Write a function extrapolate(w, h, C, a, b) that does the inverse operation, where w & h (resp. width & height) denotes the size of the channel before the subsampling has been applied. Note that subsampling is a lossy operation: it is then not possible to recover the original information. However, if a sub-block B of the original image has been encoded by the sample/value i, we will restore B by creating a block of size a x b all of whose elements are equal to i.

Block splitting

After subsampling, each channel is split into 8x8 blocks. If the channel data does not represent an integer number of blocks then we must fill the remaining area of the incomplete blocks with some dummy pixels. In this project, we are going to extend the incomplete blocks by repeating the edge pixels.

  1. Write a function block_splitting(w, h, C) that takes a channel C and that yield all the 8x8 subblocks of the channel, line by line, from left to right. If the channel data does not represent an integer number of blocks, then you must inplement the aforementionned padding technique.

    For example, if C is equal to the following Python matrix:

    [
        [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
        [ 2,  3,  4,  5,  6,  7,  8,  9, 10,  1],
        [ 3,  4,  5,  6,  7,  8,  9, 10,  1,  2],
        [ 4,  5,  6,  7,  8,  9, 10,  1,  2,  3],
        [ 5,  6,  7,  8,  9, 10,  1,  2,  3,  4],
        [ 6,  7,  8,  9, 10,  1,  2,  3,  4,  5],
        [ 7,  8,  9, 10,  1,  2,  3,  4,  5,  6],
        [ 8,  9, 10,  1,  2,  3,  4,  5,  6,  7],
        [ 9, 10,  1,  2,  3,  4,  5,  6,  7,  8],
    ]

    then block_splitting(10, 9, C) must yield the following 4 matrices:

    [
      [ 1,  2,  3,  4,  5,  6,  7,  8],
      [ 2,  3,  4,  5,  6,  7,  8,  9],
      [ 3,  4,  5,  6,  7,  8,  9, 10],
      [ 4,  5,  6,  7,  8,  9, 10,  1],
      [ 5,  6,  7,  8,  9, 10,  1,  2],
      [ 6,  7,  8,  9, 10,  1,  2,  3],
      [ 7,  8,  9, 10,  1,  2,  3,  4],
      [ 8,  9, 10,  1,  2,  3,  4,  5],
    ]
    
    [
      [ 9, 10, 10, 10, 10, 10, 10, 10],
      [10,  1,  1,  1,  1,  1,  1,  1],
      [ 1,  2,  2,  2,  2,  2,  2,  2],
      [ 2,  3,  3,  3,  3,  3,  3,  3],
      [ 3,  4,  4,  4,  4,  4,  4,  4],
      [ 4,  5,  5,  5,  5,  5,  5,  5],
      [ 5,  6,  6,  6,  6,  6,  6,  6],
      [ 6,  7,  7,  7,  7,  7,  7,  7],
    ]
    
    [
      [ 9, 10,  1,  2,  3,  4,  5,  6],
      [ 9, 10,  1,  2,  3,  4,  5,  6],
      [ 9, 10,  1,  2,  3,  4,  5,  6],
      [ 9, 10,  1,  2,  3,  4,  5,  6],
      [ 9, 10,  1,  2,  3,  4,  5,  6],
      [ 9, 10,  1,  2,  3,  4,  5,  6],
      [ 9, 10,  1,  2,  3,  4,  5,  6],
      [ 9, 10,  1,  2,  3,  4,  5,  6],
    ]
    
    [
      [ 7,  8,  8,  8,  8,  8,  8,  8],
      [ 7,  8,  8,  8,  8,  8,  8,  8],
      [ 7,  8,  8,  8,  8,  8,  8,  8],
      [ 7,  8,  8,  8,  8,  8,  8,  8],
      [ 7,  8,  8,  8,  8,  8,  8,  8],
      [ 7,  8,  8,  8,  8,  8,  8,  8],
      [ 7,  8,  8,  8,  8,  8,  8,  8],
      [ 7,  8,  8,  8,  8,  8,  8,  8],
    ]

Discrete Cosine Transform (DCT)

The Fourier transform is a way to map a signal (like an image), in the time or spatial domain, on its spectrum in the frequency domain. The time and frequency domains are just alternative ways of representing signals and the Fourier transform is the mathematical relationship between the two representations. The Discrete Cosine Transform (DCT) is a transformation close to the Discrete Fourier Transform (DFT)

The DCT is widely used in signal and image processing, and especially in compression. The DCT has indeed an excellent property of aggregation of the energy for natural images: the information is essentially carried by the low frequency coefficients that represent the major data present in an image, while the high frequencies characterize the high contrast areas. These high frequencies will be then discarded in the quantization step.

The general case

In this project, we will use the DCT-II variant. Assume given a discrete variable \(v = [ v_i ]_{i < n} \in \mathbb{R}^n\). The DCT-II \([ \hat{v}_i ]_{i < n} \in \mathbb{R}^n\) of \(v\) is defined as:

\[ \hat{v}_i = \delta_i \sqrt{\frac{2}{n}} \sum_{j < n} v_j \cos \left[ \frac{\pi}{n} \left( j + \frac{1}{2} \right) i \right] \]

where

\[ \delta_i = \left\{ \begin{gathered} 1 / \sqrt{2} & \text{if $i = 0$} \\ 1 & \text{otherwise} \end{gathered} \right. \]

By defining \(C^{[n]} \in \mathcal{M}_n(\mathbb{R})\) as follows:

\[ C^{[n]}_{i, j} = \delta_i \sqrt{\frac{2}{n}} \cos \left[ \frac{\pi}{n} \left( j + \frac{1}{2} \right) i \right], \]

we obtain the following matrix-based expression of the DCT-II:

\[ \hat{v} = v \cdot {C^{[n]}}^\intercal . \]

The DCT can be lifted to two-dimensional discrete variables by applying it along all the dimensions (e.g. over the rows first, and then over on the columns). I.e., using the matrix-based expression of the 1D DCT-II, assuming a 2D discrete variable \(A \in \mathcal{M}_{m, n}(\mathbb{R})\), the 2D DCT-II \(\hat{A} \in \mathcal{M}_{m, n}(\mathbb{R})\) is defined by:

\[ \hat{A} = C^{[m]} \cdot A \cdot {C^{[n]}}^\intercal . \]

where one can check that \(A \cdot {C^{[n]}}^\intercal\) (resp. \(C^{[m]} \cdot A\)) corresponds to the application of the 1D DCT-II over the rows of \(A\) (resp. over the columns of \(A\)). By associativity of the matrix multiplication, it is immediate to see that the result does not depend on whether the 1D DCT-II is first applied over the rows or over the columns.

One can check that, for any \(n\), \(C^{[n]}\) is an orthonormal matrix, i.e. it is invertible and its inverse is given by \({C^{[n]}}^{-1} = {C^{[n]}}^\intercal\). Hence, the DCT-II is an invertible operation whose invert is (in the 1D variant) the function that maps \(\hat{v}\) to \(\hat{v} \cdot C^{[n]}\). It is easy to lift this to the 2D case.

  1. Write a function DCT(v) that computes and return the DCT-II of the vector v. The input vector v is given as a non-empty Python list of numbers (integers and/or floating point numbers) and your function must return a Python list of floating point numbers that contains the DCT-II coefficients, as given in the formula above. For this question, it is fine to compute the DCT-II using the literal expression given above or using the matrix-based definition of the DCT-II. For example,

    DCT([8, 16, 24, 32, 40, 48, 56, 64])

    should return the following vector:

    [101.82, -51.54, 0.00, -5.39, 0.00, -1.61, 0.00, -0.41]

    (We have chosen to format the output using a precision of two decimals, but your output may differ.)

  2. Write a function IDCT(v) that computes the inverse DCT-II of the vector v. The input vector v is given as a non-empty Python list of numbers (integers and/or floating point numbers) and your function must return a Python list of floating point numbers that contains the inverse DCT-II coefficients. For this question, you can implement your function using the matrix-based definition of the inverse DCT-II or, you can derive a literal expression from the matrix-based expression and implement

    As a sanity check, you can check that IDCT is the inverse of DCT:

    import random
    
    v = [
        float(random.randrange(-10**5, 10**5))
        for _ in range(random.randrange(1, 128))
    ]
    v2 = IDCT(DCT(v))
    assert (all(math.isclose(v[i], v2[i]) for i in range(len(v))))
  3. Write a function DCT2(m, n, A) that computed the 2D DCT-II of the matrix A. The input matrix A (represented as a row-major array) contains numbers only (integers and/or floating point numbers) and is made of m rows and n columns. Your function must return a matrix of floating point numbers that contain the 2D DCT-II coefficients. For this question, you can implement your function using the matrix-based definition of the 2D DCT-II or, you can derive a literal expression from the matrix-based expression and implement it.

    You can check that the following 8x8 matrix:

    [
      [ 140,  144,  147,  140,  140,  155,  179,  175],
      [ 144,  152,  140,  147,  140,  148,  167,  179],
      [ 152,  155,  136,  167,  163,  162,  152,  172],
      [ 168,  145,  156,  160,  152,  155,  136,  160],
      [ 162,  148,  156,  148,  140,  136,  147,  162],
      [ 147,  167,  140,  155,  155,  140,  136,  162],
      [ 136,  156,  123,  167,  162,  144,  140,  147],
      [ 148,  155,  136,  155,  152,  147,  147,  136],
    ]

    as a DCT-II transform equal to:

    [
      [1210.000,  -17.997,   14.779,   -8.980,   23.250,   -9.233,  -13.969,  -18.937],
      [  20.538,  -34.093,   26.330,   -9.039,  -10.933,   10.731,   13.772,    6.955],
      [ -10.384,  -23.514,   -1.854,    6.040,  -18.075,    3.197,  -20.417,   -0.826],
      [  -8.105,   -5.041,   14.332,  -14.613,   -8.218,   -2.732,   -3.085,    8.429],
      [  -3.250,    9.501,    7.885,    1.317,  -11.000,   17.904,   18.382,   15.241],
      [   3.856,   -2.215,  -18.167,    8.500,    8.269,   -3.608,    0.869,   -6.863],
      [   8.901,    0.633,   -2.917,    3.641,   -1.172,   -7.422,   -1.146,   -1.925],
      [   0.049,   -7.813,   -2.425,    1.590,    1.199,    4.247,   -6.417,    0.315],
    ]

    (Again, we have chosen to format the output to three decimals, but your output may differ.)

  4. Write a function IDCT2(m, n, A) that computed the 2D inverse DCT-II of the matrix A. The input matrix A (represented as a row-major array) contains numbers only (integers and/or floating point numbers) and is made of m rows and n columns. Your function must return a matrix of floating point numbers that contain the 2D inverse DCT-II coefficients. For this question, you can implement your function using the matrix-based definition of the 2D inverse DCT-II or, you can derive a literal expression from the matrix-based expression and implement it.

    Again, as a sanity check, you can test that IDCT2 is the inverse of DCT2 on some examples:

    import random
    
    m = random.randrange(1, 128)
    n = random.randrange(1, 128)
    A = [
        [
            float(random.randrange(-10**5, 10**5))
            for _ in range(n)
        ]
        for _ in range(m)
    ]
    A2 = IDCT2(m, n, DCT2(m, n, A))
    assert (all(
        math.isclose(A[i][j], A2[i][j])
        for i in range(m) for j in range(n)
    ))
The 8x8 DCT-II Transform & Chen’s Algorithm

Our compression algorithm, as in the JPEG standard, is not going to apply the DCT-II transform on the whole image’s channels. Instead, each channel is going to be cut into 8x8 blocks and the DCT-II algorithm is applied on each of these blocks. As a consequence, in this project, we are only interested in computing the 2D DCT-II transform for 8x8 matrices.

In a nutshell, doing an 2D DCT-II transform on a 8x8 channel amounts to decomposing the channel using the following bases of 64 base patterns:

More formally, for \(i \in \mathbb{N}\), let \(\alpha_i = \cos \left( \frac{i}{16} \pi \right)\). Then, the matrix \(C^{[8]}\) can be expressed as follows:

\[ C^{[8]} = \frac{1}{2} \cdot \underbrace{ \left( \begin{matrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} & \cdots & \frac{1}{\sqrt{2}} \\ \alpha_1 & \alpha_3 & \alpha_{5} & \cdots & \alpha_{15} \\ \alpha_2 & \alpha_6 & \alpha_{10} & \cdots & \alpha_{30} \\ \alpha_3 & \alpha_9 & \alpha_{15} & \cdots & \alpha_{45} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \alpha_7 & \alpha_{21} & \alpha_{35} & \cdots & \alpha_{105} \end{matrix} \right)}_{\overline{C}} = \frac{1}{2} \cdot \left[ \delta_i \cdot \alpha_{i(2j+1)} \right]_{i, j} \]

Let \(i \in \mathbb{N}\). Using the parity of the \(\cos\) function and the identities

\[ \left\{ \begin{aligned} \cos(x + \pi) &= - \cos(x) \\ \cos(x + 2\pi) &= + \cos(x), \end{aligned} \right. \]

it is possible to reduce \(\alpha_i\) to \(\sigma_i \cdot \alpha_{\tau_i}\) with \(\sigma_i \in \{-1, +1\}\) and \(\tau_i \in \{0..8\}\). E.g.

\[ \alpha_{55} = \cos \left( \frac{55}{16} \pi \right) = \cos \left( \frac{32 + 16 + 7}{16} \pi \right) = \cos \left( 2\pi + \pi + \frac{7}{16} \pi \right) = - \cos \left( \frac{7}{16} \pi \right) = - \alpha_7 . \]

  1. Write a function redalpha(i) that takes a non-negative integer i and that returns a pair (s, k) s.t.

    • s an integer in the set \(\{-1, 1\}\),
    • k an integer in the range \(\{0..8\}\), and
    • \(\alpha_i = s \cdot \alpha_k\).

    Your function must be loop-less.

It remains to remark that \(\frac{1}{\sqrt{2}}\) is equal to \(\alpha_4 = \cos(\frac{\pi}{4})\). Hence, we now know that any coefficient \(\overline{C}_{i, j}\) of \(\overline{C}\) is of the form \(\sigma_{i,j} \cdot \alpha_{\tau_{i, j}}\) with \(\sigma_{i, j} \in \{-1, +1\}\) and \(\tau_{i, j} \in \{0..8\}\).

  1. Write a function ncoeff8(i, j) that takes two integers i & j in the range \(\{0..8\}\) and that returns a pair (s, k) s.t.

    • s an integer in the set \(\{-1, 1\}\),
    • k an integer in the range \(\{0..8\}\), and
    • \(\overline{C}_{i, j} = s \cdot \alpha_k\).

Once done, define the following matrix:

M8 = [
    [ncoeff8(i, j) for j in range(8)]
    for i in range(8)
]

and execute the following code:

def M8_to_str(M8):
    def for1(s, i):
        return f"{'+' if s >= 0 else '-'}{i:d}"

    return "\n".join(
        " ".join(for1(s, i) for (s, i) in row)
        for row in M8
    )
    
print(M8_to_str(M8))

You should notice that the printed matrix has some structure:

\[ \forall i, j .\, \overline{C}_{i, 7-j} = \omega_i \cdot \overline{C}_{i, j} . \]

where

\[ \omega_i = \left\{ \begin{aligned} &+1 & \text{if $i$ is even,}\\ &-1 & \text{otherwise.} \end{aligned} \right. \]

This means that we should be able to trade some multiplications for some additions (in general, such trades lead to better performances as multiplications take longer than additions on CPUs) when computing the 1D DCT-II transform \(\hat{v}\) of a vector \(v\) of dimension \(8\). Indeed, we have seen that \(\hat{v} = v \cdot {C^{[8]}}^\intercal = \frac{1}{2} \cdot (v \cdot \overline{C}^\intercal)\), i.e.:

\[ \begin{aligned} \hat{v}_i & = \frac{1}{2} \cdot \sum_{j < 8} v_j \cdot {\overline{C}}^\intercal_{j,i} = \frac{1}{2} \cdot \sum_{j < 8} v_j \cdot \overline{C}_{i,j} \\ & = \frac{1}{2} \cdot \sum_{j < 4} (v_j \cdot \overline{C}_{i,j} + v_{7-j} \cdot \overline{C}_{i, 7-j}) \\ & = \frac{1}{2} \cdot \sum_{j < 4} (v_j + \omega_i \cdot v_{7-j}) \cdot \overline{C}_{i,j} . \end{aligned} \]

This way of factorizing is close to what is done in the FFT algorithm and/or fast polynomial multiplication algorithm (as seen during the course) and forms the basis of the Chen algorithm. At first glance, it allows to reduce the number of needed multiplications from \(64\) to \(32\) for the computation of \(\hat{v}\). (Note that we do not count \(\omega_i \cdot \overline{C}_{0,7-j}\) as a multiplication since \(\omega_i = +/- 1\). Likewise, the multiplication by \(\frac{1}{2}\) could be precomputed by dividing, a priori, all the \(\overline{C}_{i,j}\)’s by \(2\).) However, it is still possible to lower this number. Consider the computation of \(\hat{v}_0\):

\[ \hat{v}_0 = \frac{1}{2} \cdot \sum_{j < 4} (v_j + v_{7-j}) \cdot \overline{C}_{0,j} . \]

We know that, for any integer \(k \in \{0..7\}\), we have \(\overline{C}_{0, k} = \alpha_4\). Hence:

\[ \hat{v}_0 = \frac{\alpha_4}{2} \cdot \sum_{j < 8} v_j . \]

By regrouping all the common sub-expressions in the computation of the \(\hat{v}_i\)’s, it is possible to lower the number of multiplications to \(22\). (Note that the theoretical lower bound for the number of multiplications in the computation of the 1D DCT-II is of \(11\) and is achieved by the Loeffler algorithm.) As a consequence, for a 8x8 matrix \(A\), without any further optimization, it is possible to compute the 2D DCT-II transform of \(A\) using only \(352\) multiplications (where the naive algorithm uses \(1024\) multiplications). (Using the Loeffer algorithm, it would be possible to reduce this number to \(176\), number that could be again reduced by combining the two 1D DCT-II pass into a single one.)

  1. Implement a function DCT_Chen(A) that takes an 8x8 matrix A of numbers (integers and/or floating point numbers) and that returns the 2D DCT-II transform of A. Your function cannot do more than \(352\) scalar multiplications. You are allowed to do some precomputation when your Python file is loaded - e.g., it is correct to precompute the \(\frac{\alpha_i}{2}\)’s. You are allowed to do the transform in place. In that case, your function should return the input matrix A.
The inverse 8x8 Transform

For the inverse transform, we can do a similar reasoning. In that case, we obtain the following equation:

\[ [ v_0, v_1, v_2, v_3, v_7, v_6, v_5, v_4 ] = \frac{1}{2} \cdot [ \hat{v}_0, \hat{v}_2, \hat{v}_4, \hat{v}_6, \hat{v}_1, \hat{v}_3, \hat{v}_5, \hat{v}_7 ] \cdot \left[ \begin{matrix} \Omega & \Omega \\ \Theta & -\Theta \end{matrix} \right] \]

where

\[ \Omega = \left(\begin{matrix} \alpha_4 & \alpha_4 & \alpha_4 & \alpha_4 \\ \alpha_2 & \alpha_6 & - \alpha_6 & - \alpha_2 \\ \alpha_4 & - \alpha_4 & - \alpha_4 & \alpha_4 \\ \alpha_6 & - \alpha_2 & \alpha_2 & - \alpha_6 \end{matrix}\right) \]

and

\[ \Theta = \left(\begin{matrix} \alpha_1 & \alpha_3 & \alpha_5 & \alpha_7 \\ \alpha_3 & - \alpha_7 & - \alpha_1 & - \alpha_5 \\ \alpha_5 & - \alpha_1 & \alpha_7 & \alpha_3 \\ \alpha_7 & - \alpha_5 & \alpha_3 & - \alpha_1 \end{matrix}\right) \]

From this formula, by regrouping all the common sub-expressions, it is again possible to lower the number of multiplications:

  1. Implement a function IDCT_Chen(A) that takes an 8x8 matrix A of numbers (integers and/or floating point numbers) and that returns the 2D DCT-II inverse transform of A. You should try to lower the number of needed multiplications as much as possible. You are allowed to do some precomputation when your Python file is loaded - e.g., it is correct to precompute the \(\frac{\alpha_i}{2}\)’s. You are allowed to do the inverse transform in place. In that case, your function should return the input matrix A.

Quantization

Quantizing a signal consists in reducing its precision. Quantization is the step in the JPEG-like compression algorithm where most of the loss of information (and thus of visual quality) occurs. Quantization occurs right after the DCT transform. It consists in dividing the DCT output matrix by another, called the quantization matrix, which contains 8x8 coefficients, called the quantization coefficients.

These coefficients depend on several parameters: the encoder, the desired quality, the channel being quantized and their position in the quantization matrix. In all cases, the higher a quantization coefficient is, the more information is lost. Since the human eye is less sensitive to high-frequencies, the quantization matrix will have higher values for the high frequencies. Likewise, since the human eye is more sensitive to the luminance information, the luminance’s quantization matrix will contain lower values than the chrominance quantization matrix.

Let \(A \in \mathcal{M}_8(\mathbb{R})\) and \(Q \in \mathcal{M}_8(\mathbb{Z})\). The quantization of \(A\) by \(Q\), denoted by \(A \star Q\), is the member of \(\mathcal{M}_8(\mathbb{Z})\) defined by:

\[ (A \star Q)_{i, j} = \left\lfloor \frac{A_{i,j}}{Q_{i,j}} \right\rceil , \]

where \(\lfloor x \rceil\) denotes \(x\) rounded to the nearest value. (You can use the round Python function)

The inverse quantization of \(A\) by \(Q\), denoted by \(A \star^{-1} Q\), is the member of \(\mathcal{M}_8(\mathbb{R})\) defined by:

\[ (A \star^{-1} Q)_{i, j} = A_{i, j} \cdot Q_{i, j} . \]

  1. Write a function quantization(A, Q) that takes two 8x8 matrices of numbers (integers and/or floating point numbers) and that returns the quantization of A by Q. The returned matrix must be an 8x8 matrix of integers.

  2. Write a function quantizationI(A, Q) that takes two 8x8 matrices of numbers (integers and/or floating point numbers) and that returns the inverse quantization of A by Q. The returned matrix must be an 8x8 matrix of numbers (integers and/or floating point numbers).

It remains to define the quantization matrices used by our encoder / decoder. The JPEG standard defines the two following quantization matrices. The first one is used for the luminance channel and is defined as follows:

\[ \left(\begin{matrix} 16 & 11 & 10 & 16 & 24 & 40 & 51 & 61\\ 12 & 12 & 14 & 19 & 26 & 58 & 60 & 55\\ 14 & 13 & 16 & 24 & 40 & 57 & 69 & 56\\ 14 & 17 & 22 & 29 & 51 & 87 & 80 & 62\\ 18 & 22 & 37 & 56 & 68 & 109 & 103 & 77\\ 24 & 35 & 55 & 64 & 81 & 104 & 113 & 92\\ 49 & 64 & 78 & 87 & 103 & 121 & 120 & 101\\ 72 & 92 & 95 & 98 & 112 & 100 & 103 & 99 \end{matrix}\right) \]

The second one is used for the chrominance channels and is defined as follows:

\[ \left(\begin{matrix} 17 & 18 & 24 & 47 & 99 & 99 & 99 & 99\\ 18 & 21 & 26 & 66 & 99 & 99 & 99 & 99\\ 24 & 26 & 56 & 99 & 99 & 99 & 99 & 99\\ 47 & 66 & 99 & 99 & 99 & 99 & 99 & 99\\ 99 & 99 & 99 & 99 & 99 & 99 & 99 & 99\\ 99 & 99 & 99 & 99 & 99 & 99 & 99 & 99\\ 99 & 99 & 99 & 99 & 99 & 99 & 99 & 99\\ 99 & 99 & 99 & 99 & 99 & 99 & 99 & 99 \end{matrix}\right) \]

You can download their Python definition here.

Given a quantization matrix \(Q\) and a quality factor \(\phi \in \{ 1 .. 100 \}\), we define the lift of \(Q\) by \(\phi\) as the quantization matrix \(Q_\phi\) defined as follows:

\[ (Q_\phi)_{i, j} = \left\lceil \frac{50 + S_\phi \cdot Q_{i, j}}{100} \right\rceil \]

where

\[ S_\phi = \left\{ \begin{gathered} 200 - 2\phi & \text{if $\phi \geq 50$} \\ \left\lfloor \frac{5000}{\phi} \right\rceil & \text{otherwise.} \end{gathered} \right. \]

  1. Write a function Qmatrix(isY, phi) that takes a boolean isY and a quality factor phi. If isY is True, it returns the standard JPEG quantization matrix for the luminance channel, lifted by the quality factor phi. If isY is False, it returns the standard JPEG quantization matrix for the chrominance channel, lifted by the quality factor phi.

Zig-Zag walk & RLE Encoding

Finally, each 8X8 block is streamed and compressed using a variant run-length encoding (RLE).

Zig-Zag walk

First, we want to stream the 8x8 blocks that we obtained from the previous passes in such a way that there is a high probability to obtain long chains of 0’s. On natural images, we expect the DCT to give very small values for the high frequencies - i.e. for the values that are in the bottom-right quadrant of the 8x8 block. Moreover, after the quantization phase, most (if not all) of these coefficients for the high frequencies should be null. This means that the bottom-right quadrant of the 8x8 block should only be made of 0’s.

To increase the probability of streaming long chains of 0’s, it seems better to walk the 8x8 block in zig-zag from the upper-left position to the bottom-right one, as follows:

  1. Write a function zigzag(A) that takes a 8x8 row-major matrix and that returns a generator that yields all the values of A, following the zig-zag ordering.

It remains to encode these long chains of 0’s in a more compact way. For this purpose, we will use a variant of the run-length encoding, named RLE0, based on the value 0. In RLE0, we are going to ignore the 0’s from the stream. However, each time we encounter a non-zero value v, instead of simply streaming v, we stream the pair (i, v) where i is equal to the number of 0’s preceding v in the stream. For example, the sequence [0, 0, 4, 0, 0, 0, 7, 1, 0, 2, 0, 0] is encoded as [(2, 4), (3, 7), (0, 1), (1, 2)] when using the RLE0 encoding.

  1. Write a function rle0(g) that takes a generator that yields integers and that returns a generator that yields the pairs obtained from the RLE0 encoding of g.

Tying-up everything

To finalize the JPEG compressor, it remains to do a entropy encoding and to save the result in a file. This phase is not part of the project.