You must be authenticated to submit your files

CSE 102 – Tutorial 4 – Randomized algorithms

General instructions

Sections tagged with a green marginal line are mandatory.

Sections tagged with an orange marginal line are either more advanced or supplementary, and are therefore considered optional. They will help you deepen your understanding of the material in the course. You may skip these sections on a first pass through the tutorial, and come back to them after finishing the mandatory exercises.

When submitting a solution, the system will do some basic validation of your files and run some automated tests on your solutions. A green tick tells you that your file compiles and has passed all of the automated tests. A red tick tells you either that your file is badly broken (e.g., a syntax error) or that your solutions fail one or more of the automated tests. You can see the result of the tests by clicking on “See my uploads” in the upper-left menu, and then clicking on the right arrow button next to each file you submitted.

Although passing all of our tests provides evidence that your solutions are correct, it might not guarantee it, as typically we would need to run infinitely many tests! You are encouraged to write your own tests, and more generally to think about why your code works and not just rely on the automated tests.

Discussion of the tutorials with friends and classmates is allowed, with acknowledgment, but please read the Moodle page on “Collaboration vs plagiarism/cheating”.

A simple experiment

Throughout this tutorial, you will be asked to write experiments to estimate the probabilities of different kinds of events. Here we give you a quick reminder of how to perform such simulations.

Say that we want to compute the probability of having 2 heads when tossing 3 coins. We begin by writing code to perform a single “trial” of flipping a coin three times:

import random

def trial():
    heads = 0
    for i in range(3):
        # I do 3 coin tosses, and count the number of
        # times I got an Head.
        # Here, we use "True" for Head and "False" for Tail
        if random.choice([True, False]):
            heads += 1
    return (heads == 2)         # I win if I got 2 Heads (exactly)

We can then run a large number of trials (say 100,000) to get an estimation of the probability of the event:

def experiment():
    C = 100_000                          # Number of trials
    N = sum(trial() for _ in range(C))   # Number of successes
    return N / C                         # Probability of success

You can check that experiment() returns a result close to \(3/8\), the theoretical probability of making three coin tosses and seeing exactly two heads.

In [1]: experiment()
Out[1]: 0.37562

Likelihood of winning streaks

We expect you to write your solution to this problem in a file named streak.py and to upload it using the form below. You can resubmit your solution as many times as you like.

Upload form is only available when connected

You probably find it intuitive that if successive flips of an unbiased coin are independent, then the probability that the coin will come up Heads again after \(k\) successive Heads is \(1/2\). Suppose you flip a coin \(N\) successive times. What is the probability that there was a streak of \(k\) heads? (Assume \(N>2k\))

  1. After you have made an initial guess, set up the problem above in Python. Write an auxiliary function streak(N, k) that makes a sequence of \(N\) flips from a uniform distribution and that returns True just in case there was a streak of \(k\) heads. Then write a function experiment(N, k) which repeats trial sufficiently many (e.g., 100,000) times to get an estimate of the probability of having such a streak. Now vary \(N\) and \(k\) and see how the estimates returned by experiment(N, k) change.

Write your observations in a comment.

Penney’s paradox

We expect you to write your solution to this problem in a file named penney.py and to upload it using the form below. You can resubmit your solution as many times as you like.

Upload form is only available when connected

Assume two players that pick an unbiased coin and toss it ad infinitum. The first player wins if the sequence Heads-Heads-Tails appears first, while the second one wins if the sequence Tails-Heads-Heads shows up first. What is the probability that the first player wins?

  1. Again, after you have made an initial guess, set up the problem above in Python. Flip from a uniform distribution iteratively until the sequence “Heads-Heads-Tails” or “Tails-Heads-Heads” shows up and store which sequence appeared. Then write a function named experiment() that repeats this experiment over a number of trials to return an estimate of the probability that the first player wins.

Write your observations down in a comment.

Simulating a biased die

We expect you to write your solution to this problem in a file named dice.py and to upload it using the form below. You can resubmit your solution as many times as you like.

Upload form is only available when connected

A die is said to be biased if the probability of landing on a face is different depending on the face. In general, a \(k\)-sided die is fully characterized by a list of \(k\) numbers \([p_1, \dots, p_k]\), where each \(p_i\) gives the probability of the die landing on the \(i\)th face. Moreover, being probabilities of mutually exclusive events these numbers should satisfy:

  • \(0 \le p_i \le 1\) for each \(i\), and
  • \(p_1 + \dots + p_k = 1\)

We refer to such a list of probabilities as a die representation. For example, \([1/6,1/6,1/6,1/6,1/6,1/6]\) represents a fair six-sided die, while \([1/2,1/10,1/10,1/10,1/10,1/10]\) represents a biased die that lands on 1 half the time and on other faces with equal probability.

  1. Write a function roll(D) that takes as input a list D representing a \(k\)-sided die, and simulates rolling the die to return a value. You are only allowed to use the function random from the random module. The function should return a number in the set \(\{1, \dots, k\}\), where \(k\) = len(D), with \(i\) being generated with probability D[i - 1].

  2. Write a function rolls(D, N) that takes a die representation D and an integer N and that simulates rolling the die N times. The function should return a list of \(k\) numbers \([n_1, \cdots, n_k]\) where \(n_i\) is the number of times the face \(i\) has been obtained. (The \(n_i\)’s should sum up to N)

  3. Use the following function to plot the result of your rolls(). Check that the obtained probability for each die face is compatible with the die representation.

    import matplotlib.pyplot as plt
    
    def plot(ns):
        N  = sum(ns)
        ns = [float(x) / N for x in ns]
        plt.bar(range(len(ns)), height=ns)
        plt.xticks(range(len(ns)), [str(i+1) for i in range(len(ns))])
        plt.ylabel('Probability')
        plt.title('Biased die sampling')
        plt.show()

Size estimation: volume of a solid torus

We expect you to write your solution to this problem in a file named torus.py and to upload it using the form below. You can resubmit your solution as many times as you like.

Upload form is only available when connected

A torus is the famous shape of a tube that loops on itself, which can be parameterized as a circle of radius \(r\) (called the inner radius) that is extruded along another circle of radius \(R\) (called the outer radius). It looks as follows, except that the loop is complete.

The volume of a solid torus is easily determined analytically by integrating the area of the inner circle (which is \(\pi r^2\)) along the outer circle, giving \((2 \pi R) (\pi r^2) = 2 \pi^2 R r^2\).

Enclosing cuboid

  1. Imagine that the outer circle of the torus is centered at the origin with the outer circle on the \(xy\)-plane. Further imagine that the torus is inscribed in a tight fitting cuboid whose sides are parallel to one of the three planes, \(xy\), \(xz\), or \(yz\). The opposite corners of this cuboid are \((-(R + r), -(R + r), -r)\) and \((R + r, R + r, r)\).

Write a function torus_volume_cuboid(R, r, N=100_000) that estimates the volume of the solid torus by uniformly sampling points \((x, y, z)\) in this enclosing cuboid. Such a point lies within the torus if: \(\left(\sqrt{x^2 + y^2} - R\right)^2 + z^2 \le r^2\). Use the ratio of successful samples within the torus to the total number of samples N to estimate the volume of the solid torus.

Enclosing cylinder – (optional)

  1. Repeat the exercise, but with an enclosing cylinder of radius \((R + r)\) centered at the origin and with total height \(2r\), half of which is below the \(xy\)-plane. Call this function torus_volume_cylinder(R, r, N=100_000). Hint: pay close attention to how you are sampling points in the cylinder! You can either use cartesian coordinates and rejection sampling, or sample more efficiently using polar coordinates, but in that case there is a trick involved to make the selection uniform. See this article on uniform random sampling on a disc for more discussion.

Enclosing pipe – (optional)

  1. There is no reason to generate samples that lie inside the cylinder that passes through the hole in the torus. Use this insight to optimize the previous function into torus_volume_pipe(R, r, N=100_000) that only generates points in the hollow cylinder—i.e., a pipe—of outer radius \((R + r)\) and thickness of \(2r\). The torus should fit snugly in the solid portion of this pipe.

    Hint: begin by considering how to uniformly sample a point in a circular ring (aka, an annulus) that lies between an inner radius \(r_1\) and an outer radius \(r_2\). Try to do it without resorting to rejection sampling.

Happy \(\pi\) day! – (optional)

We expect you to write your solution to this problem in a file named pi.py and to upload it using the form below. You can resubmit your solution as many times as you like.

Upload form is only available when connected

In this section, we are going to estimate \(\pi\) using random numbers in a few different ways. For each sub-section, we ask you to write a function that returns an approximation of \(\pi\) using the described method - the name of the function is given for each method.

Coprimality

(Put the experiment in the function pi_coprimality).

Given two random natural numbers drawn uniformly in \(\mathbb{N}^*\), what is the probability that they are coprime? (Recall that two numbers \(a\) and \(b\) are coprime if \(gcd(a,b) = 1\).) An elementary analytic derivation of the answer was given by two first-year college students to the surprise of the mathematician Henry L. Adler, who believed that this required considerable prior knowledge of number theory.

  1. Generate a large number of pairs of natural numbers and see what fraction of them are coprime. Compare this empirical estimate to the exact analytical value \(6/\pi^2\). (It is not possible to sample uniformly from the infinite set of natural numbers, but you can approximate the notion of “random pair of natural numbers” by uniform sampling over a finite but large range, such as numbers up to sys.maxsize.)

Evenness

(Put the experiment in the function pi_evenness).

  1. Uniformly sample two floating point numbers x and y in \((0, 1)\) and check if round(x / y) is even, where round(f) is the nearest integer to f. Try it out for a large number of x/y pairs to see what fraction of pairs has this property. Analytically, the probability of this occurring can be shown to be \((5 - \pi)/4\) using high school level math.

Chord lengths

(Put the experiment in the function pi_chords).

  1. Uniformly sample two points on the circumference of the unit circle (a circle of radius \(1\)) and compute the length of the chord formed by these points. Analytically, the expected length is \(4/\pi\). Approximate this by generating a large number of chords and computing their average length. Hint: in polar coordinates, picking a point on the unit circle amounts to uniformly sampling an angle (in radians) from the interval \([0, 2\pi)\).

Observe that here we are estimating the expected value by taking the average of a large number of measurements. This is similar to estimating the probability of an event by running a large number of success/fail experiments.

Line segments in the unit circle

(Put the experiment in the function pi_circle_segments).

  1. A variant of the previous exercise, sample two points uniformly anywhere within the unit circle and compute the length of the line segment joining them. The expected length is \(\frac{128}{45\pi}\). Hint: see the hint above for torus_volume_cylinder.

Uniform sampling from a triangle – (optional)

We expect you to write your solution to this problem in a file named utriangle.py and to upload it using the form below. You can resubmit your solution as many times as you like.

Upload form is only available when connected

In this exercise you are going to figure out how to generate uniform samples from triangles that are drawn on the \(xy\)-plane.

Rejection sampling

  1. We are going to represent points as 2-tuples of the form \((x, y)\). A triangle is made up of three such points. Write a function area3(p1, p2, p3) that returns the area of such a triangle, where each of the points p1, p2, and p3 is a 2-tuple of \(x\) and \(y\) coordinates.

  2. A first attempt at sampling points in the triangle is to find an enclosing rectangle. To simplify matters, we are going to make the rectangle axis aligned, meaning that its edges are parallel to either the \(x\) or the \(y\) axes.

    • Write the function bounding_rect(p1, p2, p3) that returns a 2-tuple of points representing the opposite corners of an axis-aligned rectangle for the triangle made up of the points p1, p2, and p3.

    • A point p is in a triangle formed by the vertices p1, p2, and p3 if and only if:

      area3(p, p1, p2) + area3(p, p2, p3) + area3(p, p1, p3) == area3(p1, p2, p3)

      However, to take into account rounding errors, it is better to compare these two values using math.isclose.

    • Use rejection sampling to uniformly select points in the triangle by repeatedly sampling points in the bounding rectangle and discarding them until you get one that is inside the triangle.

    Implement this as the function rejection_sample3(p1, p2, p3).

To test your function, use the display() function from the provided module triangle_test.py. For example:

import triangle_test
triangle_test.display(rejection_sample3)

Hit the escape key to quit the test, and the return key to restart the test with a different random triangle. Pay attention to the distribution of the points: if they appear to clump together, there is probably something wrong with your generator.

If you want to use the triangle_test.py module on your own laptop, you will need to install the pygame library if you don’t have it already.

An improvement

We can do better than rejection sampling. Say that our triangle has points \((x_1, y_1)\), \((x_2, y_2)\). and \((x_3, y_3)\). Here is the algorithm we are going to follow.

  1. First, move the origin to the first point \((x_1, y_1)\), so that the triangle is now determined by the origin and the points \(\mathbf{u} = (x_2 - x_1, y_2 - y_1)\) and \(\mathbf{v} = (x_3 - x_1, y_3 - y_1)\).

  2. Consider the parallellogram formed by joining up \(\mathbf{u}\) and \(\mathbf{v}\) seen as vectors. The point opposite the origin would be the vector \(\mathbf{u} + \mathbf{v}\). Moreover, any point \(\mathbf{p}\) inside the parallelogram is determined by two independent constants \(\alpha, \beta \in [0, 1]\) such that \(\mathbf{p} = \alpha \mathbf{u} + \beta \mathbf{v}\). These two constants can be selected uniformly.

  3. Half the points sampled in this way will not be in the target triangle and will have to be mapped into it. Figure out a bijection between points in the triangle formed by \((\mathbf{0}, \mathbf{u}, \mathbf{v})\) and that formed by \((\mathbf{u}, \mathbf{v}, \mathbf{u} + \mathbf{v})\). Use this to map all generated points in the latter triangle to the former triangle.

  4. Finally, undo the change of origin performed in the first step to map the point \(\mathbf{p}\) back to the original coordinate system.

  1. Implement the above procedure as a function quick_sample3(p1, p2, p3) where p1, p2, and p3 are respectively \((x_1, y_1)\), \((x_2, y_2)\), and \((x_3, y_3)\). Note that this function should make exactly two calls to random.random().

Gambling queens – (optional)

We expect you to write your solution to this problem in a file named rqueens.py and to upload it using the form below. You can resubmit your solution as many times as you like.

Upload form is only available when connected

In class in Lecture 3, we saw how to solve the \(n\)-queens problem using backtracking. It is also possible to take a randomized approach:

  1. Start by placing the first queen in the first column, choosing its position (= row) uniformly at random.

  2. Then place the second queen in the second column, choosing its position uniformly at random among all positions that do not create a conflict with the first queen.

  3. Continue in the same manner until either you’ve placed all \(n\) queens on the board, or you’ve placed \(k < n\) and there are no positions to place the \((k+1)\)st queen. In that case, rather than backtracking to the \(k\)th queen, restart the whole process from step (1) and try again until you manage to build a complete, random solution.

This is a “Las Vegas algorithm”, in the sense that it always returns a correct solution, although it may get unlucky and require many iterations to find one.

  1. Implement the above algorithm as a function vegas_queens(n), taking as input a natural number n and returning a solution to the \(n\)-queens problem (represented as an appropriate permutation of list(range(n))). Compare the performance of vegas_queens(n) to the generator queens([], n) defined using backtracking that we saw in Lecture 3, for different values of \(n\), and write down your observations as a comment.

One nice feature of backtracking is that it allows us to exhaustively enumerate all solutions. On the other hand, depending on our available computing power the full backtracking tree might be too large to exhaust. The Las Vegas algorithm described above can be adapted to quickly obtain a Monte Carlo estimate of the size of the full backtracking tree, which we can use to decide whether it is worth performing a full enumeration.

The idea is to keep track of the number of choices \(c_1,\dots,c_k\) we had at each iteration, from the initial choice (in this case, a choice of an arbitrary row for the first queen, so that \(c_1 = n\)) up until the point where either the search failed (with \(c_k = 0\)) or we found a complete solution (with \(k = n\) and \(c_n > 0\)). From each such trial, we infer an estimate of

\[ 1 + c_1 + c_1c_2 + \dots + c_1c_2\cdots c_n \]

for the number of nodes in the backtracking tree, and by performing repeated trials we can get a pretty good estimate for the actual number of nodes.

  1. Implement the Monte Carlo algorithm sketched above as a function qbt_estimate(n, trials=100), taking as input a natural number n and an optional number of trials, and returning an estimate for the total size of the backtracking tree for the \(n\)-queens problem. Compare this estimate to the exact answer computed using the function queen_stats in Lecture 3, and write down your observations as a comment. (You can also try varying the number of trials.)

Note that there are no automated tests for these optional exercises, it is up to you to ensure that your code is correct!