You must be authenticated to submit your files

CSE 102 – Tutorial 2 – Random sampling and randomized algorithms

General instructions
Introduction: running a simple experiment
Games of chance
Probabilistic primality testing
More fun with sampling (optional)

General instructions

Sections tagged with a green marginal line are mandatory.

Sections tagged with an orange marginal line are either more advanced or supplementary, and are therefore considered optional. They will help you deepen your understanding of the material in the course. You may skip these sections on a first pass through the tutorial, and come back to them after finishing the mandatory exercises.

After you are finished with the tutorial, please upload your submission to Moodle by completing “Tutorial 2 assignment” before the indicated deadline.

When submitting a solution, the system will do some basic validation of your files and run some automated tests on your solutions. A green tick tells you that your file compiles and has passed all of the automated tests. A red tick tells you either that your file is badly broken (e.g., a syntax error) or that your solutions fail one or more of the automated tests. You can see the result of the tests by clicking on “See my uploads” in the upper-left menu, and then clicking on the right arrow button next to each file you submitted.

Although passing all of our tests provides evidence that your solutions are correct, it might not guarantee it, as typically we would need to run infinitely many tests! You are encouraged to write your own tests, and more generally to think about why your code works and not just rely on the automated tests.

Discussion of the tutorials with friends and classmates is allowed, with acknowledgment, but please read the Moodle page on “Collaboration, plagiarism, and AI”.

Introduction: running a simple experiment

In many parts of this tutorial, you will be asked to write experiments to estimate the probabilities of different kinds of events. Here we give you a quick reminder of how to perform such simulations.

Say that we want to compute the probability of having 2 heads when tossing 3 coins. We begin by writing code to perform a single “trial” of flipping a coin three times:

import random

def trial():
    heads = 0
    for i in range(3):
        # Do three coin tosses, and count the number of
        # times it lands "Heads".
        # Here, we use True for "Heads" and False for "Tails"
        if random.choice([True, False]):
            heads += 1
    return (heads == 2)         # We win if we got exactly two Heads

We can then run a large number of trials (say 100,000) to get an estimation of the probability of the event:

def experiment():
    N = 100_000                          # Number of trials
    S = sum(trial() for _ in range(N))   # Number of successes
    return S / N                         # Probability of success

You can check that experiment() returns a result close to \(3/8\), the theoretical probability of making three coin tosses and seeing exactly two heads.

In [1]: experiment()
Out[1]: 0.37562

Games of chance

We expect you to write your solution to this problem in a file named chance.py and to upload it using the form below. You can resubmit your solution as many times as you like.

Upload form is only available when connected

You probably find it intuitive that if successive flips of an unbiased coin are independent, then the probability that the coin will come up Heads again after \(k\) successive Heads is \(1/2\). Suppose you flip a coin \(N\) successive times. What is the probability that there was a streak of \(k\) heads? (Assume \(N>2k\))

After you have made an initial guess, set up the problem above in Python. Write an auxiliary function streak(N, k) that makes a sequence of \(N\) flips from a uniform distribution and that returns True just in case there was a streak of \(k\) heads. Then write a function experiment(N, k) which repeats trial sufficiently many (e.g., 100,000) times to get an estimate of the probability of having such a streak. Now vary \(N\) and \(k\) and see how the estimates returned by experiment(N, k) change.

Write your observations in a comment.

Assume two players that pick an unbiased coin and toss it ad infinitum. The first player wins if the sequence Heads-Heads-Tails appears first, while the second one wins if the sequence Tails-Heads-Heads shows up first. What is the probability that the first player wins?

Again, after you have made an initial guess, set up the problem above in Python. Flip from a uniform distribution iteratively until the sequence “Heads-Heads-Tails” or “Tails-Heads-Heads” shows up and store which sequence appeared. Then write a function named penney() that repeats this experiment over a number of trials to return an estimate of the probability that the first player wins.

Write your observations down in a comment.

A die is said to be biased if the probability of landing on a face is different depending on the face. In general, a \(k\)-sided die is fully characterized by a list of \(k\) numbers \([p_1, \dots, p_k]\), where each \(p_i\) gives the probability of the die landing on the \(i\)th face. Moreover, being probabilities of mutually exclusive events these numbers should satisfy:

\(0 \le p_i \le 1\) for each \(i\), and
\(p_1 + \dots + p_k = 1\)

We refer to such a list of probabilities as a die representation. For example, \([1/6,1/6,1/6,1/6,1/6,1/6]\) represents a fair six-sided die, while \([1/2,1/10,1/10,1/10,1/10,1/10]\) represents a biased die that lands on 1 half the time and on other faces with equal probability.

Write a function roll(D) that takes as input a list D representing a \(k\)-sided die, and simulates rolling the die to return a value. You are only allowed to use the function random from the random module. The function should return a number in the set \(\{1, \dots, k\}\), where \(k\) = len(D), with \(i\) being generated with probability D[i - 1].
Write a function rolls(D, N) that takes a die representation D and an integer N and that simulates rolling the die N times. The function should return a list of \(k\) numbers \([n_1, \cdots, n_k]\) where \(n_i\) is the number of times the face \(i\) has been obtained. (The \(n_i\)’s should sum up to N)

Use the following function to plot the result of your rolls(). Check that the obtained probability for each die face is compatible with the die representation.

import matplotlib.pyplot as plt

def plot(ns):
    N  = sum(ns)
    ns = [float(x) / N for x in ns]
    plt.bar(range(len(ns)), height=ns)
    plt.xticks(range(len(ns)), [str(i+1) for i in range(len(ns))])
    plt.ylabel('Probability')
    plt.title('Biased die sampling')
    plt.show()

Probabilistic primality testing

We expect you to write your solution to this problem in a file named chance.py and to upload it using the form below. You can resubmit your solution as many times as you like.

Upload form is only available when connected

A primality test is a procedure that takes a number and determines whether that the number is prime or not. We are going to implement the Miller-Rabin primality test, discussed in lecture, which is a probabilistic primality check with good accuracy. The algorithm is parameterized by a integer \(k\) that will control the accuracy of the test: the higher \(k\) is, the more accurate the primality test. Here is a description of the algorithm:

let \(n \in \mathbb{N}\) and let \(k\) be the accuracy parameter. If \(n \le 3\), then answer directly. Otherwise, find \(r\) and \(d\) such that \(n - 1 = 2^r \cdot d\) and \(d\) is odd – you can iteratively divide \(n - 1\) by \(2\) up to the point where the result is odd.
repeat the following procedure \(k\) times:
1. pick a random number \(a\) in \([2 \ldots n - 2]\) and let \(x = a^d \mod n\). If \(x = 1\) or \(x = n - 1\), then move to the next loop iteration. Otherwise,
2. repeat the following procedure \(r - 1\) times:
  - \(x\) is updated to the value of \(x^2 \mod n\).
  - if \(x = 1\), the number is composite (not prime)
  - if \(x = n - 1\), move to the next outer loop iteration
3. if you complete all \(r-1\) iterations of the inner loop, the number is composite. (Note: if \(r \le 1\), this condition is automatically satisfied.)
if you complete all \(k\) iterations of the outer loop above, the number is probably prime.

Write a function is_prime(n, k = 32) that implements the Miller-Rabin algorithm. Important note: python comes with a built-in function pow(x, p, n) that computes modular exponentiation but in a much more efficient way than the naive formula pow(x, p, n) = (x ** p) % n. Hint: the main difficulty of this exercise is programming the loop logic correctly. Recall that the continue statement jumps to the next iteration of a for-loop, while break breaks out of the current loop.
Using the function that you implemented above, write a function random_prime(n) that generates a random probable prime number, by repeatedly sampling a number \(x \in [n,2n)\) until you find one for which the primality test succeeds. Since no even number bigger than 2 is prime, as a small optimization you should only sample odd numbers.

Using your is_prime function, write a function prime_pi_estimate(n, N=100_000) that returns an estimate of the prime-counting function \(\pi(n) = \# \{ p \mid p \le n,\; p\ prime\}\), by sampling \(N\) numbers between 1 and \(n\) and testing if they are prime. Compare the running time and accuracy of this estimate with the exact version of this function prime_pi(n) that you may have already implemented in Tutorial 1, and also compare it with the asymptotic estimate provided by the Prime Number Theorem, \(\pi(n) \sim \frac{n}{\log n}\) as \(n\to \infty\).

More fun with sampling (optional)

We expect you to write your solution to this problem in a file named sampling.py and to upload it using the form below. You can resubmit your solution as many times as you like.

Upload form is only available when connected

A torus is the famous shape of a tube that loops on itself, which can be parameterized as a circle of radius \(r\) (called the inner radius) that is extruded along another circle of radius \(R\) (called the outer radius). It looks as follows, except that the loop is complete.

The volume of a solid torus may be determined analytically by integrating the area of the inner circle (which is \(\pi r^2\)) along the outer circle, giving \((2 \pi R) (\pi r^2) = 2 \pi^2 R r^2\). Here we will explore a few ways of computing this volume without doing any symbolic integration, through sampling.

Enclosing cuboid method.

Imagine that the outer circle of the torus is centered at the origin with the outer circle on the \(xy\)-plane. Further imagine that the torus is inscribed in a tight fitting cuboid whose sides are parallel to one of the three planes, \(xy\), \(xz\), or \(yz\). The opposite corners of this cuboid are \((-(R + r), -(R + r), -r)\) and \((R + r, R + r, r)\).

Write a function torus_volume_cuboid(R, r, N=100_000) that estimates the volume of the solid torus by uniformly sampling points \((x, y, z)\) in this enclosing cuboid. Such a point lies within the torus if: \(\left(\sqrt{x^2 + y^2} - R\right)^2 + z^2 \le r^2\). Use the ratio of successful samples within the torus to the total number of samples N to estimate the volume of the solid torus.
Enclosing cylinder method.

Repeat the exercise, but with an enclosing cylinder of radius \((R + r)\) centered at the origin and with total height \(2r\), half of which is below the \(xy\)-plane. Call this function torus_volume_cylinder(R, r, N=100_000). Hint: pay close attention to how you are sampling points in the cylinder! You can either use cartesian coordinates and rejection sampling, or sample more efficiently using polar coordinates, but in that case there is a trick involved to make the selection uniform. See this article on uniform random sampling on a disc for more discussion.
Enclosing pipe method.

There is no reason to generate samples that lie inside the cylinder that passes through the hole in the torus. Use this insight to optimize the previous function into torus_volume_pipe(R, r, N=100_000) that only generates points in the hollow cylinder—i.e., a pipe—of outer radius \((R + r)\) and thickness of \(2r\). The torus should fit snugly in the solid portion of this pipe.

Hint: begin by considering how to uniformly sample a point in a circular ring (aka, an annulus) that lies between an inner radius \(r_1\) and an outer radius \(r_2\). Try to do it without resorting to rejection sampling.

In the next series of exercises, we are going to estimate \(\pi\) using random numbers in a few different ways. For each problem, we ask you to write a function that returns an approximation of \(\pi\) using the described method - the name of the function is given for each method.

Coprimality method. Given two random natural numbers drawn uniformly in \(\mathbb{N}\), what is the probability that they are coprime? (Recall that two numbers \(a\) and \(b\) are coprime if \(gcd(a,b) = 1\).) An elementary analytic derivation of the answer, \(6/\pi^2\), was given by two first-year college students to the surprise of the mathematician Henry L. Adler, who believed that this required considerable prior knowledge of number theory.

Write a function pi_coprimality(N=100_000) that uses this fact to estimate the value of \(\pi\), by generating a large number (N) of pairs of natural numbers and seeing what fraction of them are coprime. (Although it is impossible to sample uniformly from the infinite set of natural numbers, you can approximate it by uniform sampling over a finite but large range, such as numbers up to sys.maxsize.)
Evenness method. Suppose you uniformly sample two numbers \(x\) and \(y\) between 0 and 1, and consider the nearest integer to the ratio \(x/y\). One can prove that the probability that this integer is even is exactly \((5 - \pi)/4\). Use this fact to write a function pi_evenness(N=100_000) estimating the value of \(\pi\) by repeatedly sampling random pairs of numbers between 0 and 1. (You can use the function round(f) to round a floating point number f to the nearest integer.)
Chord length method. Suppose you uniformly sample two points on the circumference of the unit circle (a circle of radius \(1\)) and compute the length of the chord formed by these points. Analytically, the expected length is \(4/\pi\). Use this fact to write a function pi_chords(N=100_000) estimating the value of \(\pi\) by by generating a large number of chords and computing their average length. Hint: in polar coordinates, picking a point on the unit circle amounts to uniformly sampling an angle (in radians) from the interval \([0, 2\pi)\).

In this last series of exercises, you are going to figure out how to generate uniform samples from triangles that are drawn on the plane. We are going to represent points as pairs (x,y), and triangles as triples of such pairs (p1, p2, p3).

Write a function area3(p1, p2, p3) that returns the area of a triangle, by working out an explicit formula.
A first attempt at sampling points in the triangle is to find an enclosing rectangle. To simplify matters, we are going to make the rectangle axis aligned, meaning that its edges are parallel to either the \(x\) or the \(y\) axes.
- Write the function bounding_rect(p1, p2, p3) that returns a 2-tuple of points representing the opposite corners of an axis-aligned rectangle for the triangle made up of the points p1, p2, and p3.
- A point p is in a triangle formed by the vertices p1, p2, and p3 if and only if:
```
area3(p, p1, p2) + area3(p, p2, p3) + area3(p, p1, p3) == area3(p1, p2, p3)
```
  However, to take into account rounding errors, it is better to compare these two values using math.isclose.
- Use rejection sampling to uniformly select points in the triangle by repeatedly sampling points in the bounding rectangle and discarding them until you get one that is inside the triangle.
Implement this as the function rejection_sample3(p1, p2, p3).

To test your function, use the display() function from the provided module triangle_test.py. For example:

import triangle_test
triangle_test.display(rejection_sample3)

Hit the escape key to quit the test, and the return key to restart the test with a different random triangle. Pay attention to the distribution of the points: if they appear to clump together, there is probably something wrong with your generator.

If you want to use the triangle_test.py module on your own laptop, you will need to install the pygame library if you don’t have it already.

We can do better than rejection sampling. Say that our triangle has points \((x_1, y_1)\), \((x_2, y_2)\). and \((x_3, y_3)\). Here is the algorithm we are going to follow.

First, move the origin to the first point \((x_1, y_1)\), so that the triangle is now determined by the origin and the points \(\mathbf{u} = (x_2 - x_1, y_2 - y_1)\) and \(\mathbf{v} = (x_3 - x_1, y_3 - y_1)\).
Consider the parallellogram formed by joining up \(\mathbf{u}\) and \(\mathbf{v}\) seen as vectors. The point opposite the origin would be the vector \(\mathbf{u} + \mathbf{v}\). Moreover, any point \(\mathbf{p}\) inside the parallelogram is determined by two independent constants \(\alpha, \beta \in [0, 1]\) such that \(\mathbf{p} = \alpha \mathbf{u} + \beta \mathbf{v}\). These two constants can be selected uniformly.
Half the points sampled in this way will not be in the target triangle and will have to be mapped into it. Figure out a bijection between points in the triangle formed by \((\mathbf{0}, \mathbf{u}, \mathbf{v})\) and that formed by \((\mathbf{u}, \mathbf{v}, \mathbf{u} + \mathbf{v})\). Use this to map all generated points in the latter triangle to the former triangle.
Finally, undo the change of origin performed in the first step to map the point \(\mathbf{p}\) back to the original coordinate system.

Implement the above procedure as a function quick_sample3(p1, p2, p3) where p1, p2, and p3 are respectively \((x_1, y_1)\), \((x_2, y_2)\), and \((x_3, y_3)\). Note that this function should make exactly two calls to random.random().