This assignment has been closed on May 14, 2026.

You can still upload files, but please note that your submission will be marked as late.

You must be authenticated to submit your files

CSE 102 – Tutorial 9 – Specification, testing, and proof

General instructions
Rigorous specification and proof
Testing through combinatorial generation
Verifying an implemention of binary heaps (optional)

General instructions

Sections tagged with a green marginal line are mandatory.

Sections tagged with an orange marginal line are either more advanced or supplementary, and are therefore considered optional. They will help you deepen your understanding of the material in the course. You may skip these sections on a first pass through the tutorial, and come back to them after finishing the mandatory exercises.

Warning: this tutorial is a little different from the previous ones! The goal of this tutorial is to develop better skills in debugging and reasoning about your code on your own, and so there are no automated tests on the submission server. You should nevertheless upload your solutions to the server after verifying the correctness of your code on your own (as usual, you can upload as many times as you like).

We expect you to write your solutions to all of problems in this tutorial in a file named spec.py and to upload it using the form below.

Upload form is only available when connected

Rigorous specification and proof

In-place list reversal is the process of reversing the order of the elements of a list without creating a new list. In Python, this can be achieved by calling the built-in reverse() method:

In [1]: xs = [0, 1, 2, 3, 4, 5]
In [2]: xs.reverse()
In [3]: xs
Out[3]: [5, 4, 3, 2, 1, 0]

We will consider here two different high-level algorithms for performing in-place list reversal, as well as the corresponding low-level implementations that initially turn out to be incorrect. But before describing the algorithms, we will try to be more precise about what it means for an implementation of in-place list reversal to be correct. That is to say, we will give a rigorous specification of in-place list reversal.

Let us suppose that in-place list reversal is implemented as a function rev taking the input list as argument. For example, rev(xs) could be defined using a call to the built-in reverse() method:

def rev(xs): xs.reverse()

In general, for rev to be a correct implementation of in-place list reversal, then after calling rev(xs) it should be the case that

xs_output[i] == xs_input[n-1-i] for all i in range(n)

where xs_input is the original state of the list xs before the call, xs_output is the state of the list after the call, and n is its original length. Here is a Python function that checks this correctness criterion for any given list reversal function revfn on any given input list xs (while leaving the input list unchanged):

def check_rev(revfn, xs):
    n = len(xs)
    xs_copy = xs[:]
    revfn(xs_copy)
    return all (xs_copy[i] == xs[n-1-i] for i in range(n))

An implementation revfn of in-place list reversal is correct just in case check_rev(revfn, xs) evaluates to True for all possible input lists xs. Of course it is not possible to test this automatically, since there are infinitely many lists! So if we want to convince ourselves that an implementation revfn is correct, we have to reason carefully about how it achieves its task. On the other hand, to show that an implementation is incorrect it suffices to find a single counterexample, in the form of a list xs such that check_rev(revfn, xs) evaluates to False.

Now, the first algorithm for in-place list reversal that we consider works by successively swapping elements at opposite ends of the list while moving towards the middle, as illustrated by the following diagram:

Here is a plausible-looking, but buggy, implementation of the algorithm:

def rev1_buggy(xs):
    n = len(xs)
    for k in range(n):
        t = xs[k]
        xs[k] = xs[n-1-k]
        xs[n-1-k] = t

Try running rev1_buggy on some examples, and then give an example of a list xs such that check_rev(rev1_buggy, xs) evaluates to False. You should do so by adding the lines below to your Python file while replacing ... with an appropriate value:
```
xs1 = ...
assert check_rev(rev1_buggy, xs1) == False
```
Then run the file. The assert command causes Python to check that the boolean condition evaluates to True, which in this case will verify that xs1 is indeed a counterexample to the correctness of rev1_buggy. If the condition evaluates to False (i.e., xs1 is not an actual counterexample), then it will raise an AssertionError.

By now you may have noticed the bug that is the reason for the incorrectness of rev1_buggy. (Have you? If not, try to find it before clicking below to reveal a corrected implementation.)

(click to reveal)

def rev1(xs):
    n = len(xs)
    for k in range(n//2):  # corrected number of loop iterations
        t = xs[k]
        xs[k] = xs[n-1-k]
        xs[n-1-k] = t

You can verify that the corrected in-place list reversal function rev1 deals with your previous counterexample by adding the following line to your Python file and reloading it.

assert check_rev(rev1, xs1)

Of course, this is only a single example, not a proof of correctness on all possible inputs. To convince ourselves that rev1 is a correct implementation of in-place list reversal, we need to try to understand how it goes about reversing a list. For this kind of iterative algorithm, the key is to identify an appropriate “loop invariant”, that is, a condition that is true at the end of each iteration of the for loop, and which implies that the correctness criterion for in-place list reversal holds once all iterations have been completed.

You may have run into this idea before if you are taking CSE103. Here we are not asking you to construct a formal proof in the style of Floyd-Hoare logic, but instead to formulate the loop invariant as a checkable Python condition, and to argue informally that it is always valid and implies the correctness criterion. You should start with the following instrumented version of the corrected implementation:

def rev1_checked(xs):
    xs_input = xs[:]  # we copy the original input list to be able to refer to it in the invariant
    n = len(xs)
    for k in range(n//2):
        t = xs[k]
        xs[k] = xs[n-1-k]
        xs[n-1-k] = t
        assert True   # a formula that should hold at the end of the kth iteration of the loop

The loop invariant that we asserted above is trivial. Try replacing True by another formula Inv(k), and run rev1_checked on some examples to convince yourself that the assertion does not raise an error, i.e., that the formula is really true at the end of each loop iteration. The formula Inv(k) should at least involve the loop variable k, but can also mention the variables n, xs, and xs_input, as well as any other variables that you introduce. As an example of a valid but not so interesting invariant, you can take Inv(k) = k < n//2.
Find a loop invariant Inv(k) that is strong enough to imply the correctness of rev1. You should state the invariant as an assertion, and you should also include a comment arguing why it implies that rev1 is a correct implementation of in-place list reversal.

Hint: You need to show that when k = n//2-1 (i.e., at the end of the last iteration of the loop), then Inv(k) implies that xs[i] == xs_input[n-1-i] for all i in range(n) (i.e., that the list is reversed relative to its original state). If the invariant that you chose above does not imply this, you may need to go back and find a stronger invariant.

An alternative algorithm for reversing the elements of a list works by successively swapping elements to “bubble” up values towards the end of the list, as illustrated by the following diagram:

(This algorithm involves a lot more swapping than the first one, though it only uses swaps of adjacent elements.)

And here is a slightly buggy implementation:

def rev2_buggy(xs):
    n = len(xs)
    for m in range(n,1,-1):
        for i in range(m):
            t = xs[i]
            xs[i] = xs[i+1]
            xs[i+1] = t

Try running rev2_buggy on some examples, and then give an example of a list xs such that check_rev(rev2_buggy, xs) does not evaluate to True.
```
xs2 = ...  # define your counterexample here
```

Figure out how to fix the bug in rev2_buggy, before clicking below to reveal a corrected implementation.

(click to reveal)

def rev2(xs):
    n = len(xs)
    for m in range(n,1,-1):
        for i in range(m-1):  # corrected upper bound
            t = xs[i]
            xs[i] = xs[i+1]
            xs[i+1] = t

You will have noticed that the original bug led to check_rev(rev2_buggy, xs2) raising an IndexError exception, rather than returning False. It was implicit in our specification of in-place list reversal that the reversal function should successfully terminate on any input list, without raising an exception or going into an infinite loop. We can improve the check to test explicitly that the function does not raise an exception, as below:

def improved_check_rev(revfn, xs):
    n = len(xs)
    xs_copy = xs[:]
    try:
        revfn(xs_copy)
    except Exception as e:
        return False
    return all (xs_copy[i] == xs[n-1-i] for i in range(n))

(We could also potentially add a timeout to ensure that the call to revfn returns within a reasonable amount of time, although this requires some library support in Python.) Now we can include assertions verifying explicitly that rev2_buggy fails the improved check on the counterexample, and that rev2 passes it.

assert improved_check_rev(rev2_buggy, xs2) == False
assert improved_check_rev(rev2, xs2)

On the other hand, all bets are off if the reversal function is called with something that is not a list. For example, although calling rev2(3.14) raises an exception (TypeError: 'float' object is not subscriptable), this is not considered a specification violation, since the specification states that the input argument should be a list.

The goal of the next optional series of exercises is to explain (informally but rigorously) why rev2 is a correct implementation of in-place list reversal. As we did above for rev1, we’ll start from an instrumented version with placeholders for loop invariants:

def rev2_checked(xs):
    xs_input = xs[:]
    n = len(xs)
    for m in range(n,1,-1):
        for k in range(m-1):
            t = xs[k]
            xs[k] = xs[k+1]
            xs[k+1] = t
            assert True  # inner loop invariant
        assert True  # outer loop invariant

Replace the outer loop invariant by a non-trivial formula Outer(m), which should hold at the end of each outer loop iteration. (The variable m counts down, so this corresponds to the end of the n-mth outer loop iteration.) Run rev2_checked on some example lists to make sure that the invariant is really true.
Find an outer loop invariant that is strong enough to imply the correctness of rev2. (You do not need to prove that the invariant is correct, although you should check this on examples.)

Hint: Examine the “braid diagram” above and try to write a formula that describes the state of the diagram every time a strand reaches its final position, i.e., at the end of a series of swaps. You may want to decompose Outer(m) as the conjunction of a pair of formulas Outer1(m) and Outer2(m), which you can assert individually.
Replace the inner loop invariant by a non-trivial formula Inner(m,k), and progressively refine it until you have a valid inner loop invariant that is strong enough to imply the outer loop invariant.

Hint: You need to show that Inner(m,k) implies Outer(m) when k = m-2. You may want to use the outer loop invariant as a guide for constructing the inner loop invariant, and again you may want to decompose Inner(m,k) as a conjunction of invariants, and it may be helpful to look back at the diagram.

Testing through combinatorial generation

In Tutorial 5 you implemented various operations on binary search trees, including the tricky deletion operation bst_remove which required some careful coding to avoid bugs. Fortunately (or unfortunately!) for you, the submission server’s automated testing was able to frequently identify counterexamples on which your code gave incorrect results. We will now have a retrospective look to see how you could have found these counterexamples on your own.

Let’s begin by recalling the Node class for binary trees, which we’ve augmented with __repr__ and __eq__ methods for the purposes of debugging.

class Node:
    def __init__(self, value, left = None, right = None):
        self.value = value
        self.left = left
        self.right = right
    def __repr__(self):
        return f'Node{(self.value, self.left, self.right)}' if self.left or self.right \
            else f'Node({self.value})'
    def __eq__(self, other):
        return other is not None and \
            self.value == other.value and \
            self.left == other.left and self.right == other.right

Also for the purpose of writing checks, we add a copy method, which returns a fresh copy of a tree.

    def copy(self):
        left  = self.left.copy() if self.left else None
        right = self.right.copy() if self.right else None
        return Node(self.value, left, right)

Your first programming task will be to write a function check_bst_remove that checks whether a given implementation of BST deletion behaves correctly on a given input. But before that, we need to make more precise what exactly defines “correct” behavior.

This requires some thought because there are minor ways that algorithms for BST deletion may differ. For example, the algorithm explained in TD05 makes precise that in the case where the node \(N\) with the value to be deleted has two children you should replace its value by that of its inorder successor — but it would work just as well to take the inorder predecessor. Here are implementations of both approaches:

def bst_remove1(node, x):
    if node is None:
        return None

    if x < node.value:
        node.left = bst_remove1(node.left, x)
    elif x > node.value:
        node.right = bst_remove1(node.right, x)
    else: 
        if node.left is None:
            return node.right
        if node.right is None:
            return node.left
            
        parent, current = node, node.right
        while current.left is not None:
            parent, current = current, current.left
        if parent is node:
            parent.right = current.right
        else:
            parent.left  = current.right
        node.value = current.value
  
    return node

def bst_remove2(node, x):
    if node is None:
        return None

    if x < node.value:
        node.left = bst_remove2(node.left, x)
    elif x > node.value:
        node.right = bst_remove2(node.right, x)
    else: 
        if node.left is None:
            return node.right
        if node.right is None:
            return node.left
            
        parent, current = node, node.left
        while current.right is not None:
            parent, current = current, current.right
        if parent is node:
            parent.left = current.left
        else:
            parent.right  = current.left
        node.value = current.value
  
    return node

Both of these implementations may be considered as correct, even though they sometimes return different outputs.

Propose a precise specification of the BST deletion operation that is sufficiently general to be satisfied by both of the implementations bst_remove1 and bst_remove2.
Write a function check_bst_remove(delfn, bin, x) that determines whether delfn behaves according to this specification of BST deletion, on the given input values bin and x. The check should return True if delfn returns a correct output on the given input, and False if it returns an incorrect output or raises an exception. (You do not have to check for non-termination.)

An implementation delfn of BST deletion is correct just in case it satisfies the specification for all inputs. Again, since there are infinitely many possible inputs, no amount of testing will guarantee correctness, and the only way to know this for sure is to carefully examine the code and come up with a rigorous explanation for why it works. However, testing can often be used to quickly come up with counterexamples to correctness, which can be a guide in debugging. Two important techniques are exhaustive testing and random testing.

Under exhaustive testing, one generates all possible inputs up to a given size and uses those as test cases. Since the number of possible inputs often grows exponentially as a function of the size, this approach can only be used for testing with relatively small inputs, although it is still useful because often small inputs suffice to find shallow bugs. Under random testing, larger objects are sampled probabilistically and used as test cases. Repeated sampling is then used to try to come up with a counterexample— although again the failure to find one is no guarantee of correctness.

To test implementations of BST deletion, we need to generate pairs of a binary search tree and a value to delete. For simplicity, we will restrict to BSTs with \(n\) nodes labelled by distinct values \(0, \dots, n-1\), and assume that the value to be deleted is an integer between \(0\) and \(n\), and thus either has exactly one occurrence in the tree, or none. (These are pretty safe assumptions, since most reasonable implementations of BST deletion will not depend on the specific values stored in a tree, only making use of relative comparisons. It is possible, though, that some implementations may have bugs in situations where the same value is stored at different nodes of the tree, which would not be caught if we limit ourselves to the space of BSTs with nodes labelled by distinct values.)

Using the procedure for exhaustive generation of binary search trees seen in the lecture on combinatorial generation as well as your function check_bst_remove above, write a testing function testall_bst_remove(delfn, maxn) that checks that delfn is correct on all inputs consisting of a BST bin of size n <= maxn and a value x between 0 and n. In the case that delfn fails on any such input, it should return the input pair (bin, x). Otherwise — that is, if no counterexample to correctness was found — it should return None.

For your convenience, here is the recursive exhaustive BST generator seen in class:
```
def bsts_rec(n, x=0):
    '''generate all BSTs with n nodes and smallest label x'''
    if n == 0:
        yield None
    else:
        for k in range(n):
            for tL in bsts_rec(k, x):
                for tR in bsts_rec(n-1-k, x+k+1):
                    yield Node(x+k, tL, tR)
```
Using the procedure for random generation of binary search trees seen in the lecture on combinatorial generation as well as your function check_bst_remove above, write a testing function testrnd_bst_remove(delfn, n, k) that checks that delfn is correct on random inputs consisting of a BST bin of size exactly n and a value x between 0 and n. In the case that delfn fails on any such input over k trials, it should return the input pair (bin, x). Otherwise — that is, if no counterexample to correctness was found — it should return None.

Note that uniform random generation is not important for this exercise, since we are just looking to generate counterexamples and not estimate some statistical parameters. You can therefore use the simple function randbst_naive seen in class, which we recall below for convenience:
```
def randbst_naive(n, x=0):
    if n == 0:
        return None
    else:
        k = random.randrange(n)
        tL = randbst_naive(k, x)
        tR = randbst_naive(n-1-k, x+k+1)
        return Node(x+k, tL, tR)
```

Now it’s time to test your testers!

The file buggy.py contains several buggy implementations of BST deletion. By calling testall_bst_remove or testrnd_bst_remove with sufficiently large values, you should be able to find correctness counterexamples for all of them. For example, it is particulary easy to find a counterexample for buggy.del0:

def del0(node, x):
    if node is None:
        return None

In [4]: testall_bst_remove(buggy.del0, 1)
Out[4]: (Node(0), 1)

Once you have found these counterexamples, record them as assertions in your Python file:

import buggy

assert not check_bst_remove(buggy.del0, Node(0), 1)
# assert not check_bst_remove(buggy.del1, ...)
# assert not check_bst_remove(buggy.del2, ...)
# assert not check_bst_remove(buggy.del3, ...)
# assert not check_bst_remove(buggy.del4, ...)
# assert not check_bst_remove(buggy.del5, ...)
# assert not check_bst_remove(buggy.del6, ...)
# assert not check_bst_remove(buggy.del7, ...)

Verifying an implemention of binary heaps (optional)

Go back and review the definition of binary heaps, as seen in lecture. Here again is the indexing scheme for nodes that can be used to represent binary heaps efficiently as arrays:

def parent(k : int) -> int: return (k-1) >> 1
def left(k : int) -> int:   return (1 + (k << 1))
def right(k : int)-> int:   return (2 + (k << 1))

We limit ourselves to heaps storing string values with associated integer weights, which we represent as lists of string-integer pairs, i.e., values of type list[tuple[str,int]]. The function below checks that such a list satisfies the heap property that the weight of each node is bounded from above by the weights of each of its children, or equivalently, bounded from below by the weight of its parent.

def is_heap(heap : list[tuple[str,int]]) -> bool:
    for i in range(1,len(heap)):
        if heap[i][1] < heap[parent(i)][1]:
            return False
    return True

Our overall goal is to correctly implement the insertion (heappush) and removal (heappop) operations,

def heappush(H : list[tuple[str,int]], a : str, w : int) -> None:
    pass
    
def heappop(H : list[tuple[str,int]]) -> tuple[str,int]:
    pass

as explained during the lecture and illustrated below.

heappush:

heappop:

Using the testing routines described during class to help you with debugging, implement heappush and heappop. For convenience we copy all of the testing code below, which you are free to modify.

class InvalidHeap(Exception):
    def __init__(self, H):
        self.H = H

def heap_sort(xs : list[int]) -> list[int]:
    """sorts a list of integers using a binary heap"""
    H : list[tuple[str,int]] = []
    for i in range(len(xs)):
        heappush(H, str(i), xs[i])
        if not is_heap(H): raise InvalidHeap(H)
    output = []
    while H:
        output.append(heappop(H)[1])
        if not is_heap(H): raise InvalidHeap(H)
    return output

def is_sorted(l):
    """returns True if the input list is sorted, and False otherwise."""
    for i in range(len(l)-1):
        if l[i+1] < l[i]:
            return False
    return True

def randperm(n):
    """returns a permutation of 1..n chosen uniformly at random"""
    if n == 0: return []
    else:
        p = randperm(n-1)
        i = random.randrange(n)
        return p[:i] + [n] + p[i:]

def test_heapsort(n, k=10000):
    """tests correctness of heappush and heappop by calling heap_sort on random permutations"""
    for _ in range(k):
        xs = randperm(n)
        assert is_sorted(heap_sort(xs))

(Hard!) Once you are sufficiently convinced that your implementation of heappush and heappop is correct, prove it by writing down a rigorous specification of both functions and identifying the necessary loop invariants. (You can also start directly from the implementations shown in the last two slides of Lecture 10, but writing and debugging your own code is a good exercise, and there is definitely room for more efficient implementations.)