You must be authenticated to submit your files

CSE 102 – Tutorial 8 – Debugging through rigorous specification, informal proof, and testing

General instructions
Rigorous specification and informal proof
Testing through exhaustive enumeration

General instructions

Sections tagged with a green marginal line are mandatory.

Sections tagged with an orange marginal line are either more advanced or supplementary, and are therefore considered optional. They will help you deepen your understanding of the material in the course. You may skip these sections on a first pass through the tutorial, and come back to them after finishing the mandatory exercises.

Warning: this tutorial is a little different from the previous ones! The goal of this tutorial is to develop better skills in debugging and reasoning about your code on your own, and so there are no automated tests on the submission server. Still, you should upload your solutions to the submission server using the form below, so that it verifies that your Python file does not contain syntax errors, and so that the instructors have easy access to your code.

We expect you to write your solutions to all of problems in this tutorial in a file named spec.py and to upload it using the form below. You can resubmit as many times as you like on the submission server, and when you are finished you should make a Moodle submission before the deadline.

Upload form is only available when connected

Rigorous specification and informal proof

In-place list reversal is the process of reversing the order of the elements of a list without creating a new list. In Python, this can be achieved by calling the built-in reverse() method:

In [1]: xs = [0, 1, 2, 3, 4, 5]
In [2]: xs.reverse()
In [3]: xs
Out[3]: [5, 4, 3, 2, 1, 0]

We will consider here two different high-level algorithms for performing in-place list reversal, as well as the corresponding low-level implementations that initially turn out to be incorrect. But before describing the algorithms, we will try to be more precise about what it means for an implementation of in-place list reversal to be correct. That is to say, we will give a rigorous specification of in-place list reversal.

Let us suppose that in-place list reversal is implemented as a function rev taking the input list as argument. For example, rev(xs) could be defined using a call to the built-in reverse() method:

def rev(xs): xs.reverse()

In general, for rev to be a correct implementation of in-place list reversal, then after calling rev(xs) it should be the case that

xs_output[i] == xs_input[n-1-i] for all i in range(n)

where xs_input is the original state of the list xs before the call, xs_output is the state of the list after the call, and n is its original length. Here is a Python function that checks this correctness criterion for any given list reversal function revfn on any given input list xs (while leaving the input list unchanged):

def check_rev(revfn, xs):
    n = len(xs)
    xs_copy = xs[:]
    revfn(xs_copy)
    return all (xs_copy[i] == xs[n-1-i] for i in range(n))

An implementation revfn of in-place list reversal is correct just in case check_rev(revfn, xs) evaluates to True for all possible input lists xs. Of course it is not possible to test this automatically, since there are infinitely many lists! So if we want to convince ourselves that an implementation revfn is correct, we have to reason carefully about how it achieves its task. On the other hand, to show that an implementation is incorrect it suffices to find a single counterexample, in the form of a list xs such that check_rev(revfn, xs) evaluates to False.

Now, the first algorithm for in-place list reversal that we consider works by successively swapping elements at opposite ends of the list while moving towards the middle, as illustrated by the following diagram:

Here is a plausible-looking, but buggy, implementation of the algorithm:

def rev1_buggy(xs):
    n = len(xs)
    for k in range(n):
        t = xs[k]
        xs[k] = xs[n-1-k]
        xs[n-1-k] = t

Try running rev1_buggy on some examples, and then give an example of a list xs such that check_rev(rev1_buggy, xs) evaluates to False. You should do so by adding the lines below to your Python file while replacing ... with an appropriate value:
```
xs1 = ...
assert check_rev(rev1_buggy, xs1) == False
```
Then run the file. The assert command causes Python to check that the boolean condition evaluates to True, which in this case will verify that xs1 is indeed a counterexample to the correctness of rev1_buggy. If the condition evaluates to False (i.e., xs1 is not an actual counterexample), then it will raise an AssertionError.

By now you may have noticed the bug that is the reason for the incorrectness of rev1_buggy. (Have you? If not, try to find it before clicking below to reveal a corrected implementation.)

(click to reveal)

def rev1(xs):
    n = len(xs)
    for k in range(n//2):  # corrected number of loop iterations
        t = xs[k]
        xs[k] = xs[n-1-k]
        xs[n-1-k] = t

You can verify that the corrected in-place list reversal function rev1 deals with your previous counterexample by adding the following line to your Python file and reloading it.

assert check_rev(rev1, xs1)

Of course, this is only a single example, not a proof of correctness on all possible inputs. To convince ourselves that rev1 is a correct implementation of in-place list reversal, we need to try to understand how it goes about reversing a list. For this kind of iterative algorithm, the key is to identify an appropriate “loop invariant”, that is, a condition that is true at the end of each iteration of the for loop, and which implies that the correctness criterion for in-place list reversal holds once all iterations have been completed.

You may have run into this idea before if you are taking CSE103. Here we are not asking you to construct a formal proof in the style of Floyd-Hoare logic, but instead to formulate the loop invariant as a checkable Python condition, and to argue informally that it is always valid and implies the correctness criterion. You should start with the following instrumented version of the corrected implementation:

def rev1_checked(xs):
    xs_input = xs[:]  # we copy the original input list to be able to refer to it in the invariant
    n = len(xs)
    for k in range(n//2):
        t = xs[k]
        xs[k] = xs[n-1-k]
        xs[n-1-k] = t
        assert True   # a formula that should hold at the end of the kth iteration of the loop

The loop invariant that we asserted above is trivial. Try replacing True by another formula Inv(k), and run rev1_checked on some examples to convince yourself that the assertion does not raise an error, i.e., that the formula is really true at the end of each loop iteration. The formula Inv(k) should at least involve the loop variable k, but can also mention the variables n, xs, and xs_input, as well as any other variables that you introduce. As an example of a valid but not so interesting invariant, you can take Inv(k) = k < n//2.
Find a loop invariant Inv(k) that is strong enough to imply the correctness of rev1. You should state the invariant as an assertion, and you should also include a comment arguing why it implies that rev1 is a correct implementation of in-place list reversal.

Hint: You need to show that when k = n//2-1 (i.e., at the end of the last iteration of the loop), then Inv(k) implies that xs[i] == xs_input[n-1-i] for all i in range(n) (i.e., that the list is reversed relative to its original state). If the invariant that you chose above does not imply this, you may need to go back and find a stronger invariant.

An alternative algorithm for reversing the elements of a list works by successively swapping elements to “bubble” up values towards the end of the list, as illustrated by the following diagram:

(This algorithm involves a lot more swapping than the first one, though it only uses swaps of adjacent elements.)

And here is a slightly buggy implementation:

def rev2_buggy(xs):
    n = len(xs)
    for m in range(n,1,-1):
        for i in range(m):
            t = xs[i]
            xs[i] = xs[i+1]
            xs[i+1] = t

Try running rev2_buggy on some examples, and then give an example of a list xs such that check_rev(rev2_buggy, xs) does not evaluate to True.
```
xs2 = ...  # define your counterexample here
```

Figure out how to fix the bug in rev2_buggy, before clicking below to reveal a corrected implementation.

(click to reveal)

def rev2(xs):
    n = len(xs)
    for m in range(n,1,-1):
        for i in range(m-1):  # corrected upper bound
            t = xs[i]
            xs[i] = xs[i+1]
            xs[i+1] = t

You will have noticed that the original bug led to check_rev(rev2_buggy, xs2) raising an IndexError exception, rather than returning False. It was implicit in our specification of in-place list reversal that the reversal function should successfully terminate on any input list, without raising an exception or going into an infinite loop. We can improve the check to test explicitly that the function does not raise an exception, as below:

def improved_check_rev(revfn, xs):
    n = len(xs)
    xs_copy = xs[:]
    try:
        revfn(xs_copy)
    except Exception as e:
        return False
    return all (xs_copy[i] == xs[n-1-i] for i in range(n))

(We could also potentially add a timeout to ensure that the call to revfn returns within a reasonable amount of time, although this requires some library support in Python.) Now we can include assertions verifying explicitly that rev2_buggy fails the improved check on the counterexample, and that rev2 passes it.

assert improved_check_rev(rev2_buggy, xs2) == False
assert improved_check_rev(rev2, xs2)

On the other hand, all bets are off if the reversal function is called with something that is not a list. For example, although calling rev2(3.14) raises an exception (TypeError: 'float' object is not subscriptable), this is not considered a specification violation, since the specification states that the input argument should be a list.

The goal of the next optional series of exercises is to explain (informally but rigorously) why rev2 is a correct implementation of in-place list reversal. As we did above for rev1, we’ll start from an instrumented version with placeholders for loop invariants:

def rev2_checked(xs):
    xs_input = xs[:]
    n = len(xs)
    for m in range(n,1,-1):
        for k in range(m-1):
            t = xs[k]
            xs[k] = xs[k+1]
            xs[k+1] = t
            assert True  # inner loop invariant
        assert True  # outer loop invariant

Replace the outer loop invariant by a non-trivial formula Outer(m), which should hold at the end of each outer loop iteration. (The variable m counts down, so this corresponds to the end of the n-mth outer loop iteration.) Run rev2_checked on some example lists to make sure that the invariant is really true.
Find an outer loop invariant that is strong enough to imply the correctness of rev2. (You do not need to prove that the invariant is correct, although you should check this on examples.)

Hint: Examine the “braid diagram” above and try to write a formula that describes the state of the diagram every time a strand reaches its final position, i.e., at the end of a series of swaps. You may want to decompose Outer(m) as the conjunction of a pair of formulas Outer1(m) and Outer2(m), which you can assert individually.
Replace the inner loop invariant by a non-trivial formula Inner(m,k), and progressively refine it until you have a valid inner loop invariant that is strong enough to imply the outer loop invariant.

Hint: You need to show that Inner(m,k) implies Outer(m) when k = m-2. You may want to use the outer loop invariant as a guide for constructing the inner loop invariant, and again you may want to decompose Inner(m,k) as a conjunction of invariants, and it may be helpful to look back at the diagram.

Testing through exhaustive enumeration

In Tutorial 5 you implemented various operations on binary search trees, including the tricky deletion operation bst_remove which required some careful coding to avoid bugs. Fortunately (or unfortunately!) for you, the submission server’s automated testing was able to frequently identify examples on which your code gave incorrect results. We will now have a look retrospectively to see how you could have done the job of the server on your own.

Let’s begin by recalling the Node class for binary trees, which we’ve augmented with __repr__ and __eq__ methods for the purposes of debugging.

class Node:
    def __init__(self, value, left = None, right = None):
        self.value = value
        self.left = left
        self.right = right
    def __repr__(self):
        return f'Node{(self.value, self.left, self.right)}' if self.left or self.right \
            else f'Node({self.value})'
    def __eq__(self, other):
        return other is not None and \
            self.value == other.value and \
            self.left == other.left and self.right == other.right

Also for the purpose of writing checks, we add a copy method, which returns a fresh copy of a tree.

    def copy(self):
        left  = self.left.copy() if self.left else None
        right = self.right.copy() if self.right else None
        return Node(self.value, left, right)

Your first programming task will be to write a function check_bst_remove that checks whether a given implementation of BST deletion behaves correctly on a given input. But before that, we need to make more precise what exactly defines “correct” behavior. This requires some thought and discussion, because there are minor ways that algorithms for BST deletion may differ. For example, the algorithm described in Tutorial 5 says that in the case where the node \(N\) with the value to be deleted has two children you should replace its value by that of its inorder successor — but it would work just as well to take the inorder predecessor. Here are implementations of both approaches:

def bst_remove1(node, x):
    if node is None:
        return None

    if x < node.value:
        node.left = bst_remove1(node.left, x)
    elif x > node.value:
        node.right = bst_remove1(node.right, x)
    else: 
        if node.left is None:
            return node.right
        if node.right is None:
            return node.left
            
        parent, current = node, node.right
        while current.left is not None:
            parent, current = current, current.left
        if parent is node:
            parent.right = current.right
        else:
            parent.left  = current.right
        node.value = current.value
  
    return node

def bst_remove2(node, x):
    if node is None:
        return None

    if x < node.value:
        node.left = bst_remove2(node.left, x)
    elif x > node.value:
        node.right = bst_remove2(node.right, x)
    else: 
        if node.left is None:
            return node.right
        if node.right is None:
            return node.left
            
        parent, current = node, node.left
        while current.right is not None:
            parent, current = current, current.right
        if parent is node:
            parent.left = current.left
        else:
            parent.right  = current.left
        node.value = current.value
  
    return node

Both of these implementations may be considered as “correct”, even though they sometimes return different outputs.

Propose a precise specification of the BST deletion operation that is sufficiently general to be satisfied by both of the implementations bst_remove1 and bst_remove2.
Write a function check_bst_remove(delfn, node, x) that determines whether delfn behaves according to this specification of BST deletion, on the given input values node and x. The check should return True if delfn returns a correct output on the given input, and False if it returns an incorrect output or raises an exception. (You do not have to check for non-termination.)

An implementation delfn of BST deletion is correct just in case it satisfies the specification for all inputs. Again, since there are infinitely many possible inputs, no amount of testing will guarantee correctness, and the only way to know this for sure is to carefully examine the code and come up with a rigorous explanation for why it works. However, testing can often be used to quickly come up with counterexamples to correctness, which can be a guide in debugging.

Two important testing techniques are exhaustive enumeration and random generation. Under exhaustive enumeration, one generates all possible inputs up to a given size and uses those as test cases. Since the number of possible inputs often grows exponentially as a function of the size, this approach can only be used for testing with relatively small inputs, although it is still useful because many times small inputs suffice to find shallow bugs. Under random generation, larger objects are sampled probabilistically and used as test cases. Repeated sampling is then used to either come up with a counterexample or provide some evidence for correctness — although again it is no guarantee.

We will only consider exhaustive enumeration in this tutorial — though the submission server uses both exhaustive enumeration and random generation, as well as testing on specific examples.

To test implementations of BST deletion, we need to generate pairs of a binary search tree and a value to delete. For simplicity, we will restrict to BSTs with \(n\) nodes labelled by distinct values \(0, \dots, n-1\), and assume that the value to be deleted is an integer between \(0\) and \(n\) (thus either has exactly one occurrence in the tree, or none). These are pretty safe assumptions, since most reasonable implementations of BST deletion will not depend on the specific values stored in a tree, only making use of relative comparisons. (It is possible, though, that some implementations may have bugs in situations where the same value is stored at different nodes of the tree, which would not be caught if we limit ourselves to the space of BSTs with nodes labelled by distinct values.)

Write a function all_bst_list(n) that takes as input a non-negative integer n, and returns a list of all possible BSTs with n nodes labelled by distinct values between 0 and n-1. For example, all_bst_list(3) should return the list
```
[Node(0, None, Node(1, None, Node(2))), Node(0, None, Node(2, Node(1), None)), Node(1, Node(0), Node(2)), Node(2, Node(0, None, Node(1)), None), Node(2, Node(1, Node(0), None), None)]
```
(or some permutation thereof), corresponding to the five BSTs with three nodes:

Hint: using recursion, define a more general function all_bst_list(n, x=0) which returns a list of all BSTs with n nodes labelled between x and x+n-1. To set up the recursion, use the fact that any binary tree of size n>0 has a left child of size k and a right child of size n-1-k, for some k between 0 and n-1. The nodes of the left child should be labelled x , ... , x+k-1, and the nodes of the right child x+k+1 , ... , x+n-1.

In Python, this kind of exhaustive enumeration is better performed using generators rather than lists, since it allows for objects to be used as soon as they are generated, and doesn’t require the complete list of objects to be stored in memory. Rewrite all_bst_list as a generator all_bst_gen, which successively yields all BSTs of size n.

Using your solutions to the previous questions, write a testing function test_bst_remove(delfn, maxn) that checks that delfn is correct on all inputs consisting of a BST node of size n <= maxn and a value x between 0 and n. In the case that delfn fails on any such input, it should return the input pair (node, x). Otherwise — that is, if no counterexample to correctness was found — it should return None.

Now it’s time to test your testers!

The file buggy.py contains several buggy implementations of BST deletion. By calling test_bst_remove with a sufficiently large maxn, you should be able to find correctness counterexamples for all of them. For example, it is particulary easy to find a counterexample for buggy.del0:

In [4]: test_bst_remove(buggy.del0, 3)
Out[4]: (Node(0), 0)

Once you have found these counterexamples, record them as assertions in your Python file:

import buggy

assert check_bst_remove(buggy.del0, Node(0), 0) == False
# assert check_bst_remove(buggy.del1, ...) == False
# assert check_bst_remove(buggy.del2, ...) == False
# assert check_bst_remove(buggy.del3, ...) == False
# assert check_bst_remove(buggy.del4, ...) == False
# assert check_bst_remove(buggy.del5, ...) == False