CSE 102 – Tutorial 9 – Specification, testing, and proof
General instructions
Sections tagged with a green marginal line are mandatory.
Sections tagged with an orange marginal line are either more advanced or supplementary, and are therefore considered optional. They will help you deepen your understanding of the material in the course. You may skip these sections on a first pass through the tutorial, and come back to them after finishing the mandatory exercises.
Warning: this tutorial is a little different from the previous ones! The goal of this tutorial is to develop better skills in debugging and reasoning about your code on your own, and so there are no automated tests on the submission server. You should nevertheless upload your solutions to the server after verifying the correctness of your code on your own (as usual, you can upload as many times as you like).
We expect you to write your solutions to all of problems in this
tutorial in a file named spec.py and to upload it using the
form below.
Rigorous specification and proof
In-place list reversal is the process of reversing the order
of the elements of a list without creating a new list. In Python, this
can be achieved by calling the built-in reverse()
method:
In [1]: xs = [0, 1, 2, 3, 4, 5]
In [2]: xs.reverse()
In [3]: xs
Out[3]: [5, 4, 3, 2, 1, 0]We will consider here two different high-level algorithms for performing in-place list reversal, as well as the corresponding low-level implementations that initially turn out to be incorrect. But before describing the algorithms, we will try to be more precise about what it means for an implementation of in-place list reversal to be correct. That is to say, we will give a rigorous specification of in-place list reversal.
Let us suppose that in-place list reversal is implemented as a
function rev taking the input list as argument. For
example, rev(xs) could be defined using a call to the
built-in reverse() method:
def rev(xs): xs.reverse()In general, for rev to be a correct implementation of
in-place list reversal, then after calling rev(xs) it
should be the case that
xs_output[i] == xs_input[n-1-i] for all i in
range(n)
where xs_input is the original state of the list
xs before the call, xs_output is the state of
the list after the call, and n is its original length. Here
is a Python function that checks this correctness criterion for any
given list reversal function revfn on any given input list
xs (while leaving the input list unchanged):
def check_rev(revfn, xs):
n = len(xs)
xs_copy = xs[:]
revfn(xs_copy)
return all (xs_copy[i] == xs[n-1-i] for i in range(n))An implementation revfn of in-place list reversal is
correct just in case check_rev(revfn, xs) evaluates to
True for all possible input lists xs.
Of course it is not possible to test this automatically, since there are
infinitely many lists! So if we want to convince ourselves that an
implementation revfn is correct, we have to reason
carefully about how it achieves its task. On the other hand, to show
that an implementation is incorrect it suffices to find a
single counterexample, in the form of a list xs such that
check_rev(revfn, xs) evaluates to False.
Now, the first algorithm for in-place list reversal that we consider works by successively swapping elements at opposite ends of the list while moving towards the middle, as illustrated by the following diagram:
Here is a plausible-looking, but buggy, implementation of the algorithm:
def rev1_buggy(xs):
n = len(xs)
for k in range(n):
t = xs[k]
xs[k] = xs[n-1-k]
xs[n-1-k] = tTry running
rev1_buggyon some examples, and then give an example of a listxssuch thatcheck_rev(rev1_buggy, xs)evaluates toFalse. You should do so by adding the lines below to your Python file while replacing...with an appropriate value:xs1 = ... assert check_rev(rev1_buggy, xs1) == FalseThen run the file. The
assertcommand causes Python to check that the boolean condition evaluates toTrue, which in this case will verify thatxs1is indeed a counterexample to the correctness ofrev1_buggy. If the condition evaluates toFalse(i.e.,xs1is not an actual counterexample), then it will raise anAssertionError.
By now you may have noticed the bug that is the reason for the
incorrectness of rev1_buggy. (Have you? If not, try to find
it before clicking below to reveal a corrected implementation.)
(click to reveal)
def rev1(xs):
n = len(xs)
for k in range(n//2): # corrected number of loop iterations
t = xs[k]
xs[k] = xs[n-1-k]
xs[n-1-k] = tYou can verify that the corrected in-place list reversal function
rev1 deals with your previous counterexample by adding the
following line to your Python file and reloading it.
assert check_rev(rev1, xs1)Of course, this is only a single example, not a proof of correctness
on all possible inputs. To convince ourselves that rev1 is
a correct implementation of in-place list reversal, we need to try to
understand how it goes about reversing a list. For this kind of
iterative algorithm, the key is to identify an appropriate “loop
invariant”, that is, a condition that is true at the end of each
iteration of the for loop, and which implies that the
correctness criterion for in-place list reversal holds once all
iterations have been completed.
You may have run into this idea before if you are taking CSE103. Here we are not asking you to construct a formal proof in the style of Floyd-Hoare logic, but instead to formulate the loop invariant as a checkable Python condition, and to argue informally that it is always valid and implies the correctness criterion. You should start with the following instrumented version of the corrected implementation:
def rev1_checked(xs):
xs_input = xs[:] # we copy the original input list to be able to refer to it in the invariant
n = len(xs)
for k in range(n//2):
t = xs[k]
xs[k] = xs[n-1-k]
xs[n-1-k] = t
assert True # a formula that should hold at the end of the kth iteration of the loopThe loop invariant that we asserted above is trivial. Try replacing
Trueby another formulaInv(k), and runrev1_checkedon some examples to convince yourself that the assertion does not raise an error, i.e., that the formula is really true at the end of each loop iteration. The formulaInv(k)should at least involve the loop variablek, but can also mention the variablesn,xs, andxs_input, as well as any other variables that you introduce. As an example of a valid but not so interesting invariant, you can takeInv(k) = k < n//2.Find a loop invariant
Inv(k)that is strong enough to imply the correctness ofrev1. You should state the invariant as an assertion, and you should also include a comment arguing why it implies thatrev1is a correct implementation of in-place list reversal.Hint: You need to show that when
k = n//2-1(i.e., at the end of the last iteration of the loop), thenInv(k)implies thatxs[i] == xs_input[n-1-i]for alliinrange(n)(i.e., that the list is reversed relative to its original state). If the invariant that you chose above does not imply this, you may need to go back and find a stronger invariant.
An alternative algorithm for reversing the elements of a list works by successively swapping elements to “bubble” up values towards the end of the list, as illustrated by the following diagram:
(This algorithm involves a lot more swapping than the first one, though it only uses swaps of adjacent elements.)
And here is a slightly buggy implementation:
def rev2_buggy(xs):
n = len(xs)
for m in range(n,1,-1):
for i in range(m):
t = xs[i]
xs[i] = xs[i+1]
xs[i+1] = tTry running
rev2_buggyon some examples, and then give an example of a listxssuch thatcheck_rev(rev2_buggy, xs)does not evaluate toTrue.xs2 = ... # define your counterexample here
Figure out how to fix the bug in rev2_buggy, before
clicking below to reveal a corrected implementation.
(click to reveal)
def rev2(xs):
n = len(xs)
for m in range(n,1,-1):
for i in range(m-1): # corrected upper bound
t = xs[i]
xs[i] = xs[i+1]
xs[i+1] = tYou will have noticed that the original bug led to
check_rev(rev2_buggy, xs2) raising an
IndexError exception, rather than returning
False. It was implicit in our specification of in-place
list reversal that the reversal function should successfully terminate
on any input list, without raising an exception or going into an
infinite loop. We can improve the check to test explicitly that the
function does not raise an exception, as below:
def improved_check_rev(revfn, xs):
n = len(xs)
xs_copy = xs[:]
try:
revfn(xs_copy)
except Exception as e:
return False
return all (xs_copy[i] == xs[n-1-i] for i in range(n))(We could also potentially add a timeout to ensure that the call to
revfn returns within a reasonable amount of time, although
this requires some library support in Python.) Now we can include
assertions verifying explicitly that rev2_buggy fails the
improved check on the counterexample, and that rev2 passes
it.
assert improved_check_rev(rev2_buggy, xs2) == False
assert improved_check_rev(rev2, xs2)On the other hand, all bets are off if the reversal function is
called with something that is not a list. For example, although calling
rev2(3.14) raises an exception
(TypeError: 'float' object is not subscriptable), this is
not considered a specification violation, since the specification states
that the input argument should be a list.
The goal of the next optional series of exercises is to explain
(informally but rigorously) why rev2 is a correct
implementation of in-place list reversal. As we did above for
rev1, we’ll start from an instrumented version with
placeholders for loop invariants:
def rev2_checked(xs):
xs_input = xs[:]
n = len(xs)
for m in range(n,1,-1):
for k in range(m-1):
t = xs[k]
xs[k] = xs[k+1]
xs[k+1] = t
assert True # inner loop invariant
assert True # outer loop invariantReplace the outer loop invariant by a non-trivial formula
Outer(m), which should hold at the end of each outer loop iteration. (The variablemcounts down, so this corresponds to the end of then-mth outer loop iteration.) Runrev2_checkedon some example lists to make sure that the invariant is really true.Find an outer loop invariant that is strong enough to imply the correctness of
rev2. (You do not need to prove that the invariant is correct, although you should check this on examples.)Hint: Examine the “braid diagram” above and try to write a formula that describes the state of the diagram every time a strand reaches its final position, i.e., at the end of a series of swaps. You may want to decompose
Outer(m)as the conjunction of a pair of formulasOuter1(m)andOuter2(m), which you can assert individually.Replace the inner loop invariant by a non-trivial formula
Inner(m,k), and progressively refine it until you have a valid inner loop invariant that is strong enough to imply the outer loop invariant.Hint: You need to show that
Inner(m,k)impliesOuter(m)whenk = m-2. You may want to use the outer loop invariant as a guide for constructing the inner loop invariant, and again you may want to decomposeInner(m,k)as a conjunction of invariants, and it may be helpful to look back at the diagram.
Testing through combinatorial generation
In Tutorial 5 you implemented various operations on binary search
trees, including the tricky deletion operation bst_remove
which required some careful coding to avoid bugs. Fortunately (or
unfortunately!) for you, the submission server’s automated testing was
able to frequently identify counterexamples on which your code gave
incorrect results. We will now have a retrospective look to see how you
could have found these counterexamples on your own.
Let’s begin by recalling the Node class for binary
trees, which we’ve augmented with __repr__ and
__eq__ methods for the purposes of debugging.
class Node:
def __init__(self, value, left = None, right = None):
self.value = value
self.left = left
self.right = right
def __repr__(self):
return f'Node{(self.value, self.left, self.right)}' if self.left or self.right \
else f'Node({self.value})'
def __eq__(self, other):
return other is not None and \
self.value == other.value and \
self.left == other.left and self.right == other.rightAlso for the purpose of writing checks, we add a copy
method, which returns a fresh copy of a tree.
def copy(self):
left = self.left.copy() if self.left else None
right = self.right.copy() if self.right else None
return Node(self.value, left, right)Your first programming task will be to write a function
check_bst_remove that checks whether a given implementation
of BST deletion behaves correctly on a given input. But before that, we
need to make more precise what exactly defines “correct” behavior.
This requires some thought because there are minor ways that algorithms for BST deletion may differ. For example, the algorithm explained in TD05 makes precise that in the case where the node \(N\) with the value to be deleted has two children you should replace its value by that of its inorder successor — but it would work just as well to take the inorder predecessor. Here are implementations of both approaches:
def bst_remove1(node, x):
if node is None:
return None
if x < node.value:
node.left = bst_remove1(node.left, x)
elif x > node.value:
node.right = bst_remove1(node.right, x)
else:
if node.left is None:
return node.right
if node.right is None:
return node.left
parent, current = node, node.right
while current.left is not None:
parent, current = current, current.left
if parent is node:
parent.right = current.right
else:
parent.left = current.right
node.value = current.value
return node def bst_remove2(node, x):
if node is None:
return None
if x < node.value:
node.left = bst_remove2(node.left, x)
elif x > node.value:
node.right = bst_remove2(node.right, x)
else:
if node.left is None:
return node.right
if node.right is None:
return node.left
parent, current = node, node.left
while current.right is not None:
parent, current = current, current.right
if parent is node:
parent.left = current.left
else:
parent.right = current.left
node.value = current.value
return nodeBoth of these implementations may be considered as correct, even though they sometimes return different outputs.
Propose a precise specification of the BST deletion operation that is sufficiently general to be satisfied by both of the implementations
bst_remove1andbst_remove2.Write a function
check_bst_remove(delfn, bin, x)that determines whetherdelfnbehaves according to this specification of BST deletion, on the given input valuesbinandx. The check should returnTrueifdelfnreturns a correct output on the given input, andFalseif it returns an incorrect output or raises an exception. (You do not have to check for non-termination.)
An implementation delfn of BST deletion is correct just
in case it satisfies the specification for all inputs. Again,
since there are infinitely many possible inputs, no amount of testing
will guarantee correctness, and the only way to know this for sure is to
carefully examine the code and come up with a rigorous explanation for
why it works. However, testing can often be used to quickly come up with
counterexamples to correctness, which can be a guide in debugging. Two
important techniques are exhaustive testing and
random testing.
Under exhaustive testing, one generates all possible inputs up to a given size and uses those as test cases. Since the number of possible inputs often grows exponentially as a function of the size, this approach can only be used for testing with relatively small inputs, although it is still useful because often small inputs suffice to find shallow bugs. Under random testing, larger objects are sampled probabilistically and used as test cases. Repeated sampling is then used to try to come up with a counterexample— although again the failure to find one is no guarantee of correctness.
To test implementations of BST deletion, we need to generate pairs of a binary search tree and a value to delete. For simplicity, we will restrict to BSTs with \(n\) nodes labelled by distinct values \(0, \dots, n-1\), and assume that the value to be deleted is an integer between \(0\) and \(n\), and thus either has exactly one occurrence in the tree, or none. (These are pretty safe assumptions, since most reasonable implementations of BST deletion will not depend on the specific values stored in a tree, only making use of relative comparisons. It is possible, though, that some implementations may have bugs in situations where the same value is stored at different nodes of the tree, which would not be caught if we limit ourselves to the space of BSTs with nodes labelled by distinct values.)
Using the procedure for exhaustive generation of binary search trees seen in the lecture on combinatorial generation as well as your function
check_bst_removeabove, write a testing functiontestall_bst_remove(delfn, maxn)that checks thatdelfnis correct on all inputs consisting of a BSTbinof sizen <= maxnand a valuexbetween0andn. In the case thatdelfnfails on any such input, it should return the input pair(bin, x). Otherwise — that is, if no counterexample to correctness was found — it should returnNone.For your convenience, here is the recursive exhaustive BST generator seen in class:
def bsts_rec(n, x=0): '''generate all BSTs with n nodes and smallest label x''' if n == 0: yield None else: for k in range(n): for tL in bsts_rec(k, x): for tR in bsts_rec(n-1-k, x+k+1): yield Node(x+k, tL, tR)Using the procedure for random generation of binary search trees seen in the lecture on combinatorial generation as well as your function
check_bst_removeabove, write a testing functiontestrnd_bst_remove(delfn, n, k)that checks thatdelfnis correct on random inputs consisting of a BSTbinof size exactlynand a valuexbetween0andn. In the case thatdelfnfails on any such input overktrials, it should return the input pair(bin, x). Otherwise — that is, if no counterexample to correctness was found — it should returnNone.Note that uniform random generation is not important for this exercise, since we are just looking to generate counterexamples and not estimate some statistical parameters. You can therefore use the simple function
randbst_naiveseen in class, which we recall below for convenience:def randbst_naive(n, x=0): if n == 0: return None else: k = random.randrange(n) tL = randbst_naive(k, x) tR = randbst_naive(n-1-k, x+k+1) return Node(x+k, tL, tR)
Now it’s time to test your testers!
The file buggy.py contains several
buggy implementations of BST deletion. By calling
testall_bst_remove or testrnd_bst_remove with
sufficiently large values, you should be able to find correctness
counterexamples for all of them. For example, it is particulary easy to
find a counterexample for buggy.del0:
def del0(node, x):
if node is None:
return NoneIn [4]: testall_bst_remove(buggy.del0, 1)
Out[4]: (Node(0), 1)Once you have found these counterexamples, record them as assertions in your Python file:
import buggy assert not check_bst_remove(buggy.del0, Node(0), 1) # assert not check_bst_remove(buggy.del1, ...) # assert not check_bst_remove(buggy.del2, ...) # assert not check_bst_remove(buggy.del3, ...) # assert not check_bst_remove(buggy.del4, ...) # assert not check_bst_remove(buggy.del5, ...) # assert not check_bst_remove(buggy.del6, ...) # assert not check_bst_remove(buggy.del7, ...)
Verifying an implemention of binary heaps (optional)
Go back and review the definition of binary heaps, as seen in lecture. Here again is the indexing scheme for nodes that can be used to represent binary heaps efficiently as arrays:
def parent(k : int) -> int: return (k-1) >> 1
def left(k : int) -> int: return (1 + (k << 1))
def right(k : int)-> int: return (2 + (k << 1))We limit ourselves to heaps storing string values with associated
integer weights, which we represent as lists of string-integer pairs,
i.e., values of type list[tuple[str,int]]. The function
below checks that such a list satisfies the heap property that
the weight of each node is bounded from above by the weights of each of
its children, or equivalently, bounded from below by the weight of its
parent.
def is_heap(heap : list[tuple[str,int]]) -> bool:
for i in range(1,len(heap)):
if heap[i][1] < heap[parent(i)][1]:
return False
return TrueOur overall goal is to correctly implement the insertion (heappush) and removal (heappop) operations,
def heappush(H : list[tuple[str,int]], a : str, w : int) -> None:
pass
def heappop(H : list[tuple[str,int]]) -> tuple[str,int]:
passas explained during the lecture and illustrated below.
heappush:
Using the testing routines described during class to help you with debugging, implement
heappushandheappop. For convenience we copy all of the testing code below, which you are free to modify.class InvalidHeap(Exception): def __init__(self, H): self.H = H def heap_sort(xs : list[int]) -> list[int]: """sorts a list of integers using a binary heap""" H : list[tuple[str,int]] = [] for i in range(len(xs)): heappush(H, str(i), xs[i]) if not is_heap(H): raise InvalidHeap(H) output = [] while H: output.append(heappop(H)[1]) if not is_heap(H): raise InvalidHeap(H) return output def is_sorted(l): """returns True if the input list is sorted, and False otherwise.""" for i in range(len(l)-1): if l[i+1] < l[i]: return False return True def randperm(n): """returns a permutation of 1..n chosen uniformly at random""" if n == 0: return [] else: p = randperm(n-1) i = random.randrange(n) return p[:i] + [n] + p[i:] def test_heapsort(n, k=10000): """tests correctness of heappush and heappop by calling heap_sort on random permutations""" for _ in range(k): xs = randperm(n) assert is_sorted(heap_sort(xs))(Hard!) Once you are sufficiently convinced that your implementation of
heappushandheappopis correct, prove it by writing down a rigorous specification of both functions and identifying the necessary loop invariants. (You can also start directly from the implementations shown in the last two slides of Lecture 10, but writing and debugging your own code is a good exercise, and there is definitely room for more efficient implementations.)