CSE 102 – Tutorial 8 – Debugging through rigorous specification, informal proof, and testing
General instructions
Sections tagged with a green marginal line are mandatory.
Sections tagged with an orange marginal line are either more advanced or supplementary, and are therefore considered optional. They will help you deepen your understanding of the material in the course. You may skip these sections on a first pass through the tutorial, and come back to them after finishing the mandatory exercises.
Warning: this tutorial is a little different from the previous ones! The goal of this tutorial is to develop better skills in debugging and reasoning about your code on your own, and so there are no automated tests on the submission server. Still, you should upload your solutions to the submission server using the form below, so that it verifies that your Python file does not contain syntax errors, and so that the instructors have easy access to your code.
We expect you to write your solutions to all of problems in this
tutorial in a file named spec.py
and to upload it using the
form below. You can resubmit as many times as you like on the submission
server, and when you are finished you should make a Moodle submission
before the deadline.
Rigorous specification and informal proof
In-place list reversal is the process of reversing the order
of the elements of a list without creating a new list. In Python, this
can be achieved by calling the built-in reverse()
method:
1]: xs = [0, 1, 2, 3, 4, 5]
In [2]: xs.reverse()
In [3]: xs
In [3]: [5, 4, 3, 2, 1, 0] Out[
We will consider here two different high-level algorithms for performing in-place list reversal, as well as the corresponding low-level implementations that initially turn out to be incorrect. But before describing the algorithms, we will try to be more precise about what it means for an implementation of in-place list reversal to be correct. That is to say, we will give a rigorous specification of in-place list reversal.
Let us suppose that in-place list reversal is implemented as a
function rev
taking the input list as argument. For
example, rev(xs)
could be defined using a call to the
built-in reverse()
method:
def rev(xs): xs.reverse()
In general, for rev
to be a correct implementation of
in-place list reversal, then after calling rev(xs)
it
should be the case that
xs_output[i] == xs_input[n-1-i]
for all i
in
range(n)
where xs_input
is the original state of the list
xs
before the call, xs_output
is the state of
the list after the call, and n
is its original length. Here
is a Python function that checks this correctness criterion for any
given list reversal function revfn
on any given input list
xs
(while leaving the input list unchanged):
def check_rev(revfn, xs):
= len(xs)
n = xs[:]
xs_copy
revfn(xs_copy)return all (xs_copy[i] == xs[n-1-i] for i in range(n))
An implementation revfn
of in-place list reversal is
correct just in case check_rev(revfn, xs)
evaluates to
True
for all possible input lists xs
.
Of course it is not possible to test this automatically, since there are
infinitely many lists! So if we want to convince ourselves that an
implementation revfn
is correct, we have to reason
carefully about how it achieves its task. On the other hand, to show
that an implementation is incorrect it suffices to find a
single counterexample, in the form of a list xs
such that
check_rev(revfn, xs)
evaluates to False
.
Now, the first algorithm for in-place list reversal that we consider works by successively swapping elements at opposite ends of the list while moving towards the middle, as illustrated by the following diagram:

Here is a plausible-looking, but buggy, implementation of the algorithm:
def rev1_buggy(xs):
= len(xs)
n for k in range(n):
= xs[k]
t = xs[n-1-k]
xs[k] -1-k] = t xs[n
Try running
rev1_buggy
on some examples, and then give an example of a listxs
such thatcheck_rev(rev1_buggy, xs)
evaluates toFalse
. You should do so by adding the lines below to your Python file while replacing...
with an appropriate value:= ... xs1 assert check_rev(rev1_buggy, xs1) == False
Then run the file. The
assert
command causes Python to check that the boolean condition evaluates toTrue
, which in this case will verify thatxs1
is indeed a counterexample to the correctness ofrev1_buggy
. If the condition evaluates toFalse
(i.e.,xs1
is not an actual counterexample), then it will raise anAssertionError
.
By now you may have noticed the bug that is the reason for the
incorrectness of rev1_buggy
. (Have you? If not, try to find
it before clicking below to reveal a corrected implementation.)
(click to reveal)
def rev1(xs):
= len(xs)
n for k in range(n//2): # corrected number of loop iterations
= xs[k]
t = xs[n-1-k]
xs[k] -1-k] = t xs[n
You can verify that the corrected in-place list reversal function
rev1
deals with your previous counterexample by adding the
following line to your Python file and reloading it.
assert check_rev(rev1, xs1)
Of course, this is only a single example, not a proof of correctness
on all possible inputs. To convince ourselves that rev1
is
a correct implementation of in-place list reversal, we need to try to
understand how it goes about reversing a list. For this kind of
iterative algorithm, the key is to identify an appropriate “loop
invariant”, that is, a condition that is true at the end of each
iteration of the for
loop, and which implies that the
correctness criterion for in-place list reversal holds once all
iterations have been completed.
You may have run into this idea before if you are taking CSE103. Here we are not asking you to construct a formal proof in the style of Floyd-Hoare logic, but instead to formulate the loop invariant as a checkable Python condition, and to argue informally that it is always valid and implies the correctness criterion. You should start with the following instrumented version of the corrected implementation:
def rev1_checked(xs):
= xs[:] # we copy the original input list to be able to refer to it in the invariant
xs_input = len(xs)
n for k in range(n//2):
= xs[k]
t = xs[n-1-k]
xs[k] -1-k] = t
xs[nassert True # a formula that should hold at the end of the kth iteration of the loop
The loop invariant that we asserted above is trivial. Try replacing
True
by another formulaInv(k)
, and runrev1_checked
on some examples to convince yourself that the assertion does not raise an error, i.e., that the formula is really true at the end of each loop iteration. The formulaInv(k)
should at least involve the loop variablek
, but can also mention the variablesn
,xs
, andxs_input
, as well as any other variables that you introduce. As an example of a valid but not so interesting invariant, you can takeInv(k) = k < n//2
.Find a loop invariant
Inv(k)
that is strong enough to imply the correctness ofrev1
. You should state the invariant as an assertion, and you should also include a comment arguing why it implies thatrev1
is a correct implementation of in-place list reversal.Hint: You need to show that when
k = n//2-1
(i.e., at the end of the last iteration of the loop), thenInv(k)
implies thatxs[i] == xs_input[n-1-i]
for alli
inrange(n)
(i.e., that the list is reversed relative to its original state). If the invariant that you chose above does not imply this, you may need to go back and find a stronger invariant.
An alternative algorithm for reversing the elements of a list works by successively swapping elements to “bubble” up values towards the end of the list, as illustrated by the following diagram:

(This algorithm involves a lot more swapping than the first one, though it only uses swaps of adjacent elements.)
And here is a slightly buggy implementation:
def rev2_buggy(xs):
= len(xs)
n for m in range(n,1,-1):
for i in range(m):
= xs[i]
t = xs[i+1]
xs[i] +1] = t xs[i
Try running
rev2_buggy
on some examples, and then give an example of a listxs
such thatcheck_rev(rev2_buggy, xs)
does not evaluate toTrue
.= ... # define your counterexample here xs2
Figure out how to fix the bug in rev2_buggy
, before
clicking below to reveal a corrected implementation.
(click to reveal)
def rev2(xs):
= len(xs)
n for m in range(n,1,-1):
for i in range(m-1): # corrected upper bound
= xs[i]
t = xs[i+1]
xs[i] +1] = t xs[i
You will have noticed that the original bug led to
check_rev(rev2_buggy, xs2)
raising an
IndexError
exception, rather than returning
False
. It was implicit in our specification of in-place
list reversal that the reversal function should successfully terminate
on any input list, without raising an exception or going into an
infinite loop. We can improve the check to test explicitly that the
function does not raise an exception, as below:
def improved_check_rev(revfn, xs):
= len(xs)
n = xs[:]
xs_copy try:
revfn(xs_copy)except Exception as e:
return False
return all (xs_copy[i] == xs[n-1-i] for i in range(n))
(We could also potentially add a timeout to ensure that the call to
revfn
returns within a reasonable amount of time, although
this requires some library support in Python.) Now we can include
assertions verifying explicitly that rev2_buggy
fails the
improved check on the counterexample, and that rev2
passes
it.
assert improved_check_rev(rev2_buggy, xs2) == False
assert improved_check_rev(rev2, xs2)
On the other hand, all bets are off if the reversal function is
called with something that is not a list. For example, although calling
rev2(3.14)
raises an exception
(TypeError: 'float' object is not subscriptable
), this is
not considered a specification violation, since the specification states
that the input argument should be a list.
The goal of the next optional series of exercises is to explain
(informally but rigorously) why rev2
is a correct
implementation of in-place list reversal. As we did above for
rev1
, we’ll start from an instrumented version with
placeholders for loop invariants:
def rev2_checked(xs):
= xs[:]
xs_input = len(xs)
n for m in range(n,1,-1):
for k in range(m-1):
= xs[k]
t = xs[k+1]
xs[k] +1] = t
xs[kassert True # inner loop invariant
assert True # outer loop invariant
Replace the outer loop invariant by a non-trivial formula
Outer(m)
, which should hold at the end of each outer loop iteration. (The variablem
counts down, so this corresponds to the end of then-m
th outer loop iteration.) Runrev2_checked
on some example lists to make sure that the invariant is really true.Find an outer loop invariant that is strong enough to imply the correctness of
rev2
. (You do not need to prove that the invariant is correct, although you should check this on examples.)Hint: Examine the “braid diagram” above and try to write a formula that describes the state of the diagram every time a strand reaches its final position, i.e., at the end of a series of swaps. You may want to decompose
Outer(m)
as the conjunction of a pair of formulasOuter1(m)
andOuter2(m)
, which you can assert individually.Replace the inner loop invariant by a non-trivial formula
Inner(m,k)
, and progressively refine it until you have a valid inner loop invariant that is strong enough to imply the outer loop invariant.Hint: You need to show that
Inner(m,k)
impliesOuter(m)
whenk = m-2
. You may want to use the outer loop invariant as a guide for constructing the inner loop invariant, and again you may want to decomposeInner(m,k)
as a conjunction of invariants, and it may be helpful to look back at the diagram.
Testing through exhaustive enumeration
In Tutorial 5 you implemented various operations on binary search
trees, including the tricky deletion operation bst_remove
which required some careful coding to avoid bugs. Fortunately (or
unfortunately!) for you, the submission server’s automated testing was
able to frequently identify examples on which your code gave incorrect
results. We will now have a look retrospectively to see how you could
have done the job of the server on your own.
Let’s begin by recalling the Node
class for binary
trees, which we’ve augmented with __repr__
and
__eq__
methods for the purposes of debugging.
class Node:
def __init__(self, value, left = None, right = None):
self.value = value
self.left = left
self.right = right
def __repr__(self):
return f'Node{(self.value, self.left, self.right)}' if self.left or self.right \
else f'Node({self.value})'
def __eq__(self, other):
return other is not None and \
self.value == other.value and \
self.left == other.left and self.right == other.right
Also for the purpose of writing checks, we add a copy
method, which returns a fresh copy of a tree.
def copy(self):
= self.left.copy() if self.left else None
left = self.right.copy() if self.right else None
right return Node(self.value, left, right)
Your first programming task will be to write a function
check_bst_remove
that checks whether a given implementation
of BST deletion behaves correctly on a given input. But before that, we
need to make more precise what exactly defines “correct” behavior. This
requires some thought and discussion, because there are minor ways that
algorithms for BST deletion may differ. For example, the algorithm
described in Tutorial 5 says that in the case where the node \(N\) with the value to be deleted has two
children you should replace its value by that of its inorder successor —
but it would work just as well to take the inorder predecessor. Here are
implementations of both approaches:
def bst_remove1(node, x):
if node is None:
return None
if x < node.value:
= bst_remove1(node.left, x)
node.left elif x > node.value:
= bst_remove1(node.right, x)
node.right else:
if node.left is None:
return node.right
if node.right is None:
return node.left
= node, node.right
parent, current while current.left is not None:
= current, current.left
parent, current if parent is node:
= current.right
parent.right else:
= current.right
parent.left = current.value
node.value
return node
def bst_remove2(node, x):
if node is None:
return None
if x < node.value:
= bst_remove2(node.left, x)
node.left elif x > node.value:
= bst_remove2(node.right, x)
node.right else:
if node.left is None:
return node.right
if node.right is None:
return node.left
= node, node.left
parent, current while current.right is not None:
= current, current.right
parent, current if parent is node:
= current.left
parent.left else:
= current.left
parent.right = current.value
node.value
return node
Both of these implementations may be considered as “correct”, even though they sometimes return different outputs.
Propose a precise specification of the BST deletion operation that is sufficiently general to be satisfied by both of the implementations
bst_remove1
andbst_remove2
.Write a function
check_bst_remove(delfn, node, x)
that determines whetherdelfn
behaves according to this specification of BST deletion, on the given input valuesnode
andx
. The check should returnTrue
ifdelfn
returns a correct output on the given input, andFalse
if it returns an incorrect output or raises an exception. (You do not have to check for non-termination.)
An implementation delfn
of BST deletion is correct just
in case it satisfies the specification for all inputs. Again,
since there are infinitely many possible inputs, no amount of testing
will guarantee correctness, and the only way to know this for sure is to
carefully examine the code and come up with a rigorous explanation for
why it works. However, testing can often be used to quickly come up with
counterexamples to correctness, which can be a guide in debugging.
Two important testing techniques are exhaustive enumeration and random generation. Under exhaustive enumeration, one generates all possible inputs up to a given size and uses those as test cases. Since the number of possible inputs often grows exponentially as a function of the size, this approach can only be used for testing with relatively small inputs, although it is still useful because many times small inputs suffice to find shallow bugs. Under random generation, larger objects are sampled probabilistically and used as test cases. Repeated sampling is then used to either come up with a counterexample or provide some evidence for correctness — although again it is no guarantee.
We will only consider exhaustive enumeration in this tutorial — though the submission server uses both exhaustive enumeration and random generation, as well as testing on specific examples.
To test implementations of BST deletion, we need to generate pairs of a binary search tree and a value to delete. For simplicity, we will restrict to BSTs with \(n\) nodes labelled by distinct values \(0, \dots, n-1\), and assume that the value to be deleted is an integer between \(0\) and \(n\) (thus either has exactly one occurrence in the tree, or none). These are pretty safe assumptions, since most reasonable implementations of BST deletion will not depend on the specific values stored in a tree, only making use of relative comparisons. (It is possible, though, that some implementations may have bugs in situations where the same value is stored at different nodes of the tree, which would not be caught if we limit ourselves to the space of BSTs with nodes labelled by distinct values.)
Write a function
all_bst_list(n)
that takes as input a non-negative integern
, and returns a list of all possible BSTs withn
nodes labelled by distinct values between0
andn-1
. For example,all_bst_list(3)
should return the list[Node(0, None, Node(1, None, Node(2))), Node(0, None, Node(2, Node(1), None)), Node(1, Node(0), Node(2)), Node(2, Node(0, None, Node(1)), None), Node(2, Node(1, Node(0), None), None)]
(or some permutation thereof), corresponding to the five BSTs with three nodes:
Hint: using recursion, define a more general function
all_bst_list(n, x=0)
which returns a list of all BSTs withn
nodes labelled betweenx
andx+n-1
. To set up the recursion, use the fact that any binary tree of sizen>0
has a left child of sizek
and a right child of sizen-1-k
, for somek
between0
andn-1
. The nodes of the left child should be labelledx , ... , x+k-1
, and the nodes of the right childx+k+1 , ... , x+n-1
.
- In Python, this kind of exhaustive enumeration is better performed
using generators rather than lists, since it allows for objects to be
used as soon as they are generated, and doesn’t require the complete
list of objects to be stored in memory. Rewrite
all_bst_list
as a generatorall_bst_gen
, which successively yields all BSTs of sizen
.
- Using your solutions to the previous questions, write a testing
function
test_bst_remove(delfn, maxn)
that checks thatdelfn
is correct on all inputs consisting of a BSTnode
of sizen <= maxn
and a valuex
between0
andn
. In the case thatdelfn
fails on any such input, it should return the input pair(node, x)
. Otherwise — that is, if no counterexample to correctness was found — it should returnNone
.
Now it’s time to test your testers!
The file buggy.py contains several
buggy implementations of BST deletion. By calling
test_bst_remove
with a sufficiently large
maxn
, you should be able to find correctness
counterexamples for all of them. For example, it is particulary easy to
find a counterexample for buggy.del0
:
4]: test_bst_remove(buggy.del0, 3)
In [4]: (Node(0), 0) Out[
Once you have found these counterexamples, record them as assertions in your Python file:
import buggy assert check_bst_remove(buggy.del0, Node(0), 0) == False # assert check_bst_remove(buggy.del1, ...) == False # assert check_bst_remove(buggy.del2, ...) == False # assert check_bst_remove(buggy.del3, ...) == False # assert check_bst_remove(buggy.del4, ...) == False # assert check_bst_remove(buggy.del5, ...) == False