Tutorial 13: Recursion II
Log in to enable code uploads
You need to log in to be able to upload your work as you go. Use your Polytechnique LDAP login here.
Setup: before you start
Create a new empty project in Spyder, and call it “Tutorial_13”.
Now create a new program file, parsing.py
. All the
functions you write have to be added to this file.
Files to download
You will need to download the following files, right-clicking on the links and selecting “Save link as…” to save each in your Tutorial_13 project directory.
Context and Objectives
In this tutorial, we will dive deeper into recursive functions namely writing a parser for arithmetic expressions. The aim is to practice recursion, but the exercises could also be solved iteratively.
Skills we will practice today: recursion, arithmetic expressions, sudoku.
Exercise 1: Reading numbers
In this exercise, we will want to read and evaluate arithmetic expressions given as a string. To do so, we will first write a recursive function to read substrings representing non-negative integer values.
Your function should start as follows:
def read_positive_integer(text, position):
"""Return the longest substring of text representing a natural integer
starting from the given position, together with the first position after
that substring. If there is no positive integer value starting at the
given position, then return ('', position).
"""
Hint: You can use the isdigit()
method
for strings that returns True
if all characters are digits
and False
otherwise.
Test your function as follows:
In [0]: read_positive_integer('1234', 0)
Out[0]: ('1234', 4)
In [1]: read_positive_integer('1234', 4)
Out[1]: ('', 4)
In [2]: read_positive_integer('1234q', 0)
Out[2]: ('1234', 4)
In [3]: read_positive_integer('q1234q', 1)
Out[3]: ('1234', 5)
In [4]: read_positive_integer('q1234q', 0)
Out[5]: ('', 0)
Upload your file parsing.py
:
Exercise 2: Balanced parentheses
Writing arithmetic expressions (and also programming code) requires
well-paired parentheses. Write a recursive function
is_balanced
in parsing.py that will get a string as an
input and that will tell if the string contains well paired parentheses.
The input contains additionally the position in the expression where to
tell the algorithm where to start. The output is then a tuple with
either True or False at the first position and the position of the last
position that has been checked. If the expression is well-formed, the
output will be (True, X)
. Otherwise, the output will be
(False,X)
where X
is the position where
parsing stopped, thus at the end of the string for a succesful parse or
at the position of the first wrong parenthesis that has been found.
Your function should start as follows:
def is_balanced(expression, position):
"""
Read the expression starting from the given position,
recursively check if the parentheses in the expression are well balanced.
"""
Test your function on the console as follows:
In [6]: is_balanced("a",0)
Out[6]: (True, 1)
In [7]: is_balanced("()",0)
Out[7]: (True, 2)
In [8]: is_balanced("(abc)",0)
Out[8]: (True, 5)
In [9]: is_balanced("(abc",0)
Out[9]: (False, 0)
In [10]: is_balanced("abc)",0)
Out[10]: (False, 3)
In [11]: is_balanced("((abc)(eee))(123))",0)
Out[11]: (False, 17)
In [12]: is_balanced("((abc)(eee))((123))",1)
Out[12]: (False, 11)
In [13]: is_balanced("((abc)(e(e)e))()((123))", 0)
Out[13]: (True, 23)
In [14]: is_balanced("((abc)(e(e)e))()((123))", 14)
Out[14]: (True, 23)
Upload your file parsing.py
:
Exercise 3: Balanced parentheses extended
Now that we are able to check if a string has balanced parentheses
‘(’ and ‘)’, we will now write an extended version
is_totally_balanced
that checks for all different kinds of
parentheses (or pairs of symbols) if they are balanced in the input
string.
Crossing parentheses is not allowed in well-formed expressions: for
instance '([)]'
is not allowed even if this expression
restricted to ‘(’ and ‘)’ (resp. ‘[’ and ’]’) is balanced.
You can copy the code from is_balanced
, change the name
(also in the recursive call!) and add additional lines for the
extension.
Hint: You can use two lists that contain the symbols to be matched in the same order.
Your function should start as follows:
def is_totally_balanced(expression, position):
"""
Read the expression starting from the given position,
recursively check if all the parentheses in the expression are well balanced.
"""
= ['(', '[', '{', '<' ]
opening = [')', ']', '}', '>' ] closing
Test your function on the console as follows:
In [15]: is_totally_balanced("a", 0)
Out[15]: (True, 1)
In [16]: is_totally_balanced("(a)", 0)
Out[16]: (True, 3)
In [17]: is_totally_balanced("(a]", 0)
Out[17]: (False, 0)
In [18]: is_totally_balanced("([a{}])", 0)
Out[18]: (True, 7)
In [19]: is_totally_balanced("([a]){", 0)
Out[19]: (False, 5)
In [20]: is_totally_balanced("((abc)(e{e}e))<>[(123)])", 0)
Out[20]: (False, 23)
In [21]: is_totally_balanced("((abc)(e{e}e))<>[(123)]", 0)
Out[21]: (True, 23)
In [22]: is_totally_balanced("((abc)(e{e}e))<>[(123)]", 1)
Out[22]: (False, 13)
In [23]: is_totally_balanced("((abc)(e{e}e))<>[(123)]", 14)
Out[23]: (True, 23)
In [24]: is_totally_balanced("((abc)(e{e}e))<>[(123)]", 13)
Out[24]: (False, 13)
Upload your file parsing.py
:
Exercise 4: Arithmetic expressions
Now that we can read numbers, let us define what an expression is. We define them as strings having the following properties:
- A number as read in the last exercise is an arithmetic expression.
- If
a
andb
are arithmetic expressions, then so are(a+b)
,(a-b)
and(a*b)
. - No other string is an arithmetic expression, i.e. we don’t allow spaces (” “) and only parentheses ‘(’ and ‘)’.
- Each arithmetic expression is framed by parentheses
e.g.
(a)
or(a+b)
, however, single numbers can be read without parentheses (see examples).
For example '10'
is an arithmetic expression as well as
'(10)'
, '(1+1)'
, '((2+3)*10)'
and
'((1+1)*(1+1))'
. The above definition is what is generally
called an inductive definition. This kind of definition is
pervasive throughout computer science, discrete mathematics, logic and
many other fields (so now is a good time to get used to them). Note that
to avoid divisions by zero we do not allow division in arithmetic
expressions.
We now define the semantics of arithmetic expressions in the obvious
way by giving an evaluation function evaluate()
that maps
every arithmetic expression to a number:
- If
a
is a number, thenevaluate(a)
is that number. - If
a
andb
are arithmetic expressions, thenevaluate((a+b)) = evaluate(a)+evaluate(b)
,evaluate((a-b)) = evaluate(a)-evaluate(b)
, andevaluate((a*b)) = evaluate(a)*evaluate(b)
.
Write the function evaluate()
. To simplify the
implementation, let evaluate return a second value that gives the first
position after the currently read arithmetic sub-expression.
def evaluate(expression, position):
"""Evaluate the expression starting from the given position.
Return the value and the first position after the read
sub-expression. If the string starting at the given expression
is not an arithmetic expression, return (-1,None) as the second value of the
output tuple.
"""
Hint: you can define an additional function that evaluates the basic
expressions, e.g. eval_expression("+", "2", "3")
will
return 5
.
Test your function on the console as follows:
In [25]: evaluate("10", 0)
Out[25]: (10, 2)
In [26]: evaluate("(1+1)", 0)
Out[26]: (2, 5)
In [27]: evaluate("(1*1)", 0)
Out[27]: (1, 5)
In [28]: evaluate("(1-1)", 0)
Out[28]: (0, 5)
In [29]: evaluate("((1-1)+(2-1))", 0)
Out[29]: (1, 13)
In [30]: evaluate("q((1-1)+(2-1))", 1)
Out[30]: (1, 14)
In [31]: evaluate("((1-1)+(2*1))", 0)
Out[31]: (2, 13)
In [32]: evaluate("q((1-1)+(2*1))", 0)
Out[32]: (-1, None)
Upload your file parsing.py
:
Exercise 5: Collecting the parts
The function evaluate that we have written above does not necessarily find expressions with unbalanced parentheses, e.g.
In [33]: evaluate("1)", 0)
Out[33]: (1, 1)
Thus write another function check_and_evaluate
that
receives an arithmetic expression as input and first checks if its
parentheses are balanced using the function is_balanced
. If
the parentheses are not balanced, the output should be
None
. Otherwise, it calls evaluate
as a second
step whereas the output is either None
or the value
received after evaluation of the expression.
Test your function on the console as follows:
In [34]: check_and_evaluate("1")
Out[34]: 1
In [35]: check_and_evaluate("(1)")
Out[35]: 1
In [36]: check_and_evaluate("(1+1)")
Out[36]: 2
In [37]: check_and_evaluate("(1+1")
In [37]: check_and_evaluate("(1")
In [37]: check_and_evaluate("1)")
In [37]: check_and_evaluate("((1-1)+(2*1))")
Out[37]: 2
In [38]: check_and_evaluate("((10-1)+(2*1))")
Out[38]: 11
In [39]: check_and_evaluate("((10-1)+(2*1)")
In [39]: check_and_evaluate("q((10-1)+(2*1))")
Note that the output for some expressions (the invalid ones) is
None
which is not printed on the console.
Upload your file parsing.py
:
Exercise 6: Sudoku
We will now see how a simple recursive search algorithm can be used to solve Sudokus. We assume that you know Sudoku; if this is not the case, please read the rules before continuing.
We will write a very simple program to solve Sudokus: we choose a cell on the board that is not yet filled, then we fill in a number that does not appear in the row, column or box of that cell and solve recursively (the box of the cell, also called region or block is the relevant 3x3-subgrid that contains the cell). This algorithms is guaranteed to find all solutions if for every empty cell we try all possible values one after the other.
To concentrate on the recursive aspect of the algorithm, we give you
a class Sudoku
in soduku.py
that already
implements many methods for you to use:
- an
__init__()
method that initializes a board from a file, - an
__str__()
method to give a string representation where empty cells are encoded by ‘.’, - methods
put()
anddelete()
to set or delete values in a cell, - a method
check()
that checks if a value appears already in a row, column or box, and - a method
find_first_empty()
that finds the first empty cell on the board.
The only thing that remains to do is to write the method
solve()
that begins as follows:
def solve(self):
"""Solve the given Sudoku. When a solution is found, print 'Found solution!'
followed by the solution in the next line.
"""
Our strategy will be to use find_first_empty()
to find
an empty cell, put()
a value into that cell, and run
solve()
recursively. If solve
does not find a
solution, then we delete the value we have put before before trying the
next value.
Test your solver!
In [39]: s = Sudoku('sudoku1.txt')
In [40]: print(s)
┌───┬───┬───┐
│6..│...│..7│
│...│.9.│.2.│
│3.1│..2│59.│
├───┼───┼───┤
│8..│..7│.13│
│...│.8.│...│
│76.│3..│..8│
├───┼───┼───┤
│.78│2..│1.6│
│.5.│.3.│...│
│2..│...│..9│
└───┴───┴───┘
In [41]: s.solve()
Found solution!
┌───┬───┬───┐
│629│415│387│
│547│893│621│
│381│672│594│
├───┼───┼───┤
│894│567│213│
│132│984│765│
│765│321│948│
├───┼───┼───┤
│478│259│136│
│956│138│472│
│213│746│859│
└───┴───┴───┘
You can also create your own Sudokus by editing more files.
Upload your file sudoku.py
:
Exercise 7 (optional): More efficient Sudoku
If you try your Sudoku solver on the file sudoku-hard.txt, you should notice that this instance is, as the name suggests, in fact hard for your solver. Can you explain why that is?
Come up with ways to speed up the search for a solution in Sudoku puzzles. There are several ways of approaching this, for example:
- using a better variable choice heuristic
- using smarter reasoning techniques when before filling in values, for example some of those found here. Note that for some of the techniques it might be helpful to represent the board differently.
Which approach is more efficient?
Sudoku is a special case of the Constraint Satisfaction Problem (CSP) which is a very active research field in artificial intelligence. A general finding of that research is that there is a trade-off between the efficiency of search (i.e. plugging in values and recursing) and reasoning (i.e. thinking about which values can lead to solutions before plugging them in).
The two extreme cases are:
- just exhaustively searching through all combinations of values which for non-trivial problems leads to a combinatorial explosion and thus bad runtime, and
- making very complex reasoning to filter the values that are tried, which leads to less search but often slows down the program because the reasoning is slow.
Generally, good algorithms lie between these two extremes. The reasoning is limited to techniques that shrink the number of combinations that have to be studied effectively while making sure that this reasoning is not too costly. Of course, what the best trade-off is depends heavily on the problem. Can you find a good trade-off for Sudoku?
Upload your file sudoku.py
: