Tutorial 13: Recursion II
Log in to enable code uploads
You need to log in to be able to upload your work as you go. Use your Polytechnique LDAP login here.
Setup: before you start
Create a new empty project in Spyder, and call it “Tutorial_13”.
Now create a new program file, parsing.py. All the
functions you write have to be added to this file.
Files to download
You will need to download the following files, right-clicking on the links and selecting “Save link as…” to save each in your Tutorial_13 project directory.
Context and Objectives
In this tutorial, we will dive deeper into recursive functions namely writing a parser for arithmetic expressions. The aim is to practice recursion, but the exercises could also be solved iteratively.
Skills we will practice today: recursion, arithmetic expressions, sudoku.
Exercise 1: Finding the first digit
In this exercise, we want to locate the position of the first digit in a string, starting from a given position.
Write a recursive function first_digit that scans the string character by character until it finds a digit. Note that your function should return the position of the first digit and not the digit itself (see examples below).
Your function should start as follows:
def first_digit(text, position):
"""Return the position of the first digit in text starting from the given
position. If there is no digit starting from this position, return None.
"""
pass # remove this line and complete the functionHint: You can use the isdigit() method
for strings that returns True if all characters are digits
and False otherwise.
Test your function as follows:
In [0]: first_digit('abc123', 0)
Out[0]: 3
In [1]: first_digit('abc123', 2)
Out[1]: 3
In [2]: first_digit('abc123', 3)
Out[2]: 3
In [3]: first_digit('abc123', 4)
Out[3]: 4
In [4]: first_digit('abcdef', 0) is None
Out[4]: True
In [5]: first_digit('', 0) is None
Out[5]: True
In [6]: first_digit('9abc', 0)
Out[6]: 0
In [7]: first_digit('abc9', 4) is None
Out[7]: True
Upload your file parsing.py:
Exercise 2: Reading numbers
In this exercise, we will want to read and evaluate arithmetic
expressions given as a string. To do so, we will first write a recursive
function read_positive_integer to read substrings
representing non-negative integer values.
You do not need to use the first_digit function from the
previous exercise, but you can use it as inspiration.
Your function should start as follows:
def read_positive_integer(text, position):
"""Return the longest substring of text representing a natural integer
starting from the given position, together with the first position after
that substring. If there is no positive integer value starting at the
given position, then return ('', position).
"""
pass # remove this line and complete the functionTest your function as follows:
In [8]: read_positive_integer('1234', 0)
Out[8]: ('1234', 4)
In [9]: read_positive_integer('1234', 1)
Out[9]: ('234', 4)
In [10]: read_positive_integer('1234', 4)
Out[10]: ('', 4)
In [11]: read_positive_integer('1234q', 0)
Out[11]: ('1234', 4)
In [12]: read_positive_integer('q1234q', 1)
Out[12]: ('1234', 5)
In [13]: read_positive_integer('q1234q', 0)
Out[14]: ('', 0)
Upload your file parsing.py:
Exercise 3: Balanced parentheses
Writing arithmetic expressions (and also programming code) requires
well-paired parentheses. Write a recursive function
is_balanced in parsing.py that will get a string as an
input and that will tell if the string contains well paired parentheses.
The input contains additionally the position in the expression to tell
the algorithm where to start. The output is then a tuple with either
True or False at the first position and the position of the last
position that has been checked. If the expression is well-formed, the
output will be (True, X). Otherwise, the output will be
(False,X) where X is the position where
parsing stopped, thus at the end of the string for a succesfull parse or
at the position of the first wrong parenthesis that has been found.
Here is a high-level idea of how the function
is_balanced works:
If
positionexceeds the length ofexpression, everything went well when pairing the parenthesis, so the function returns(True,position).When the function sees
(, it:- recursively checks what comes inside,
- expects to eventually find a matching
), - then continues parsing after that closing parenthesis.
When it sees
)unexpectedly, it immediately fails.All other characters (
a,b,c, digits, operators, etc.) are ignored and skipped.
It is important to notice that recursion naturally follows the
nesting structure of parentheses: each opening parenthesis
( delegates the search for its matching closing one.
Reaching the end of the string too early reveals a missing
).
We can also develop step-by-step the behavior of the function, for
example is_balanced("((a+2)*12", 0). There are two
opening parentheses (( but only one
closing parenthesis ).
Step-by-step execution
Step 1 — Initial call
is_balanced("((a+2)*12", 0)
expression[0] == '('- We enter the opening parenthesis case
- Recursive call:
(res, pos) = is_balanced(expression, 1)
Step 2 — Second opening parenthesis
is_balanced("((a+2)*12", 1)
expression[1] == '('- Recursive call:
(res, pos) = is_balanced(expression, 2)
Step 3 — Skipping non-parenthesis characters
is_balanced("((a+2)*12", 2) # 'a'
is_balanced("((a+2)*12", 3) # '+'
is_balanced("((a+2)*12", 4) # '2'
None of these characters are parentheses, so the function keeps advancing.
Step 4 — Encountering a closing parenthesis
is_balanced("((a+2)*12", 5)
expression[5] == ')'- According to the code:
return (False, 5)
This signals a closing parenthesis that must be matched by a previous
'('.
Step 5 — Returning to the call at position 1
We resume the call:
is_balanced("((a+2)*12", 1)
with:
res = False
pos = 5
So the inner parentheses (a+2) are considered
correctly matched.
The function continues after this closing parenthesis:
return is_balanced(expression, 6)
Step 6 — After the inner parentheses
is_balanced("((a+2)*12", 6) # '*'
is_balanced("((a+2)*12", 7) # '1'
is_balanced("((a+2)*12", 8) # '2'
Again, no parentheses are found, so the function keeps advancing.
Step 7 — End of the string
is_balanced("((a+2)*12", 10)
Since the function is called with
position >= len(expression), the function returns:
(True, 10)
Meaning: “Everything from here onward is balanced.”
Step 8 — Returning to the initial call
Back to:
is_balanced("((a+2)*12", 0)
We now have:
res = True
pos = 10
The function finally returns:
(False, 0)
We see on the detailed execution of the function that the inner
parentheses (a+2) are correctly matched,
the outer opening parenthesis at position 0 is
never closed. The expression is therefore not
well-balanced and the error is reported at position
0, corresponding to the unmatched '('.
Your function should start as follows:
def is_balanced(expression, position):
"""
Read the expression starting from the given position,
recursively check if the parentheses in the expression are well balanced.
"""Test your function on the console as follows:
In [15]: is_balanced("a",0)
Out[15]: (True, 1)
In [16]: is_balanced("()",0)
Out[16]: (True, 2)
In [17]: is_balanced("(abc)",0)
Out[17]: (True, 5)
In [18]: is_balanced("(abc",0)
Out[18]: (False, 0)
In [19]: is_balanced("((a+2)*12", 0)
Out[19]: (False, 0)
In [20]: is_balanced("abc)",0)
Out[20]: (False, 3)
In [21]: is_balanced("((abc)(eee))(123))",0)
Out[21]: (False, 17)
In [22]: is_balanced("((abc)(eee))((123))",1)
Out[22]: (False, 11)
In [23]: is_balanced("((abc)(e(e)e))()((123))", 0)
Out[23]: (True, 23)
In [24]: is_balanced("((abc)(e(e)e))()((123))", 14)
Out[24]: (True, 23)
Upload your file parsing.py:
Exercise 4: Balanced parentheses extended
Now that we are able to check if a string has balanced parentheses
‘(’ and ‘)’, we will now write an extended version
is_totally_balanced that checks for all different kinds of
parentheses (or pairs of symbols) if they are balanced in the input
string.
Crossing parentheses is not allowed in well-formed expressions: for
instance '([)]' is not allowed even if this expression
restricted to ‘(’ and ‘)’ (resp. ‘[’ and ’]’) is balanced.
You can copy the code from is_balanced, change the name
(also in the recursive call!) and add additional lines for the
extension.
Hint: You can use two lists that contain the symbols to be matched in the same order.
Your function should start as follows:
def is_totally_balanced(expression, position):
"""
Read the expression starting from the given position,
recursively check if all the parentheses in the expression are well balanced.
"""
opening = ['(', '[', '{', '<' ]
closing = [')', ']', '}', '>' ]Test your function on the console as follows:
In [25]: is_totally_balanced("a", 0)
Out[25]: (True, 1)
In [26]: is_totally_balanced("(a)", 0)
Out[26]: (True, 3)
In [27]: is_totally_balanced("(a]", 0)
Out[27]: (False, 0)
In [28]: is_totally_balanced("([a{}])", 0)
Out[28]: (True, 7)
In [29]: is_totally_balanced("([a]){", 0)
Out[29]: (False, 5)
In [30]: is_totally_balanced("((abc)(e{e}e))<>[(123)])", 0)
Out[30]: (False, 23)
In [31]: is_totally_balanced("((abc)(e{e}e))<>[(123)]", 0)
Out[31]: (True, 23)
In [32]: is_totally_balanced("((abc)(e{e}e))<>[(123)]", 1)
Out[32]: (False, 13)
In [33]: is_totally_balanced("((abc)(e{e}e))<>[(123)]", 14)
Out[33]: (True, 23)
In [34]: is_totally_balanced("((abc)(e{e}e))<>[(123)]", 13)
Out[34]: (False, 13)
Upload your file parsing.py:
Exercise 5: Arithmetic expressions
Now that we can read numbers, let us define what an expression is. We define them as strings having the following properties:
- A number as read in the last exercise is an arithmetic expression.
- If
aandbare arithmetic expressions, then so are(a+b),(a-b)and(a*b). - No other string is an arithmetic expression, i.e. we don’t allow spaces (” “) and only parentheses ‘(’ and ‘)’.
- Each arithmetic expression is framed by parentheses
e.g.
(a)or(a+b), however, single numbers can be read without parentheses (see examples).
For example '10' is an arithmetic expression as well as
'(10)', '(1+1)', '((2+3)*10)' and
'((1+1)*(1+1))'. The above definition is what is generally
called an inductive definition. This kind of definition is
pervasive throughout computer science, discrete mathematics, logic and
many other fields (so now is a good time to get used to them). Note that
to avoid divisions by zero we do not allow division in arithmetic
expressions.
We now define the semantics of arithmetic expressions in the obvious
way by giving an evaluation function evaluate() that maps
every arithmetic expression to a number:
- If
ais a number, thenevaluate(a)is that number. - If
aandbare arithmetic expressions, thenevaluate((a+b)) = evaluate(a)+evaluate(b),evaluate((a-b)) = evaluate(a)-evaluate(b), andevaluate((a*b)) = evaluate(a)*evaluate(b).
Write the function evaluate(). To simplify the
implementation, let evaluate return a second value that gives the first
position after the currently read arithmetic sub-expression.
def evaluate(expression, position):
"""Evaluate the expression starting from the given position.
Return the value and the first position after the read
sub-expression. If the string starting at the given expression
is not an arithmetic expression, return (-1,None) as the second value of the
output tuple.
"""Hint: you can define an additional function that evaluates the basic
expressions, e.g. eval_expression("+", "2", "3") will
return 5.
Test your function on the console as follows:
In [35]: evaluate("10", 0)
Out[35]: (10, 2)
In [36]: evaluate("(1+1)", 0)
Out[36]: (2, 5)
In [37]: evaluate("(1*1)", 0)
Out[37]: (1, 5)
In [38]: evaluate("(1-1)", 0)
Out[38]: (0, 5)
In [39]: evaluate("((1-1)+(2-1))", 0)
Out[39]: (1, 13)
In [40]: evaluate("q((1-1)+(2-1))", 1)
Out[40]: (1, 14)
In [41]: evaluate("((1-1)+(2*1))", 0)
Out[41]: (2, 13)
In [42]: evaluate("q((1-1)+(2*1))", 0)
Out[42]: (-1, None)
Upload your file parsing.py:
Exercise 6: Collecting the parts
The function evaluate that we have written above does not necessarily find expressions with unbalanced parentheses, e.g.
In [43]: evaluate("1)", 0)
Out[43]: (1, 1)
Thus write another function check_and_evaluate that
receives an arithmetic expression as input and first checks if its
parentheses are balanced using the function is_balanced. If
the parentheses are not balanced, the output should be
None. Otherwise, it calls evaluate as a second
step whereas the output is either None or the value
received after evaluation of the expression.
Test your function on the console as follows:
In [44]: check_and_evaluate("1")
Out[44]: 1
In [45]: check_and_evaluate("(1)")
Out[45]: 1
In [46]: check_and_evaluate("(1+1)")
Out[46]: 2
In [47]: check_and_evaluate("(1+1")
In [47]: check_and_evaluate("(1")
In [47]: check_and_evaluate("1)")
In [47]: check_and_evaluate("((1-1)+(2*1))")
Out[47]: 2
In [48]: check_and_evaluate("((10-1)+(2*1))")
Out[48]: 11
In [49]: check_and_evaluate("((10-1)+(2*1)")
In [49]: check_and_evaluate("q((10-1)+(2*1))")
Note that the output for some expressions (the invalid ones) is
None which is not printed on the console.
Upload your file parsing.py:
Exercise 7: Sudoku
We will now see how a simple recursive search algorithm can be used to solve Sudokus. We assume that you know Sudoku; if this is not the case, please read the rules before continuing.
We will write a very simple program to solve Sudokus: we choose a cell on the board that is not yet filled, then we fill in a number that does not appear in the row, column or box of that cell and solve recursively (the box of the cell, also called region or block is the relevant 3x3-subgrid that contains the cell). This algorithms is guaranteed to find all solutions if for every empty cell we try all possible values one after the other.
To concentrate on the recursive aspect of the algorithm, we give you
a class Sudoku in soduku.py that already
implements many methods for you to use:
- an
__init__()method that initializes a board from a file, - an
__str__()method to give a string representation where empty cells are encoded by ‘.’, - methods
put()anddelete()to set or delete values in a cell, - a method
check()that checks if a value appears already in a row, column or box, and - a method
find_first_empty()that finds the first empty cell on the board.
The only thing that remains to do is to write the method
solve() that begins as follows:
def solve(self):
"""Solve the given Sudoku. When a solution is found, print 'Found solution!'
followed by the solution in the next line.
"""Our strategy will be to use find_first_empty() to find
an empty cell, put() a value into that cell, and run
solve() recursively. If solve does not find a
solution, then we delete the value we have put before before trying the
next value.
Test your solver!
In [49]: s = Sudoku('sudoku1.txt')
In [50]: print(s)
┌───┬───┬───┐
│6..│...│..7│
│...│.9.│.2.│
│3.1│..2│59.│
├───┼───┼───┤
│8..│..7│.13│
│...│.8.│...│
│76.│3..│..8│
├───┼───┼───┤
│.78│2..│1.6│
│.5.│.3.│...│
│2..│...│..9│
└───┴───┴───┘
In [51]: s.solve()
Found solution!
┌───┬───┬───┐
│629│415│387│
│547│893│621│
│381│672│594│
├───┼───┼───┤
│894│567│213│
│132│984│765│
│765│321│948│
├───┼───┼───┤
│478│259│136│
│956│138│472│
│213│746│859│
└───┴───┴───┘
You can also create your own Sudokus by editing more files.
Upload your file sudoku.py:
Exercise 8 (optional): More efficient Sudoku
If you try your Sudoku solver on the file sudoku-hard.txt, you should notice that this instance is, as the name suggests, in fact hard for your solver. Can you explain why that is?
Come up with ways to speed up the search for a solution in Sudoku puzzles. There are several ways of approaching this, for example:
- using a better variable choice heuristic
- using smarter reasoning techniques when before filling in values, for example some of those found here. Note that for some of the techniques it might be helpful to represent the board differently.
Which approach is more efficient?
Sudoku is a special case of the Constraint Satisfaction Problem (CSP) which is a very active research field in artificial intelligence. A general finding of that research is that there is a trade-off between the efficiency of search (i.e. plugging in values and recursing) and reasoning (i.e. thinking about which values can lead to solutions before plugging them in).
The two extreme cases are:
- just exhaustively searching through all combinations of values which for non-trivial problems leads to a combinatorial explosion and thus bad runtime, and
- making very complex reasoning to filter the values that are tried, which leads to less search but often slows down the program because the reasoning is slow.
Generally, good algorithms lie between these two extremes. The reasoning is limited to techniques that shrink the number of combinations that have to be studied effectively while making sure that this reasoning is not too costly. Of course, what the best trade-off is depends heavily on the problem. Can you find a good trade-off for Sudoku?
Upload your file sudoku.py: