You must be authenticated to submit your files

CSE 102 – Tutorial 6 – Programming with graphs

General instructions
Warmup: graph representations
University life: combining depth and breadth
Fish coloring (optional)

General instructions

Sections tagged with a green marginal line are mandatory.

Sections tagged with an orange marginal line are either more advanced or supplementary, and are therefore considered optional. They will help you deepen your understanding of the material in the course. You may skip these sections on a first pass through the tutorial, and come back to them after finishing the mandatory exercises.

When submitting a solution, the system will do some basic validation of your files and run some automated tests on your solutions. A green tick tells you that your file compiles and has passed all of the automated tests. A red tick tells you either that your file is badly broken (e.g., a syntax error) or that your solutions fail one or more of the automated tests. You can see the result of the tests by clicking on “See my uploads” in the upper-left menu, and then clicking on the right arrow button next to each file you submitted.

Although passing all of our tests provides evidence that your solutions are correct, it might not guarantee it, as typically we would need to run infinitely many tests! You are encouraged to write your own tests, and more generally to think about why your code works and not just rely on the automated tests.

Discussion of the tutorials with friends and classmates is allowed, with acknowledgment, but please read the Moodle page on “Collaboration vs plagiarism/cheating”.

Warmup: graph representations

We expect you to write your solution to this problem in a file named graphs.py and to upload it using the form below. You can resubmit your solution as many times as you like.

Upload form is only available when connected

In this tutorial we will be mainly using the adjacency list representation of directed graphs, but for the first problem we also consider adjacency matrix form. Recall that a graph \(G\) with \(n\) nodes can be represented by a \(\{0,1\}\)-valued \(n\times n\) matrix \(M\), where \(M_{i,j} = 1\) just in case there is an edge from the \(i\)th node to the \(j\)th node of \(G\).

For example, consider the following graph \(G\) with four nodes and five edges:

If we order the nodes a < b < c < d then we get the adjacency matrix

\[ M_G = \left(\begin{array}{cccc}0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0\end{array}\right) \]

which can be represented as the following Python value:

MG = [[0, 1, 0, 0],
      [0, 0, 1, 0],
      [1, 0, 0, 1],
      [0, 1, 0, 0]]

In the adjacency list representation of a directed graph, we just list the outgoing neighbors of each node. For this example, those are

\[ A_G = \left\{\begin{align*}a &: [b] \\ b &: [c] \\ c &: [a,d] \\ d &: [b] \end{align*}\right. \]

which we can represent by a Python dictionary:

AG = {'a': ['b'], 'b': ['c'], 'c': ['a', 'd'], 'd': ['b']}

Write a function matrix_to_adjlist(nodes, MG) that takes a list of node labels of length \(n\) and the \(n \times n\) adjacency matrix of a directed graph and returns the adjacency list representation of the same graph.

For example, running matrix_to_adjlist(['a','b','c','d'], MG) for MG as defined above should return {'a': ['b'], 'b': ['c'], 'c': ['a', 'd'], 'd': ['b']}.

Hint: you may find it helpful to use the built-in function enumerate(xs), which takes a list xs as input and returns a list of pairs (i,x) consisting of each element of xs paired with its corresponding index (x = xs[i]).

A graph \(G\) is said to be symmetric if for every edge \(x \to y\) there is an opposite edge \(y \to x\). This is easy to check using the adjacency matrix representation by checking that \(M_{i,j} = M_{j,i}\) for all indices \(i,j\). Likewise, it’s easy to compute the opposite of a graph, with an edge \(y \to x\) in \(G^{op}\) just in case there is an edge \(x \to y\) in \(G\): the adjacency matrix for \(G^{op}\) is just the matrix transpose \(M^\intercal\)! Of course we can also perform both of these computations directly with adjacency lists, but it is just a little more tricky.

Write a function is_symmetric(AG) that takes a directed graph represented in adjacency list form and checks whether it is symmetric.
Write a function revert_edges(AG) that takes a directed graph in adjacency list form and returns the opposite graph.

For example, running revert_edges(AG) on the example graph above should return {'a': ['c'], 'b': ['a', 'd'], 'c': ['b'], 'd': ['c']}.

University life: combining depth and breadth

We expect you to write your solution to this problem in a file named dfsbfs.py and to upload it using the form below. You can resubmit your solution as many times as you like.

Upload form is only available when connected

A university’s course curriculum may be considered as a graph whose nodes are courses and where there is an edge \(x \to y\) whenever to take course \(x\) you must have first passed course \(y\) (that is, when \(y\) is a prerequisite for \(x\)). For example, the graph below shows a partial rendering of the computer science curriculum at a certain university:

In order to take CSE301, you must have first passed CSE203 and CSE201, which requires having passed CSE102, which requires having passed CSE101. And so on.

Let’s encode this graph in adjacency list form:

CourseG = {101 : [], 102 : [101], 103 : [101],
           201 : [102], 202 : [103], 203 : [], 204 : [102, 201], 205 : [201], 206 : [201, 203],
           301 : [201, 203], 304 : [201], 305 : [201, 202], 308 : [203, 206]}

The following exercices can be solved using DFS traversals!

Write a function required(G, c) that takes a course graph G in adjacency list form and the name of a course c, and returns the total number of courses that have to be passed in order to pass c, including c itself.

For example, required(CourseG, 301) should evaluate to 5.

Hint: you can use a DFS traversal from c and count the number of visited nodes during the traversal.
Write a function required_list(G, c) that takes a course c and returns a list of courses that have to be passed in order to pass c (including c itself).

For example, required_list(CourseG, 301) should evaluate to [101, 102, 201, 203, 301] (or some permutation thereof).

Hint: what can you say about the set of visited nodes when doing a DFS traversal from c?
Write a function needed_for(G, c) that takes a course c and returns the number of courses that cannot be passed without first passing c (including c itself).

For example, needed_for(CourseG, 102) should evaluate to 9.

Hint: try to express needed_for using required.

When designing a university curriculum, it is important to ensure that there are no cyclic dependencies, i.e., a chain of course prerequisite relationships of the form \(c_1 \to c_2 \to \dots \to c_k \to c_1\) (making it impossible to enroll in any of the courses along the chain since you would have had to have first completed all of the courses in the chain).

Write a function cyclic_dependence(G) that checks whether there is a cyclic dependence in the course graph. If there is such a dependence \(c_1 \to c_2 \to \dots \to c_k \to c_1\), the function should return the list of courses along the chain [c1, ..., ck, c1] (where the first and last elements of the list coincide). If there is no dependence, the function should return an empty list.

Notice that, in general, there might be multiple cyclic dependencies. You should return any one of them.

Choosing the program of study that’s right for you can at times feel like navigating a maze. You already saw how to escape from mazes using backtracking in Tutorial 3, but that approach runs into trouble when there are exponentially many paths, and a better approach is to view maze navigation as a graph traversal problem. Indeed, by using the right kind of graph traversal we can always find the shortest escape from a maze.

In the following problems, a maze is defined by a rectangular 2D array where each value is either 0 or 1. Cells marked by 1 are empty space, and cells marked by 0 are walls. For example, the maze

is represented by the following 2D array:

example_maze = [ [1, 1, 1, 1, 1, 1, 1],
                 [1, 0, 0, 0, 0, 0, 1],
                 [1, 0, 1, 1, 1, 0, 1],
                 [1, 1, 1, 0, 1, 1, 1]
                ]

You are allowed to go from an empty cell to any other empty cell that shares a side with it. For example, here are two different paths through the maze:

The following exercices can be solved using BFS traversals!

Write a function shortest_escape_len(maze, start, finish) that takes as arguments a maze and the start and finish cells (given as pairs of numbers). The function should return the length of the shortest path from start to finish, or math.inf if no such path exist. The length of a path is the number of cells along the path minus one. It is guaranteed that the start and finish cells are empty.

For example, shortest_escape_len(example_maze, (0, 0), (3, 6)) should evaluate to 9, corresponding to the length of the red path in the example.
Write a function shortest_escape(maze, start, finish) with the same input data as in the previous problem that returns the shortest path from start to finish represented as a list of cell coordinates starting with start and ending with finish. If there is no path, return None.

Fish coloring (optional)

We expect you to write your solution to this problem in a file named fish.py and to upload it using the form below. You can resubmit your solution as many times as you like.

Upload form is only available when connected

Imagine you are responsible for setting up a series of fish tanks in a marine biology laboratory. Each tank can accommodate only fish which are compatible.

There are \(N\) fish, labeled from \(0\) to \(N-1\). For each fish, you are given a list of fish which are incompatible with it. Your job is to determine a distribution of the fish into the fish tanks, ensuring that only compatible fish can share the same tank, while trying to minimize the number of fish tanks.

For instance, you are given the following Python dictionary, where each key represents a fish, and each value is its list of incompatibility:

fish = { 0 : [1,2],
         1 : [0,2,3],
         2 : [0,1,4],
         3 : [1,4,5],
         4 : [2,3,5],
         5 : [3,4] }

In that case, you can for example use four fish tanks with a distribution represented by the array [1, 2, 3, 1, 2, 4], meaning that Fish 0 is assigned to Tank 1, Fish 1 assigned to Tank 2, and so on. But, you could also take the distribution [1, 2, 3, 3, 2, 1] using only three fish tanks, which is better.

Write first a “greedy algorithm” greedy_tank(fish,n), where fish is a dictionary describing the incompatibilities between fish, and n is the number of fish, that returns an array giving a valid assignment of fish. Your algorithm should proceed as follows:
- Start with an empty list or array to store the tanks assigned to each fish.
- Begin with the first fish. Assign Tank 1 to it.
- Move to the next fish that hasn’t been assigned yet.
- Look up the fish tanks of its incompatible fish, and from the remaining tanks assign the first one that is available.
- Repeat steps 4 and 5 until all fish are assigned.
Try your algorithm on the example given above, and on the following bigger example:
```
fish_big = { 0 : [1,2,3,7],
             1 : [0,2,3,7],
             2 : [0,1,3,8],
             3 : [0,1,2,8],
             4 : [6,8],
             5 : [6,7,8],
             6 : [4,5,7,8],
             7 : [0,1,5,6,8],
             8 : [2,3,4,5,6,7] }
```
You should get the following behavior:
```
In [1]: greedy_tank(fish,len(fish))
Out[1]: [1, 2, 3, 1, 2, 3]

In [2]: greedy_tank(fish_big,len(fish_big))
Out[2]: [1, 2, 3, 4, 1, 1, 2, 3, 5]
```
Theoretical question: Can you give a bound (in terms of the length of the longest list of incompatibilities) on the number of fish tanks used? Can you give some examples where this algorithm uses a lot of fish tanks, whereas only two fish tanks would be enough?
Theoretical question: The fish tank problem can be translated to a graph coloring problem, where we are given a graph and we want to color its nodes in a way that no two nodes sharing an edge have the same color. Can you elaborate how this translation works? What are the vertices and the edges of the graph?
To try to get an assignment of the fish using fewer fish tanks, consider the following refined algorithm: Sort the list of fish by the length of their incompatibility list, starting with the one with the longest list.
- Begin with the first fish. Assign Tank 1 to it.
- Traverse the (sorted) list of fish, and assign each fish to Tank 1 if it does not create some incompatibilities. You might want to create a “conflicted” set in which you store the union of the incompatibilities of the fish assigned to Tank 1.
- Repeat this procedure with Tank 2, etc.
Implement this algorithm in a function wp_tank(fish,n), where as before fish is a dictionary giving the incompatibilities, and n is the number of fish.

Try your algorithm on the examples given above. Check that you get a tank assignment which satisfies the contraints. You can use the function:
```
def check_tanks(fish,tanks):
    n = len(fish)
    if min(tanks) <= 0 :
        return False
    for i in range(n):
        col = tanks[i]
        for nf in  fish.get(i):
            if tanks[nf] == col:
                return False
    return True
```
The number of tanks should be 3 on fish (as for the first algorithm), but should be 4 on big_fish (which is an improvement compared to the first algorithm !).
Theoretical question: This algorithm is called the Welsh-Powell algorithm. Can you give some examples where this algorithm uses a lot of fish tanks, whereas only two fish tanks would be enough?