CSE 305 - Tutorial 5

In this tutorial, you will be asked to implement classes/functions described below. The exact definitions and automated tests are given to you in the following archive.

Your task is to fill the implementations of the functions in td5.cpp and submit it on this page by 23:59 May 22. The solution will be graded first by autograder (40% of the grade) and then manually (remaining 60% of the grade). You are allowed (and encouraged) to revise your code based on the output of autograder and resubmit again before the deadline. The demo code is available here.

In the first two exercises we are using only good old locks, and in the next two we proceed to condition variables.

Based on the thread-safe blocking bounded queue from the lecture, implement its unbounded version by using std::queue instead of a vector in a class SafeUnboundedQueueLock. The class should have a method push and pop_nonblocking. The latter should throw an exception if the queue is empty. Detailed specification is given in td5.cpp.
For the unbounded queue from the previous exercise, implement a method pop_blocking which does not throw an exception but uses busy waiting: if queue is empty, it will release the lock and try again until it succeeds. Detailed specification is given in td5.cpp
Write a function DivideOnceEven(cv, m, n) which takes as input references to a condition variable cv, mutex m, and integer n. Then the function should lock the mutex and use this lock to wait on the condition variable until n becomes even. Once n becomes even, the function should divide it by 2 and terminate.
Now implement a safe undounded queue in a new class SafeUnboundedQueueCV with blocking pop but without busy waiting. Use condition variables as explained in the last lecture. Detailed specification is given in td5.cpp.

Upload your file td5.cpp:

Upload form is only available when connected

In the case you would like to practice a bit more GPU computing, you are invited to solve the following optional exercises.

In these two exercises we will be interested in solving a PDE \(u_t = C u_x\) using a finite difference scheme with given initial condition \(u(0, x)\) and periodic boundary condition \(u(t, 0) = u(t, 2\pi)\). The idea is to introduce timestep \(\Delta t\) and space step \(\Delta x\) and consider a grid of points of the form \((i \Delta t, j \Delta x)\). Then the finite difference approximation \(u((i + 1)\Delta t, j \Delta x) \approx u(i\Delta t, j\Delta x) + \frac{C\Delta t}{\Delta x} (u(i\Delta t, (j + 1)\Delta x) - u(i\Delta t, j \Delta x))\) can be used to compute the value on the grid starting from \(t = 0\) level by level. You are given a sequential code which does this. Your task is to write a GPU code for this. Note that you need to transfer data between the device and host only at the very beginning and at the very end, and on the intermediate iterations you can just run the kernel again.

For the parameters specified in the input file (difference_scheme.cu), you should get 6-7 fold speedup, use profiler to check how much time is spent on actual computation.
Implement another version of the function from the problem above (SolvePDEGPU2) which uses shared memory. Using profiler, you should see a minor speedup.