This assignment will be closed on October 20, 2024 (23:59:59).
You must be authenticated to submit your files

Tutorial 3: CSE101 Statistics

N. Aubrun d’après U. Fahrenberg

Objectives

What we will practice today: Reading and writing files, operations on lists.

Setup: Before You Start

Launch Spyder (from the Applications/Programming menu). If you see a “Spyder update” message, just click OK and ignore it.

Before you start: Create a new project, called Tutorial_3 (using “New Project” from the “Projects” menu). This will ensure that files from last week (and future weeks) do not get mixed up together.

Exercises

We will write a program to help evaluate the CSE101 course, using the grading scheme from 2021-22 and 2022-23 (this year is a little different!).

The inputs to our program are the scores for all tutorials, collected in a file CSE101data.txt which contains lines of the format

login_name s1 s2 ...

where

We want to compute final raw grades and general statistics on the (fictional) CSE101 class.

Data files

Before you start, you need to download the following data files:

Right-click on the link and select “Save link as…”, saving it in your Tutorial_3 project directory.

Check and make sure that both files have appeared in your Tutorial_3 project in Spyder.

We will mostly be working with the test data in CSE101test.txt below, but your code must also work on the data in CSE101data.txt. (Make sure to check this during each exercise.)

Exercise 1: Computing Averages

Create a new program file called stats.py. Make sure you get the filename right, or else the testing code won’t work.

Our first task is to define a function to compute the average of a list of numbers. Write a function average which takes as input a non-empty list of numbers and returns their average as a float.

We want floats to show (at most) 2 digits after the decimal point. We will use rounding for this; calling round(x, 2) on a float x returns x rounded to two decimal places.

Here is the start of your function:

def average(numlist):
    """Return average of list of numbers"""
    pass # remove this line and replace with your own code

Warning: Adding two numbers rounded to 2 digits does not necessarily yields a number rounded to two digits! Try for instance this:

In [1]: 0.99+1-0.90
Out[1]: 1.0899999999999999

Testing

Test your code in the console, after running stats.py. You should be able to replicate this example:

In [2]: average([7.5, 5.5, 7])
Out[2]: 6.67

Upload your solution via the form below for further automatic testing.

Upload form is only available when connected

Exercise 2: Convert a string of floats to a list

We now define a function to convert a string of floats into a list of numbers. Write a new function string_floats_to_list which takes as input a string representing a series of numerical values, separated by single spaces, and returns the list of these numerical values as floats.

Here is the start of your function:

def string_floats_to_list(string_floats):
    """Return a list from the string of floats string_floats."""
    pass # remove this line and replace with your own code

Hints:

  • Use float(s) to turn a string s into a float.
  • If s is a string, then s.split() returns the space-separated pieces of s (it does not modify s).

Testing

Now test your code in the console (after re-running stats.py). Here’s what your output should look like:

In [3]: string_floats_to_list('4.3 2.1 -3.2')
Out[3]: [4.3, 2.1, -3.2]

Once this test is passing correctly you can upload your solution to the form below for further automatic testing.

Upload form is only available when connected

Exercise 3: Student Data

Next we want a function which can read a line login_name s1 s2 ..., like the ones in our file and return the data in a usable format. Write a function student_data which takes as input a string as above and returns a tuple (login_name, results), where results is the list of scores s1, s2, … converted to floats.

Here is the start of your function:

def student_data(data_string):
    """Compute (name, results) tuple from the string data_string."""
    pass # remove this line and replace with your own code

Testing

Now test your code in the console, after running stats.py. Your output should look like this:

In [4]: student_data('gleb.pogudin 7.5 5.5 8')
Out[4]: ('gleb.pogudin', [7.5, 5.5, 8.0])

Upload your solution to the form below for further automatic testing.

Upload form is only available when connected

Exercise 4: Convert a tuple to a string

Now we want the inverse of the previous exercise: that is, we want a function which converts the components of a (name, results) tuple into a string.

Write a function student_data_to_string which takes as input a string and a list of floats, and returns a string obtained by concatenation of the string and elements of the list, separated by single spaces.

Here is the start of your function:

def student_data_to_string(name, results):
    """Return string from (name, results) tuple"""
    pass # remove this line and replace with your own code

Hint:

Use ' '.join(l) to construct a space-separated string out of a list l of strings. Example: ' '.join(['a', 'b']) becomes 'a b'.

Testing

Test your code in the console, after running stats.py. You should be able to replicate the following example:

In [5]: student_data_to_string('gleb.pogudin', [9.2,8.1,10])
Out[5]: 'gleb.pogudin 9.2 8.1 10'

Upload your solution to the form below, it will be tested automatically.

Upload form is only available when connected

Exercise 5: Reading Student Data

Now write a function read_student_data which reads a file with student data like in CSE101data.txt and returns a list containing student data tuples.

This is the start of your function:

def read_student_data(filename):
    """Return list of student data from file"""
    pass # remove this line and replace with your own code

Hint:

Use with open(filename,'r') as file: to open a file for reading. To read one line of text, use file.readline() and to read all the (remaining) lines, use file.readlines().

For instance, with

with open('sample.txt') as infile:
	list_lines = infile.readlines()

then list_lines is a list containing each line in sample.txt as a list item.

Testing

Test your code in the console (after re-running stats.py). You should be able to reproduce the following example:

In [6]: read_student_data('CSE101test.txt')
Out[6]: 
[('gleb.pogudin', [7.5, 5.5, 8.0, 3.6, 5.5, 8.0]),
 ('uli.fahrenberg', [2.0, 6.0, 2.5, 6.5, 3.0, 7.0]),
 ('stefan.mengel', [6.0, 6.0, 9.0, 5.0, 5.0, 8.0])]

Upload your solution to the form below, it will be tested automatically.

Upload form is only available when connected

Exercise 6: Extracting Averages

We will do more complicated statistics on our student data below, but let’s start off with something simple. Write a function extract_averages which reads a student data file (like CSE101data.txt) and returns a list of tuples, each containing a student name and the average of that student’s scores. As before, round floats to (at most) two decimal places, using round( x, 2).

This is the start of your function:

def extract_averages(filename):
    """Return list of name and average for each line in file"""
    pass # remove this line and replace with your own code

Hint: Use the function read_student_data to read in the file.

Testing

Test your code in the console, after running stats.py. You should be able to replicate the following example exactly:

In [7]: extract_averages('CSE101test.txt')
Out[7]: [('gleb.pogudin', 6.35), ('uli.fahrenberg', 4.5), ('stefan.mengel', 6.5)]

Now, upload your solution using the form below for further automatic testing.

Upload form is only available when connected

Exercise 7: Filtering Lists of Scores

To prepare for the next exercise, we need a function to filter lists of scores. Write a function discard_scores which takes a list numlist of (at least 5) numbers as input, and constructs a new list such that if numlist has \(n\) entries, then the result contains the top \(n-4\) of the last \(n-2\) entries in numlist.

In other words, moving from numlist to the new result list, we

  • first, discard the first two numbers from numlist;
  • then, discard the lowest two numbers from what remains;
  • and finally, return the list of remaining numbers.

Warning:

  • your function should not modify numlist!
  • the order of remaining items should be the same: do not use sort()!

As an example, applied to the list [2.0, 6.0, 6.5, 9.0, 6.5, 2.5], discard_scores will:

  • ignore the first two scores, 2.0 and 6.0,
  • and ignore the worst two of the remaining scores: that is, from [6.5, 9.0, 6.5, 2.5], it will keep only 9.0 and (one of the) 6.5.
  • the result is [9.0, 6.5].

In the second step, if the choice of the two lowest numbers is ambiguous, then we discard the first two of the lowest numbers.

Your function should start as follows:

def discard_scores(numlist):
    """Filter numlist: construct a new list from numlist with
    the first two, and then the lowest two, scores discarded.
    """
    pass # remove this line and replace with your own code

Testing

Now test your code in the console, after running stats.py. You should be able to reproduce the following example:

In [8]: discard_scores([2.0, 6.0, 6.5, 9.0, 6.5, 2.5])
Out[8]: [9.0, 6.5]

Can you think of some more interesting tests?

Once you have tried some different tests, you can upload your solution to the form below for automatic testing.

Upload form is only available when connected

Exercise 8: Summaries per Student

We want to use the student data in CSE101data.txt to write a file with student summaries. For each student, we

  • use discard_scores to discard the first two and lowest two scores;
  • take the sum of the remaining scores;
  • write a line to an output file containing
    • the student’s name, followed by a single space;
    • all remaining scores, separated by single spaces;
    • the string ' sum: ' followed by the computed sum.

Finally, the file should finish with a line containing the string 'total average: ' followed by the the average of all the summed scores. The file must end with a newline character \n. Also remember to round floats to (at most) two decimals.

Write a function summary_per_student which accomplishes this. Your function should start as follows:

def summary_per_student(infilename, outfilename):
    """Create summaries per student from the input file 
    and write the summaries to the output file.
    """
    data = read_student_data(infilename)
    pass # remove this line and replace with your own code

Hints:

  • Use open(outfile, 'w') to open a file for writing.
  • Use file.write('\n') to finish a line when writing to a file.

Testing

Test your code in the console, after re-running stats.py:

In [9]: summary_per_student('CSE101test.txt', 'out1.txt')

Now open the output file out1.txt (look in the list of files on the left of the main Spyder window). The generated file out1.txt should look like this:

gleb.pogudin 8.0 8.0 sum: 16.0
uli.fahrenberg 6.5 7.0 sum: 13.5
stefan.mengel 9.0 8.0 sum: 17.0
total average: 15.5

Now, a second test:

In [10]: summary_per_student('CSE101data.txt', 'out2.txt')

Have a look at out2.txt to see whether your code works on this more realistic data.

Finally, upload your solution using the form below; it will be tested automatically.

Upload form is only available when connected

Exercise 9: Summaries per Tutorial

Now we want to create summaries for each tutorial. For each tutorial, we write to an output file a line containing the name of the tutorial together with its average, minimum, and maximum scores.

Since this is a French school, the tutorials are named TD1, TD2, TD3, etc (“TD” meaning travaux dirigés).

Note that no scores are to be removed from the information; we want the complete picture here. As usual, we use round on our floats before displaying them, to show (at most) 2 digits after the decimal point. The file must end with a newline character \n.

Write a function summary_per_tutorial which accomplishes this. Here’s a start:

def summary_per_tutorial(infilename, outfilename):
    """Create summaries per student from infile and write to outfile."""
    data = read_student_data(infilename)
    pass # remove this line and replace with your own code

Testing

Test your code in the console, after running stats.py, as follows:

In [11]: summary_per_tutorial('CSE101test.txt', 'out3.txt')

Here’s what the generated file out3.txt should look like:

TD1: average: 5.17 min: 2.0 max: 7.5
TD2: average: 5.83 min: 5.5 max: 6.0
TD3: average: 6.5 min: 2.5 max: 9.0
TD4: average: 5.03 min: 3.6 max: 6.5
TD5: average: 4.5 min: 3.0 max: 5.5
TD6: average: 7.67 min: 7.0 max: 8.0

Now try running your code on some more data:

In [12]: summary_per_tutorial('CSE101data.txt', 'out4.txt')

Have a look at out4.txt to see whether your code works on this data too.

Once this is done, you can upload your solution using the form below for automatic testing.

Upload form is only available when connected

Optional Exercise 10: Sending Results to Students

Copy your stats.py into a new file, stats_plus.py; this is where you will program the optional function generate_emails.

We also want to send students emails with their results. For each login_name in our database CSE101data.txt, we want to create an email with the following contents:

To: login_name@polytechnique.edu

This is to notify you of your final results for the CSE101 course, see
table below.  (Note that the two first and two lowest scores are
excluded from the result.)

TD1  TD2  ...  Result
---------------------
a.b  c.d  ...  xy.z

Best regards,
and please get back to me if you have any questions,
Your Teacher

Here, login_name is replaced with the student’s login name, and a.b etc. are replaced with the student’s scores. The result column contains the sum of the scores filtered by discard_scores

The columns in the table are aligned to the left and separated by two spaces; you may assume that there are at most 9 TDs. The email is to be written to a file login_name.txt.

Write a function generate_emails which accomplishes this. Here’s a start:

def generate_emails(filename):
    """Generate emails to students with their results"""
    data = read_student_data(filename)
    pass # remove this line and replace with your own code

Testing

Test your code in the console, after running stats_plus.py, as follows:

In [13]: generate_emails('CSE101test.txt')

This should generate three files; here’s what uli.fahrenberg.txt should look like:

To: uli.fahrenberg@polytechnique.edu

This is to notify you of your final results for the CSE101 course, see
table below.  (Note that the two first and two lowest scores are
excluded from the result.)

TD1  TD2  TD3  TD4  TD5  TD6  Result
------------------------------------
2.0  6.0  2.5  6.5  3.0  7.0  13.5

Best regards,
and please get back to me if you have any questions,
Your Teacher

Does your code also work for CSE101data.txt?

Upload your solution to the form below.

Upload form is only available when connected