Tutorial 3: CSE101 Statistics
Objectives
What we will practice today: Reading and writing files, operations on lists.
Setup: Before You Start
Launch Spyder (from the Applications/Programming menu). If you see a “Spyder update” message, just click OK and ignore it.
Before you start: Create a new
project, called Tutorial_3
(using “New Project”
from the “Projects” menu). This will ensure that files from last week
(and future weeks) do not get mixed up together.
Exercises
We will write a program to help evaluate the CSE101 course, using the grading scheme from 2021-22 and 2022-23 (this year is a little different!).
The inputs to our program are the scores for all tutorials, collected
in a file CSE101data.txt
which contains lines of the
format
login_name s1 s2 ...
where
login_name
looks like an X login name (e.g.mickey.mouse
)s1
,s2
etc. are numbers between 0 and 10, representing point scores for weeks in CSE101.
We want to compute final raw grades and general statistics on the (fictional) CSE101 class.
Data files
Before you start, you need to download the following data files:
Right-click on the link and select “Save link as…”, saving it in your Tutorial_3 project directory.
Check and make sure that both files have appeared in your Tutorial_3 project in Spyder.
We will mostly be working with the test data in
CSE101test.txt
below, but your code must also work on the
data in CSE101data.txt
. (Make sure to check this during
each exercise.)
Exercise 1: Computing Averages
Create a new program file called
stats.py
. Make sure you get the filename right, or else the
testing code won’t work.
Our first task is to define a function to compute the
average of a list of numbers. Write a function
average
which takes as input a non-empty list of numbers
and returns their average as a float.
We want floats to show (at most) 2 digits after the decimal point. We
will use rounding for this; calling round(x, 2)
on
a float x
returns x
rounded to two decimal
places.
Here is the start of your function:
def average(numlist):
"""Return average of list of numbers"""
pass # remove this line and replace with your own code
Warning: Adding two numbers rounded to 2 digits does not necessarily yields a number rounded to two digits! Try for instance this:
In [1]: 0.99+1-0.90
Out[1]: 1.0899999999999999
Testing
Test your code in the console, after running
stats.py
. You should be able to replicate this example:
In [2]: average([7.5, 5.5, 7])
Out[2]: 6.67
Upload your solution via the form below for further automatic testing.
Exercise 2: Convert a string of floats to a list
We now define a function to convert a string of floats into a list of
numbers. Write a new function
string_floats_to_list
which takes as input a string
representing a series of numerical values, separated by single spaces,
and returns the list of these numerical values as floats.
Here is the start of your function:
def string_floats_to_list(string_floats):
"""Return a list from the string of floats string_floats."""
pass # remove this line and replace with your own code
Hints:
- Use
float(s)
to turn a strings
into a float. - If
s
is a string, thens.split()
returns the space-separated pieces ofs
(it does not modifys
).
Testing
Now test your code in the console (after re-running
stats.py
). Here’s what your output should look like:
In [3]: string_floats_to_list('4.3 2.1 -3.2')
Out[3]: [4.3, 2.1, -3.2]
Once this test is passing correctly you can upload your solution to the form below for further automatic testing.
Exercise 3: Student Data
Next we want a function which can read a line
login_name s1 s2 ...
, like the ones in our file and return
the data in a usable format. Write a function
student_data
which takes as input a string as above and
returns a tuple (login_name, results)
, where
results
is the list of scores s1
,
s2
, … converted to floats.
Here is the start of your function:
def student_data(data_string):
"""Compute (name, results) tuple from the string data_string."""
pass # remove this line and replace with your own code
Testing
Now test your code in the console, after running
stats.py
. Your output should look like this:
In [4]: student_data('gleb.pogudin 7.5 5.5 8')
Out[4]: ('gleb.pogudin', [7.5, 5.5, 8.0])
Upload your solution to the form below for further automatic testing.
Exercise 4: Convert a tuple to a string
Now we want the inverse of the previous exercise: that is, we want a
function which converts the components of a (name, results)
tuple into a string.
Write a function student_data_to_string
which takes as input a string and a list of floats, and returns a string
obtained by concatenation of the string and elements of the list,
separated by single spaces.
Here is the start of your function:
def student_data_to_string(name, results):
"""Return string from (name, results) tuple"""
pass # remove this line and replace with your own code
Hint:
Use ' '.join(l)
to construct a space-separated string
out of a list l
of strings. Example:
' '.join(['a', 'b'])
becomes 'a b'
.
Testing
Test your code in the console, after running
stats.py
. You should be able to replicate the following
example:
In [5]: student_data_to_string('gleb.pogudin', [9.2,8.1,10])
Out[5]: 'gleb.pogudin 9.2 8.1 10'
Upload your solution to the form below, it will be tested automatically.
Exercise 5: Reading Student Data
Now write a function read_student_data
which reads a file with student data like in CSE101data.txt
and returns a list containing student data tuples.
This is the start of your function:
def read_student_data(filename):
"""Return list of student data from file"""
pass # remove this line and replace with your own code
Hint:
Use with open(filename,'r') as file:
to open a file for
reading. To read one line of text, use file.readline()
and
to read all the (remaining) lines, use
file.readlines()
.
For instance, with
with open('sample.txt') as infile:
= infile.readlines() list_lines
then list_lines
is a list containing each line in
sample.txt
as a list item.
Testing
Test your code in the console (after re-running
stats.py
). You should be able to reproduce the following
example:
In [6]: read_student_data('CSE101test.txt')
Out[6]:
[('gleb.pogudin', [7.5, 5.5, 8.0, 3.6, 5.5, 8.0]),
('uli.fahrenberg', [2.0, 6.0, 2.5, 6.5, 3.0, 7.0]),
('stefan.mengel', [6.0, 6.0, 9.0, 5.0, 5.0, 8.0])]
Upload your solution to the form below, it will be tested automatically.
Exercise 6: Extracting Averages
We will do more complicated statistics on our student data below, but
let’s start off with something simple. Write a function
extract_averages
which reads a student data file (like
CSE101data.txt
) and returns a list of tuples, each
containing a student name and the average of that student’s scores.
As before, round floats to (at most) two decimal places, using
round( x, 2)
.
This is the start of your function:
def extract_averages(filename):
"""Return list of name and average for each line in file"""
pass # remove this line and replace with your own code
Hint: Use the function
read_student_data
to read in the file.
Testing
Test your code in the console, after running
stats.py
. You should be able to replicate the following
example exactly:
In [7]: extract_averages('CSE101test.txt')
Out[7]: [('gleb.pogudin', 6.35), ('uli.fahrenberg', 4.5), ('stefan.mengel', 6.5)]
Now, upload your solution using the form below for further automatic testing.
Exercise 7: Filtering Lists of Scores
To prepare for the next exercise, we need a function to filter lists
of scores. Write a function discard_scores
which takes a list numlist
of (at least 5) numbers as
input, and constructs a new list such that if
numlist
has \(n\) entries,
then the result contains the top \(n-4\) of the last \(n-2\) entries in numlist
.
In other words, moving from numlist
to the new result
list, we
- first, discard the first two numbers from
numlist
; - then, discard the lowest two numbers from what remains;
- and finally, return the list of remaining numbers.
Warning:
- your function should not modify
numlist
! - the order of remaining items should be the same: do not
use
sort()
!
As an example, applied to the list
[2.0, 6.0, 6.5, 9.0, 6.5, 2.5]
, discard_scores
will:
- ignore the first two scores,
2.0
and6.0
, - and ignore the worst two of the remaining scores: that is, from
[6.5, 9.0, 6.5, 2.5]
, it will keep only9.0
and (one of the)6.5
. - the result is
[9.0, 6.5]
.
In the second step, if the choice of the two lowest numbers is ambiguous, then we discard the first two of the lowest numbers.
Your function should start as follows:
def discard_scores(numlist):
"""Filter numlist: construct a new list from numlist with
the first two, and then the lowest two, scores discarded.
"""
pass # remove this line and replace with your own code
Testing
Now test your code in the console, after running
stats.py
. You should be able to reproduce the following
example:
In [8]: discard_scores([2.0, 6.0, 6.5, 9.0, 6.5, 2.5])
Out[8]: [9.0, 6.5]
Can you think of some more interesting tests?
Once you have tried some different tests, you can upload your solution to the form below for automatic testing.
Exercise 8: Summaries per Student
We want to use the student data in CSE101data.txt
to
write a file with student summaries. For each student, we
- use
discard_scores
to discard the first two and lowest two scores; - take the sum of the remaining scores;
- write a line to an output file containing
- the student’s name, followed by a single space;
- all remaining scores, separated by single spaces;
- the string
' sum: '
followed by the computed sum.
Finally, the file should finish with a line containing the string
'total average: '
followed by the the average of
all the summed scores. The file must end with a newline
character \n
. Also remember to round floats to (at
most) two decimals.
Write a function summary_per_student
which accomplishes
this. Your function should start as follows:
def summary_per_student(infilename, outfilename):
"""Create summaries per student from the input file
and write the summaries to the output file.
"""
= read_student_data(infilename)
data pass # remove this line and replace with your own code
Hints:
- Use
open(outfile, 'w')
to open a file for writing. - Use
file.write('\n')
to finish a line when writing to a file.
Testing
Test your code in the console, after re-running
stats.py
:
In [9]: summary_per_student('CSE101test.txt', 'out1.txt')
Now open the output file out1.txt
(look in the list of
files on the left of the main Spyder window). The generated file
out1.txt
should look like this:
gleb.pogudin 8.0 8.0 sum: 16.0
uli.fahrenberg 6.5 7.0 sum: 13.5
stefan.mengel 9.0 8.0 sum: 17.0
total average: 15.5
Now, a second test:
In [10]: summary_per_student('CSE101data.txt', 'out2.txt')
Have a look at out2.txt
to see whether your code works
on this more realistic data.
Finally, upload your solution using the form below; it will be tested automatically.
Exercise 9: Summaries per Tutorial
Now we want to create summaries for each tutorial. For each tutorial, we write to an output file a line containing the name of the tutorial together with its average, minimum, and maximum scores.
Since this is a French school, the tutorials are named TD1, TD2, TD3, etc (“TD” meaning travaux dirigés).
Note that no scores are to be removed from the information; we want
the complete picture here. As usual, we use round
on our
floats before displaying them, to show (at most) 2 digits after the
decimal point. The file must end with a newline character
\n
.
Write a function summary_per_tutorial
which accomplishes
this. Here’s a start:
def summary_per_tutorial(infilename, outfilename):
"""Create summaries per student from infile and write to outfile."""
= read_student_data(infilename)
data pass # remove this line and replace with your own code
Testing
Test your code in the console, after running
stats.py
, as follows:
In [11]: summary_per_tutorial('CSE101test.txt', 'out3.txt')
Here’s what the generated file out3.txt
should look
like:
TD1: average: 5.17 min: 2.0 max: 7.5
TD2: average: 5.83 min: 5.5 max: 6.0
TD3: average: 6.5 min: 2.5 max: 9.0
TD4: average: 5.03 min: 3.6 max: 6.5
TD5: average: 4.5 min: 3.0 max: 5.5
TD6: average: 7.67 min: 7.0 max: 8.0
Now try running your code on some more data:
In [12]: summary_per_tutorial('CSE101data.txt', 'out4.txt')
Have a look at out4.txt
to see whether your code works
on this data too.
Once this is done, you can upload your solution using the form below for automatic testing.
Optional Exercise 10: Sending Results to Students
Copy your stats.py
into a new file,
stats_plus.py
; this is where you will program the optional
function generate_emails
.
We also want to send students emails with their results. For each
login_name
in our database CSE101data.txt
, we
want to create an email with the following contents:
To: login_name@polytechnique.edu
This is to notify you of your final results for the CSE101 course, see
table below. (Note that the two first and two lowest scores are
excluded from the result.)
TD1 TD2 ... Result
---------------------
a.b c.d ... xy.z
Best regards,
and please get back to me if you have any questions,
Your Teacher
Here, login_name
is replaced with the student’s login
name, and a.b
etc. are replaced with the student’s scores.
The result
column contains the sum of the scores
filtered by discard_scores
The columns in the table are aligned to the left and separated by
two spaces; you may assume that there are at most 9 TDs. The
email is to be written to a file login_name.txt
.
Write a function generate_emails
which accomplishes
this. Here’s a start:
def generate_emails(filename):
"""Generate emails to students with their results"""
= read_student_data(filename)
data pass # remove this line and replace with your own code
Testing
Test your code in the console, after running
stats_plus.py
, as follows:
In [13]: generate_emails('CSE101test.txt')
This should generate three files; here’s what
uli.fahrenberg.txt
should look like:
To: uli.fahrenberg@polytechnique.edu
This is to notify you of your final results for the CSE101 course, see
table below. (Note that the two first and two lowest scores are
excluded from the result.)
TD1 TD2 TD3 TD4 TD5 TD6 Result
------------------------------------
2.0 6.0 2.5 6.5 3.0 7.0 13.5
Best regards,
and please get back to me if you have any questions,
Your Teacher
Does your code also work for CSE101data.txt
?
Upload your solution to the form below.