Quiz

To answer the questions in the quiz you need to run the python script in the quiz/ directory.

The quiz has questions about two datasets that have data about salaries and houses.

Salary

The Salary dataset consists of observations on six variables for 52 tenure-track professors in a small college. The files salary_train.csv and salary_test.csv are the trainging and testing dataset, respectively.

The variables (features) are:
- salary = Academic year salary, in dollars.
- ry = Number of years in current rank.
- yd = Number of years since highest degree was earned.
- sex = 1 for male, 0 for female.
- associate = 1 for associate professor, 0 otherwise (assistant professor is the reference value for the rank).
- full = 1 for full professor, 0 otherwise.
- phd = 1 if (s)he has a PhD, 0 if (s)he has a master.

Hint for question 1.1: gender discrimination

Test the regression of salary against the gender (i.e., male). For this, use the salary_* CSV files, and column 3 - "sex" (for X) for regression; the salary (i.e., y) is in the 0-th column. You can use the td7.py script in the quiz/ directory, which will need to be customized.

Maisons

The Maisons dataset consists of a random sample of records of resales of homes from Feb 15 to Apr 30, 1993 from the files maintained by the Albuquerque Board of Realtors. The (pair of) files (maisons_train.csv and maisons_test.csv) are training and the testing dataset, respectively.

There are 117 cases.

The variable names are as follows:

The td7.py script contains a normalize function that can be called from get_data with appropriate arguments.