To answer the questions in the quiz you need to run the python
script in the quiz/
directory.
The quiz has questions about two datasets that have data about salaries and houses.
The Salary dataset consists of observations on six variables for 52
tenure-track professors in a small college. The files
salary_train.csv
and salary_test.csv
are the trainging and testing dataset, respectively.
The variables (features) are:
- salary = Academic year salary,
in dollars.
- ry = Number of years in current rank.
- yd = Number of years since highest degree was earned.
- sex = 1 for male, 0 for female.
- associate = 1 for associate professor, 0 otherwise (assistant
professor is the reference value for the rank).
- full = 1 for full professor, 0 otherwise.
- phd = 1 if (s)he has a PhD, 0 if (s)he has a master.
Test the regression of salary against the gender (i.e., male). For
this, use the salary_* CSV files, and column 3 - "sex" (for X) for
regression; the salary (i.e., y) is in the 0-th column. You can use the
td7.py
script in the quiz/ directory, which will need to be customized.
The Maisons dataset consists of a random sample of records of resales
of homes from Feb 15 to Apr 30, 1993 from the files maintained by the
Albuquerque Board of Realtors. The (pair of) files
(maisons_train.csv
and
maisons_test.csv
) are training and the testing
dataset, respectively.
There are 117 cases.
The variable names are as follows:
The td7.py
script contains a normalize
function that can be called from get_data
with appropriate arguments.