Stat209/HRP239/Ed260A Feb 10 2018
Take Home Problems #1
Usual Honor Code procedures:
You may use any inanimate resources--no collaboration. This work
is done under Stanford's Honor Code.
Questions regarding wording, interpretation etc are encouraged.
I will, however, stop short of answering queries on how to do the question.
Write to Rogosa (only). Any useful info, if needed, will be posted on
the assignments page below the TH1 link. I have checked and double-checked
(this AM) that all data sets are acessible as described in each of the
problems. But tell me if you experience any issues.
Solutions for these problems are to be submitted in hard-copy
form, in class 2/16/18. Given that these problems are untimed, some care
should be taken in presentation, clarity, format. Especially
important is to give full and clear answers to questions, not
just to submit unannotated computer output, although relevant
output should be included. Production values are at your discretion.
Make sure any computing is integrated into the problem answer (not
a pile attached at the end).
Note, there are two problems, instead of the usual three. You are welcome.
or if you are away, make arrangements for submission with Rogosa
easiest (for me) option is
Statistics Department Fax (address to Rogosa)
Department of Statistics -- Sequoia Hall
390 Serra Mall
Stanford University
Stanford, CA 94305-4065
Phone: (650) 723-2620
Fax: (650) 725-8977
This exam covers the content of weeks 1 through 4, also Computer Labs 1 and 2.
------------------------------
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Problem 1. Nip/Tuck for profs?
The publication is:
Hamermesh, D.S., and Parker, A. (2005).
Beauty in the Classroom: Instructors' Pulchritude and Putative Pedagogical Productivity.
Economics of Education Review, 24, 369-376.
Probably what this proves is that in social science you can publish most anything.
But we can have some fun with these data anyways.
The data exist in the AER package (which will be used in Lab3 to obtain the
Stata-euphonic "ivreg" function, so you might as well install
the AER package) as TeachingRatings
I also for convenience placed the data at
http://statweb.stanford.edu/~rag/stat209/evals
Description
Data on course evaluations, course characteristics, and professor
characteristics for 463 courses for the academic years 2000-2002
at the University of Texas at Austin.
Details
A sample of student instructional ratings for a group of
university teachers along with beauty rating (average from six
independent judges) and a number of other characteristics.
TeachingRatings
Format
A data frame containing 463 observations on 13 variables.
minority factor. Does the instructor belong to a minority (non-Caucasian)?
age the professor's age.
gender factor indicating instructor's gender.
credits factor. Is the course a single-credit elective (e.g., yoga, aerobics, dance)?
beauty rating of the instructor's physical appearance by a panel of six students, averaged across
the six panelists, shifted to have a mean of zero.
eval course overall teaching evaluation score, on a scale of 1 (very unsatisfactory) to 5 (excellent).
division factor. Is the course an upper or lower division course? (Lower division courses are mainly
large freshman and sophomore courses)?
native factor. Is the instructor a native English speaker?
tenure factor. Is the instructor on tenure track?
students number of students that participated in the evaluation.
allstudents number of students enrolled in the course.
prof factor indicating instructor identifier.
The publication used a prediction model of the sort
lm(eval ~ beauty + gender + minority + native + tenure + division + credits,
weights = students, data = TeachingRatings)
to see whether beauty was important for teaching evaluations
A couple feature of the data:
The data set consists of 12 measures on each of 463 courses, i.e.,
> dim(TeachingRatings)
[1] 463 12
These 463 courses have 94 distinct instructors (some with as many as 13 courses in
the data set). We will follow the authors by doing these analyses at the course level,
as does the publication. That is we ignore (wisely or not) that source of
non-independence from that source, in part because each of the courses (should)
have different student compositions who are the ones making the evals.
> head(TeachingRatings)
minority age gender credits beauty eval division native tenure students allstudents prof
1 yes 36 female more 0.2899157 4.3 upper yes yes 24 43 1
2 no 59 male more -0.7377322 4.5 upper yes yes 17 20 2
3 no 51 male more -0.5719836 3.7 upper yes yes 55 55 3
4 no 40 female more -0.6779634 4.3 upper yes yes 40 46 4
5 no 31 female more 1.5097940 4.4 upper yes yes 42 48 5
6 no 62 male more 0.5885687 4.2 upper yes yes 182 282 6
> fivenum(tapply(eval, prof, length))
22 46 81 9 50
1 3 4 7 13
> length(unique(prof))
[1] 94
######################## part 1
In part 1 of this exercise we will focus on differential outcome by
instructor gender-- i.e., is there gender bias in student ratings of instruction?
That is, we are starting off fresh with this data set; what the authors' did is
not relevant yet (until part 2)
a. give a 95% confidence interval for gender effect on evals. Confirm that
there exist gender differences on both age and beauty. How would those complicate
the interpretation of gender differences in eval?
b. Give a point and interval estimate for the correlation between beauty and eval.
Evaluate whether this correlation is spurious with instructor gender as the supposed
common cause/influence.
c. Perhaps instead, beauty is seen as a mediating variable in the influence of instructor
gender on eval. Evaluate beauty as a mediating variable.
d. Consider the subset of these data: courses taught by instructors who are tenure track.
How does this subset appear to alter the types of courses included?
Repeat parts (a) and (b) using this subset of 361 courses.
Comment on results.
> table(tenure)
tenure
no yes
102 361
######################part 2
e. Carry out the publication's estimation equation for both the full set of
463 courses and the subset of 361 courses taught by tenure track instructors.
Do these multiple regressions agree with your indications in parts a and d?
What interpretation would you give for the coefficient of beauty in these fits?
note: The 'weights' statement gives more weight to the courses with larger numbers of students
completing evaluations (done in an attempt to improve precision). (You likely saw that
in your intro courses).
f. Regression Diagnostics. Refit the author's prediction equation without
doing the weighting (no weights statement) for all 463 courses.
Construct the adjusted variables plot for the coefficient of beauty and
demonstrate that the slope for a fit to that plot matches the obtained coefficient
for beauty.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Problem 2
Familiar Examples
NELS data; Math Achievement, SES, Homework etc
Math test performance in 23 schools
Data set for this problem contains information on studentsâ€™ performance on a math test, as well as several
explanatory variables. These data are subset of the NELS-88 data (National Education Longitudinal
Study of 1988). Both a selected number of variables and a selected number of observations are given
here.
Our dataset exists at http://rogosateaching.com/stat209/sch23
This dataset was adapted from (see below) the school23 dataframe in the influence.ME package
Format
A data frame with 519 observations on the following 8 variables:
school.ID SES mean.SES ratio math sex urban hw
school.ID a factor with 23 levels, representing the 23 schools within which students are nested.
> length(unique(school.ID))
[1] 23
SES a numeric vector, representing the socio-economic status
mean.SES a numeric vector, representing the mean socio-economic status per school
ratio a numeric vector, representing the student-teacher ratio
math a numeric vector, representing the number of correct answers on a mathematics test
sex a factor with levels Male and Female
urban a factor with levels Urban, Suburban, and Rural
In the original dataset
homework a factor representing the time spent on math homework each week, with levels None,
Less than 1 hour, 1 hour, 2 hours, 3 hours, 4-6 hours, 7-9 hours, and 10 or more
> table(homework)
homework
None Less than 1 hour 1 hour 2 hours 3 hours 4-6 hours 7-9 hours 10 or more
42 225 111 47 47 38 6 3
For better or worse I made the measure "hw" in the provided dataset by recoding those categories
into something you can use (or are encouraged to use) as an interval measure.
> #recode of homework to numeric hw
> library(car)
> school23$hw = recode(school23$homework, "'None' = 0 ; 'Less than 1 hour' = .5 ; '1 hour' = 1 ; '2 hours' = 2 ; '3 hours' = 3 ; '4-6 hours' = 5 ; '7-9 hours' = 8 ; '10 or more' = 10 ")
--------------------------------------------
Compute and interpret the traditional multilevel data quantities (point estimates) for math predicted by SES
part a. aggregation bias, for using school means in place of individual data
part b. contextual effect-- (supposed) estimate of the effect on math for a student being in a school
with one unit higher mean score on SES, student "held constant". Also provide an interval estimate
for the contextual effect.
part c. Is there a relationship between student gender and the estimated contextual effect? I.e. does the contextual effect depend on gender?
## homework and math achievement
part d. Obtain the 23 separate OLS fits for hw predicting math. Present a five number summary of the gradients.
How many schools have a negative gradient (i.e. more hw, lower score indication)? Offer explanation for any large negative slopes.
part e. Do schools with different levels of student SES have a different relation of math with hw? Use a mixed model to investigate
the relationship between math and hours of homework. Explain the results. Then formulate a mixed model that allows school mean SES
to influence the relation between math and hw. Interpret the results. Does a model incorporating school mean SES improve the model fit?
================================================
END TH1 2018