Stat209/HRP239/Ed260A March 12 2018
Take Home Problems 2
Usual Honor Code procedures:
You may use any inanimate resources--no collaboration. This work
is done under Stanford's Honor Code.
Questions regarding wording, interpretation etc are encouraged.
I will, however, stop short of answering queries on how to do the question.
Write to Rogosa (only). Any useful info, if needed, will be posted on
the assignments page below the TH2 link.
I have checked that new data sets (and prior ones) (this AM) are accessible
as described in each of the problems.
Our revised schedule has solutions for these problems submitted in hard-copy
form during the exam schedule time Monday 3/19/18 3:30 PM. I will be in the
Classroom Sequoia 200 or in nearby office Sequoia 224, until 5PM,
eager to receive papers.
Given that these problems are untimed, some care
should be taken in presentation, clarity, format. Especially
important is to give full and clear answers to questions, not
just to submit unannotated computer output, although relevant
computing should be included. Production values are at your discretion.
Make sure any computing is integrated into the problem answer (not
a pile attached at the end).
or if you are away, easiest (for me) option is
Statistics Department Fax (address to Rogosa)
Department of Statistics -- Sequoia Hall
390 Serra Mall
Stanford University
Stanford, CA 94305-4065
Phone: (650) 723-2620
Fax: (650) 725-8977
This exam covers the content of weeks 5 through 8, also Computer Labs 3 and 4.
------------------------------
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Problem 1.
Problem 1. Revisit Nip/Tuck for profs?
Trying to keep things simplified, we reuse this data set in Problem 1 and Problem 3.
The publication is:
Hamermesh, D.S., and Parker, A. (2005).
Beauty in the Classroom: Instructors' Pulchritude and Putative Pedagogical Productivity.
Economics of Education Review, 24, 369-376.
Probably what this proves is that in social science you can publish most anything.
But we can have some fun with these data anyways.
The data exist in the AER package (which was used in Lab3 to obtain the
Stata-euphonic "ivreg" function equivalent to tsls) as TeachingRatings
I also for convenience placed the data at
http://www-stat.stanford.edu/~rag/stat209/evals
Description
Data on course evaluations, course characteristics, and professor
characteristics for 463 courses for the academic years 2000-2002
at the University of Texas at Austin.
Details
A sample of student instructional ratings for a group of
university teachers along with beauty rating (average from six
independent judges) and a number of other characteristics.
TeachingRatings
Format
A data frame containing 463 observations on 13 variables.
minority factor. Does the instructor belong to a minority (non-Caucasian)?
age the professor's age.
gender factor indicating instructor's gender.
credits factor. Is the course a single-credit elective (e.g., yoga, aerobics, dance)?
beauty rating of the instructor's physical appearance by a panel of six students, averaged across
the six panelists, shifted to have a mean of zero.
eval course overall teaching evaluation score, on a scale of 1 (very unsatisfactory) to 5 (excellent).
division factor. Is the course an upper or lower division course? (Lower division courses are mainly
large freshman and sophomore courses)?
native factor. Is the instructor a native English speaker?
tenure factor. Is the instructor on tenure track?
students number of students that participated in the evaluation.
allstudents number of students enrolled in the course.
prof factor indicating instructor identifier.
The publication used a prediction model of the sort
lm(eval ~ beauty + gender + minority + native + tenure + division + credits,
weights = students, data = TeachingRatings)
to see whether beauty was important for teaching evaluations.
We won't examine or revisit that analysis, nor for our use of these data
will we bother with weighting, not so clear that's appropriate
A couple feature of the data:
The data set consists of 12 measures on each of 463 courses, i.e.,
> dim(TeachingRatings)
[1] 463 12
These 463 courses have 94 distinct instructors (some with as many as 13 courses in
the data set). We won't worry here about non-independence from that source, in part
because each of the courses (should) have different student compositions who are
the ones making the evals.
For the purposes of this exercise we will focus again on differential outcome by
instructor gender-- i.e., is there gender bias in student ratings of instruction?
a. give a 95% confidence interval for gender effect on evals. This time please pay
attention to the assumptions of your two-sample inference procedure.
Alternatively, try regression adjustment via analysis of covariance. Does
using age or beauty or both as covariates alter the conclusion?
b. Consider again the subset of these data for instructors who are tenure track.
How does this subset appear to alter the courses included?
Repeat part (a) using this subset of 361 courses. Comment on results.
c. Carry part b a bit further with a comparing regressions analysis (class
handout and HW5) using age. What are the gender differences in eval at the
quartiles and median of the age distribution? Is there a region of the
age distribution for which the gender differences in evals are not significant?
==================================================================
Problem 2
Instrumental Variables Methods: IV with another PSID
Lab3 used PSID data from 1975. Here we jump forward to 1982.
PSID1982 in the AER package:
PSID Earnings Data 1982
Description
Cross-section data originating from the Panel Study on Income Dynamics, 1982.
Usage data("PSID1982")
Excerpt from help file
Format
A data frame containing 595 observations on 12 variables.
experience
Years of full-time work experience.
weeks
Weeks worked.
occupation
factor. Is the individual a white-collar ("white") or blue-collar ("blue") worker?
industry
factor. Does the individual work in a manufacturing industry?
south
factor. Does the individual reside in the South?
smsa
factor. Does the individual reside in a SMSA (standard metropolitan statistical area)?
married
factor. Is the individual married?
gender
factor indicating gender.
union
factor. Is the individual's wage set by a union contract?
education
Years of education.
ethnicity
factor indicating ethnicity. Is the individual African-American ("afam") or not ("other")?
wage
Wage.
a. Carry out the simple-minded returns-to-education estimation via OLS regression
(wage is the outcome, education the predictor). Give a point and
an interval estimate for the (presumed) increase in wage for a year increase in education.
Briefly comment on the problems with making an "as if by experiment" conclusion from this regression.
b. Now consider an instrumental variables (IV) approach to this returns-to-education estimation.
Consider the use of "industry" as an instrument. (You may find it convenient to do
something like ivind = as.numeric(industry) - 1 with the factor in the data set.)
i. Demonstrate that industry has a non-zero (even significant) correlation with education and a near
zero correlation with wage. Are those the required properties for a good instrument?
ii. Compare the instrumental variables estimate for returns-to-education with the OLS estimate in part a.
Give a point and interval estimate for the IV estimate.
Also compare standard error (and width of the confidence interval) for the IV estimate with that for
OLS slope from part a.
iii. Check the IV result by doing the estimation in two steps (i.e. the two stage least
squares equivalence shown in Week 6 class handout).
c. These data contain 67 females and 528 males. Repeat parts a and b(i,ii) for the men only--
how do the results and conclusions change (if at all) for this subset of these data?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Problem 3
More with the Problem 1 dataset, from "Beauty in the Classroom: Instructors' Pulchritude and Putative Pedagogical Productivity".
For subset of these data for instructors who are tenure track (as in Problem 1 part b,c).
a. Consider the matching by subclassification strategy that worked for Bill Cochran
in the effects of smoking on mortality from lecture and handouts, described in the Don Rubin article
"Estimation from nonrandomized treatment comparisons using subclassification on propensity scores"
(or 'Estimating Causal Effects from Large Data Sets Using Propensity Scores' in the week 8 readings.
Try subclassification on age with 2 (i.e. split at median(age)) and also 3 age categories.
Do these adjustments for age change the gender differences in evals seen in part Problem 1 a?
b. Consider a matching alternative to the regression adjustment strategies in Problem 1; those analyses seek to
estimate gender effects on evals. Instead, try Ben Hansen's Full matching-- trying to match the
male and female course instructors using variables: beauty minority native age
Be careful about the form of the gender variable used in the matching statement.
Less critical (but still convenient) consider whether to use minority or native as factors
or as numerical indicators.
Does the matching method produce better balance on age and beauty than in the unmatched data?
Show results. How many subclasses does the full matching method produce? Which subclass
contains the largest number of course instructors? Which subclass has the greatest preponderance
of female course instructors?
c. Use the full matching results in part b to assess the effects of gender on the evals. Discuss limitations,
shortcomings, or advantages of this analysis, especially as compared to problem 1 activities.
================================================
End TH2 2018