Problem 2
Note: part d of this question calls upon week9 content
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PART I
Revisit Exam 1 problem 2 right-heart catheterization (rhc)
part a. Sensitivity analysis (week5,6 lecture, week6 computing corner)
In part b of exam 1 problem 2 you carried out 1:1 pair matching and used that to estimate the effect of rhc on outcome (death).
i. Here carry out a sensitivity analysis for the effect of rhc. What value of Gamma is required to "alter the conclusion": i.e. render the effect nonsignificant?
Does the method chosen for the function make a difference?
ii. Look back at the linked press reports and then comment on Paul Rosenbaum's preference for one-sided (vs two-sided) inferences in these sensitivity calculations.
iii. Interpret the value of Gamma you select.
iv. What are the sensitivity bounds for the point estimate and confidence interval for the effect at that level of Gamma?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PART II
Dose (education), Response (wage)
Is your education worth it? worth anything? Only the economists (claim to) know.
But we do have some (old) data that is used alot at present. It really does pertain to your grandparents (almost).
Keep in mind for the salary data that one dollar in 1976 equates to about $4.35 in 2018.
-------------------------------------
card.data {ivmodel} R Documentation
Card (1995) Data
Description: Data from the National Longitudinal Survey of Young Men (NLSYM) that was used by Card (1995).
Usage data(card.data)
Format A data frame with 3010 observations on 35 variables.
for full description ?card.data or ivmodel manual page
-------------------------------------
The ivmodel package is from a friend of this course, Hyunseung Kang co-Author and Maintainer, package ivmodel https://cran.r-project.org/web/packages/ivmodel/ivmodel.pdf
vignette: to appear Journal of Statistical Software http://www.stat.wisc.edu/~hyunseung/ivmodel.pdf
ivmodel: An R Package for Inference and Sensitivity Analysis of Instrumental Variables Models with One Endogenous Variable
The ivmodel vignette section 7 uses the Card data; these data are available in the ivmodel package (and many other R locations)
-------------------------------------------------------------------------------------------------------------------------------
The objective of all the various analyses of these data are a basic dose-response conclusion in the form of returns to schooling.
Note dose-response the topic of Week 7 Computing Corner
Dose is amount of education. In the data set
educ subject's years of education (an integer from 1 to 18)
Response (outcome) is wages in cents per hour in 1976: outcome variable used is lwage.
The dataset contains many interesting and useful measures, with the most useful having alot of missing data (e.g. fatheduc, motheduc, IQ)
and therefore are not employed in this problem exercise. (In real life I'd probably do some mice or equivalents at certain points, but here
I try to keep things simpler.)
part b. Basic dose-response.
i. A stat60 student might investigate the question: what is the change in "response" log-wage (lwage) for a unit change in "dose" (educ).
Does a straight-line fit (constant effect of a unit change in educ) appear adequate in these data? What is the prima facie dose-response
curve and the effect of increasing the dose one unit? Give a point-estimate and confidence interval.
Although the age range (in 1976) is from 24 to 34, almost 10% of the subjects are indicated to be enrolled in college in 1976
(that seems to be never noted in the many many analyses of these data).
-------------------------------------
enroll indicator for whether subject is enrolled in college in 1976
table(cd$enroll, cd$age) # I abbreviate card.data as cd
24 25 26 27 28 29 30 31 32 33 34
0 355 329 346 298 278 208 183 155 206 179 195
1 40 43 40 41 34 25 14 11 7 13 10
> table(cd$enroll)
0 1
2732 278
------------------------------------
I would submit that a student enrolled (part or full time) in college is in a different career domain and salary
trajectory than non-students say in full-time jobs (i.e. enrolled may be in temporary low-paying jobs but have high educ).
So I want to set those enrolled students aside and proceed with the 2732 (out of 3010) not enrolled in college in 1976.
ii. Repeat part b(i) for this subsample (I call it cd_1) of non-enrolled adults. Do the results change?
iii. What problems do you see with forming conclusions from the OLS dose-response estimation in parts i or ii ?
What condition(s) would have to hold for the part b analyses to be taken seriously?
part c. Regression Adjustments (a la ancova)
The most common attempt (seen repeatedly in current publications) to heal any problems with the part a analysis you may have identified
is to toss in additional predictors (in analog to analysis of covariance for a binary dose or treatment variable).
With the subsample cd_1 in part b(ii) try the set of additional predictors (some used by Kang) exper , expersq , black , south , momdad14.
Compare your results for returns-to-education with those in part b.
Two notes.
First, one would like to use fatheduc, motheduc but there's too much missing data, 732 cases, to make that worthwhile
without extra imputation and neither is a significant predictor if that matters.
Second, age is confounded with exper to the extent you will get a singularity result if you try to add age to the prediction equation.
If you use age instead of the experience measures, does your result for returns-to-education change?
part d. Instrumental Variables Estimates (week8,9)
Economists take a different approach to improving the part b analysis, and these Instrumental Variables methods were introduced in Weeks 8 and 9.
Kang uses (as others have done) the instrument nearc4. Carry out that IV estimation (with no covariates) and give a point and interval estimate
for the dose-response (educ-lwage) relationship (use cd_1). Compare the results with parts b and c.
What properties, assumptions must the instrument nearc4 satisfy for this analysis?
--------------------------------------------------------------------------------------------------------
Extra Credit Dose-response functions for Observational Data
If you were disappointed by your exam1 assessment and have finished well the standard parts of these questions, here's
an opportunity to augment your exam 1 score.
Do not spend time on this if you have work to do on other parts of the exam. Don't just write something for the sake of writing something.
If you can do the problem well, then it's worth a little effort.
Or regard this as just a reminder that we displayed these advanced dose-response methods week7
Week 7 CC (beyond binary treatments) illustrated developing methods for dose-response estimation using generalized propensity scores.
i. Try out one of those estimation methods (e.g. HI) for the lwage,educ dose-response function using cd_1 data.
Use the measures from part b as background characteristics. Compare results with the prior analyses.
ii. Compare the formal ADRF methods with this informal two-part method (adapted from binary treatment). Also see ADRF summary slide
Estimate expected dose using background measures
[analog to logistic regression predicting binary treatment; fit is estimate of expected value of 0,1 treatment var]
Estimate dose-response using that measure as an additional predictor (analogous to using propensity score as a covariate in binary treatments in earlier CC).
-------------------------------------------------------------------------------------------------------
END Problem 2