Stat209/Ed260 D Rogosa 1/21/18
Assignment 2.
Week 2 Experiments and Observational Studies
Formulation for causal inference
Spurious Correlation, Mediating Variables
-------------------------------
Don't forget Lab1
-------------------------------
1. Spurious correlation
Consider the spurious correlation (common cause type) discussed class week 2
Additional examples from class page links in Simon (1954) or
Aldrich (1995, sec 7 "illusory"):
In the data below is the association between X and Y a consequent of
common cause Z? Give a point estimate, corresponding scatterplot,
and 95% confidence interval for the appropriate partial correlation.
Does the partial correlation coefficient settle the causal question?
one real-life multi-million-dollar invocation of version is:
Spurious correlations have been used by tobacco
companies to argue that the association between
smoking and lung cancer may actually be a result
of some other factor such as a genetic factor
that predisposes people both to nicotine
addiction and lung cancer.
– If this is true, then smoking cannot be blamed for
causing cancer?
----------------------------------
-----------------------------------
Data for problem #1
Row X Y Z
1 9.0212 84.6627 8.6171
2 5.2538 85.7456 7.3477
3 6.5207 85.1347 7.3531
4 7.2431 86.1643 9.1862
5 7.5461 89.3024 9.5189
6 9.0210 88.7147 12.0136
7 8.3726 86.4755 10.0230
8 7.1911 88.0138 9.5797
9 10.0137 89.1816 12.6738
10 5.3736 84.9516 6.8683
11 11.0974 89.6058 12.3068
12 7.9195 86.1434 10.1863
13 7.8415 86.0734 9.8743
14 11.4073 91.8612 13.6939
15 12.0733 88.3301 12.9291
16 9.9289 88.1292 11.9946
17 8.1698 88.4382 10.9764
18 7.1619 87.6612 10.5144
19 6.8630 86.4479 9.6304
20 8.5959 88.0485 10.7310
21 6.4488 84.5818 8.1024
22 7.1632 87.5248 10.3337
23 3.0688 85.1612 7.3740
24 7.2027 86.1984 8.6081
25 6.5500 85.6579 8.8006
26 7.5967 86.5576 9.6634
27 7.5950 86.1207 8.3186
28 11.5465 89.3447 13.0802
29 6.6970 85.6474 6.3297
30 6.6525 84.6865 8.6335
31 6.2913 84.2773 7.0406
32 5.1816 84.3297 6.2774
33 6.5507 86.9395 10.1587
34 5.6970 85.8757 8.4053
35 8.6243 86.5366 9.9835
36 9.7497 88.7193 11.6969
37 9.5877 90.5493 13.2135
38 5.8931 85.9084 7.8613
39 12.7624 89.8823 14.0196
40 10.0226 89.0170 12.4061
41 8.9119 87.4125 11.0488
42 11.1444 88.6123 12.2295
43 7.3012 85.7531 9.3443
44 5.9883 86.7767 10.2305
45 7.6093 87.7501 8.9490
46 8.7703 87.7138 10.7560
47 7.8032 87.8862 9.9443
48 10.7217 90.1479 14.1238
49 5.2494 84.5584 8.3379
50 8.9764 87.2774 12.0931
51 8.3849 86.2232 10.5416
52 8.6081 87.4158 11.7202
53 8.8871 86.9935 10.3284
54 8.7269 88.8683 11.2838
55 8.8410 85.7579 9.6485
56 6.6962 88.0010 8.7865
57 10.1554 88.0220 10.9067
58 11.1203 86.6664 13.2074
59 12.9841 90.2776 15.3170
60 7.5932 86.7685 9.1095
61 9.0288 89.8605 12.0329
62 10.4784 89.5776 14.5848
63 11.4204 90.0289 13.6982
64 7.1885 88.0499 10.0004
65 6.2022 85.8224 8.0043
66 7.9289 87.9744 9.9819
67 9.8591 89.2717 11.7178
68 9.9081 87.7020 11.7360
69 6.6107 84.1002 7.8847
70 9.8365 87.7949 11.4140
71 8.7290 87.9023 11.6513
72 7.4914 87.1108 9.7582
73 7.6773 86.2681 9.8398
74 6.1195 84.5499 6.7605
75 6.8257 86.0817 8.9163
76 9.6898 86.6979 10.8215
77 9.1025 88.1930 11.8133
78 6.0022 84.5501 8.9034
79 6.8042 85.7458 8.6374
80 5.3521 87.2153 9.6195
81 7.7924 87.9349 9.4538
82 6.6512 87.3632 10.8286
83 7.3747 86.1124 9.4795
84 7.6636 83.1105 9.2179
85 9.3188 89.4198 11.7035
86 5.1568 85.1249 6.3330
87 8.7670 86.2721 9.8406
88 5.4504 86.6124 7.2781
89 9.3781 87.9577 11.9744
90 7.1295 88.0978 10.6290
91 3.9848 87.0559 7.6392
92 8.9921 86.7763 10.9617
93 8.0228 89.2983 10.5134
94 8.2127 86.4980 9.9726
95 7.2126 87.6937 9.2254
96 7.1183 85.8024 8.6688
97 8.6755 88.0529 9.6548
98 7.6840 86.9696 9.9541
99 10.2886 86.1218 11.1020
100 9.7024 87.3195 11.5858
## Also if you like try out the ppcor package; I did in solutions, doesn't give you much
-------------------------------------
Problem 2
Mediating Variable Computations: Class example continued
The data set shown in class example (linked week 2) ss423 is
in the class directory
http://web.stanford.edu/~rag/stat209/ss423
for predictor (IV) 'belong' outcome 'depress' and (potential) mediating variable 'master'
The class example showed you the Baron-Kenny analysis using functions from
the multilevel and MBESS packages.
Here just use 'lm' basic regression and the recipees from the class handout
to recreate point estimates and asymptotic standard errors, significance tests
for the mediating variable effect.
Compare your result with the class handout.
Extra: also try out the more sophisticated functions in
the mediate package.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Problem 3
Stigler Salary Discrimination Example (Week 1 readings, Week 2 lecture and handouts)
Measurement error, multiple regression
Yes I can't seem to let go ....
See the Maindonald-Braun labeled posting on Stigler ex, week1
The task here is to check the results on the measurement error handout against the
simulation results show in week 1 links (using the finction in DAAG package)
link labelled "Maindonald-Braun sec6.7 results and R-functions, Stigler example"
I don't expect you to use this function, but you could follow my session.
Solution walks you through the demonstration
that the analytic results for perfectly measured variable 2 (here group membership)
do match the (small n=200) simulation values quite well. Substituting into the class handout
is the other part of the exercise. Instance of "Measure worse, get a bigger result ..."
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Problem 4
### potential outcome details,
#### week 2 material to be done Lecture Wed; solution posted after lecture
4. Neyman-Holland-Rubin formulation
Making potential outcomes real.....
Create some counter-factual data (following Holland 1988 appendix)
from lecture material (handout labelled week 2)
consider a collection of 100 (experimental) units
unit-level causal effect (T_tc(u) or rho(u))
is distributed over units as N(2,1) (mean 2, variance 1).
# note here is a violation/extension of the base formulation allowing
individual difference in effect of the intervention (i.e. unit homogeneity)
# solution also does seperately a 10000 unit version with rho mean 2, var .01
Y(u,c) distributed over units N(10,1).
potential outcomes:
If you had {Y(u,c), Y(u,t)} for all 100 units
make a statistical inference for effect of treatment-average
causal (treatment) effect. (Inference is to the population from
which the u are sampled). Compare result with specification of
unit causal effect used in generating the artificial data.
now back to reality.
generate 100 values of an indicator variable G (group assignment),
coin flip (Bernoulli trial has prob G = 1 equal to 1/2).
If G=1 observe Y(u,t), if G = 0 observe Y(u,c) from above.
Repeat statistical inference for effect of treatment (estimate
of average causal effect) designating G=1 observations as the
treatment group and G=0 observations as control.
compare result with the unit causal effect built into the artificial data
Finally create a different dichotomous group indicator variable,
call it say Gneq (representing non-random selection).
Gneq is a Bernoulli trial outcome with probability Gneq=1
set as PHI(Y(u,c) - 10) where PHI is normal(0,1) cdf.
if Gneq=1 observe Y(u,t) if Gneq = 0 observe Y(u,c) from above.
Repeat statistical inference for effect of treatment (estimate
of average causal effect) designating Gneq=1 observations as the
treatment group and Gneq=0 observations as control.
compare result with specification of unit causal effect.
Advanced item: Try to compare bias using Gneq assignment with result for bias
given in class handout. A numerical approach here may be easier than the math.
--------------
Solution works through all this step-by-step; this problem may be an
instance when referring to solution immediately is best
----------------------------------------------------------------
Problem 5 (added 1/22)
Standardization Example.
On the Class Handout, in addition to the extended Hooke's Law example (see also Freedman version)
there is an informal (I made it up on the fly) "Comperability example". This example
illustrates that a "universal" law would show large gender differences if standardized
coefficients for each gender were computed. Without standardization, both genders would
have the same happiness on money gradient = 2.
recreate the results for the two within-gender coefficients
----------------------------------------------------------------
Problem 6 Adapted from class demonstrations
Potential Outcomes, Encouragement Design Estimation and
(Causal) Mediation
Task 1. Create a potential Outcomes dataset following the
first ALICE specification in the posted slides (week 3)
## ALICE example beta = 3 rho = 3 tau = 1, delta = 3
(I did n=400; larger would be better so I redid with n = 6400)
Task 2. Use the artificial data to show the results
for the mediation (indirect) effect
by hand doing the 3 regressions
using multilevel package (sobel)
using MBESS package
using the causal mediation estimation ACME from the mediation package
and compare with rho*beta
Task 3
estimate beta by the Wald estimator (assuming tau = 0)
and estimate mediation effect
==================================================
end HW2 2018