week7 RQ6 Cochran subclassificating for confounding varaible Week 7-8 class example, age as a confounder on effects of (cig) smoking. Use lalonde data (Lab4) as play data to show a simple implementation of subclassification adjustment with re78 as outcome, treatment group comparsion, and just consider age as the confounder. > library(MatchIt) > data(lalonde) > ?cut starting httpd help server ... done > # make bins based on treat distrib of age; use 5 subclass as group diffs seem to be in shape of ditributionbs (same age medians) > k = 1:4 > lt = lalonde[treat > 0,] # pull of the 185 job training subjects > dim(lt) [1] 185 10 > qt = quantile(lt$age, k/5) # 5 subclass on gae > qt 20% 40% 60% 80% 19 23 26 30 > tbin = cut(lt$age, c(0, 19, 23, 26, 30, 100)) # form bins for interger age values > table(tbin) # integer age makes this a bit lumpy but doing the right thing as can be checked from age table below tbin (0,19] (19,23] (23,26] (26,30] (30,100] 38 41 36 34 36 > table(lt$age) 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 35 37 38 40 41 42 43 44 45 46 48 13 13 12 9 9 11 12 9 18 9 16 6 9 3 7 1 5 3 2 3 2 2 4 1 1 1 3 1 > lt$bin = tbin # put these bins in the data frame > head(lt) treat age educ black hispan married nodegree re74 re75 re78 bin NSW1 1 37 11 1 0 1 1 0 0 9930.0460 (30,100] NSW2 1 22 9 0 1 0 1 0 0 3595.8940 (19,23] NSW3 1 30 12 1 0 0 0 0 0 24909.4500 (26,30] NSW4 1 27 11 1 0 0 1 0 0 7506.1460 (26,30] NSW5 1 33 8 1 0 0 1 0 0 289.7899 (30,100] NSW6 1 22 9 1 0 0 1 0 0 4056.4940 (19,23] > tapply(lt$age, lt$treat, mean) 1 25.81622 > tapply(lt$age, lt$bin, mean) (0,19] (19,23] (23,26] (26,30] (30,100] 17.97368 21.63415 25.00000 27.97059 37.63889 > mean(tapply(lt$age, lt$bin, mean)) # lumpiness of integer age moves this mean a little bit (from 25.82) [1] 26.04346 > (17.97368 + 21.63415 + 25.00000 + 27.97059 + 37.63889 )/5 [1] 26.04346 # pull off the control group > l0 = lalonde[lalonde$treat < 1,] > dim(l0) [1] 429 10 > t0bin = cut(l0$age, c(0, 19, 23, 26, 30, 100)) # apply treatment goup bins to control group age distrib > table(l0$age) 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 29 26 30 30 28 22 16 17 14 19 11 17 10 12 7 8 8 4 7 7 9 6 4 11 11 3 7 4 1 5 5 7 5 3 7 4 3 4 2 6 > table(t0bin) # interesting pattern, tries to downweight oldest controls t0bin (0,19] (19,23] (23,26] (26,30] (30,100] 115 83 44 46 141 > l0$bin = t0bin # "mean" age for the two groups closer now > mean(tapply(lt$age, lt$bin, mean)) [1] 26.04346 > mean(tapply(l0$age, l0$bin, mean)) [1] 26.68247 > mean(tapply(lt$re78, lt$bin, mean)) [1] 6409.315 > mean(tapply(l0$re78, l0$bin, mean)) [1] 7002.655 # but diffs in age distributions not enough to much alter the outcome comparison; # purpose here was just to do a walk-through of subclasifications # procedure laid out in class handout ===============================