week7 RQ6

Cochran subclassificating for confounding varaible

Week 7-8 class example, age as a confounder on effects of (cig) smoking.

Use lalonde data (Lab4) as play data to show a simple implementation
of subclassification adjustment with re78 as outcome, treatment group comparsion,
and just consider age as the confounder.

> library(MatchIt)
> data(lalonde)

> ?cut
starting httpd help server ... done
> # make bins based on treat distrib of age; use 5 subclass as group diffs seem to be in shape of ditributionbs (same age medians)
> k = 1:4
> lt = lalonde[treat > 0,] # pull of the 185 job training subjects
> dim(lt)
[1] 185  10

> qt = quantile(lt$age, k/5) # 5 subclass on gae
> qt
20% 40% 60% 80% 
 19  23  26  30 

> tbin = cut(lt$age, c(0, 19, 23, 26, 30, 100)) # form bins for interger age values
> table(tbin) # integer age makes this a bit lumpy but doing the right thing as can be checked from age table below
tbin
  (0,19]  (19,23]  (23,26]  (26,30] (30,100] 
      38       41       36       34       36 
> table(lt$age)

17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 35 37 38 40 41 42 43 44 45 46 48 
13 13 12  9  9 11 12  9 18  9 16  6  9  3  7  1  5  3  2  3  2  2  4  1  1  1  3  1 


> lt$bin = tbin # put these bins in the data frame
> head(lt)
     treat age educ black hispan married nodegree re74 re75       re78      bin
NSW1     1  37   11     1      0       1        1    0    0  9930.0460 (30,100]
NSW2     1  22    9     0      1       0        1    0    0  3595.8940  (19,23]
NSW3     1  30   12     1      0       0        0    0    0 24909.4500  (26,30]
NSW4     1  27   11     1      0       0        1    0    0  7506.1460  (26,30]
NSW5     1  33    8     1      0       0        1    0    0   289.7899 (30,100]
NSW6     1  22    9     1      0       0        1    0    0  4056.4940  (19,23]

> tapply(lt$age, lt$treat, mean)
       1 
25.81622 

> tapply(lt$age, lt$bin, mean)
  (0,19]  (19,23]  (23,26]  (26,30] (30,100] 
17.97368 21.63415 25.00000 27.97059 37.63889 
> mean(tapply(lt$age, lt$bin, mean)) # lumpiness of integer age moves this mean a little bit (from 25.82)
[1] 26.04346

> (17.97368 + 21.63415 + 25.00000 + 27.97059 + 37.63889 )/5
[1] 26.04346

# pull off the control group
> l0 = lalonde[lalonde$treat < 1,]
> dim(l0)
[1] 429  10
> t0bin = cut(l0$age, c(0, 19, 23, 26, 30, 100)) # apply treatment goup bins to control group age distrib
> table(l0$age)

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 
29 26 30 30 28 22 16 17 14 19 11 17 10 12  7  8  8  4  7  7  9  6  4 11 11  3  7  4  1  5  5  7  5  3  7  4  3  4  2  6 

> table(t0bin) # interesting pattern, tries to downweight oldest controls
t0bin
  (0,19]  (19,23]  (23,26]  (26,30] (30,100] 
     115       83       44       46      141 
> l0$bin = t0bin

# "mean" age for the two groups closer now 
> mean(tapply(lt$age, lt$bin, mean))
[1] 26.04346
> mean(tapply(l0$age, l0$bin, mean))
[1] 26.68247

> mean(tapply(lt$re78, lt$bin, mean))
[1] 6409.315
> mean(tapply(l0$re78, l0$bin, mean))
[1] 7002.655
# but diffs in age distributions not enough to much alter the outcome comparison; 
# purpose here was just to do a walk-through of subclasifications
# procedure laid out in class handout

===============================