Lab 4 Stat209 2/27/16

            Rogosa Lab 4 R-session 
           Additional Matching Exercises (extra, secs 4-6 plus)
              nearest neighbor, optimal, full matching options
########################### ######################
As shown in the base Lab4
The most current versions of R install and load optmatch in addition to MatchIt in order to
 use method = "full" or "optimal" in MatchIt (the Ben Hansen routines).
 We showed that in the script and base session
 In prior versions of R, MatchIt would fetch optmatch capabilities.

################# ##################################



> # Lab script PART 4, use MatchIt in various ways just for practice
> #lab text left out "married" and "no degree" so I did here also
> # reason is that the Matchit docs omit these also 
      http://gking.harvard.edu/node/4355/rbuild_documentation/
# compare results for some of the matching options
> m1.out = matchit(treat ~ re74 + re75 + educ + black + hispan + age,
+ data = lalonde, method = "nearest", ratio = 1) # the simplest old-fashioned matching function

Nearest neighbor matching selects the r (default=1) best control matches for each individual
in the treatment group (excluding those discarded using the discard option). Matching is
done using a distance measure specified by the distance option (default=logit). Matches
are chosen for each treated unit one at a time, with the order specified by the m.order
command (default=largest to smallest). At each matching step we choose the control unit
that is not yet matched but is closest to the treated unit on the distance measure.

> m1.out

Call: 
matchit(formula = treat ~ re74 + re75 + educ + black + hispan + 
    age, data = lalonde, method = "nearest", ratio = 1)

Sample sizes:
          Control Treated
All           429     185
Matched       185     185
Unmatched     244       0
Discarded       0       0

> summary(m1.out) 

Call:
matchit(formula = treat ~ re74 + re75 + educ + black + hispan + 
    age, data = lalonde, method = "nearest", ratio = 1)

Summary of balance for all data:
         Means Treated Means Control SD Control Mean Diff  eQQ Med eQQ Mean  eQQ Max
distance         0.566         0.187      0.231     0.379    0.517    0.379    0.595
re74          2095.574      5619.237   6788.751 -3523.663 2425.572 3620.924 9216.500
re75          1532.055      2466.484   3291.996  -934.429  981.097 1060.658 6795.010
educ            10.346        10.235      2.855     0.111    1.000    0.703    4.000
black            0.843         0.203      0.403     0.640    1.000    0.643    1.000
hispan           0.059         0.142      0.350    -0.083    0.000    0.081    1.000
age             25.816        28.030     10.787    -2.214    1.000    3.265   10.000


Summary of balance for matched data:
         Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean  eQQ Max
distance         0.566         0.365      0.260     0.201   0.106    0.201     0.49
re74          2095.574      2466.304   4245.694  -370.730 289.971  692.064 13121.75
re75          1532.055      1960.355   2948.255  -428.300 451.161  627.898 11365.71
educ            10.346        10.470      3.207    -0.124   1.000    0.924     4.00
black            0.843         0.470      0.500     0.373   0.000    0.373     1.00
hispan           0.059         0.276      0.448    -0.216   0.000    0.216     1.00
age             25.816        26.054     10.191    -0.238   2.000    2.735     8.00

Percent Balance Improvement:
         Mean Diff. eQQ Med eQQ Mean eQQ Max
distance      46.94   79.44    46.90   17.62
re74          89.48   88.05    80.89  -42.37
re75          54.16   54.02    40.80  -67.27
educ         -12.50    0.00   -31.54    0.00
black         41.76  100.00    42.02    0.00
hispan      -161.35    0.00  -166.67    0.00
age           89.26 -100.00    16.23   20.00

Sample sizes:
          Control Treated
All           429     185
Matched       185     185
Unmatched     244       0
Discarded       0       0

# so this is not so great, only use less than half of control group, reduce  T/C
  diffs in most variables, slightly increase in educ, and increase hispan, i.e.
  matches in control group double the proportion of hispanics

> m2.out = matchit(treat ~ re74 + re75 + educ + black + hispan + age,
+ data = lalonde, method = "nearest", ratio = 2)
> m2.out # now try 2 control matches for each treatment still using nearest neighbor

Call: 
matchit(formula = treat ~ re74 + re75 + educ + black + hispan + 
    age, data = lalonde, method = "nearest", ratio = 2)

Sample sizes:
          Control Treated
All           429     185
Matched       370     185
Unmatched      59       0
Discarded       0       0

> summary(m2.out)

Call:
matchit(formula = treat ~ re74 + re75 + educ + black + hispan + 
    age, data = lalonde, method = "nearest", ratio = 2)

Summary of balance for all data:
         Means Treated Means Control SD Control Mean Diff  eQQ Med eQQ Mean  eQQ Max
distance         0.566         0.187      0.231     0.379    0.517    0.379    0.595
re74          2095.574      5619.237   6788.751 -3523.663 2425.572 3620.924 9216.500
re75          1532.055      2466.484   3291.996  -934.429  981.097 1060.658 6795.010
educ            10.346        10.235      2.855     0.111    1.000    0.703    4.000
black            0.843         0.203      0.403     0.640    1.000    0.643    1.000
hispan           0.059         0.142      0.350    -0.083    0.000    0.081    1.000
age             25.816        28.030     10.787    -2.214    1.000    3.265   10.000


Summary of balance for matched data:
         Means Treated Means Control SD Control Mean Diff  eQQ Med eQQ Mean   eQQ Max
distance         0.566         0.213      0.238     0.352    0.488    0.354     0.586
re74          2095.574      3689.453   4736.770 -1593.879 1030.574 1814.473 12434.050
re75          1532.055      2121.527   2915.592  -589.472  666.000  798.521 11365.710
educ            10.346        10.235      2.833     0.111    1.000    0.638     4.000
black            0.843         0.235      0.425     0.608    1.000    0.611     1.000
hispan           0.059         0.165      0.372    -0.105    0.000    0.103     1.000
age             25.816        26.281      9.961    -0.465    2.000    2.411     9.000

Percent Balance Improvement:
         Mean Diff.  eQQ Med eQQ Mean eQQ Max
distance      6.908    5.707    6.642   1.391
re74         54.766   57.512   49.889 -34.911
re75         36.916   32.117   24.715 -67.266
educ         -0.268    0.000    9.231   0.000
black         5.049    0.000    5.042   0.000
hispan      -27.406    0.000  -26.667   0.000
age          79.004 -100.000   26.159  10.000

Sample sizes:
          Control Treated
All           429     185
Matched       370     185
Unmatched      59       0
Discarded       0       0

# In most respects the matching is worse here; mean diffs on many vars are
  more discrepant. We asked for a little harder task: give two controls 

##task 4.2
> m2opt.out = matchit(treat ~ re74 + re75 + educ + black + hispan + age,
+ data = lalonde, method = "optimal", ratio = 2) # Ben Hansen's method with 2:1 matching

> m2opt.out  #here, does the same as the nearest neighbor

Call: 
matchit(formula = treat ~ re74 + re75 + educ + black + hispan + 
    age, data = lalonde, method = "optimal", ratio = 2)

Sample sizes:
          Control Treated
All           429     185
Matched       370     185
Unmatched      59       0
Discarded       0       0

> summary(m2opt.out)

Call:
matchit(formula = treat ~ re74 + re75 + educ + black + hispan + 
    age, data = lalonde, method = "optimal", ratio = 2)

Summary of balance for all data:
         Means Treated Means Control SD Control Mean Diff  eQQ Med eQQ Mean  eQQ Max
distance         0.566         0.187      0.231     0.379    0.517    0.379    0.595
re74          2095.574      5619.237   6788.751 -3523.663 2425.572 3620.924 9216.500
re75          1532.055      2466.484   3291.996  -934.429  981.097 1060.658 6795.010
educ            10.346        10.235      2.855     0.111    1.000    0.703    4.000
black            0.843         0.203      0.403     0.640    1.000    0.643    1.000
hispan           0.059         0.142      0.350    -0.083    0.000    0.081    1.000
age             25.816        28.030     10.787    -2.214    1.000    3.265   10.000


Summary of balance for matched data:
         Means Treated Means Control SD Control Mean Diff  eQQ Med eQQ Mean   eQQ Max
distance         0.566         0.213      0.238     0.352    0.488    0.354     0.586
re74          2095.574      3689.453   4736.770 -1593.879 1030.574 1814.473 12434.050
re75          1532.055      2121.527   2915.592  -589.472  666.000  798.521 11365.710
educ            10.346        10.235      2.833     0.111    1.000    0.638     4.000
black            0.843         0.235      0.425     0.608    1.000    0.611     1.000
hispan           0.059         0.165      0.372    -0.105    0.000    0.103     1.000
age             25.816        26.281      9.961    -0.465    2.000    2.411     9.000

Percent Balance Improvement:
         Mean Diff.  eQQ Med eQQ Mean eQQ Max
distance      6.908    5.707    6.642   1.391
re74         54.766   57.512   49.889 -34.911
re75         36.916   32.117   24.715 -67.266
educ         -0.268    0.000    9.231   0.000
black         5.049    0.000    5.042   0.000
hispan      -27.406    0.000  -26.667   0.000
age          79.004 -100.000   26.159  10.000

Sample sizes:
          Control Treated
All           429     185
Matched       370     185
Unmatched      59       0
Discarded       0       0

# optimal 2:1 didn't do any better than nearest for these data, some things can't be done

## perhaps the basic quintile subclassification matching is the best use of these data


## Task 4.3
> m2full.out = matchit(treat ~ re74 + re75 + educ + black + hispan + age, data = lalonde, method = "full")
> summary(m2full.out) #let's try Ben's full matching, as he did with the SAT coaching

Call:
matchit(formula = treat ~ re74 + re75 + educ + black + hispan + 
    age, data = lalonde, method = "full")

Summary of balance for all data:
         Means Treated Means Control Mean Diff  eQQ Med eQQ Mean  eQQ Max
distance         0.566         0.187     0.379    0.517    0.379    0.595
re74          2095.574      5619.237 -3523.663 2425.572 3620.924 9216.500
re75          1532.055      2466.484  -934.429  981.097 1060.658 6795.010
educ            10.346        10.235     0.111    1.000    0.703    4.000
black            0.843         0.203     0.640    1.000    0.643    1.000
hispan           0.059         0.142    -0.083    0.000    0.081    1.000
age             25.816        28.030    -2.214    1.000    3.265   10.000


Summary of balance for matched data:
         Means Treated Means Control Mean Diff eQQ Med eQQ Mean   eQQ Max
distance         0.566         0.566     0.000   0.001    0.005     0.144
re74          2095.574      2024.744    70.830   0.000  493.428 13121.750
re75          1532.055      1313.744   218.312   0.000  328.307 12746.050
educ            10.346        10.417    -0.071   0.000    0.348     3.000
black            0.843         0.838     0.005   0.000    0.010     1.000
hispan           0.059         0.069    -0.009   0.000    0.003     1.000
age             25.816        25.182     0.634   3.000    2.571     9.000

Percent Balance Improvement:
         Mean Diff. eQQ Med eQQ Mean eQQ Max
distance      99.97   99.83    98.67   75.81
re74          97.99  100.00    86.37  -42.37
re75          76.64  100.00    69.05  -87.58
educ          35.31  100.00    50.53   25.00
black         99.16  100.00    98.44    0.00
hispan        88.79    0.00    96.05    0.00
age           71.35 -200.00    21.25   10.00

Sample sizes:
          Control Treated
All           429     185
Matched       429     185
Unmatched       0       0
Discarded       0       0

# seems to have done much better on all mean comparisons; check the distance imbalance

#######################
# Lots of fun plots to do, try type=jitter to get propensity scores, and you can get ident for indiv cases by clicking
> plot(m2full.out)
Waiting to confirm page change...
Waiting to confirm page change...

> help.matchit("plot")
> plot(m2full.out, type = "jitter")
Waiting to confirm page change...
[1] "To identify the units, use first mouse button; to stop, use second."
warning: no point with 0.25 inches
warning: no point with 0.25 inches

############Some more activities

> m2full.dat = match.data(m2full.out) # let's get the matched data set from full matching
> dim(m2full.dat)
[1] 614  13
> names(m2full.dat)
 [1] "treat"    "age"      "educ"     "black"    "hispan"   "married"  "nodegree"
 [8] "re74"     "re75"     "re78"     "distance" "weights"  "subclass"
> m2full.dat[1:20,]  # just a look at the data set
      treat age educ black hispan married nodegree re74 re75    re78 distance weights subclass
NSW1      1  37   11     1      0       1        1    0    0  9930.0  0.69208       1        1
NSW2      1  22    9     0      1       0        1    0    0  3595.9  0.18249       1      108
NSW3      1  30   12     1      0       0        0    0    0 24909.5  0.70891       1      107
NSW4      1  27   11     1      0       0        1    0    0  7506.1  0.69294       1       91
NSW5      1  33    8     1      0       0        1    0    0   289.8  0.64088       1       95
NSW6      1  22    9     1      0       0        1    0    0  4056.5  0.65949       1       20
NSW7      1  23   12     1      0       0        0    0    0     0.0  0.70950       1        8
NSW8      1  32   11     1      0       0        1    0    0  8472.2  0.69251       1       10
NSW9      1  22   16     1      0       0        0    0    0  2164.0  0.76907       1       32
NSW10     1  33   12     0      0       1        0    0    0 12418.1  0.08978       1       37
NSW11     1  19    9     1      0       0        1    0    0  8173.9  0.65976       1       16
NSW12     1  21   13     1      0       0        0    0    0 17094.6  0.72535       1       41
NSW13     1  18    8     1      0       0        1    0    0     0.0  0.64226       1       50
NSW14     1  27   10     1      0       1        1    0    0 18739.9  0.67622       1       39
NSW15     1  17    7     1      0       0        1    0    0  3023.9  0.62438       1       71
NSW16     1  19   10     1      0       0        1    0    0  3228.5  0.67692       1       79
NSW17     1  27   13     1      0       0        0    0    0 14581.9  0.72487       1       90
NSW18     1  23   10     1      0       0        1    0    0  7693.4  0.67657       1      102
NSW19     1  40   12     1      0       0        0    0    0 10804.3  0.70808       1      107
NSW20     1  26   12     1      0       0        0    0    0 10747.4  0.70925       1      109

> # so you can see match.data appends 3 colums "distance" "weights"  "subclass"
> # to the original data set

Instead of doing the Zelig in Lab Section 5, which has alot of overhead
we take a closer look at the matches and correspondence to propensity matching;
e.g. do the subclasses match well on propensity

> table(m2full.dat$subclass) #the 109 subclasses have various sizes

  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
  2   2  17   4  10   2   2  12   3   8  20   3   3   4   5   2   2   3   3   3 
 21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40 
  3   4  37   2   2   2  32   2   2   3   3   2   3   2   9   2   2   2   4   2 
 41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60 
  2   6   3   4   3 125   2   6  13   2   2   2   2   2   2   2   2  10   2   2 
 61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80 
  2   4   2   2   3   9   2   2   2   2   2   2   2   4   2   2   2   2   2   2 
 81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 
  2   2   3   2   2   2   5  11   2   4  18   6   2   7   6   8   2   2   4  18 
101 102 103 104 105 106 107 108 109 
  2   9   2   2   2   5   3   3   2 

> # so in full matching multiple treatment cases can be grouped in a subclass
> table(treat,m2full.dat$subclass)
     
treat   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19
    0   1   1  16   3   9   1   1   1   1   1  19   1   2   1   1   1   1   1   1
    1   1   1   1   1   1   1   1  11   2   7   1   2   1   3   4   1   1   2   2
     
treat  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38
    0   1   2   3  36   1   1   1  31   1   1   2   2   1   1   1   8   1   1   1
    1   2   1   1   1   1   1   1   1   1   1   1   1   1   2   1   1   1   1   1
     
treat  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57
    0   1   1   1   1   1   1   1 124   1   5  12   1   1   1   1   1   1   1   1
    1   3   1   1   5   2   3   2   1   1   1   1   1   1   1   1   1   1   1   1
     
treat  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76
    0   9   1   1   1   1   1   1   1   8   1   1   1   1   1   1   1   3   1   1
    1   1   1   1   1   3   1   1   2   1   1   1   1   1   1   1   1   1   1   1
     
treat  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95
    0   1   1   1   1   1   1   1   1   1   1   4  10   1   1   1   5   1   6   1
    1   1   1   1   1   1   1   2   1   1   1   1   1   1   3  17   1   1   1   5
     
treat  96  97  98  99 100 101 102 103 104 105 106 107 108 109
    0   7   1   1   1  17   1   1   1   1   1   1   1   2   1
    1   1   1   1   3   1   1   8   1   1   1   4   2   1   1

# you can see that full matching produces very different matching than nearest
  or optimal etc

#### the analysis for outcome comparison is the same lmer approach that was done in the Lab4 base
#### there with the 5 (quintile) subclassification; here the same t-test comparison over the 109
#### subclasses from the full matching
#### I do the lmer below from full matching with the full set of covariates

> sum(m2full.dat$subclass == 8) # 12 total cases in subclass 8
[1] 12
> m2full.dat$subclass == 8  # crude way to ident the cases in subclass 8 (propen was appended from below)
> m2full.dat[subclass ==8,]  # here's a way to get the listing
        treat age educ black hispan married nodegree re74  re75  re78 distance weights subclass propen
NSW7        1  23   12     1      0       0        0    0  0.00     0   0.7095    1.00        8 0.7095
NSW34       1  20   12     1      0       0        0    0  0.00     0   0.7097    1.00        8 0.7097
NSW38       1  23   12     1      0       1        0    0  0.00  5912   0.7095    1.00        8 0.7095
NSW41       1  21   12     1      0       0        0    0  0.00  1255   0.7097    1.00        8 0.7097
NSW50       1  23   12     1      0       0        0    0  0.00  4843   0.7095    1.00        8 0.7095
NSW54       1  25   12     1      0       0        0    0  0.00  2349   0.7093    1.00        8 0.7093
NSW74       1  25   12     1      0       0        0    0  0.00 11966   0.7093    1.00        8 0.7093
NSW77       1  22   12     1      0       0        0    0  0.00 18678   0.7096    1.00        8 0.7096
NSW81       1  25   12     1      0       0        0    0  0.00     0   0.7093    1.00        8 0.7093
NSW82       1  18   12     1      0       0        0    0  0.00  2321   0.7099    1.00        8 0.7099
NSW94       1  21   12     1      0       0        0    0  0.00  9984   0.7097    1.00        8 0.7097
PSID293     0  31   12     1      0       1        0    0 42.97 11024   0.7091   25.51        8 0.7091



# As we are somewhat awkwardly not using two of the vars in this part of the exercise
# recompute propensity score for the subset of vars and append to the m2full.dat
>  glm.lalondesub = glm( treat ~ age + educ + black + hispan  + re74 + re75, data = lalonde, family = binomial)
> propensub = fitted(glm.lalondesub)
> m2full.dat$propen = propensub
> m2full.dat$propen[subclass == 8] # for the 12 cases in subclass 8 about equal propen scores
                                     
 [1] 0.7095 0.7097 0.7095 0.7097 0.7095 0.7093 0.7093 0.7096 0.7093 0.7099 0.7097 0.7091

# in subclass 8 we found one control case with high propensity score

> table(m2full.dat$propen[subclass == 8], treat[subclass == 8])
                   
                    0 1
  0.709099144142846 1 0
  0.709329320325358 0 3
  0.709495198369488 0 3
  0.709578116424806 0 1
  0.709661020498057 0 2
  0.70974391058607  0 1
  0.709909648793713 0 1

# subclass 46 has 124 controls and 1 treatment and a much wider range on propensity
> table(treat[subclass == 46], signif(m2full.dat$propen[subclass == 46],4))
   
    0.008977 0.01128 0.01176 0.01315 0.01317 0.01386 0.01388 0.01389 0.0151 0.01524 0.01613 0.01641 0.01673
  0        1       1       1       1       1       1       1       1      1       1       1       1       1
  1        0       0       0       0       0       0       0       0      0       0       0       0       0
   
    0.017 0.01731 0.01808 0.01844 0.0185 0.01853 0.01935 0.01974 0.01979 0.02016 0.02068 0.02085 0.02237
  0     1       1       1       1      1       1       1       1       1       1       1       1       1
  1     0       0       0       0      0       0       0       0       0       0       0       0       0
   
    0.02255 0.0239 0.02398 0.02434 0.02435 0.02445 0.02467 0.0251 0.02569 0.0257 0.02605 0.02651 0.02679
  0       1      1       1       1       1       1       1      1       1      1       1       1       1
  1       0      0       0       0       0       0       0      0       0      0       0       0       0
   
    0.02734 0.02785 0.02856 0.02876 0.02919 0.0294 0.03022 0.03032 0.03087 0.03095 0.03115 0.03144 0.03156
  0       1       1       1       1       1      1       1       1       1       1       1       1       1
  1       0       0       0       0       0      0       0       0       0       0       0       0       0
   
    0.03195 0.03207 0.0321 0.0323 0.0324 0.03242 0.03301 0.03338 0.03516 0.03546 0.03566 0.03568 0.03579
  0       1       1      1      1      1       1       1       1       1       1       1       1       1
  1       0       0      0      0      0       0       0       0       0       0       0       0       0
   
    0.03588 0.03675 0.0371 0.03825 0.03844 0.03856 0.03863 0.03873 0.03895 0.03936 0.04041 0.04086 0.04148
  0       1       1      1       1       1       1       1       1       1       2       1       1       1
  1       0       0      0       0       0       0       0       0       0       0       0       0       0
   
    0.04187 0.04218 0.04295 0.04306 0.04342 0.04352 0.04364 0.0438 0.04471 0.0449 0.04498 0.04499 0.04508
  0       1       1       1       1       1       1       1      1       1      1       1       1       1
  1       0       0       0       0       0       0       0      0       0      0       0       0       0
   
    0.04517 0.04593 0.04598 0.04622 0.04638 0.04709 0.04719 0.04735 0.04762 0.04778 0.04781 0.04784 0.04785
  0       1       1       1       1       1       1       1       1       1       1       1       1       0
  1       0       0       0       0       0       0       0       0       0       0       0       0       1
   
    0.04805 0.0482 0.04827 0.04842 0.04916 0.04918 0.04972 0.04994 0.04997 0.05013 0.05085 0.05209 0.05223
  0       1      1       1       1       1       1       1       1       1       1       1       2       1
  1       0      0       0       0       0       0       0       0       0       0       0       0       0
   
    0.05246 0.05314 0.05374 0.05401 0.05428 0.05458
  0       1       1       1       1       1       1
  1       0       0       0       0       0       0

> tapply(m2full.dat$propen[subclass == 46], treat[subclass == 46], mean) #compare means
      0       1 
0.03447 0.04785 

look at a couple more subclasses or can loop through them
> tapply(m2full.dat$propen[subclass == 109], treat[subclass == 109], mean)
     0      1 
0.7097 0.7092 
> tapply(m2full.dat$propen[subclass == 108], treat[subclass == 108], mean)
     0      1 
0.1832 0.1825 

> tapply(m2full.dat$propen, list(treat,subclass ), mean) # looks like a pretty good balance on propen by fullmatch
       1      2      3       4      5      6      7      8      9     10      11     12     13     14     15
0 0.6913 0.7240 0.1057 0.07214 0.2001 0.6619 0.5648 0.7091 0.7086 0.6918 0.07444 0.6424 0.1705 0.6425 0.6591
1 0.6921 0.7249 0.1030 0.07238 0.1940 0.6599 0.5655 0.7095 0.7086 0.6925 0.07290 0.6419 0.1714 0.6421 0.6596
      16     17     18     19     20      21    22      23     24     25     26     27     28      29     30
0 0.6600 0.6588 0.6368 0.5665 0.6597 0.08438 0.534 0.05877 0.7182 0.6074 0.1754 0.0812 0.6913 0.08928 0.1683
1 0.6598 0.6581 0.6372 0.5668 0.6592 0.08399 0.539 0.06300 0.7233 0.6060 0.1777 0.0839 0.6918 0.08961 0.1712
       31     32     33     34      35     36      37      38     39     40     41     42     43     44
0 0.09135 0.7894 0.6802 0.6906 0.09055 0.6455 0.08997 0.06821 0.6762 0.7154 0.7291 0.6090 0.7460 0.6820
1 0.09017 0.7691 0.6796 0.6915 0.09004 0.6448 0.08978 0.06872 0.6751 0.7121 0.7254 0.6118 0.7492 0.6823
      45      46     47      48     49     50     51     52     53     54     55     56     57      58
0 0.6656 0.03447 0.1828 0.06473 0.1246 0.6424 0.6342 0.6996 0.7012 0.6770 0.3044 0.6887 0.6675 0.07071
1 0.6653 0.04785 0.1819 0.06503 0.1177 0.6423 0.6344 0.7010 0.7019 0.6765 0.3787 0.6893 0.6701 0.07107
      59     60      61     62     63     64     65      66     67     68     69     70     71     72     73
0 0.6501 0.5151 0.09011 0.4907 0.4390 0.6771 0.5918 0.09613 0.6666 0.6620 0.6670 0.7097 0.6233 0.6065 0.6937
1 0.6493 0.5102 0.08918 0.4870 0.4417 0.6763 0.5935 0.09650 0.6692 0.6604 0.6706 0.7104 0.6244 0.6004 0.6931
      74     75     76     77     78     79     80     81     82     83     84     85     86      87     88
0 0.4550 0.6200 0.6772 0.5159 0.6807 0.6771 0.6526 0.6936 0.6434 0.4084 0.4757 0.7070 0.5652 0.09184 0.1557
1 0.4586 0.6139 0.6782 0.5088 0.6815 0.6769 0.6487 0.6937 0.6444 0.4035 0.4720 0.7044 0.5566 0.09073 0.1639
      89     90     91     92     93      94     95     96     97     98     99     100    101    102    103
0 0.6864 0.7254 0.6937 0.5762 0.6913 0.08754 0.6405 0.2192 0.7024 0.2763 0.7400 0.06780 0.6579 0.6770 0.2349
1 0.6868 0.7248 0.6933 0.5749 0.6917 0.08782 0.6408 0.2291 0.7028 0.2507 0.7395 0.06796 0.6576 0.6769 0.2391
     104    105    106    107    108    109
0 0.1848 0.5017 0.5860 0.7086 0.1832 0.7097
1 0.1799 0.4950 0.5863 0.7085 0.1825 0.7092


> tapply(m2full.dat$re78, list(treat,subclass ), mean) #mean differences in outcome by subclass
     1       2    3    4     5    6    7     8    9   10    11    12   13    14    15   16   17     18    19
0    0   582.2 7233 3085  5274 3984    0 11024    0    0  4277 12760 9352 12719 14344 2159    0  422.6 14999
1 9930 34099.3 5150 6409 11143 1953 6552  5210 2824 5900 13189  4031 2788     0  1121 8174 1068 2722.6  4024
    20   21   22   23    24   25   26    27    28   29    30    31    32   33 34   35 36    37    38    39
0 6084 9943 2727 4638   0.0 7310 1306  6854  3057 1491 18701 17156 18209    0  0 4147  0  1534     0  7544
1 7375 1048 8882    0 647.2    0 6211 18783 23006 4942  3881  8049  2164 4527  0 5588  0 12418 12558 15190
    40      41      42      43   44     45    46    47   48     49   50   51    52   53   54   55   56    57
0 5496   116.7   94.57   461.1 3684  701.9 10386  6821 4753 5509.1 4520 1274  2285 9067 5307  959 7934  2449
1 1653 17094.6 3847.76 13706.8 8660 9729.3     0 26818 4322  559.4    0 1085 60308 1460    0 6943 4033 10363
    58      59 60    61    62    63   64    65   66   67      68    69    70   71   72     73   74   75   76
0 5272   803.9  0  3903 14421 19330 1054   0.0 6314 8900   54.68 20243     0 6146    0 7146.3    0 1614 1730
1 4232 11141.4  0 13386  9071     0 9266 712.5 2485 4147 9970.68     0 26372 3024 5615  485.2 6168 8484 1294
    77 78   79   80    81   82    83    84 85    86    87   88    89    90   91     92   93   94     95
0 9088  0    0    0 16486 1224  7315 17833  0     0 11354 7267  8154 17941    0 2285.7 2821 4639  648.7
1    0  0 3229 4280  4815 3463 11668     0  0 10977 13830 6788 13228  7354 3578  743.7 5523 1359 1061.6
      96    97    98    99   100   101  102  103   104  105    106   107  108   109
0 5288.4 33.99 19438 10122  6170 17765 2282 1893  9453    0  187.7 18717 3702 11594
1  672.9  0.00 10093 13318 12591     0 4384 5112 36647 3787 7774.5 17857 3596 10747

> m2full.dat[subclass ==17,] #closer look at subclass 17 which shows 0 for control mean (only 1 each T/C)
        treat age educ black hispan married nodegree re74  re75 re78 distance weights subclass propen
NSW57       1  37    9     1      0       0        1    0   0.0 1068   0.6581   1.000       17 0.6581
PSID277     0  19   10     1      0       0        1 1056 205.9    0   0.6588   2.319       17 0.6588
> 

############################
#let's try Ben's full matching with all the vars; sould also compare with propensity in part 3
> m2fullvars.out = matchit(treat ~ re74 + re75 + educ + black + hispan + age + married + nodegree,
+ data = lalonde, method = "full")
> m2fullvars.out  

Call: 
matchit(formula = treat ~ re74 + re75 + educ + black + hispan + 
    age + married + nodegree, data = lalonde, method = "full")

Sample sizes:
          Control Treated
All           429     185
Matched       429     185
Unmatched       0       0
Discarded       0       0

> summary(m2fullvars.out)

Call:
matchit(formula = treat ~ re74 + re75 + educ + black + hispan + 
    age + married + nodegree, data = lalonde, method = "full")

Summary of balance for all data:
         Means Treated Means Control Mean Diff  eQQ Med eQQ Mean  eQQ Max
distance         0.577         0.182     0.395    0.518    0.396    0.597
re74          2095.574      5619.237 -3523.663 2425.572 3620.924 9216.500
re75          1532.055      2466.484  -934.429  981.097 1060.658 6795.010
educ            10.346        10.235     0.111    1.000    0.703    4.000
black            0.843         0.203     0.640    1.000    0.643    1.000
hispan           0.059         0.142    -0.083    0.000    0.081    1.000
age             25.816        28.030    -2.214    1.000    3.265   10.000
married          0.189         0.513    -0.324    0.000    0.324    1.000
nodegree         0.708         0.597     0.111    0.000    0.114    1.000


Summary of balance for matched data:
         Means Treated Means Control Mean Diff eQQ Med eQQ Mean   eQQ Max
distance         0.577         0.576     0.001   0.003    0.006     0.087
re74          2095.574      2434.869  -339.295 311.523  659.367 13121.750
re75          1532.055      1577.728   -45.672 205.887  468.549 12746.050
educ            10.346        10.442    -0.096   0.000    0.392     4.000
black            0.843         0.835     0.009   0.000    0.000     1.000
hispan           0.059         0.061    -0.001   0.000    0.002     1.000
age             25.816        24.707     1.110   3.000    3.141     9.000
married          0.189         0.131     0.058   0.000    0.044     1.000
nodegree         0.708         0.695     0.013   0.000    0.011     1.000

Percent Balance Improvement:
         Mean Diff. eQQ Med eQQ Mean eQQ Max
distance      99.64   99.50   98.493   85.49
re74          90.37   87.16   81.790  -42.37
re75          95.11   79.02   55.825  -87.58
educ          13.08  100.00   44.158    0.00
black         98.66  100.00   99.938    0.00
hispan        98.26    0.00   98.027    0.00
age           49.88 -200.00    3.788   10.00
married       82.06    0.00   86.557    0.00
nodegree      88.53    0.00   90.133    0.00

Sample sizes:
          Control Treated
All           429     185
Matched       429     185
Unmatched       0       0
Discarded       0       0

#### do the outcome re78 analysis
> m2fullvars.out = matchit(treat ~ re74 + re75 + educ + black + hispan + age + married + nodegree, data = lalonde, method = "full")
Loading required package: optmatch
Loading required package: survival
The optmatch package has an academic license. Enter relaxinfo() for more information.
> m2full.dat = match.data(m2fullvars.out)
> head(m2full.dat)
     treat age educ black hispan married nodegree re74 re75       re78  distance weights subclass
NSW1     1  37   11     1      0       1        1    0    0  9930.0460 0.6387699       1        1
NSW2     1  22    9     0      1       0        1    0    0  3595.8940 0.2246342       1      103
NSW3     1  30   12     1      0       0        0    0    0 24909.4500 0.6782439       1        8
NSW4     1  27   11     1      0       0        1    0    0  7506.1460 0.7763241       1       15
NSW5     1  33    8     1      0       0        1    0    0   289.7899 0.7016387       1       24
NSW6     1  22    9     1      0       0        1    0    0  4056.4940 0.6990699       1       32

> library(lme4)
Loading required package: Matrix

> mfull.lmer = lmer(re78 ~ treat + (1 + treat|subclass), data = m2full.dat) # like for the quintiles in base section
> summary(mfull.lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: re78 ~ treat + (1 + treat | subclass)
   Data: m2full.dat

REML criterion at convergence: 12633.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.5267 -0.7497 -0.2851  0.5165  7.4616 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 subclass (Intercept)  4027897 2007          
          treat        3810081 1952     -0.89
 Residual             51103906 7149          
Number of obs: 614, groups:  subclass, 104

Fixed effects:
            Estimate Std. Error t value
(Intercept)   5862.9      507.8  11.546
treat          504.5      736.2   0.685    ## about the same as seen in base section 384 (952)

Correlation of Fixed Effects:
      (Intr)
treat -0.679
> confint(mfull.lmer)              # this took a while
Computing profile confidence intervals ...
                2.5 %   97.5 %
.sig01      1216.8647 3011.968
.sig02        -1.0000    1.000
.sig03         0.0000      Inf
.sigma      6740.8624 7581.414
(Intercept) 4807.1941 6873.722
treat       -985.7685 1977.973
There were 50 or more warnings (use warnings() to see the first 50)

===========================================
Lab Section 6

#Try Mahalanobis
# Lab text asks you to do "full" match, done below. For comparison with
# propensity distance a "nearest" ratio=1 match may be easiest

> matchit.mahal2 = matchit( treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, method = "nearest",
+ distance = "mahalanobis", data = lalonde )
> summary(matchit.mahal2)

Call:
matchit(formula = treat ~ age + educ + black + hispan + married + 
    nodegree + re74 + re75, data = lalonde, method = "nearest", 
    distance = "mahalanobis")

Summary of balance for all data:
         Means Treated Means Control SD Control Mean Diff  eQQ Med eQQ Mean eQQ Max
distance         6.571         8.598      5.202    -2.027    1.701    2.399   25.18
age             25.816        28.030     10.787    -2.214    1.000    3.265   10.00
educ            10.346        10.235      2.855     0.111    1.000    0.703    4.00
black            0.843         0.203      0.403     0.640    1.000    0.643    1.00
hispan           0.059         0.142      0.350    -0.083    0.000    0.081    1.00
married          0.189         0.513      0.500    -0.324    0.000    0.324    1.00
nodegree         0.708         0.597      0.491     0.111    0.000    0.114    1.00
re74          2095.574      5619.237   6788.751 -3523.663 2425.572 3620.924 9216.50
re75          1532.055      2466.484   3291.996  -934.429  981.097 1060.658 6795.01


Summary of balance for matched data:
         Means Treated Means Control SD Control Mean Diff  eQQ Med eQQ Mean  eQQ Max
distance         6.571         6.613      4.657    -0.042    0.079    0.377    25.18
age             25.816        25.346      9.597     0.470    3.000    2.665     7.00
educ            10.346        10.335      2.499     0.011    0.000    0.411     4.00
black            0.843         0.308      0.463     0.535    1.000    0.535     1.00
hispan           0.059         0.065      0.247    -0.005    0.000    0.005     1.00
married          0.189         0.395      0.490    -0.205    0.000    0.205     1.00
nodegree         0.708         0.665      0.473     0.043    0.000    0.043     1.00
re74          2095.574      4243.581   5931.698 -2148.007 1412.631 2278.350  9504.95
re75          1532.055      1817.187   2634.623  -285.132  300.774  520.283 10876.95

Percent Balance Improvement:
         Mean Diff. eQQ Med eQQ Mean eQQ Max
distance      97.91   95.37    84.30    0.00
age           78.76 -200.00    18.38   30.00
educ          90.22  100.00    41.54    0.00
black         16.44    0.00    16.81    0.00
hispan        93.47    0.00    93.33    0.00
married       36.53    0.00    36.67    0.00
nodegree      61.17    0.00    61.91    0.00
re74          39.04   41.76    37.08   -3.13
re75          69.49   69.34    50.95  -60.07

Sample sizes:
          Control Treated
All           429     185
Matched       185     185
Unmatched     244       0
Discarded       0       0

#Now do the lab text version using full rather than nearest method (use all the controls rather than 185 above)

> matchit.mahal = matchit( treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, method = "full",
+ distance = "mahalanobis", data = lalonde )
> summary(matchit.mahal)

Call:
matchit(formula = treat ~ age + educ + black + hispan + married + 
    nodegree + re74 + re75, data = lalonde, method = "full", 
    distance = "mahalanobis")

Summary of balance for all data:
         Means Treated Means Control Mean Diff  eQQ Med eQQ Mean eQQ Max
distance         6.571         8.598    -2.027    1.701    2.399   25.18
age             25.816        28.030    -2.214    1.000    3.265   10.00
educ            10.346        10.235     0.111    1.000    0.703    4.00
black            0.843         0.203     0.640    1.000    0.643    1.00
hispan           0.059         0.142    -0.083    0.000    0.081    1.00
married          0.189         0.513    -0.324    0.000    0.324    1.00
nodegree         0.708         0.597     0.111    0.000    0.114    1.00
re74          2095.574      5619.237 -3523.663 2425.572 3620.924 9216.50
re75          1532.055      2466.484  -934.429  981.097 1060.658 6795.01


Summary of balance for matched data:
         Means Treated Means Control Mean Diff  eQQ Med eQQ Mean  eQQ Max
distance         6.571         6.412     0.159    0.051    0.296    25.18
age             25.816        25.517     0.299    3.000    2.732     8.00
educ            10.346        10.300     0.045    0.000    0.339     4.00
black            0.843         0.444     0.399    0.000    0.407     1.00
hispan           0.059         0.065    -0.006    0.000    0.025     1.00
married          0.189         0.390    -0.200    0.000    0.198     1.00
nodegree         0.708         0.671     0.037    0.000    0.034     1.00
re74          2095.574      4088.903 -1993.329 1030.574 2172.700 12841.58
re75          1532.055      1620.410   -88.354  207.194  372.867 12516.89

Percent Balance Improvement:
         Mean Diff. eQQ Med eQQ Mean eQQ Max
distance      92.16   97.03    87.64    0.00
age           86.50 -200.00    16.31   20.00
educ          58.84  100.00    51.79    0.00
black         37.70  100.00    36.76    0.00
hispan        92.70    0.00    69.41    0.00
married       38.09    0.00    39.07    0.00
nodegree      66.76    0.00    69.69    0.00
re74          43.43   57.51    40.00  -39.33
re75          90.55   78.88    64.85  -84.21

Sample sizes:
          Control Treated
All           429     185
Matched       429     185
Unmatched       0       0
Discarded       0       0

Warning message:
In sample(names(treat)[treat == 0], numdraws/2, replace = TRUE,  :
  Walker's alias method used: results are different from R < 2.2.0
> mahabdat = match.data(matchit.mahal)
> mahabdat[1:20,]
      treat age educ black hispan married nodegree re74 re75    re78 distance weights subclass
NSW1      1  37   11     1      0       1        1    0    0  9930.0    9.237       1        1
NSW2      1  22    9     0      1       0        1    0    0  3595.9    9.083       1      136
NSW3      1  30   12     1      0       0        0    0    0 24909.5    4.768       1        9
NSW4      1  27   11     1      0       0        1    0    0  7506.1    3.558       1       96
NSW5      1  33    8     1      0       0        1    0    0   289.8    3.472       1      109
NSW6      1  22    9     1      0       0        1    0    0  4056.5    2.355       1       82
NSW7      1  23   12     1      0       0        0    0    0     0.0    4.553       1       38
NSW8      1  32   11     1      0       0        1    0    0  8472.2    4.552       1       26
NSW9      1  22   16     1      0       0        0    0    0  2164.0    7.048       1       58
NSW10     1  33   12     0      0       1        0    0    0 12418.1    6.548       1       63
NSW11     1  19    9     1      0       0        1    0    0  8173.9    2.661       1       25
NSW12     1  21   13     1      0       0        0    0    0 17094.6    4.354       1       67
NSW13     1  18    8     1      0       0        1    0    0     0.0    3.451       1       78
NSW14     1  27   10     1      0       1        1    0    0 18739.9    6.861       1       88
NSW15     1  17    7     1      0       0        1    0    0  3023.9    4.963       1       75
NSW16     1  19   10     1      0       0        1    0    0  3228.5    2.692       1       70
NSW17     1  27   13     1      0       0        0    0    0 14581.9    4.318       1        2
NSW18     1  23   10     1      0       0        1    0    0  7693.4    2.458       1       82
NSW19     1  40   12     1      0       0        0    0    0 10804.3    7.347       1      135
NSW20     1  26   12     1      0       0        0    0    0 10747.4    4.485       1      137

> attach(mahabdat)

> table(treat, subclass)
     subclass
treat  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
    0  1  1  6  6  1  5  3  1  2  1  1 18  1  3  3  1  1  5  1  7  1  6  1  1  1  1  1  9  1  1
    1  1  3  1  1  1  1  1  1  1  5  1  1  3  1  1  1  2  1  1  1  1  1  1  2  2  2  1  1  1  1
     subclass
treat 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
    0  7  2  1  1  2  1 19  1  2 12  1  1  1  3  1  3  2  1  2  5  2  1  2  1  1  5  1  2  1  4
    1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
     subclass
treat 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
    0  1  1  1  1  9  3  5  1 12  1  3  3  1 10  1  1  1  1  1  1  1  1  3  2  7  1  1  1  9  1
    1  1  1  1  1  1  1  1  1  1  3  1  1  2  1  2  1  2  2  1  2  2  7  1  1  1  1  1  1  1  1
     subclass
treat 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
    0  1  1  1  4  7  1  1  1  1   1   5   1   1   1   7   1   7   2   1   2   2   2   5   3   4
    1  1  3  2  1  1  4  1  2  1   6   1   2   1   1   1   1   1   1   7   1   1   1   1   1   1
     subclass
treat 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137
    0   4   4   2   2   1  16   6   8   1  11   1   6   2   6   6   1  12   5   1   1   5   1
    1   1   1   1   1   1   1   1   1   4   1   1   1   1   1   1   1   1   1   1   1   1   1

              

> tapply(re78, list(treat,subclass), mean) # look at outcomes
      1     2    3    4     5    6     7        8     9   10    11   12     13    14   15    16
0 12618  1745 4802 7542  2282 3834  7175    54.68  7819    0 14344 5815 9473.7 12370 8821  2862
1  9930 17782 5150 6409 11163 9643 11143 16218.04 24909 2463     0 6552  721.7     0 3192 20506
    17   18   19   20    21    22   23    24   25   26   27   28    29   30    31   32   33
0 6399 5974    0 4584 12490  5756    0 12760 7544 7310 6495 1926 14053 2821  3644 2112 8154
1 9185 5912 3094    0  1255 13189 2788     0 8149 6658 2349 1068     0 7285 13168 1048 1924
       34   35   36   37   38   39     40    41    42   43   44    45    46    47     48    49
0 22095.0 4123 2233 8894 2814 7589 9106.6 12108  5552 1067 8437 12419  7343 11111  422.6 10650
1   762.9 8882    0 8547    0 7480  647.2     0 11966 9599 6211 18783 18678 23006 6456.7     0
    50   51   52   53 54    55   56   57   58   59    60   61   62    63    64    65   66    67
0 2237 5284 6848 5813  0 10746 4224    0 7215 1422 13256 5307 5345  1534 19516  5479 5611  8086
1 2321 4942    0    0  0     0 3881 8049 2164 9984     0 4483    0 12418 12558 22163 1653 17095
      68    69   70   71    72    73   74    75    76   77   78   79    80      81   82   83
0 3057.4  4319    0 9062 15485 10122 7128 14065 14999 7236 7146 1054 14476   31.03 1224 9469
1  671.3 17815 4322    0 26818  4955 1773  1512 11233  959 8615 5445 30154 1442.65 3206 6943
     84   85    86 87    88    89   90   91   92    93   94   95   96    97   98      99  100
0 17811 9706  8900  0     0  6787 4788 3372 6084 803.9 7482 8937    0     0 2159   33.99 7934
1  4033 4232 11141  0 18740 13386 4850    0 8984 830.3    0 2485 3051 26372 2808 3196.57 2890
   101  102  103   104  105   106  107  108  109  110   111  112  113   114   115  116  117
0 5062 4520 7699 18133 1211 20243 3789 3838    0 2857 12957 6962 3893 11521  5735 7867 8336
1 6168 7799 8484  1294    0  5010 9371 4280 2441 3463  7383    0    0 10977 13830 6788 9559
    118   119   120  121  122     123  124   125  126   127  128  129   130   131   132   133
0 13860 10085   0.0 6163 6411 16704.8 7987  6870    0  7245 8606 7859  8245     0  8815 14163
1 13228  7458 743.7 5523    0   672.9 6263 10093 6281 12591    0 5112 15953 36647 12804  3787
   134   135   136     137
0 9923  7216 10509   648.7
1 4182 10804  3596 10747.4




** that's (more than) enough here**********************