1. Now they tell us. With daily low-dose aspirin use, risks may outweigh benefits for older adults . Daily Aspirin No Longer Recommended To Prevent Heart Attacks In Older Adults Publication: Effect of Aspirin on Disability-free Survival in the Healthy Elderly NEJM.

2. Even drinking Diet Coke everyday 'increases your risk of dying young from heart disease and cancer'. Sugar even worse.

1. Course outline and logistics

2. A matched observational study (DOS, Chap 7)

3. Study design versus inference

4. Basic tools of multivariate matching (DOS, Secs 8.1-8.4)

Rosenbaum DOS: Chapters 7 and 8 (8.1-8.4)

Observational Studies according to Donald B. Rubin

For objective causal inference, design trumps analysis Annals of Applied Statistics, Volume 2, Number 3 (2008), 808-840. Rubin talk . Another Rubin overview of matching: Matching Methods for Causal Inference Stuart, E.A. and Rubin, D.B. (2007). Best Practices in Quasi-Experimental Designs: Matching methods for causal inference. Chapter 11 (pp. 155-176) in Best Practices in Quantitative Social Science. J. Osborne (Ed.). Thousand Oaks, CA: Sage Publications.

Lalonde NSW data (DOS sec 2.1). Subclassification/Stratification and Full matching.

Week 1 handout pdf slides shown in class

Rogosa R-session (using R 3.3.3) 4/1/18 redo in R 3.4.4 (sparse)

2019 lalonde Matchit: full matching, balance with cobalt love.plot and bal.tab

2019 lalonde optmatch: fullmatch with outcome analysis

optmatch:fullmatch vignette optmatch another version another good tutorial optmatch Functions for Optimal Matching

Cobalt: Using cobalt with Other Preprocessing Packages Covariate Balance Tables and Plots: A Guide to the cobalt Package

1. In Week 1 Computing Corner with the Lalonde data (effect of job training on earnings), we started out (see R-session) by showing the ubiquitous [epidemiology to economics] analysis for observational data of an analysis of covariance, aka tossing the treatment variable and all the confounders into a regression equation predicting outcome and hoping for the best (c.f 2016 Week 1

3. We will be working a lot with matching based techniques. One of the best thinkers/writers on the topic of matching is Elizabeth Stuart from Johns Hopkins. For this problem, take a look at her paper: "Matching Methods for Causal Inference: A Review and a Look Forward." In lecture 01 you were introduced to "balance tables" (a.k.a. "Table 1") which summarizes the covariate distribution of the observations. A handful of questions: (a) as concisely as possible, state why we focus on balance assessments as part of our argumentation when attempting to perform causal inference, (b) in addition to a balance table, name other tools used to report balance, (c) why do we use standardized mean differences instead of p-values to assess balance when assessing the quality of a match design?, and (d) why is it kinda weird to use a p-value of the covariates in a randomized trial to assess balance? 4. In lecture 1 we quickly outlined some of the big challenges to causal inference when using observational data (see slide 41, "There should be strong effort to show the two groups are similar..."). These challenges include: inclusion/exclusion of observations, observational units that may be completely missing (censored, survival bias), missing data, imbalances in observed data, and imbalances in unobserved data. We'll address each of these at different points in the course. But let's focus on the decision to include/exclude observations. What we're doing when matching -- i.e., removing observations that do not have adequate counterparts in the contrast group -- may seem a bit subversive. The intuition is: why "throw away" data? I think there are two reasons people worry about "throwing away data." First, it seems like limiting the kinds of observations in our study we may be losing the ability to generalize our conclusions to a wider swath of the population. The counter to that is: yes, we are trading off the ability to generalize (i.e., external validity) for the ability to make stronger claims about a candidate causal effect (i.e., internal validity). The second concern is that it seems like more data is better. Formulate a response to this concern. (Note: OMG, this question seems so nebulous. Yup. That's how this works; you're playing Big Kid academics now. We made sure to mention this argument during lecture 01, so you know it. It's a common statistical argument nowadays. If you want to read your way out of this one... here's a good paper.)

5. Exercise in pair matching. In DOS Sec 2.1, Rosenbaum works with the randomized experiment data from NSW. In Week 1,2 Computing Corner we used the constructed observational study version of these data. Use the observational study data to do a version of the 1:1 matching in DOS section 2.1. Compare the balance improvement achieved from nearest neighbor matching with the full matching results in Computing Corner Week 1,2. 6. For the fullmatch analysis done in the Lalonde class presentation weeks 1 and 2, the outcome comparison was carried out using lmer to average the treatment effects over the 104 subclasses. A hand-wavy analogy to the paired t-test here would be to use the mean difference within each subclass. Show that (because some of the subclasses are large) this simplified analysis doesn't well replicate the lmer results. 7. optmatch package, fullmatch, lalonde.

MatchIt uses the optmatch package fullmatch command for its "full" option, as used in the class example. Using the raw optmatch (without the matchit wrapper) allows additional specifications and controlls for the full or optimal matching.

For lalonde data try out optmatch fullmatching and compare results for subclasses and balance with the class example using optmatch through MatchIt.

1. New York State Bail (Mike has a study for later on). Speaker Carl Heastie: SFY 19-20 Budget Includes Critical Criminal Justice Reform Legislation and Funding NYS Budget Will Expedite Trials, End Cash Bail for Low-Level Offenses and Reform Antiquated Discovery Laws

NY Post has a different view: Suspects in nonviolent crimes may walk free under state budget deal New York 'reformers' want to spring a lot of bad guys from jail

and even the NYDailyNews: Bail reform's killer mistake: Criminal justice overhaul may have taken away important tool courts use to keep dangerous people off the streets

2. Somewhat related to Lindner data (week 2 Computing Corner). Mick Jagger to have a stent placed in his heart

1. Basic tools of multivariate matching (DOS, Secs 8.1-8.4) 2. Potential outcomes framework (DOS 2.2) 3. Fisher's sharp null; permutation test (DOS 2.3) 4. Various practical issues in matching (DOS, Chap 9)

Rosenbaum DOS: Chapter 2 (plus week1 items)

From Donald B. Rubin

First section of Basic Concepts of Statistical Inference for Causal Effects in Experiments and Observational Studies Similar material Chaps 1 and 2 Causal Inference in Statistics, Social and Biomedical Sciences: An Introduction, Guido Imbens and Don Rubin linked on main page.

Percutaneous coronary intervention (PCI), commonly known as coronary angioplasty or simply angioplasty, is a non-surgical procedure used to treat the stenotic (narrowed) coronary arteries of the heart found in coronary heart disease.

Lindner data in package

Use of Lindner data in Vignette JSS PSAgraphics: An R Package to Support Propensity Score Analysis

Week 2 handout Rogosa R-session cc2 pdf slides Lindner example

Lindner fullmatch in optmatch and cobalt

1. The JSS vignette for PSAgraphics (linked week 2 Computing Corner) does subclassification matching for Lindner data. Repeat their subclassification analyses and try out their balance displays and tests. They have some specialized functions. Compare with our basic approach. 2. The Week 2 presentation showed an alternative propensity score analysis -- analysis of covariance with propensity score as covariate. A rough analogy is to ancova vs blocking (where blocking is our subclassification, say quintiles). Try out the basic (here logistic regression) ancova approach for the lifepres dichotomous outcome

3. Modify Fisher's Sharp Null to reflect the null hypothesis that the treatment adds five units to the outcome under control. Build a small simulation (e.g., 10 observations) and construct a table that summarizes the potential outcomes. Randomize using a fair coin flip to assign treatment or control for each observational unit. Use the permutation test to assess your data set using (i) Fisher's Sharp Null and (ii) the null hypothesis that the treatment adds five units to the outcome under control.

4. Building off of RQ#3 above, sort your observations so they are in ascending order based on the outcome under control. Randomize two at a time: one fair coin flip now assigns either the first or second observation to treatment (and the other to control). A second fair coin flip assigns either the third or the fourth observation to treatment (and the other to control). This continues so on and so forth. Use the appropriate permutation test to assess your data set using (i) Fisher's Sharp Null and (ii) the null hypothesis that the treatment adds five units to the outcome under control. Contrast the results here with the results from RQ#3.

5. Pair matching--nuclear plants data. See also week8,Stat209. Another (small) canonical matching example for optmatch expositions is the nuclear plants data from Cox and Snell text.

Data cleaning gives 7 "treatment" and resevoir of 19 controls. Try out 1:2 optimal pair matching using MatchIt (see also stat209 exs) and compare with pairmatch in optmatch plus balance diagnostics.

NPR: Bad Diets Are Responsible For More Deaths Than Smoking, Global Study Finds CNN: What we aren't eating is killing us, global study finds BBC: The diets cutting one in five lives short every year

Publication: Health effects of dietary risks in 195 countries, 1990-2017: a systematic analysis for the Global Burden of Disease Study 2017 The Lancet, Published:April 03, 2019 DOI:https://doi.org/10.1016/S0140-6736(19)30041-8

Caution, consider the source, from the fine folks that brought you Wakefield (vaccines and autism). Remember "In 1998, Andrew Wakefield and 12 of his colleagues published a case series in the Lancet, which suggested that the measles, mumps, and rubella (MMR) vaccine may predispose to behavioral regression and pervasive developmental disorder in children. Despite the small sample size (n=12), the uncontrolled design, see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3136032/

Last year, week 2, Lancet told us Drinking is as harmful as smoking and consuming more than five drinks a week lowers life expectancy

optmatch example from lecture

1. Finish up: Basic tools of multivariate matching (DOS, Secs 8.1-8.4) 2. Various practical issues in matching (DOS, Chap 9) 3. Inverse probability weighting ( Robins & Hernan, Chap 2.4) -

Rosenbaum DOS: Chapters 8 and 9

Smoking study (Prochaska et al 2016)

Dealing with limited overlap in estimation of average treatment effects (Crump et al 2009) (or see http://public.econ.duke.edu/~vjh3/working_papers/overlap.pdf )

Defining the Study Population for an Observational Study to Ensure Sufficient Overlap: A Tree Approach (Traskin & Small 2011)

CONSORT Statement (randomized trials)

STROBE Statement (observational studies)

1. In this class we've shown you a couple of tools to assess the adequacy of a matched set - for example: Love plots, balance tables, standardized mean differences, and histogram plots of fitted propensity scores (or covariates). Why haven't we shown you a statistical test? That's weird, right? A ton of researchers fall for this, failing to see why assessing balance using a hypothesis test in an observational study is problematic. There are a couple of valid critiques; try articulating at least one such critique. (Hint: think about how we calculate the SMD vs a standard error.) Once you've given it a go, check out Section 6.6 of this paper (great paper!) for a couple of solutions to this question.

2. In section 6.7 of that same paper, the authors say their preferred tool for assessing balance is an empirical QQ plot. What's a QQ plot? Compare and contrast the use of QQ plots and a balance table. Neither of these tools in dominate, so what are the benefits and drawbacks to each?

note: I'd be unhappy if this were not a terrible observational study, but you will see thousands just like this.

Diet Rich in Red Meat Linked to Earlier Death Publication: Dietary proteins and protein sources and risk of death: the Kuopio Ischaemic Heart Disease Risk Factor Study. The American Journal of Clinical Nutrition, 2019; DOI: 10.1093/ajcn/nqz025

1. First model for observational studies (DOS, Sections 3.1-3.3) 2.

Alternative propensity score analyses. Propensity score weighting: Inverse Probability of Treatment Weighting (IPTW). Treatment effect estimation without matching.

Review paper: Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effectsin observational studies Peter C. Austin and Elizabeth A. Stuart, Statistics in Medicine Statist. Med.2015,34 3661-36793661

A thorough R exposition using the Lalonde data A Practical Guide for Using Propensity Score Weighting in R Practical Assessment, Research & Evaluation, v20 n13 Jun 2015.

Also Cox Regression, comparison with full matching (Elizabeth Stuart)

Rogosa R-session pdf slides shown in class

Additional Resource:

1. Try out the ATE IPTW analysis (done in week4 computing corner) for the dichotomous outcome lifepres in the Lindner data. Compare with full matching results shown in class.

2. Try an ATT IPTW analysis for log(cardbill) outcome in the Lindner data.

3. The Wilcoxon signed rank test takes as its input a fixed number, designate this number

1. The naive model - DOS chptr 3.4-3.8 2. Design sensitivity - DOS chpr 14 3. Prognostic scores - Hansen (2008), Leacy & Stuart (2014) 4. Design devices (multiple controls, coherence, and known effects) - DOS 5.2.2 through 5.2.4References:

Hansen BB. The prognostic analogue of the propensity score. Biometrika. 2008 Jun 1;95(2):481-8.

Leacy FP, Stuart EA. On the joint use of propensity and prognostic scores in estimation of the average treatment effect on the treated: a simulation study. Statistics in medicine. 2014 Sep 10;33(20):3488-508.

Alternative computation of propensity scores (trees, boosting). Teamed with IPTW in

Toolkit for Weighting and Analysis of Nonequivalent Groups: A tutorial for the twang package Lalonde data, yet again.

Rogosa twang and ATT session with Lalonde data Week 5 pdf

Additional Resources:

Package

Package

To come, sensitivity analysis computations: package

1. Try out, using the Lalonde data (Week 1), the boosted regression approach to computing propensity scores using Ridgeway's (via Friedman)

2. Try out using the Lindner data shown in the PSAgraphics vignette (JSS linked week 2), the regression tree classification (use rpart) approach for propensity score estimation. Examine resulting propensity scores, balance for matching in six suclassifications, and outcome analysis for cardbill measure.

1. Prognostic scores - Hansen (2008), Leacy & Stuart (2014) 2. Using multiple outcomes - coherence and known null effects (DoS 5.2.3 and 5.2.4) 3. Using a second control group - mitigating bias (DoS 5.2.2) 4. Thick descriptionReading: The causal impact of bail on case outcomes for indigent defendants in New York City

Sensitivity analysis computations:

package

Rosenbaum pacakges

vignette: Two R Packages for Sensitivity Analysis (examples from sections 2 and 3)in

Rogosa sensitivity session CC_6 slides

1. Mercury example (2 controls) from section 3 and 6 of Rosenbaum vignette (linked in CC_6)

Fish often contains mercury. Does eating large quantities of fish increase levels of mercury in the blood? Data set mercury in the sensitivitymw package is from the 2009-2010 National Health and Nutrition Examination Survey (NHANES) and is the example in Rosenbaum (2014). There are 397 rows or matched triples and three columns, one treated with two controls. The values are methylmercury levels in blood. Column 1, Treated, describes an individual who had at least 15 servings of fish or shellfish in the previous month. Column 2, Zero, describes an individual who had 0 servings of fish or shellfish in the previous month. Column 3, One, describes an individual who had 1 serving of fish or shellfish in the previous month. In the comparison here, Zero and One are not distinguished; both are controls. Sets were matched for gender, age, education, household income, black race, Hispanic, and cigarette consumption.

2. Demonstration--see solution. Mechanics of setting up a matched data set for the sensitivity functions. Easiest to create the data set for the most common 1:1 matching situation (merge works without needing thought); steps for 1:1 matching setting below

Dose-response. How Much Coffee Is Too Much? Drinking 6 Cups In A Day Is Bad For The Heart Publication: Long-term coffee consumption, caffeine metabolism genetics, and risk of cardiovascular disease: a prospective analysis of up to 347,077 individuals and 8368 cases The American Journal of Clinical Nutrition, Volume 109, Issue 3, March 2019, Pages 509–516,

Regression discontinuity - Lee and Lemieux 2011 Case-noncase study - Breslow 1998 Isolation in the construction of natural experiments - Zubizarreta, Small, and Rosenbaum 2014

package

Rogosa session, causaldrf examples

also for continuous treatments, Covariate Balancing Propensity Score, package CBPS plus

categorical treatments (multi), package

Multinomial Propensity Scores (mnps) in

The Propensity Score with Continuous Treatments

Causal Inference With General Treatment Regimes: Generalizing the Propensity Score, Journal of the American Statistical Association, Vol. 99, No. 467 (September), pp. 854-866.

In week 7 Computing Corner we showed results for ADRF (average dose-response function) estimates using Imbens very clever artificial data example from the linked causaldrf vignette (see also CC_7 slides).

IPW results (see Weeks 3 and 4 Computing Corner for examples for binary treatements) were notable in apparant bad bad performance (all other estimates did pretty well). Keep in mind this artificial data test is not even a "phase 2" hurdle, as we are given the selection variables (X_1, X_2) that are responsible for individuals selecting dose (here denoted by T) other than randomness.

As IPW is dominant in applications like long-term occupation exposures (to bad stuff), the dose-reponse setting is quite relevant. The artificial data ADRF has an important feature of a non-monotonic dip, reminiscent of alcohol or even salt (a bit above 0 is better than zero) for health outcomes. So for another look at IPW, I tried to make a much easier example, with basically a straight-line ADRF (just with a little wiggle) by limiting dose (T) to > .5.

So try out the comparison of the hi_estimate (shown in class) and the iptw_estimate both from the causaldrf package with the true ADRF from the artificial data construction using values T > .5 (about half the data).

Are we any happier with the value of IPW (importance sampling)? Solution indicates to me: "no", YMMV. Lecture 7 addendum:

Case-control overview (shown in class) from Encyclopedia of Public Health

Breslow NE. Statistics in epidemiology: the case-control study.J Am Stat Assoc. 1996 Mar;91(433):14-28

Carbonated Soft Drink Consumption and Risk of Esophageal Adenocarcinoma JNCI: Journal of the National Cancer Institute, Volume 98, Issue 1, 4 January 2006, Pages 72-75,

Smoking and Lung Cancer in Chap 18 of HSAUR3 (Handbook of Statistical Analysis Using R). Also driving and backpain data in Chap 7 HSAUR2

Some R-packages and resources: SensitivityCaseControl: Sensitivity Analysis for Case-Control Studies; multipleNCC: Inverse Probability Weighting of Nested Case-Control Data; Two-phase designs in epidemiology (Thomas Lumley) ; Exact McNemar's Test and Matching Confidence Intervals

From epiDisplay v3.5.0.1 by Virasakdi Chongsuvivatwong Datasets on a matched case-control study of esophageal cancer

See also matched case-control study in the epiDisplay package manual.

Data from a matched case-control study testing whether smoking, drinking alcohol and working in the rubber industry (all dichotomous) are risk factors for oesophageal cancer. Each case was matched with his/her neighbours of the same sex and age group. The matching ratio in

Publication: Chongsuvivatwong, V. 1990 A case-control study of esophageal cancer in Southern Thailand. J Gastro Hep 5:391-394.

Encouragement design (Holland 1988 ) Instrumental variable methods for causal inference ( Baiocchi, Cheng and Small 2014)

Example from rdd manual (Stat209 handout) ascii version

Angrist-Lavy Maimondes (class size) data sections 1.3, 3.2, 5.2.3, 5.3 DOS text

read data

R-package--rdd; Regression Discontinuity Estimation Author Drew Dimmery

Also Package

RJournal for rdrobust, rdrobust: An R Package for Robust Nonparametric Inference in Regression-Discontinuity Designs

Stat209, Regression Discontinuity intro handout

William Trochim's Knowledge Base

Trochim W.M. & Cappelleri J.C. (1992). "Cutoff assignment strategies for enhancing randomized clinical trials." Controlled Clinical Trials, 13, 190-212. pubmed link

Journal of Econometrics (special issue) Volume 142, Issue 2, February 2008, The regression discontinuity design: Theory and applications Regression discontinuity designs: A guide to practice, Guido W. Imbens, Thomas Lemieux

Also from Journal of Econometrics (special issue) Volume 142, Issue 2, February 2008, Waiting for Life to Arrive: A history of the regression-discontinuity design in Psychology, Statistics and Economics, Thomas D Cook

the original paper: Thistlewaite, D., and D. Campbell (1960): "Regression-Discontinuity Analysis: An Alternative to the Ex Post Facto Experiment," Journal of Educational Psychology, 51, 309-317.

Capitalizing on Nonrandom Assignment to Treatments: A Regression-Discontinuity Evaluation of a Crime-Control Program Richard A. Berk; David Rauma

Berk, R.A. & de Leeuw, J. (1999). "An evaluation of California's inmate classification system using a generalized regression discontinuity design."

To come: Instrumental Variable Methods: packages

Extra: try out also the

i. Create artificial data with the following specification. 10,000 observations; premeasure (Y_uc in my session) gaussian mean 10 variance 1. Effect of intervention (rho) if in the treatment group is 2 (or close to 2) and uncorrelated with Y_uc. Probability of being in the treatment group depends on Y_uc but is not a deterministic step-function ("sharp design"):

ii. Try out analysis of covariance with Y_uc as covariate. Obtain a confidence interval for the effect of the treatment.

iii. Try out the fancy econometric estimators (using finite support) as in the rdd package. See if you find that they work poorly in this very basic fuzzy design example.

Extra: try out also the

It is Mom's fault. Having 'Cold,' Unsupportive Mother Linked To Premature Aging, Increased Disease Risk Publication: Cold parenting is associated with cellular aging in offspring: A retrospective study Biological Psychology Volume 145, July 2019, Pages 142-149

Instrumental variable methods for causal inference ( Baiocchi, Cheng and Small 2014) Near-far matching. Package nearfar Vignette: " Near-Far Matching in R: The nearfar Package." Journal of Statistical Software (2018).

IV handout CC_9 slides Rogosa IV sessions, examples

Additional resources:

2. Observational study: Use the Card data, described in the ivmodel vignette, to carry out some basic IV analyses. Compare