Education 401C  Autumn 2015
    Data analysis examples using R


David Rogosa Sequoia 224,   rag{AT}stat{DOT}stanford{DOT}edu   
Course web page: http://rogosateaching.com/ed401//


         For 2014 course materials     go here
New Location: Mon 3:30 PM - 5:20 PM at 50-51P with Rogosa, D.
Anthropology Building 50, first floor
 From the Registrar  
 Data analysis examples using R.      Ed401C Aut 2015 (1 unit)
Description
We will do basic and intermediate level data analysis examples, like those that students will have seen in their courses, in R.
Examples include: descriptive statistics and plots, analysis of variance, correlation and regression, categorical variables, multilevel data. 
See http://rogosateaching.com/ed401/
Terms: Aut | Units: 1 | Grading: Satisfactory/No Credit
Instructors: Rogosa, D. (PI) 

EDUC 401C: Data Analysis Examples Using R	2015-2016 Autumn
    EDUC 401C | 1 units | Class # 17377 | Section 01 | Grading: Satisfactory/No Credit | WKS
    09/28/2015 - 11/02/2015 Mon 3:30 PM - 5:20 PM at Lathrop 282 with Rogosa, D. (PI) note:this is the old GSB building
    Axess Enrollment will open for students on August 1st.
    Instructors: Rogosa, D. (PI)
    Notes: Class meets on Sept 28, Oct 5, Oct 12, Oct 19, Nov 2.


Course Schedule Five (2hr) mtgs M 3:30-5:20 Lathrop 282 Sept 28 1. Descriptive stats; analysis of means (up through anova, factorial designs) Oct 5 2. Correlation and regression (up through multiple regression, variable selection etc) Oct 12 3. Categorical variables (tables, logistic regression) Oct 19 4. Overflow and additional regression topics. Missing Data (mice); Generalized linear models for counts; Smoothers (loess); Multilevel data (descriptives, plots, and intro to mixed-effects models) Nov 2 5. Student analyses (students present a small analysis of their own)

Getting started (download and install R)
1/7/09.  NY Times endorses R: Data Analysts Captivated by R's Power
Current version of R is version 3.2.2 (Fire Safety) 2015-08-14.
    For references and software: The R Project for Statistical Computing   Closest download mirror is Berkeley
Many students employ RStudio to enhance their R-enjoyment. I won't use it, but it serves very well especially on a single screen (e.g. portable) machine. "RStudio IDE is a powerful and productive user interface for R. It's free and open source, and works great on Windows, Mac, and Linux."     A short R-intro that includes RStudio (and much more)

Resources
The greatest challenge here is not being overwhelmed by all the options.
0. Reference Cards and other short documents section of CRAN page
1. When I taught the introductory course Stat141, the text for computing was Using R for introductory Statistics, J. Verzani, Chapman & Hall, 2005.
     An online version available from John Verzani's page .   alternate version, single pdf    UsingR R-package
2. In Stat209 a primary resource for R and data analysis is   Data analysis and graphics using R (2007) J. Maindonald and J. Braun, Cambridge 2nd edition 2007. 3rd edition 2010    short draft version in CRAN      Text resource page
3. A handbook of statistical analyses using R (second edition). Brian Everitt, Torsten Hothorn CRC Press, Index of book chapters   Stanford access      Chaps 2-7 relevant to our materials. Data sets etc Package 'HSAUR2'
4. From CRAN central: An Introduction to R Notes on R: A Programming Environment for Data Analysis and Graphics Version 3.0.1 (2013-05-16) W. N. Venables, D. M. Smith and the R Core Team


WEEK 1 (9/28) Descriptive Statistics, Group Comparisons (including anova, factorial designs)
Core Examples
a. Harvey Goldstein's Exam data, single school excerpt. Schoo114 data (ascii)  data analysis session    Sch14 graphics    data documentation     extra: UsingR function ex
b. Andrew Gelman's Sesame Street data (read in stata file using foreign package)     data import and analysis session   write out augmented and subsetted data sets
    manual for package foreign    advanced data import guide: R Data Import/Export
c. One-way anova with Tukey multiple comparisons, Harrington data    data analysis session (oneway anova and mult comp)   
Extra Items
d. Factorial Designs; Two-way anova (stat141 ex) Soybeans. Soybean data (ascii)   Textbook description   Stat141 analysis    SW text exs in RforBiologists , soybean p,30
e. Functions and Loops in R. Verzani pdf text p.6, std function;   Verzani pdf text p.47, Central Limit Theorem simulation.    Stat141 handout   Verzani text in a single pdf     see also Chap. 9,10 of An Introduction to R (#4 above).


Instructor note/musing. Just a placekeeper for some discussion as to the best ways to approach this material. I am not clear that using RStudio from the start is the best path for some: seemed to be getting in the way of our goal of basic use of R statistical analysis capabilities (simple session window). Also, as this week we will be in a real classroom, it's good to think about how you want to use your machines during the class time. Seems to me best use is to switch between the web pages, which are the examples I'm presenting, and trying out some items in an R-session after seeing the presented material. Every individual will have their own optimal mix of attentions.

WEEK 2 (10/05)   Correlation and Regression
Core Examples
a. Bivariate data
    i. correlation and scatterplots   platelet session      platelet plots     platelet data     extensions,fixes
    extra stat141 example Brain and Body Weights for 62 Species of Land Mammals
    ii. Straight-line regression     single subject Sleepstudy example    R session     plots and handout version
b. Multiple regression and interpretation of coefficients. MT woes of regression coefficients slides   R-session. Coleman data: adjusted-variables multiple regression   data file, 20 schools     Adjusted variable plot
more Coleman.    using pairs command      generating a larger version of Coleman data and more plots
Extra Items
c. Instrumental variables regression.    Estimating the Return to Education for Married Women (Woolridge text Ex 15.1).
     Mroz87 data      Mroz87 data description      IV data analysis session    Woolridge stata ivreg
10/15 Background exposition for IV and returns to schooling:  Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Joshua D. Angrist; Alan B. Krueger, The Journal of Economic Perspectives Vol. 15, No. 4 (Autumn, 2001), pp. 69-85
d. Missing data and multiple imputation methods.
i. single variable. Stef Van Buren example
ii. traditional bivariate multivariate data methods, correlation and regression example
          ssdat missing data example
iii. Multiple Imputation.
nhanes data in package mice     R-session using mice package
  Background materials, Multiple Imputation in R. van Buuren S and Groothuis-Oudshoorn K (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. see also multiple imputation online    Flexible Imputation of Missing Data. Stef van Buuren Chapman and Hall/CRC 2012. Book contents online        book extras    He is the originator of mice .   R resources.  Multivariate Analysis Task View, Missing data section, esp packages mice


WEEK 3 (10/12) Categorical Data, Generalized Linear Models    Holloween edition
Note: It would be good to take a few minutes at the beginning of class to take stock of our progress and its efficacy, now that we've had two of our four presentation sessions.
Core Examples
a. Proportions (1x2) and (1xK) tables.    proportions R session    For Titanic data, music Pete Seeger - The Titanic
b. 2x2 and rxc tables; independence and odds ratios.   R session, Titanic and nightmares
c. 2x2x2 tables; Simpsons paradox.   Death penalty example (Agresti)   
Extra Items
d. Dichotomous outcomes, logistic regression (glm logit link)       Donner party data         Donner analysis handout        Donner Rsession
e. Counts; more generalized linear models (log link)       Aids in Belgium R-session        Source:   AIDS in Belgium example, (from Simon Wood) single trajectory, count data using glm.


Addendum on Publishing. Last year in reponse to a student question, I mentioned sweave and alternatives. Students use these, even for problem sets, but these do have a learning curve, especially if you are unaware of TeX etc.
So I gathered together some quick resources, esp for use within R-studio where use of sweave is facililitated.
RStudio help.    Using Sweave and knitr    also      Using Sweave and LaTeX with R 3.0.2    Rstudio support queries:  1     2
Some additional intro docs. San Diego State   UW    Montana    Wharton,UPenn    Germany   Minnesota
Also the latex command from the Hmisc package

Addendum on scripts. Introduction to the R Project for Statistical Computing for use at ITC  Appendix B ;    A (very) short introduction to R  scripts section;     Kickstarting R - Writing R scripts


WEEK 4 (10/19)   Overflow and Extensions
a. One-way anova with Tukey multiple comparisons, Harrington data    data analysis session (oneway anova and mult comp)   
b. Factorial Designs; Two-way anova (stat141 ex) Soybeans. Soybean data (ascii)   Textbook description   Stat141 analysis    SW text exs in RforBiologists , soybean p,30
c. Missing data and multiple imputation methods.
Multiple Imputation example.   nhanes data in package mice     R-session using mice package
  Background materials, Multiple Imputation in R. van Buuren S and Groothuis-Oudshoorn K (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. see also multiple imputation online    Flexible Imputation of Missing Data. Stef van Buuren Chapman and Hall/CRC 2012. Book contents online        book extras    He is the originator of mice .   R resources.  Multivariate Analysis Task View, Missing data section, esp packages mice
d. Functions and Loops in R. Verzani pdf text p.6, std function;   Verzani pdf text p.47, Central Limit Theorem simulation.    Stat141 handout   Verzani text in a single pdf     see also Chap. 9,10 of An Introduction to R (#4 resource).
e.   Introduction to multilevel data; see Stat196A/Ed401D spring qtr.
High School and Beyond data.       complete Bryk dataset        Data construction from files in the MEMSS    
First pass, Bryk data:   session    plots    Additional plots for Multilevel data.   R session    xyplots
Extra Item
Publication and Display continued (student question).
Resources:  Package xtable manual     xtable gallery     2015 useR talk
also,  Package prettyR



WEEK 5 (11/2)   Student Data Analysis Presentations