Computing for Research I Spring 2012 Exploratory Data Analysis and Hypothesis Testing February 21 Primary Instructor: Elizabeth Garrett-MAyer
Exploratory Data Analysis We’ve already discussed some basic stuff – sum and sum, detail – tab What other sorts of exploration might we do? Confidence intervals – for continuous variables – for categorical variables
Immediate command for CIs Continuous: cii N xbar s Binary: cii N phat or cii N x
Confidence intervals For a continuous variable: mean varlist Example: * estimate means of ceramide variables mean c18ceramide mean totalc - s1pc1
Additional options tab initialre initial mean c18ceramide, over(initialre) mean c18ceramide, vce(bootstr) mean c18ceramide, vce(bootstr) over(initialre) mean c18ceramide, over(initialre) mean c18ceramide, level(90)
Confidence intervals for proportion proportion varlist Examples proportion failure proportion failure death initialre proportion failure, vce(bootstr) proportion failure, cluster(patient) proportion failure, level(90)
Hypothesis Testing A number of different approaches Options – nonparametric vs. parametric – continuous vs. categorical (vs. other?) – one vs. two vs. more than two groups
One sample t-tests ttesti N mean sd null ttest varname == null ttest var1 == var2 *paired Examples: ttesti ttest c18c == 10 ttest frombaselines1p==100 ttest frombaselinec18==100
Two sample t-tests ttesti N1 mean1 sd1 N2 mean2 sd2 ttest varname1 == varname2, unpaired ttest varname, by(groupvar) Examples: ttest c18, by(sex) ttest c18, by(sex) unequal
Nonparametric? ranksum : two group comparison kwallis : >= two group comparison signrank : matched pairs signed ranks test signtest : sign test of matched pairs
Nonparametric? *nonparametric tests ranksum c18, by(sex) kwallis c18, by(sex) use ceramide.alldata, clear keep if cycle==3 gen c18dif = frombaselinec signrank c18dif=0 signrank frombaselinec18=100 signtest c18dif=0 signtest frombaselinec18=100
Anova anova y x (note that x is assumed to be categorical) anova y x1 x2 Examples: anova c18c initialre
One sample binomial tests prtest and bitest Difference? – prtest uses large sample approximations – bitest uses exact test bitest varname==p0 bitesti N x p0
One sample binomial tests use "SCBC2004.v9.dta", clear replace ercat=. if ercat==9 gen ercatn=cond(ercat==2,0,1) replace ercatn=. if ercat==. tab ercat ercatn bitest ercatn=0.50 bitest ercatn=0.65 prtest ercatn=0.65
Two (or more) sample binomial tests tab y x, exact tab y x, chi tab ercatn grade tab ercatn stage