> An Introduction to R [1] Tyler K. Perrachione [4] Gabrieli Lab, MIT [7] 28 June 2010 [10]

Slides:



Advertisements
Similar presentations
BA 275 Quantitative Business Methods
Advertisements

ANOVA example 4 Polychlorinated biphenyls (PCBs) previously used in the manufacture of large electrical transformers and capacitors, are extremely hazardous.
Multiple Regression Predicting a response with multiple explanatory variables.
Zinc Data SPH 247 Statistical Analysis of Laboratory Data.
Multiple regression analysis
Linear Regression Exploring relationships between two metric variables.
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
Final Review Session.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Student’s t statistic Use Test for equality of two means
15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.
Crime? FBI records violent crime, z x y z [1,] [2,] [3,] [4,] [5,]
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Multiple Regression Analysis. General Linear Models  This framework includes:  Linear Regression  Analysis of Variance (ANOVA)  Analysis of Covariance.
Regression Transformations for Normality and to Simplify Relationships U.S. Coal Mine Production – 2011 Source:
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Lecture 5: SLR Diagnostics (Continued) Correlation Introduction to Multiple Linear Regression BMTRY 701 Biostatistical Methods II.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/09/2015 7:46 PM 1 Two-sample comparisons Underlying principles.
Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.
A Repertoire of Hypothesis Tests  z-test – for use with normal distributions and large samples.  t-test – for use with small samples and when the pop.
TAUCHI – Tampere Unit for Computer-Human Interaction ERIT 2015: Data analysis and interpretation (1 & 2) Hanna Venesvirta Tampere Unit for Computer-Human.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Design and Analysis of Clinical Study 5. Introduction to R and Statistics Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.
Regression Model Building LPGA Golf Performance
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Using R for Marketing Research Dan Toomey 2/23/2015
FACTORS AFFECTING HOUSING PRICES IN SYRACUSE Sample collected from Zillow in January, 2015 Urban Policy Class Exercise - Lecy.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Lecture 7: Multiple Linear Regression Interpretation with different types of predictors BMTRY 701 Biostatistical Methods II.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
An Introduction to R Statistical Computing AMS 597 Stony Brook University Spring 2009 By Tianyi Zhang.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Lecture 3 Linear Models II Olivier MISSA, Advanced Research Skills.
Linear Models Alan Lee Sample presentation for STATS 760.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
The Assessment of Improved Water Sources Across the Globe By Philisile Dube.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Statistical Programming Using the R Language Lecture 3 Hypothesis Testing Darren J. Fitzpatrick, Ph.D April 2016.
> An Introduction to R [1] Tyler K. Perrachione [4] Gabrieli Lab, MIT [7] 28 June 2010 [10]
Chapter 12 Simple Linear Regression and Correlation
Résolution de l’ex 1 p40 t=c(2:12);N=c(55,90,135,245,403,665,1100,1810,3000,4450,7350) T=data.frame(t,N,y=log(N));T; > T t N y
Jefferson Davis Research Analytics
CHAPTER 7 Linear Correlation & Regression Methods
Correlation and regression
Checking Regression Model Assumptions
SA3202 Statistical Methods for Social Sciences
STAT120C: Final Review.
Console Editeur : myProg.R 1
Chapter 12 Simple Linear Regression and Correlation
Estimating the Variance of the Error Terms
Presentation transcript:

> An Introduction to R [1] Tyler K. Perrachione [4] Gabrieli Lab, MIT [7] 28 June 2010 [10]

2 What is R? A suite of operators for calculations on arrays, in particular matrices, A large, coherent, integrated collection of intermediate tools for data analysis, Graphical facilities for data analysis and display either on-screen or on hardcopy, and A well-developed, simple and effective programming language which includes conditionals, loops, user- defined recursive functions and input and output facilities. Free (as in beer and speech), open-source software

3 Installing, Running, and Interacting with R How to get R: – –Google: “R” –Windows, Linux, Mac OS X, source –On mindhive: R [terminal only] R –g Tk & [application window] Files for this tutorial: – –

4 Installing, Running, and Interacting with R

5 All examples are in file “R_Tutorial_Inputs.txt” Entering data –Math –Variables –Arrays –Math on arrays –Functions Getting help Reading data from files Selecting subsets of data

6 Installing, Running, and Interacting with R > [1] 2 > * 7 [1] 8 > (1 + 1) * 7 [1] 14 > x <- 1 > x [1] 1 > y = 2 > y [1] 2 > 3 -> z > z [1] 3 > (x + y) * z [1] 9 Math: Variables:

7 Installing, Running, and Interacting with R > x <- c(0,1,2,3,4) > x [1] > y <- 1:5 > y [1] > z <- 1:50 > z [1] [16] [31] [46] Arrays:

8 Installing, Running, and Interacting with R > x <- c(0,1,2,3,4) > y <- 1:5 > z <- 1:50 > x + y [1] > x * y [1] > x * z [1] [12] [23] [34] [45] Math on arrays:

9 Installing, Running, and Interacting with R > arc <- function(x) 2*asin(sqrt(x)) > arc(0.5) [1] > x <- c(0,1,2,3,4) > x <- x / 10 > arc(x) [1] [4] > plot(arc(Percents)~Percents, + pch=21,cex=2,xlim=c(0,1),ylim=c(0,pi), + main="The Arcsine Transformation") > lines(c(0,1),c(0,pi),col="red",lwd=2) Functions:

10 Installing, Running, and Interacting with R > help(t.test) > help.search("standard deviation") Getting help:

11 Installing, Running, and Interacting with R Example experiment: –Subjects learning to perform a new task: –Two groups of subjects (“A” and “B”; high and low aptitude learners) –Two types of training paradigm (“High variability” and “Low variability”) –Four pre-training assessment tests Example data in “R_Tutorial_Data.txt”

12 Installing, Running, and Interacting with R > myData <- read.table("R_Tutorial_Data.txt", + header=TRUE, sep="\t") > myData Condition Group Pre1 Pre2 Pre3 Pre4 Learning 1 Low A Low A Low A High B High B High B Reading data from files:

13 Installing, Running, and Interacting with R > plot(myData) Examining datasets:

14 Installing, Running, and Interacting with R > myData$Learning [1] [10] [19] [28] [37] [46] [55] > myData$Learning[myData$Group=="A"] [1] [10] [19] [28] Selecting subsets of data:

15 Installing, Running, and Interacting with R > myData$Learning [1] [10] [19] [28] [37] [46] [55] > attach(myData) > Learning [1] [10] [19] [28] [37] [46] [55] Selecting subsets of data:

16 Installing, Running, and Interacting with R > Learning[Group=="A"] [1] [10] [19] [28] > Learning[Group!="A"] [1] [10] [19] [28] > Condition[Group=="B"&Learning<0.5] [1] Low Low High High High High High High High [10] High High High High High Levels: High Low Selecting subsets of data:

17 Statistics and Data Analysis Parametric Tests –Independent sample t-tests –Paired sample t-tests –One sample t-tests –Correlation Nonparametric tests –Shapiro-Wilks test for normality –Wilcoxon signed-rank test (Mann-Whitney U) –Chi square test Linear Models and ANOVA

18 Basic parametric inferential statistics > t.test(Pre2[Group=="A"], + Pre2[Group=="B"], + paired=FALSE) Welch Two Sample t-test data: Learning[Group == "A"] and Learning[Group == "B"] t = , df = , p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y Independent sample t-tests:

19 Basic parametric inferential statistics > t.test(Pre2[Group=="A"], + Pre2[Group=="B"], + paired=FALSE, + var.equal=TRUE) Welch Two Sample t-test data: Learning[Group == "A"] and Learning[Group == "B"] t = 1.601, df = 61, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y Independent sample t-tests:

20 Basic parametric inferential statistics > t.test(Pre2[Group=="A"], + Pre2[Group=="B"], + paired=FALSE, + var.equal=TRUE, + alternative="greater") Welch Two Sample t-test data: Learning[Group == "A"] and Learning[Group == "B"] t = 1.601, df = 61, p-value = alternative hypothesis: true difference in means is greater than 0 95 percent confidence interval: Inf sample estimates: mean of x mean of y Independent sample t-tests:

21 Basic parametric inferential statistics > t.test(Pre4[Group=="A"], + Pre3[Group=="A"], + paired=TRUE) Paired t-test data: Pre4[Group == "A"] and Pre3[Group == "A"] t = , df = 30, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of the differences > boxplot(Pre4[Group=="A"], + Pre3[Group=="A"], + col=c("#ffdddd","#ddddff"), + names=c("Pre4","Pre3"),main="Group A") Paired sample t-test:

22 Basic parametric inferential statistics > t.test(Learning[Group=="B"], mu=0.5, alternative="greater") One Sample t-test data: Learning[Group == "B"] t = , df = 31, p-value = alternative hypothesis: true mean is greater than percent confidence interval: Inf sample estimates: mean of x > boxplot(Learning[Group=="B"], + names="Group B", ylab="Learning") > lines(c(0,2), c(0.5, 0.5), col="red") > points(c(rep(1,length(Learning[Group=="B"]))), + Learning[Group=="B"], pch=21, col="blue") One sample t-test:

23 Basic parametric inferential statistics > cor.test(Pre1,Learning,method="pearson") Pearson's product-moment correlation data: Pre1 and Learning t = , df = 61, p-value = 3.275e-13 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: sample estimates: cor > plot(Pre1,Learning) Correlation:

24 Basic parametric inferential statistics > cor.test(Pre1,Learning,method="pearson") Pearson's product-moment correlation data: Pre1 and Learning t = , df = 61, p-value = 3.275e-13 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: sample estimates: cor > plot(Learning~Pre1, ylim=c(0,1), xlim=c(0,1), ylab="Learning", xlab="Pre1", type="n") > abline(lm(Learning~Pre1),col="black",lty=2, lwd=2) > points(Learning[Group=="A"&Condition=="High"]~Pre1[Group=="A"&Condition=="High"], + pch=65, col="red", cex=0.9) > points(Learning[Group=="A"&Condition=="Low"]~Pre1[Group=="A"&Condition=="Low"], + pch=65, col="blue", cex=0.9) > points(Learning[Group=="B"&Condition=="High"]~Pre1[Group=="B"&Condition=="High"], + pch=66, col="red", cex=0.9) > points(Learning[Group=="B"&Condition=="Low"]~Pre1[Group=="B"&Condition=="Low"], + pch=66, col="blue", cex=0.9) > legend(2.5,1.0, c("LV Training", "HV Training"), pch=c(19), col=c("blue","red"), bty="y") > yCor <- cor.test(Pre1, Learning, method="pearson") > text(0.3,0.8, paste("r = ", format(myCor$estimate,digits=3),", p < ", format(myCor$p.value,digits=3)), cex=0.8) Correlation (fancier plot example):

25 Statistics and Data Analysis > t.test(Learning[Condition=="High"&Group=="A"], + Learning[Condition=="Low"&Group=="A"]) Welch Two Sample t-test data: Learning[Condition == "High" & Group == "A"] and Learning[Condition == "Low" & Group == "A"] t = 1.457, df = , p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y Are my data normally distributed?

26 Statistics and Data Analysis > plot(dnorm,-3,3,col="blue",lwd=3,main="The Normal Distribution") > par(mfrow=c(1,2)) > hist(Learning[Condition=="High"&Group=="A"]) > hist(Learning[Condition=="Low"&Group=="A"]) Are my data normally distributed?

27 Statistics and Data Analysis > shapiro.test(Learning[Condition=="High"&Group=="A"]) Shapiro-Wilk normality test data: Learning[Condition == "High" & Group == "A"] W = , p-value = > shapiro.test(Learning[Condition=="Low"&Group=="A"]) Shapiro-Wilk normality test data: Learning[Condition == "Low" & Group == "A"] W = , p-value = Are my data normally distributed?

28 Basic nonparametric inferential statistics > wilcox.test(Learning[Condition=="High"&Group=="A"], + Learning[Condition=="Low"&Group=="A"], + exact=FALSE, + paired=FALSE) Wilcoxon rank sum test with continuity correction data: Learning[Condition == "High" & Group == "A"] and Learning[Condition == "Low" & Group == "A"] W = 173.5, p-value = alternative hypothesis: true location shift is not equal to 0 Wilcoxon signed-rank / Mann-Whitney U tests:

29 Basic nonparametric inferential statistics > x <- matrix(c( + length(Learning[Group=="A"&Condition=="High"&Gender=="F"]), + length(Learning[Group=="A"&Condition=="Low"&Gender=="F"]), + length(Learning[Group=="B"&Condition=="High"&Gender=="F"]), + length(Learning[Group=="B"&Condition=="Low"&Gender=="F"])), + ncol=2) > x [,1] [,2] [1,] 4 12 [2,] 10 7 > chisq.test(x) Pearson's Chi-squared test with Yates' continuity correction data: x X-squared = , df = 1, p-value = Chi square test:

30 Linear models and ANOVA > myModel <- lm(Learning ~ Pre1 + Pre2 + Pre3 + Pre4) > par(mfrow=c(2,2)) > plot(myModel) Linear models:

31 Linear models and ANOVA > summary(myModel) Call: lm(formula = Learning ~ Pre1 + Pre2 + Pre3 + Pre4) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) Pre e-11 *** Pre *** Pre Pre Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 58 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 4 and 58 DF, p-value: 2.710e-13 Linear models:

32 Linear models and ANOVA > step(myModel, direction="backward") Start: AIC= Learning ~ Pre1 + Pre2 + Pre3 + Pre4 Df Sum of Sq RSS AIC - Pre Pre Pre Pre Step: AIC= Learning ~ Pre1 + Pre2 + Pre4 Df Sum of Sq RSS AIC - Pre Pre Pre Linear models:... Step: AIC= Learning ~ Pre1 + Pre2 Df Sum of Sq RSS AIC Pre Pre Call: lm(formula = Learning ~ Pre1 + Pre2) Coefficients: (Intercept) Pre1 Pre

33 Linear models and ANOVA > myANOVA <- aov(Learning~Group*Condition) > summary(myANOVA) Df Sum Sq Mean Sq F value Pr(>F) Group e-13 *** Condition * Group:Condition *** Residuals Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > boxplot(Learning~Group*Condition,col=c("#ffdddd","#ddddff")) ANOVA:

34 Linear models and ANOVA > myANOVA2 <- aov(Learning~Group*Condition+Gender) > summary(myANOVA2) Df Sum Sq Mean Sq F value Pr(>F) Group e-12 *** Condition * Gender Group:Condition ** Residuals Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > boxplot(Learning~Group*Condition+Gender, + col=c(rep("pink",4),rep("light blue",4))) ANOVA:

35 How to find R help and resources on the internet R wiki: R graph gallery: Kickstarting R: ISBN: