LISA Short Course Series R Statistical Analysis Ning Wang Summer 2013 LISA: R Statistical AnalysisSummer 2013.

Slides:



Advertisements
Similar presentations
Associate Collaborator for LISA Department of Statistics, VT
Advertisements

CART: Classification and Regression Trees Chris Franck LISA Short Course March 26, 2013.
Hypothesis Testing Steps in Hypothesis Testing:
Chapter 16 Introduction to Nonparametric Statistics
Inference for Regression
Is it statistically significant?
Quantitative Data Analysis: Hypothesis Testing
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Lecture 10 Non Parametric Testing STAT 3120 Statistical Methods I.
BHS Methods in Behavioral Sciences I April 25, 2003 Chapter 6 (Ray) The Logic of Hypothesis Testing.
T-Tests.
t-Tests Overview of t-Tests How a t-Test Works How a t-Test Works Single-Sample t Single-Sample t Independent Samples t Independent Samples t Paired.
T-Tests.
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
Chapter Seventeen HYPOTHESIS TESTING
Dealing With Statistical Uncertainty Richard Mott Wellcome Trust Centre for Human Genetics.
Analysis of Variance. Experimental Design u Investigator controls one or more independent variables –Called treatment variables or factors –Contain two.
Lesson #25 Nonparametric Tests for a Single Population.
Final Review Session.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Chapter 11: Inference for Distributions
Student’s t statistic Use Test for equality of two means
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Correlation and Regression Analysis
T-T ESTS AND A NALYSIS OF V ARIANCE Jennifer Kensler.
T-T ESTS AND A NALYSIS OF V ARIANCE Jennifer Kensler.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
PS 225 Lecture 15 Analysis of Variance ANOVA Tables.
Inferential Statistics: SPSS
Selecting the Correct Statistical Test
Inference for regression - Simple linear regression
Lab 5 Hypothesis testing and Confidence Interval.
STA291 Statistical Methods Lecture 31. Analyzing a Design in One Factor – The One-Way Analysis of Variance Consider an experiment with a single factor.
Shuyu Chu Department of Statistics February 17, 2014 Lisa Short Course Series R Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis.
Choosing and using statistics to test ecological hypotheses
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2.
Learning Objectives In this chapter you will learn about the t-test and its distribution t-test for related samples t-test for independent samples hypothesis.
Previous Lecture: Categorical Data Methods. Nonparametric Methods This Lecture Judy Zhong Ph.D.
SUMMARY Hypothesis testing. Self-engagement assesment.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
Analysis of variance Petter Mostad Comparing more than two groups Up to now we have studied situations with –One observation per object One.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
11 Chapter 12 Quantitative Data Analysis: Hypothesis Testing © 2009 John Wiley & Sons Ltd.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Hong Tran, April 21, 2015.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Data Analysis.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
Data Science and Big Data Analytics Chap 3: Data Analytics Using R
PCB 3043L - General Ecology Data Analysis.
Summary.
T-T ESTS AND A NALYSIS OF V ARIANCE Jennifer Kensler July 13, 2010 Fralin Auditorium, Virginia Tech This presentation is annotated. Please click on the.
Handbook for Health Care Research, Second Edition Chapter 13 © 2010 Jones and Bartlett Publishers, LLC CHAPTER 13 Statistical Methods for Continuous Measures.
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
SUMMARY EQT 271 MADAM SITI AISYAH ZAKARIA SEMESTER /2015.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 10 Introduction to the Analysis.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
Non-parametric Tests Research II MSW PT Class 8. Key Terms Power of a test refers to the probability of rejecting a false null hypothesis (or detect a.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Y - Tests Type Based on Response and Measure Variable Data
Data Analysis and Interpretation
Chapter 10 Introduction to the Analysis of Variance
Presentation transcript:

LISA Short Course Series R Statistical Analysis Ning Wang Summer 2013 LISA: R Statistical AnalysisSummer 2013

Laboratory for Interdisciplinary Statistical Analysis Collaboration: Visit our website to request personalized statistical advice and assistance with: Experimental Design Data Analysis Interpreting Results Grant Proposals Software (R, SAS, JMP, SPSS...) LISA statistical collaborators aim to explain concepts in ways useful for your research. Great advice right now: Meet with LISA before collecting your data. All services are FREE for VT researchers. We assist with research—not class projects or homework. LISA helps VT researchers benefit from the use of Statistics LISA also offers: Educational Short Courses: Designed to help graduate students apply statistics in their research Walk-In Consulting: M-F 1-3 PM GLC Video Conference Room for questions requiring <30 mins 2

1. Review on plots 2. T-test 2.1 One sample t-test 2.2 Two sample t-test 2.3 Paired T-test 2.4 Normality Assumption & Nonparametric test 3. ANOVA 3.1 One-way ANOVA 3.2 Two-way ANOVA 4. Regression Outline Summer 2013 LISA: R Statistical Analysis

LISA: R Basics Summer 2013 Review on plots What do we actually do with a data set when it’s handed to us? Using visual tools is a critical first step when analyzing data and it can often be sufficient in its own right! By observing visual summaries of the data, we can: Determine the general pattern of data Identify outliers Check whether the data follow some theoretical distribution Make quick comparisons between groups of data LISA: R Statistical Analysis

Review on plots Summer 2013LISA: R Statistical Analysis plot(x, y) (or equivalent plot(y~x)) scatter plot of variables x and y pairs(cbind(x, y, z)): scatter plots matrix of variables x, y and z hist(y): histogram boxplot(y): boxplot lm(y~x): fit a straight line between variable x and y

Summer 2013 T-TEST LISA: R Statistical Analysis 2.1 One sample t-test Research Question: Is the mean of a population different from the null hypothesis (a nominal value)? Example: Testing whether the average mpg (Miles/(US) gallon)of cars is different from 23 mpg Hypothesis: Null hypothesis: the average mpg of cars is 23 mpg Alternative hypothesis: the average mpg of cars is not equal to(or greater/less than) 23 mpg In R: t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95)

T-Test 2.2 Two sample t-test Research Question: Are the means of two populations different? Example: Consider whether the average mpg of automatic cars is different from manual? Hypothesis: Null hypothesis: the average mpg of automatic cars equals to the average mpg of manual cars Alternative hypothesis: the average mpg of automatic cars is not equal to (or greater/less than) the average mpg of manual cars In R: t.test(mpg~am) t.test(mpg~am,var.equal=T) Summer 2013LISA: R Statistical Analysis

T-TEST Summer Sample size calculation Research Question: How many observations are needed for a given power or What is the power of the test given a sample size? Power = probability rejecting null when null is false In R: power.t.test(n = NULL, delta = NULL, sd = 1, sig.level = 0.05, power = NULL, type = c("two.sample", "one.sample", "paired"), alternative = c("two.sided", "one.sided"), strict = FALSE) Calculate power given a sample size: power.t.test(delta=2,sd=2,power=.8) Calculate the sample size given a power: power.t.test(n=20, delta=2, sd=2) LISA: R Statistical Analysis

T-TEST Summer Paired T-test Research Question: Given the paired structure of the data are the means of two sets of observations significantly different? Example: a study was conducted to generate electricity from wave power at sea. Two different procedures were tested for a variety of wave types with one of each type tested on every wave. The question of interest is whether bending stress differs for the two mooring methods. In R: t.test(method1,method2,paired=T) or : t.test(diff), diff=method1-method2 LISA: R Statistical Analysis

2.5 Checking assumptions & Nonparametric test Using t-test, we assume the data follows a normal distribution, to check this normal assumption: visualization and statistical test. Visualization Histogram: shape of normal distribution: symetric, bell-shape with rapidly dying tails. QQ-plot: plot the theoretical quintiles of the normal distribution and the quintiles of the data, straight line shows assumption hold. Statistical Test: Shapiro-Wilk Normality Test In R: shapiro.test(data) T-TEST Summer 2013LISA: R Statistical Analysis

2.5 Checking assumptions & Nonparametric test When the normal assumption does not hold, we use the alternative nonparametric test. Wilcoxon Signed Rank Test Null hypothesis: mean difference between the pairs is zero Alternative hypothesis: mean difference is not zero In R: wilcox.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, exact = NULL, correct = TRUE, conf.int = FALSE, conf.level = 0.95,...) T-TEST Summer 2013LISA: R Statistical Analysis

T-test: Compare the mean of a population to a nominal value or compare the means of equivalence for two populations How about compare the means of more than two populations? We use ANOVA! One-Way ANOVA: Compare the means of populations where the variation are attributed to the different levels of one factor. Two-Way ANOVA: Compare the means of populations where the variation are attributed to the different levels of two factors. ANOVA--Analysis Of Variance Summer 2013LISA: R Statistical Analysis

1.One-way ANOVA Example: Compare the mpg for 3 cyl levels mtcars data: mpg: Miles/(US) gallon cyl: Number of cylinders am: Transmission (0 = automatic, 1 = manual) Hypothesis: Null hypothesis: null hypothesis the three levels have equal mpg Alternative hypothesis: at least two levels do not have equal mpg In R: aov(mpg~factor(cyl)) and summary(a.1) ANOVA--Analysis Of Variance Summer 2013LISA: R Statistical Analysis

2. Two-way ANOVA Example: Compare the mpg for 3 cyl levels and 2 types of transmission Three effects to be considered: cyl levels, types of transmission and the interactions In R: a.2 = aov(mpg~factor(am)*factor(cyl)) and summary(a.2) ANOVA--Analysis Of Variance Summer 2013LISA: R Statistical Analysis

Research Question: What the relationship between two variables? Or one variable with several other variables? Example: Brownlee's Stack Loss Plant Data Air.Flow: Flow of cooling air Water.Temp: Cooling Water Inlet Temperature AcidConc.: Concentration of acid [per 1000, minus 500] stack.loss: Stack loss What is the relationship of Air.Flow and the stack.loss? Or How are the variables Air.Flow, Water.Temp and Acid.Conc related to stack.loss? In R: lm(formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset,...) Regression Summer 2013LISA: R Statistical Analysis

Summer 2013 Please don’t forget to fill the sign in sheet and to complete the survey that will be sent to you by . Thank you! LISA: R Statistical Analysis