Shuyu Chu Department of Statistics February 17, 2014 Lisa Short Course Series R Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis.

Slides:



Advertisements
Similar presentations
Associate Collaborator for LISA Department of Statistics, VT
Advertisements

CART: Classification and Regression Trees Chris Franck LISA Short Course March 26, 2013.
Hypothesis Testing Steps in Hypothesis Testing:
Chapter 16 Introduction to Nonparametric Statistics
Inference for Regression
Is it statistically significant?
Dealing With Statistical Uncertainty
Quantitative Data Analysis: Hypothesis Testing
Statistical Tests Karen H. Hagglund, M.S.
Lecture 10 Non Parametric Testing STAT 3120 Statistical Methods I.
T-Tests.
t-Tests Overview of t-Tests How a t-Test Works How a t-Test Works Single-Sample t Single-Sample t Independent Samples t Independent Samples t Paired.
T-Tests.
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
Dealing With Statistical Uncertainty Richard Mott Wellcome Trust Centre for Human Genetics.
Lesson #25 Nonparametric Tests for a Single Population.
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Final Review Session.
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
1 Distribution-free testing If the data are normally distributed, we may apply a z- test or t-test when the parameter of interest is . But what if this.
Data Analysis Statistics. Inferential statistics.
Student’s t statistic Use Test for equality of two means
Correlation and Regression Analysis
T-T ESTS AND A NALYSIS OF V ARIANCE Jennifer Kensler.
SW388R6 Data Analysis and Computers I Slide 1 One-sample T-test of a Population Mean Confidence Intervals for a Population Mean.
T-T ESTS AND A NALYSIS OF V ARIANCE Jennifer Kensler.
Basic Data Analysis Using R Xiao He 1. AGENDA 1.Data cleaning (e.g., missing values) 2.Descriptive statistics 3.t-tests 4.ANOVA 5.Linear regression Data.
LISA Short Course Series R Statistical Analysis Ning Wang Summer 2013 LISA: R Statistical AnalysisSummer 2013.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Inferential Statistics: SPSS
Hypothesis Testing Charity I. Mulig. Variable A variable is any property or quantity that can take on different values. Variables may take on discrete.
STA291 Statistical Methods Lecture 31. Analyzing a Design in One Factor – The One-Way Analysis of Variance Consider an experiment with a single factor.
Choosing and using statistics to test ecological hypotheses
Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2.
Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.
Where are we?. What we have covered: - How to write a primary research paper.
TAUCHI – Tampere Unit for Computer-Human Interaction ERIT 2015: Data analysis and interpretation (1 & 2) Hanna Venesvirta Tampere Unit for Computer-Human.
Previous Lecture: Categorical Data Methods. Nonparametric Methods This Lecture Judy Zhong Ph.D.
Analysis of variance Petter Mostad Comparing more than two groups Up to now we have studied situations with –One observation per object One.
11 Chapter 12 Quantitative Data Analysis: Hypothesis Testing © 2009 John Wiley & Sons Ltd.
Nonparametric Statistics. In previous testing, we assumed that our samples were drawn from normally distributed populations. This chapter introduces some.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
Nonparametric Statistics
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Hong Tran, April 21, 2015.
Academic Research Academic Research Dr Kishor Bhanushali M
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
GG 313 Lecture 9 Nonparametric Tests 9/22/05. If we cannot assume that our data are at least approximately normally distributed - because there are a.
Data Analysis.
Simple linear regression Tron Anders Moger
© Department of Statistics 2012 STATS 330 Lecture 19: Slide 1 Stats 330: Lecture 19.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
Data Science and Big Data Analytics Chap 3: Data Analytics Using R
PCB 3043L - General Ecology Data Analysis.
T-T ESTS AND A NALYSIS OF V ARIANCE Jennifer Kensler July 13, 2010 Fralin Auditorium, Virginia Tech This presentation is annotated. Please click on the.
Handbook for Health Care Research, Second Edition Chapter 13 © 2010 Jones and Bartlett Publishers, LLC CHAPTER 13 Statistical Methods for Continuous Measures.
Nonparametric Statistics
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 10 Introduction to the Analysis.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Nonparametric statistics. Four levels of measurement Nominal Ordinal Interval Ratio  Nominal: the lowest level  Ordinal  Interval  Ratio: the highest.
BUS 308 Entire Course (Ash Course) For more course tutorials visit BUS 308 Week 1 Assignment Problems 1.2, 1.17, 3.3 & 3.22 BUS 308.
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
Non-parametric Tests Research II MSW PT Class 8. Key Terms Power of a test refers to the probability of rejecting a false null hypothesis (or detect a.
Chapter 9: Hypothesis Tests for One Population Mean 9.5 P-Values.
Advanced Quantitative Techniques
Advanced Quantitative Techniques
Y - Tests Type Based on Response and Measure Variable Data
Presentation transcript:

Shuyu Chu Department of Statistics February 17, 2014 Lisa Short Course Series R Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis

2 LISA helps VT researchers benefit from the use of Statistics Short Courses: Designed to help graduate students apply statistics in their research Walk-In Consulting: M-F 1-3 PM GLC Video Conference Room; M 3-5 PM 312 Sandy; T 11-1PM Port; W 11-1PM Old Security Building. For questions requiring <30 mins All services are FREE for VT researchers. We assist with research—not class projects or homework. Collaboration: Visit our website to request personalized statistical advice and assistance with: Experimental Design Data Analysis Interpreting Results Grant Proposals Software (R, SAS, JMP, SPSS...) LISA statistical collaborators aim to explain concepts in ways useful for your research. Great advice right now: Meet with LISA before collecting your data.

Outline 3 1. Review of plots 2. T-test 2.1 One sample t-test 2.2 Two sample t-test 2.3 Paired T-test 2.4 Normality Assumption & Nonparametric test 3. ANOVA 3.1 One-way ANOVA 3.2 Two-way ANOVA 4. Logistic Regression Laboratory for Interdisciplinary Statistical Analysis

4 Review of plots Using visual tools is a critical first step when analyzing data and it can often be sufficient in its own right! By observing visual summaries of the data, we can: Determine the general pattern of data Identify outliers Check whether the data follow some theoretical distribution Make quick comparisons between groups of data Laboratory for Interdisciplinary Statistical Analysis

Review of plots plot(x, y) (or equivalent plot(y~x)) scatter plot of variables x and y pairs(cbind(x, y, z)): scatter plots matrix of variables x, y and z hist(y): histogram boxplot(y): boxplot lm(y~x): fit a straight line between variable x and y Laboratory for Interdisciplinary Statistical Analysis

Review of plots Low Birth Weight Data Description (lowbwt.csv) (189 observations, 11 variables) ID: Identification Code LOW: Low Birth Weight (0 = Birth Weight >= 2500g, 1 = Birth Weight < 2500g) AGE: mother’s age in years LWT: mother’s weight in lbs RACE: mother’s race (1 = white, 2 = black, 3 = other) SMOKE: smoking status during pregnancy PTL: no. of previous premature labors HT: history of hypertension UI:presence of uterine irritability FTV:no. of physician visits during first trimester BWT: Birth Weight in Grams Laboratory for Interdisciplinary Statistical Analysis

T-Test 2.1 One sample t-test Research Question: Is the mean of a population different from the null hypothesis (a nominal value, or some hypothesized value)? Example: Testing whether a baby's average birth weight is different from 2500 g. Hypotheses: Null hypothesis: the baby's average birth weight is 2500 g Alternative hypothesis: the baby's average birth weight is not equal to(or greater/less than) 2500 g In R: t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95) 7 Laboratory for Interdisciplinary Statistical Analysis

T-Test 2.2 Two sample t-test Research Question: Are the means of two populations different? Example: Consider whether the birth weight of these babies whose mothers smoke is different form those whose mothers don’t smoke ? Hypotheses: Null hypothesis: the average birth weight of the babies whose mothers smoke equals to the babies’ average birth weight whose mothers don’t smoke Alternative hypothesis: the babies’ average birth weight of smoking mothers is not equal to (or greater/less than) that of non-smoking mothers In R: t.test(BWT~SMOKE) t.test(BWT~SMOKE,var.equal=T) 8 Laboratory for Interdisciplinary Statistical Analysis

T-Test 2.3 Sample size calculation Research Question: How many observations are needed for a given power, or what is the power of the test given a sample size? Power = probability rejecting null when null is false In R: power.t.test(n = NULL, delta = NULL, sd = 1, sig.level = 0.05, power = NULL, type = c("two.sample", "one.sample", "paired"), alternative = c("two.sided", "one.sided"), strict = FALSE) Calculate a sample size given a power: power.t.test(delta=2,sd=2,power=.8) Calculate a power given a sample size : power.t.test(n=20, delta=2, sd=2) 9 Laboratory for Interdisciplinary Statistical Analysis

T-Test 2.4 Paired T-test Research Question: Given the paired structure of the data are the means of two sets of observations significantly different? Example: In a warehouse, the employees have asked management to play music to relieve the boredom of the job. The manager wants to know whether efficiency is affected by the change. The table below gives efficiency ratings of 15 employees recorded before and after the music system was installed. ( Link of the dataset: ) In R: t.test(efficiency_after,efficiency_before,paired=T) or, t.test(diff), diff= efficiency_after-efficiency_before 10 Laboratory for Interdisciplinary Statistical Analysis

T-Test 2.5 Checking assumptions & Nonparametric test Using t-test, we assume the data follows a normal distribution, to check this normal assumption: visualization and statistical test. Visualization Histogram: shape of normal distribution: symmetric, bell-shape with rapidly dying tails. QQ-plot: plot the theoretical quintiles of the normal distribution and the quintiles of the data, straight line shows assumption hold. Statistical Test: Shapiro-Wilk Normality Test In R: shapiro.test(data) 11 Laboratory for Interdisciplinary Statistical Analysis

T-Test 2.5 Checking assumptions & Nonparametric test When the normal assumption does not hold, we use the alternative nonparametric test. Wilcoxon Signed Rank Test Null hypothesis: mean difference between the pairs is zero Alternative hypothesis: mean difference is not zero In R: wilcox.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, exact = NULL, correct = TRUE, conf.int = FALSE, conf.level = 0.95,...) 12 Laboratory for Interdisciplinary Statistical Analysis

T-Test 2.5 Checking assumptions & Nonparametric test When the normal assumption does not hold, we use the alternative nonparametric test. Wilcoxon Signed Rank Test Null hypothesis: mean difference between the pairs is zero Alternative hypothesis: mean difference is not zero In R: wilcox.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, exact = NULL, correct = TRUE, conf.int = FALSE, conf.level = 0.95,...) 13 Laboratory for Interdisciplinary Statistical Analysis

ANOVA- Analysis of Variance T-test: Compare the mean of a population to a nominal value or compare the means of equivalence for two populations What if you want to compare the means of more than two populations? We use ANOVA! One-Way ANOVA: Compare the means of populations where the variation are attributed to the different levels of one factor. Two-Way ANOVA: Compare the means of populations where the variation are attributed to the different levels of two factors. 14 Laboratory for Interdisciplinary Statistical Analysis

ANOVA- Analysis of Variance 3.1 One-way ANOVA Example: Compare the BWT(birth weight in grams) for 3 races bwt data: BWT: gams RACE: mothers’ race (1 = White, 2 = Black, 3 = Other) SMOKE: mothers’ smoking status during pregnancy (1 = Yes, 0 = No) Hypothesis: Null hypothesis: the three groups have equal average birth weight Alternative hypothesis: at least two groups do not have equal bwt In R: a.1=aov(BWT~factor(RACE)) and summary(a.1) 15 Laboratory for Interdisciplinary Statistical Analysis

ANOVA- Analysis of Variance 3.2 Two-way ANOVA Example: Compare the bwt for 3 races and 2 status of smoking Three effects to be considered: RACE, SMOKE and the interactions In R: a.2 = aov(BWT~factor(SMOKE)*factor(RACE)) and summary(a.2) 16 Laboratory for Interdisciplinary Statistical Analysis

LOGISTIC Regression Laboratory for Interdisciplinary Statistical Analysis

LOGISTIC Regression Laboratory for Interdisciplinary Statistical Analysis

LOGISTIC Regression Example: Low birth weight data We are interested in understanding the variables that predict the likelihood of a mother giving birth to a baby with low-birth weight (defined as a baby weighing less than 2500 grams). The response variable: low: 0, 1 (Indicator of birth weight less than 2.5 kg) The predict variables: age: mother’s age in years lwt: mother’s weight in lbs race: mother’s race (1 = white, 2 = black, 3 = other) smoke: smoking status during pregnancy ptl:no. of previous premature labors ht:history of hypertension ui:presence of uterine irritability ftv:no. of physician visits during first trimester Laboratory for Interdisciplinary Statistical Analysis

LOGISTIC Regression Laboratory for Interdisciplinary Statistical Analysis

Thank you! Please don’t forget to fill the sign in sheet and to complete the survey that will be sent to you by Laboratory for Interdisciplinary Statistical Analysis