Laboratory for Interdisciplinary Statistical Analysis Anne Ryan Virginia Tech.

Slides:



Advertisements
Similar presentations
LISA Short Course: A Tutorial in t-tests and ANOVA using JMP Laboratory for Interdisciplinary Statistical Analysis Anne Ryan Assistant Professor of Practice.
Advertisements

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Independent t -test Features: One Independent Variable Two Groups, or Levels of the Independent Variable Independent Samples (Between-Groups): the two.
Inference Sampling distributions Hypothesis testing.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
BCOR 1020 Business Statistics
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 8-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Chapter 9 Hypothesis Testing.
Statistics for Managers Using Microsoft® Excel 5th Edition
Correlation and Regression Analysis
T-T ESTS AND A NALYSIS OF V ARIANCE Jennifer Kensler.
T-T ESTS AND A NALYSIS OF V ARIANCE Jennifer Kensler.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
T-T ESTS AND A NALYSIS OF V ARIANCE Jennifer Kensler.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
LISA Short Course Series R Statistical Analysis Ning Wang Summer 2013 LISA: R Statistical AnalysisSummer 2013.
Inference for regression - Simple linear regression
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Hypothesis Testing A hypothesis is a conjecture about a population. Typically, these hypotheses will be stated in terms of a parameter such as  (mean)
Chapter 10 Hypothesis Testing
Confidence Intervals and Hypothesis Testing - II
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Fundamentals of Hypothesis Testing: One-Sample Tests
Statistical Analysis Statistical Analysis
T-tests and ANOVA using JMP Kristopher Patton April 7, 2015 * institute-state-university-virginia-tech/
Week 8 Fundamentals of Hypothesis Testing: One-Sample Tests
Dependent Samples: Hypothesis Test For Hypothesis tests for dependent samples, we 1.list the pairs of data in 2 columns (or rows), 2.take the difference.
Chapter 10 Hypothesis Testing
Lecture 7 Introduction to Hypothesis Testing. Lecture Goals After completing this lecture, you should be able to: Formulate null and alternative hypotheses.
Two-Sample Inference Procedures with Means. Of the following situations, decide which should be analyzed using one-sample matched pair procedure and which.
Chapter 20 Testing hypotheses about proportions
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 20 Testing Hypotheses About Proportions.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Chap 8-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 8 Introduction to Hypothesis.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Chapter Eight: Using Statistics to Answer Questions.
PCB 3043L - General Ecology Data Analysis.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
© Copyright McGraw-Hill 2004
T-T ESTS AND A NALYSIS OF V ARIANCE Jennifer Kensler July 13, 2010 Fralin Auditorium, Virginia Tech This presentation is annotated. Please click on the.
Analysis of Variance STAT E-150 Statistical Methods.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 4 Investigating the Difference in Scores.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Statistics for Managers Using Microsoft® Excel 5th Edition
Testing Hypotheses About Proportions
Chapter 9 Hypothesis Testing.
Testing Hypotheses about Proportions
Testing Hypotheses About Proportions
Chapter Nine: Using Statistics to Answer Questions
Chapter 9 Hypothesis Testing: Single Population
Presentation transcript:

Laboratory for Interdisciplinary Statistical Analysis Anne Ryan Virginia Tech

Laboratory for Interdisciplinary Statistical Analysis 1948: The Statistical Laboratory was founded as a division of the Virginia Agricultural Experiment Station to help agronomists design experiments and calculate sums of squares.

Laboratory for Interdisciplinary Statistical Analysis 1949: Based on the success of the Statistical Laboratory, the Department of Statistics at Virginia Polytechnic Institute (VPI) was founded—the 3rd oldest statistics department in the United States.

Laboratory for Interdisciplinary Statistical Analysis 1973: The Statistical Laboratory was re-formed as the Statistical Consulting Center to assist with statistical analyses in every college of Virginia Polytechnic Institute & State University (VPI&SU).

Laboratory for Interdisciplinary Statistical Analysis 2007: The Graduate Student Assembly led a movement to save statistical consulting and collaboration from death by budget cuts, ensuring that graduate students could receive help with their research. The College of Science, Provost, Vice President of Research, Graduate School, and six additional colleges agreed that researchers should be able to receive free statistical consulting and collaboration.

Laboratory for Interdisciplinary Statistical Analysis 2008: The Statistical Consulting Center was re-organized as the Laboratory for Interdisciplinary Statistical Analysis (LISA) to collaborate with researchers across the Virginia Tech (VT) campuses.

Laboratory for Interdisciplinary Statistical Analysis Established in 2008 YearClientsHours

Laboratory for Interdisciplinary Statistical Analysis YearClientsHours

Laboratory for Interdisciplinary Statistical Analysis YearClientsHours

Laboratory for Interdisciplinary Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use of Statistics Experimental Design Data Analysis Interpreting Results Grant Proposals Software (R, SAS, JMP, SPSS...) Our goal is to improve the quality of research and the use of statistics at Virginia Tech. 10

Laboratory for Interdisciplinary Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use of Statistics Collaboration LISA statisticians meet with faculty, staff, and graduate students to understand their research and think of ways to help them using statistics. 11

Laboratory for Interdisciplinary Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis Collaboration LISA helps VT researchers benefit from the use of Statistics Walk-In Consulting Every day from 1-3PM clients get answers to their (quick) questions about using statistics in their research. 12

Laboratory for Interdisciplinary Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use of Statistics Walk-In Consulting Collaboration Short Courses Short Courses are designed to teach graduate students how to apply statistics in their research. 13

Laboratory for Interdisciplinary Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis Short Courses LISA helps VT researchers benefit from the use of Statistics Walk-In Consulting Collaboration All services are FREE for VT researchers. We assist with research—not class projects or homework. 14

How can LISA help? Formulate research question. Screen data for integrity and unusual observations. Implement graphical techniques to showcase the data – what is the story? Develop and implement an analysis plan to address research question. Help interpret results. Communicate! Help with writing the report or giving the talk. Identify future research directions.

Laboratory for Interdisciplinary Statistical Analysis To request a collaboration meeting go to

Laboratory for Interdisciplinary Statistical Analysis To request a collaboration meeting go to 1. Sign in to the website using your VT PID and password. 2. Enter your information ( address, college, etc.) 3. Describe your project (project title, research goals, specific research questions, if you have already collected data, special requests, etc.) 4. Wait 0-3 days, then contact the LISA collaborators assigned to your project to schedule an initial meeting.

Laboratory for Interdisciplinary Statistical Analysis

Laboratory for Interdisciplinary Statistical Analysis Introduction to R R is a free software environment for statistical computing and graphics. Download: Topics Covered: Data objects in R, loops, import/export datasets, data manipulation Graphing Basic Analyses: T-tests, Regression, ANOVA

Laboratory for Interdisciplinary Statistical Analysis Linear Regression & Structural Equation Monitoring Linear regression is used to model the relationship between a continuous response and a continuous predictor. SEM is a modeling technique that investigates causal relationships among variables. Time –related latent variables, modification indices and critical ratio in exploratory analyses, and computation of implied moments, factor score weights, total effects, and indirect effects.

Laboratory for Interdisciplinary Statistical Analysis Generalized Linear Models Modeling technique for situations where the errors are not necessarily normal. Can handle situations where you have binary responses, counts, etc. Uses a link function to relate the response to the linear model. Cover: Basic statistical concepts of GLM and how it relates to regression using normal errors.

Laboratory for Interdisciplinary Statistical Analysis Mixed Models and Random Effects Mixed Model: A statistical model that has both random effects and fixed effects. Fixed Effect: Levels of the factor are predetermined. Random Effect: Levels of the factor were chosen at random. The primary focus of the course will be to identify scenarios where a mixed model approach will be appropriate. The concepts will be explained almost wholly through examples in SAS or in R.

Anne Ryan 23

 Defense:  Prosecution:  What’s the Assumed Conclusion? Represent the accused (defendant) Hold the “Burden of Proof”—obligation to shift the assumed conclusion from an oppositional opinion to one’s own position through evidence ANSWER: The accused is innocent until proven guilty. Prosecution must convince the judge/jury that the defendant is guilty beyond a reasonable doubt 24

Burden of Proof—Obligation to shift the conclusion using evidence Trial Hypothesis Test Innocent until proven guilty Accept the status quo (what is believed before) until the data suggests otherwise 25

Decision Criteria Trial Hypothesis Test Evidence has to convincing beyond a reasonable Occurs by chance less than 100α% of the time (ex: 5%) 26

27

1. Test 2. Assumptions 3. Hypotheses 4. Mechanics 5. Conclusion 28

 State the name of the testing method to be used  It is important to not be off track in the very beginning  Hypothesis Tests we will Perform: ◦ One Sample t test for μ ◦ T wo sample t test for μ ◦ Paired t test ◦ ANOVA 29

 List all the assumptions required for your test to be valid.  All tests have assumptions  Even if assumptions are not met you should still comment on how this affects your results. 30

 State the hypothesis of interest  There are two hypotheses ◦ Null Hypothesis: Denoted ◦ Alternative Hypothesis: Denoted  Examples of possible hypotheses: 31

 For hypothesis testing there are three popular versions of testing ◦ Left Tailed Hypothesis Test ◦ Right Tailed Hypothesis Test ◦ Two Tailed or Two Sided Hypothesis Test 32

33

3. Two Tailed or Two Sided Hypothesis Test: The researcher is interested in looking above and below they hypothesized value. 34

35

 Computational Part of the Test  What is part of the Mechanics step? ◦ Stating the Significance Level ◦ Finding the Rejection Rule ◦ Computing the Test Statistic ◦ Computing the p-value 36

 Significance Level: Here we choose a value to use as the significance level, which is the level at which we are willing to start rejecting the null hypothesis.  Denoted by α  Default value is α=.05, use α=.05 unless otherwise noted! 37

 Rejection Rule: State our criteria for rejecting the null hypothesis. ◦ “Reject the null hypothesis if p-value<.05”.  p-value: The probability of obtaining a point estimate as “extreme” as the current value where the definition of “extreme” is taken from the alternative hypotheses assuming the null hypothesis is true. 38

 Test Statistic: Compute the test statistic, which is usually a standardization of your point estimate.  Translates your point estimate, a statistic, to follow a known distribution so that is can be used for a test. 39

 p-value: After computing the test statistic, now you can compute the p-value.  Use software to compute p-values. 40

 Conclusion: Last step of the hypothesis test just like it is the last step when computing confidence intervals.  Conclusions should always include: ◦ Decision: reject or fail to reject ◦ Linkage: why you made the decision (interpret p- value) ◦ Context: what your decision means in context of the problem. 41

 Note: Your decision can only be one of two choices: 1. Reject --data gives strong indication that is more likely 2. Fail to Reject --data gives no strong indication that is more likely  When conducting hypothesis tests, we assume that is true, therefore the decision CAN NOT be to accept the null hypothesis 42

43

 Used to test whether the population mean is different from a specified value.  Example: Is the mean height of 12 year old girls greater than 60 inches? 44

 The population mean is not equal to a specified value. Null Hypothesis, H 0 : μ = μ 0 Alternative Hypothesis: H a : μ ≠ μ 0 The population mean is greater than a specified value. H 0 : μ = μ 0 H a : μ > μ 0 The population mean is less than a specified value. H 0 : μ = μ 0 H a : μ < μ 0 45

 The sample is random.  The population from which the sample is drawn is either normal or the sample size is large. 46

 Step 3: Calculate the test statistic: Where Step 4: Calculate the p-value based on the appropriate alternative hypothesis.  Step 5: Write a conclusion. 47

48

 Steps 2-4: JMP Demonstration Analyze  Distribution Y, Columns: Sepal Width Normal Quantile Plot Test Mean Specify Hypothesized Mean:

50

51

 Two sample t-tests are used to determine whether the population mean of one group is equal to, larger than or smaller than the population mean of another group.  Example: Is the mean cholesterol of people taking drug A lower than the mean cholesterol of people taking drug B? 52

 The population means of the two groups are not equal. H 0 : μ 1 = μ 2 H a : μ 1 ≠ μ 2 The population mean of group 1 is greater than the population mean of group 2. H 0 : μ 1 = μ 2 H a : μ 1 > μ 2 The population mean of group 1 is less than the population mean of group 2. H 0 : μ 1 = μ 2 H a : μ 1 < μ 2 53

 The two samples are random and independent.  The populations from which the samples are drawn are either normal or the sample sizes are large.  The populations have the same standard deviation. 54

 Step 3: Calculate the test statistic where  Step 4: Calculate the appropriate p-value.  Step 5: Write a Conclusion. 55

 A researcher would like to know whether the mean sepal width of setosa irises is different from the mean sepal width of versicolor irises.  The researcher randomly selects 50 setosa irises and 50 versicolor irises and measures their sepal widths.  Step 1 Hypotheses: H 0 : μ setosa = μ versicolor H a : μ setosa ≠ μ versicolor wiki/Iris_flower_data_set wiki/Iris_versicolor 56

 Steps 2-4: JMP Demonstration: Analyze  Fit Y By X Y, Response: Sepal Width X, Factor: Species Means/ANOVA/Pooled t Normal Quantile Plot  Plot Actual by Quantile 57

Step 5 Conclusion: There is strong evidence (p-value < ) that the mean sepal widths for the two varieties are different. 58

59

 The paired t-test is used to compare the population means of two groups when the samples are dependent.  Example: A researcher would like to determine if background noise causes people to take longer to complete math problems. The researcher gives 20 subjects two math tests one with complete silence and one with background noise and records the time each subject takes to complete each test. 60

 The population mean difference is not equal to zero. H 0 : μ difference = 0 H a : μ difference ≠ 0 The population mean difference is greater than zero. H 0 : μ difference = 0 H a : μ difference > 0 The population mean difference is less than a zero. H 0 : μ difference = 0 H a : μ difference < 0 61

 The sample is random.  The data is matched pairs.  The differences have a normal distribution or the sample size is large. 62

Where d bar is the mean of the differences and s d is the standard deviations of the differences. Step 4: Calculate the p-value. Step 5: Write a conclusion. Step 3: Calculate the test Statistic: 63

 A researcher would like to determine whether a fitness program increases flexibility. The researcher measures the flexibility (in inches) of 12 randomly selected participants before and after the fitness program.  Step 1: Formulate a Hypothesis H 0 : μ After - Before = 0 H a : μ After - Before >

 Steps 2-4: JMP Analysis: Create a new column of After – Before Analyze  Distribution Y, Columns: After – Before Normal Quantile Plot Test Mean Specify Hypothesized Mean: 0 65

Step 5 Conclusion: There is not evidence that the fitness program increases flexibility. 66

67

 ANOVA is used to determine whether three or more populations have different distributions. A B C Medical Treatment 68

 The first step is to use the ANOVA F test to determine if there are any significant differences among the population means.  If the ANOVA F test shows that the population means are not all the same, then follow up tests can be performed to see which pairs of population means differ. 69

In other words, for each group the observed value is the group mean plus some random variation. 70

 Step 1: We test whether there is a difference in the population means. 71

 The samples are random and independent of each other.  The populations are normally distributed.  The populations all have the same standard deviations.  The ANOVA F test is robust to the assumptions of normality and equal standard deviations. 72

Compare the variation within the samples to the variation between the samples. A B C A B C Medical Treatment 73

Variation within groups small compared with variation between groups → Large F Variation within groups large compared with variation between groups → Small F 74

The mean square for groups, MSG, measures the variability of the sample averages. SSG stands for sums of squares groups. 75

Mean square error, MSE, measures the variability within the groups. SSE stands for sums of squares error. 76

 Step 4: Calculate the p-value.  Step 5: Write a conclusion. 77

 A researcher would like to determine if three drugs provide the same relief from pain.  60 patients are randomly assigned to a treatment (20 people in each treatment).  Step 1: Formulate the Hypotheses H 0 : μ Drug A = μ Drug B = μ Drug C H a : The μ i are not all equal. 78

 JMP demonstration Analyze  Fit Y By X Y, Response: Pain X, Factor: Drug Normal Quantile Plot  Plot Actual by Quantile Means/ANOVA 79

Step 5 Conclusion: There is strong evidence that the drugs are not all the same. 80

 The p-value of the overall F test indicates that the level of pain is not the same for patients taking drugs A, B and C.  We would like to know which pairs of treatments are different.  One method is to use Tukey’s HSD (honestly significant differences). 81

 Tukey’s test simultaneously tests  JMP demonstration Oneway Analysis of Pain By Drug  Compare Means  All Pairs, Tukey HSD for all pairs of factor levels. Tukey’s HSD controls the overall type I error. 82

The JMP output shows that drugs A and C are significantly different. 83

84

 We are interested in the effect of two categorical factors on the response.  We are interested in whether either of the two factors have an effect on the response and whether there is an interaction effect. ◦ An interaction effect means that the effect on the response of one factor depends on the level of the other factor. 85

86

87

 We would like to determine the effect of two alloys (low, high) and three cooling temperatures (low, medium, high) on the strength of a wire.  JMP demonstration Analyze  Fit Model Y: Strength Highlight Alloy and Temp and click Macros  Factorial to Degree Run Model 88

Conclusion: There is strong evidence of an interaction between alloy and temperature. 89

The one sample t-test allows us to test whether the population mean of a group is equal to a specified value. The two-sample t-test and paired t-test allow us to determine if the population means of two groups are different. ANOVA allows us to determine whether the population means of several groups are different. 90

 For information about using SAS, SPSS and R to do ANOVA: a.htm htm 91

 Fisher’s Irises Data (used in one sample and two sample t-test examples).  Flexibility data (paired t-test example): Michael Sullivan III. Statistics Informed Decisions Using Data. Upper Saddle River, New Jersey: Pearson Education, 2004:

 Special thanks to Jennifer Kensler for course materials and help with JMP! 93