Stat 112: Lecture 21 Notes Model Building (Brief Discussion) Chapter 9.1: One way Analysis of Variance. Homework 6 is due Friday, Dec. 1 st. I will be.

Slides:



Advertisements
Similar presentations
Class 21: Tues., Nov. 23 Today: Multicollinearity, One-way analysis of variance Schedule: –Tues., Nov. 30 th – Review, Homework 8 due –Thurs., Dec. 2 nd.
Advertisements

ANALYSIS OF VARIANCE (ONE WAY)
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Inference for Regression
ANOVA: Analysis of Variation
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
AP Statistics – Chapter 9 Test Review
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Class 23: Thursday, Dec. 2nd Today: One-way analysis of variance, multiple comparisons. Next week: Two-way analysis of variance. I will the final.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Stat 512 – Lecture 14 Analysis of Variance (Ch. 12)
Part I – MULTIVARIATE ANALYSIS
Stat 112: Lecture 22 Notes Chapter 9.1: One-way Analysis of Variance. Chapter 9.3: Two-way Analysis of Variance Homework 6 is due on Friday.
Stat 112: Lecture 23 Notes Chapter 9.3: Two-way Analysis of Variance Schedule: –Homework 6 is due on Friday. –Quiz 4 is next Tuesday. –Final homework assignment.
Lecture 14 – Thurs, Oct 23 Multiple Comparisons (Sections 6.3, 6.4). Next time: Simple linear regression (Sections )
Class 22: Tuesday, Nov. 30th Today: One-way analysis of variance I will you tonight or tomorrow morning with comments on your project. Schedule:
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Stat 112 – Notes 3 Homework 1 is due at the beginning of class next Thursday.
Lecture 5 Outline: Thu, Sept 18 Announcement: No office hours on Tuesday, Sept. 23rd after class. Extra office hour: Tuesday, Sept. 23rd from 12-1 p.m.
MARE 250 Dr. Jason Turner Hypothesis Testing III.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 9: One Way ANOVA Between Subjects
Chapter 11 Multiple Regression.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Stat Today: Multiple comparisons, diagnostic checking, an example After these notes, we will have looked at (skip figures 1.2 and 1.3, last.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Class 24: Tues., Dec. 7th Today: Two-way analysis of variance Thursday: Design of Experiments Schedule: –Thurs., Dec. 9 th – Final class –Mon., Dec. 13.
Lecture 14: Thur., Feb. 26 Multiple Comparisons (Sections ) Next class: Inferences about Linear Combinations of Group Means (Section 6.2).
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Chapter 12: Analysis of Variance
F-Test ( ANOVA ) & Two-Way ANOVA
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
STA291 Statistical Methods Lecture 27. Inference for Regression.
QNT 531 Advanced Problems in Statistics and Research Methods
Intermediate Applied Statistics STAT 460
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
STA291 Statistical Methods Lecture 31. Analyzing a Design in One Factor – The One-Way Analysis of Variance Consider an experiment with a single factor.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Confidence intervals and hypothesis testing Petter Mostad
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.1 One-Way ANOVA: Comparing.
Week111 The t distribution Suppose that a SRS of size n is drawn from a N(μ, σ) population. Then the one sample t statistic has a t distribution with n.
Hypothesis Testing. Why do we need it? – simply, we are looking for something – a statistical measure - that will allow us to conclude there is truly.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Text Exercise 12.2 (a) (b) (c) Construct the completed ANOVA table below. Answer this part by indicating what the f test statistic value is, what the appropriate.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Methods and Applications CHAPTER 15 ANOVA : Testing for Differences among Many Samples, and Much.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Stat 112 Notes 23. Quiz 4 Info 4 double sided sheets of notes Covers interactions, models with categorical variables and interactions, one way analysis.
Stat 112: Lecture 22 Notes Chapter 9.1: One Way Analysis of Variance Chapter 9.2: Two Way Analysis of Variance.
Chapter 11: The ANalysis Of Variance (ANOVA)
Analysis of Variance STAT E-150 Statistical Methods.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Hypothesis Tests for 1-Proportion Presentation 9.
Chapter 10: The t Test For Two Independent Samples.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
CONCEPTS OF HYPOTHESIS TESTING
One-Way Analysis of Variance
Presentation transcript:

Stat 112: Lecture 21 Notes Model Building (Brief Discussion) Chapter 9.1: One way Analysis of Variance. Homework 6 is due Friday, Dec. 1 st. I will be ing you tonight or tomorrow some comments on your project ideas. I will have the quizzes graded by tomorrow’s office hours (Wed. 1:30-2:30); otherwise, I will return to you next Tuesday.

Model Building 1.Among the potential explanatory variables, think about which explanatory variables address the question of interest. 2.For each explanatory variable, investigate whether a transformation is needed for it either because of curvature or crunching. 3.Consider adding polynomial terms for each variable if there is remaining curvature for the variable (use the procedure of adding higher orders as long as the highest order term has p- value < 0.05). 4.Consider interactions between the explanatory variables, adding the interaction if the p-value < 0.05 on the interaction term.

Analysis of Variance The goal of analysis of variance is to compare the means of several (many) groups. Analysis of variance is regression with only categorical variables One-way analysis of variance: Groups are defined by one categorical variable. Two-way analysis of variance: Groups are defined by two categorical variables.

Milgram’s Obedience Experiments Subjects recruited to take part in an experiment on “memory and learning.” The subject is the teacher. The subject conducted a paired-associated learning task with the student. The subject is instructed by the experimenter to administer a shock to the student each time he gave a wrong response. Moreover, the subject was instructed to “move one level higher on the shock generator each time the learner gives a wrong answer.” The subject was also instructed to announce the voltage level before administering a shock.

Four Experimental Conditions 1.Remote-Feedback condition: Student is placed in a room where he cannot be seen by the subject nor can his voice be heard; his answers flash silently on signal box. However, at 300 volts the laboratory walls resound as he pounds in protest. After 315 volts, no further answers appear, and the pounding ceases. 2.Voice-Feedback condition: Same as remote- feedback condition except that vocal protests were introduced that could be heard clearly through the walls of the laboratory.

3.Proximity: Same as the voice-feedback condition except that student was placed in the same room as the subject, a few feet from subject. Thus, he was visible as well as audible. 4.Touch-Proximity: Same as proximity condition except that student received a shock only when his hand rested on a shock plate. At the 150-volt level, the student demanded to be let free and refused to place his hand on the shock plate. The experimenter ordered the subject to force the victim’s hand onto the plate.

Two Key Questions 1.Is there any difference among the mean voltage levels of the four conditions? 2.If there are differences, what conditions specifically are different?

Multiple Regression Model for Analysis of Variance To answer these questions, we can fit a multiple regression model with voltage level as the response and one categorical explanatory variable (condition). We obtain a sample from each level of the categorical variable (group) and are interested in estimating the population means of the groups based on these samples. Assumptions of multiple regression model for one-way analysis of variance: –Linearity: automatically satisfied. –Constant variance: Check if spread within each group is the same. –Normality: Check if distribution within each group is normally distributed. –Independence: Sample consists of independent observations.

Comparing the Groups The coefficient on Condition[Proximity]= means that proximity is estimated to have a mean that is less than the mean of the means of all the conditions. Sample mean of proximity group.

Effect Test tests null hypothesis that the mean in all four conditions is the same versus alternative hypothesis that at least two of the conditions have different means. p-value of Effect Test < Strong evidence that population means are not the same for all four conditions.

JMP for One-way ANOVA One-way ANOVA can be carried out in JMP either using Fit Model with a categorical explanatory variable or Fit Y by X with the categorical variable as the explanatory variable. After using the Fit Y by X command, click the red triangle next to Oneway Analysis and then Display Options, Boxplots to see side by side boxplots and click Mean/ANOVA to see means of the different groups and the test of whether all groups have the same means. This test of whether all groups have the same means has p- value Prob>F in the ANOVA table.

Prob>F = p-value for test that all groups have same mean. Same as p-value for Effect test in Fit Model Output.

Two Key Questions 1.Is there any difference among the mean voltage levels of the four conditions? Yes, there is strong evidence of a difference. p-value of Effect Test < If there are differences, what conditions specifically are different?

Testing whether each of the groups is different Naïve approach to deciding which groups have mean that is different from the average of the means of all groups: Do t- test for each group and look for groups that have p-value <0.05. Problem: Multiple comparisons.

Errors in Hypothesis Testing State of World Null Hypothesis True Alternative Hypothesis True Decision Based on Data Accept Null Hypothesis Correct Decision Type II error Reject Null Hypothesis Type I errror Correct Decision When we do one hypothesis test and reject null hypothesis if p-value <0.05, then the probability of making a Type I error when the null hypothesis is true is We protect against falsely rejecting a null hypothesis by making probability of Type I error small.

Multiple Comparisons Problem Compound uncertainty: When doing more than one test, there is an increase chance of making a mistake. If we do multiple hypothesis tests and use the rule of rejecting the null hypothesis in each test if the p-value is 0.05.

Multiple Comparisons Simulation In multiplecomp.JMP, 20 groups are compared with sample sizes of ten for each group. The observations for each group are simulated from a standard normal distribution. Thus, in fact, Number of pairs found to have significantly different means using t-test at level Iterat ion # of Pairs

Multiple Comparison Simulation In multiplecomp.JMP, 20 groups are compared with sample sizes of ten for each group. The observations for each group are simulated from a standard normal distribution. Thus, in fact, Number of groups found to have means different than average using t-test and rejecting if p-value <0.05. Iteration12345 # of Groups

Individual vs. Familywise Error Rate When several tests are considered simultaneously, they constitute a family of tests. Individual Type I error rate: Probability for a single test that the null hypothesis will be rejected assuming that the null hypothesis is true. Familywise Type I error rate: Probability for a family of test that at least one null hypothesis will be rejected assuming that all of the null hypotheses are true. When we consider a family of tests, we want to make the familywise error rate small, say 0.05, to protect against falsely rejecting a null hypothesis.

Bonferroni Method General method for doing multiple comparisons for any family of k tests. Denote familywise type I error rate we want by p*, say p*=0.05. Compute p-values for each individual test -- Reject null hypothesis for ith test if Guarantees that familywise type I error rate is at most p*. Why Bonferroni works: If we do k tests and all null hypotheses are true, then using Bonferroni with p*=0.05, we have probability 0.05/k to make a Type I error for each test and expect to make k*(0.05/k)=0.05 errors in total.

Tukey’s HSD Tukey’s HSD is a method that is specifically designed to control the familywise type I error rate (at 0.05) for analysis of variance. After Fit Model, click the red triangle next to the X variable and click LSMeans Tukey HSD.

Comparisons between groups that are in red are groups for which the null hypothesis that the group means are the same is rejected using the Tukey HSD procedure, which controls the familywise Type I error rate at A confidence interval for the difference in group means that adjusts for multiple comparisons is shown in the third and fourth lines.

Assumptions in one-way ANOVA Assumptions needed for validity of one- way analysis of variance p-values and CIs: –Linearity: automatically satisfied. –Constant variance: Spread within each group is the same. –Normality: Distribution within each group is normally distributed. –Independence: Sample consists of independent observations.

Rule of thumb for checking constant variance Constant variance: Look at standard deviation of different groups by using Fit Y by X and clicking Means and Std Dev. Rule of Thumb: Check whether (highest group standard deviation/lowest group standard deviation) is greater than 2. If greater than 2, then constant variance is not reasonable and transformation should be considered.. If less than 2, then constant variance is reasonable. (Highest group standard deviation/lowest group standard deviation) =( /63.640)=2.07. Thus, constant variance is not reasonable for Milgram’s data.

Transformations to correct for nonconstant variance If standard deviation is highest for high groups with high means, try transforming Y to log Y or. If standard deviation is highest for groups with low means, try transforming Y to Y 2. SD is particularly low for group with highest mean. Try transforming to Y 2. To make the transformation, right click in new column, click New Column and then right click again in the created column and click Formula and enter the appropriate formula for the transformation.

Transformation of Milgram’s data to Squared Voltage Level Check of constant variance for transformed data: (Highest group standard deviation/lowest group standard deviation) = Constant variance assumption is reasonable for voltage squared. Analysis of variance tests are approximately valid for voltage squared data; reanalyzed data using voltage squared.

Analysis using Voltage Squared Strong evidence that the group mean voltage squared levels are not all the same. Strong evidence that remote has higher mean voltage squared level than proximity and touch-proximity and that voice-feedback has higher mean voltage squared level than touch-proximity, taking into account the multiple comparisons.

Rule of Thumb for Checking Normality in ANOVA The normality assumption for ANOVA is that the distribution in each group is normal. Can be checked by looking at the boxplot, histogram and normal quantile plot for each group. If there are more than 30 observations in each group, then the normality assumption is not important; ANOVA p-values and CIs will still be approximately valid even for nonnormal data if there are more than 30 observations in each group. If there are less than 30 observations per group, then we can check normality by clicking Analyze, Distribution and then putting the Y variable in the Y, Columns box and the categorical variable denoting the group in the By box. We can then create normal quantile plots for each group and check that for each group, the points in the normal quantile plot are in the confidence bands. If there is nonnormality, we can try to use a transformation such as log Y and see if the transformed data is approximately normally distributed in each group.

One way Analysis of Variance: Steps in Analysis 1.Check assumptions (constant variance, normality, independence). If constant variance is violated, try transformations. 2.Use the effect test (commonly called the F- test) to test whether all group means are the same. 3.If it is found that at least two group means differ from the effect test, use Tukey’s HSD procedure to investigate which groups are different, taking into account the fact multiple comparisons are being done.