Anova and contingency tables

Slides:



Advertisements
Similar presentations
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16.
Advertisements

Analysis of frequency counts with Chi square
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Week 10 Comparing Two Means or Proportions. Generalising from sample IndividualsMeasurementGroupsQuestion Children aged 10 Mark in maths test Boys & girls.
Copyright ©2011 Brooks/Cole, Cengage Learning Analysis of Variance Chapter 16 1.
CHAPTER 11 Inference for Distributions of Categorical Data
Chi-Square and Analysis of Variance (ANOVA) Lecture 9.
Lecture 9: One Way ANOVA Between Subjects
Stat 512 – Lecture 13 Chi-Square Analysis (Ch. 8).
Comparing Many Group Means One Way Analysis of Variance.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chi-Square Tests and the F-Distribution
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Test of Independence.
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
Chapter 10 Analyzing the Association Between Categorical Variables
How Can We Test whether Categorical Variables are Independent?
AS 737 Categorical Data Analysis For Multivariate
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
PS 225 Lecture 15 Analysis of Variance ANOVA Tables.
Regression Analysis (2)
CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.
AP STATISTICS LESSON 11 – 2 (DAY 1) Comparing Two Means.
©2011 Brooks/Cole, Cengage Learning Elementary Statistics: Looking at the Big Picture 1 Lecture 33: Chapter 12, Section 2 Two Categorical Variables More.
For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Lesson 2. ANOVA Analysis of Variance One Way ANOVA One variable is measured for many different treatments (population)_ Null Hypothesis: all population.
Chapter 14 – 1 Chapter 14: Analysis of Variance Understanding Analysis of Variance The Structure of Hypothesis Testing with ANOVA Decomposition of SST.
7. Comparing Two Groups Goal: Use CI and/or significance test to compare means (quantitative variable) proportions (categorical variable) Group 1 Group.
Education 793 Class Notes Presentation 10 Chi-Square Tests and One-Way ANOVA.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
HYPOTHESIS TESTING BETWEEN TWO OR MORE CATEGORICAL VARIABLES The Chi-Square Distribution and Test for Independence.
Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.
1 Chapter 11: Analyzing the Association Between Categorical Variables Section 11.1: What is Independence and What is Association?
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
© Copyright McGraw-Hill 2004
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
1 Week 3 Association and correlation handout & additional course notes available at Trevor Thompson.
Introduction to ANOVA Research Designs for ANOVAs Type I Error and Multiple Hypothesis Tests The Logic of ANOVA ANOVA vocabulary, notation, and formulas.
Section 6.4 Inferences for Variances. Chi-square probability densities.
Bullied as a child? Are you tall or short? 6’ 4” 5’ 10” 4’ 2’ 4”
The p-value approach to Hypothesis Testing
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chi-Squared Test of Homogeneity Are different populations the same across some characteristic?
AP Statistics Chapter 11 Section 2. TestConfidence IntervalFormulasAssumptions 1-sample z-test mean SRS Normal pop. Or large n (n>40) Know 1-sample t-test.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 11 Multinomial Experiments and Contingency Tables 11-1 Overview 11-2 Multinomial Experiments:
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Goodness-of-Fit and Contingency Tables Chapter 11.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Hypothesis Testing Review
CHAPTER 11 Inference for Distributions of Categorical Data
Is a persons’ size related to if they were bullied
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
CHAPTER 11 Inference for Distributions of Categorical Data
Analyzing the Association Between Categorical Variables
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Inference for Two Way Tables
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Presentation transcript:

Anova and contingency tables Week 12 Anova and contingency tables

Two categorical variables Joint probabilities px,y = P(X=x and Y=y) proportion of popn with values (x, y) School performance and wt of children

Conditional probabilities Proportion within row

Conditional probabilities School performance and wt of children Weight & performance are independent

Independence from sample? 214 child skiers classified by skiing ability and whether they got injured Are ability and injury independent in underlying population?

Independence from sample? Conditional sample proportions Is there independence in underlying population?

Testing for independence Can a relationship observed in the sample data be inferred to hold in the population represented by the data? Could observed sample relationship have occurred by chance?

Expected counts — independence 31 out of 214 injured overall Expect 31/214 of the 80 beginners to be injured i.e. expect injured beginners

Expected counts — independence General formula Injured Uninjured Beginner 80 Intermediate 93 Advanced 41 31 183 214

Observed and estimated counts Injured Uninjured Beginner 20 (11.59) 60 (68.41) 80 Intermediate 9 (13.47) 84 (79.53) 93 Advanced 2 (5.94) 39 (35.06) 41 31 183 214 Are the differences more than would be expected by chance?

Chi-squared test of independence H0: independence of injury & experience HA: association between injury & experience or equivalently H0: P(injury|beginner) = P(injury|intermediate) = ... HA: P(injury | experience) depends on experience Test statistic:

Chi-squared test of independence Small values consistent with independence Big values arise when observed are very different from what would be expected under independence. p-value = Prob(2 as big as obtained) if indep Tail area of chi-squared distribution d.f. of chi-squared = (rows–1)(cols–1)

Chi-squared distributions Skewed to the right distributions. Minimum value is 0. Indexed by the degrees of freedom.

Skiing injury and experience Chi-Square Test: Injured, Uninjured Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts Injured Uninjured Total 1 20 60 80 11.59 68.41 6.105 1.034 2 9 84 93 13.47 79.53 1.484 0.251 3 2 39 41 5.94 35.06 2.613 0.443 Total 31 183 214 Chi-Sq = 11.930, DF = 2, P-Value = 0.003 p-value = 0.003 Strong evidence that the chance of injury is related to experience.

Ear Infections and Xylitol Experiment: n = 533 children randomized to 3 groups Group 1: Placebo Gum; Group 2: Xylitol Gum; Group 3: Xylitol Lozenge Response = Did child have an ear infection?

Ear Infections and Xylitol Moderately strong evidence of differences between probs of infection

Making friends With whom do you find it easiest to make friend — opposite sex, same sex or no difference?

Making friends H0: No difference in distribution of responses of men and women (no relationship between gender & response) HA: Difference in distribution of responses of men and women (association between gender & response) Chi-Square Test: Opposite sex, Same sex, No difference Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts Opposite sex Same sex No difference Total 1 58 16 63 137 48.79 19.38 68.83 1.740 0.590 0.494 2 15 13 40 68 24.21 9.62 34.17 3.507 1.188 0.996 Total 73 29 103 205 Chi-Sq = 8.515, DF = 2, P-Value = 0.014 Fairly strong evidence of difference between Females(1) & Males(2) Females more likely to choose opposite sex

Comparing means of 3+ groups Do best students sit in the front of a classroom? Seat location and GPA for n = 384 students Students sitting in the front generally have slightly higher GPAs than others. Chance?

Seat location and GPA H0: m1 = m2 = m3 HA: The means are not all equal. p-value = 0.0001. Such big differences between sample means unlikely if popn means were same Extremely strong evidence that means are not all same.

Seat location and GPA 95% CIs for separate means: Main difference seems to be between front and others

Assumptions for F-test Independent random samples. Normal distribution within each population. Perhaps different population means. Same standard deviation,  in each group. Can still proceed if n is big or assumptions approx hold

F ratio More evidence of a real difference when: How do you measure: Group means are far apart Variability within groups is small How do you measure: Variation between means? Variation within groups?

Variation between means Between-groups sum of squares Mean sum of squares for groups (k groups):

Variation within groups Within-groups sum of squares Residual sum of squares Also called residual sum of squares Mean residual sum of squares: Best estimate of error st devn, :

Total variation Total sum of squares = SSTotal SSTotal = SSGroups + SSError

Analysis of variance table Anova table F test is based on F ratio p-value = Prob of such a high F ratio if all means same (p-value found from an ‘F distribution’)

Seat location and GPA (again) H0: m1 = m2 = m3 HA: The means are not all equal. p-value = P(F ≥ 6.69) under H0 = 0.0001. Such a big F ratio unlikely if popn means were same Extremely strong evidence that means are not all same.