Math 10, Spring 2019 Introductory Statistics

Slides:



Advertisements
Similar presentations
Categorical Data Analysis
Advertisements

Hypothesis Testing and Comparing Two Proportions Hypothesis Testing: Deciding whether your data shows a “real” effect, or could have happened by chance.
Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.
Analysis of frequency counts with Chi square
Chi-Square Test A fundamental problem in genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
The Chi-square Statistic. Goodness of fit 0 This test is used to decide whether there is any difference between the observed (experimental) value and.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 16 – Categorical Data Analysis Math 22 Introductory Statistics.
Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.
The Scientific Method Probability and Inferential Statistics.
Introduction to Behavioral Statistics Probability, The Binomial Distribution and the Normal Curve.
HYPOTHESIS TESTING BETWEEN TWO OR MORE CATEGORICAL VARIABLES The Chi-Square Distribution and Test for Independence.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics: A First Course Fifth Edition.
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
The table shows a random sample of 100 hikers and the area of hiking preferred. Are hiking area preference and gender independent? Hiking Preference Area.
© Copyright McGraw-Hill 2004
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Making Comparisons All hypothesis testing follows a common logic of comparison Null hypothesis and alternative hypothesis – mutually exclusive – exhaustive.
Lecture 11. The chi-square test for goodness of fit.
Science Practice 2: The student can use mathematics appropriately. Science Practice 5: The student can perform data analysis and evaluation of evidence.
Ch. 26 Tests of significance Example: –Goal: Decide if a die is fair. –Procedure: Roll a die 100 times and count the number of dots. We observe 368 total.
The Chi Square Equation Statistics in Biology. Background The chi square (χ 2 ) test is a statistical test to compare observed results with theoretical.
Chapter 11 Chi Square Distribution and Its applications.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
The Chi-Square Distribution  Chi-square tests for ….. goodness of fit, and independence 1.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Check your understanding: p. 684
The Chi Square Test A statistical method used to determine goodness of fit Chi-square requires no assumptions about the shape of the population distribution.
Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
I. CHI SQUARE ANALYSIS Statistical tool used to evaluate variation in categorical data Used to determine if variation is significant or instead, due to.
The Chi-square Statistic
Warm Up Check your understanding on p You do NOT need to calculate ALL the expected values by hand but you need to do at least 2. You do NOT need.
Lecture8 Test forcomparison of proportion
Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
Chapter 11 Chi-Square Tests.
Chapter 12 Tests with Qualitative Data
Active Learning Lecture Slides
Qualitative data – tests of association
STATISTICS For Research
Data Analysis for Two-Way Tables
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
The Chi-Square Distribution and Test for Independence
Analyzing Data c2 Test….”Chi” Square.
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
Lecture 18 Section 8.3 Objectives: Chi-squared distributions
Is a persons’ size related to if they were bullied
Hypothesis Testing and Comparing Two Proportions
Goodness of Fit Test - Chi-Squared Distribution
Chapter 10 Analyzing the Association Between Categorical Variables
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
Chapter 11 Chi-Square Tests.
P-VALUE.
Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007
Analyzing the Association Between Categorical Variables
Chi-Squared AP Biology.
How do you know if the variation in data is the result of random chance or environmental factors? O is the observed value E is the expected value.
Inference for Two Way Tables
Genetic Statistics.
Chapter 11 Chi-Square Tests.
Quadrat sampling & the Chi-squared test
Quadrat sampling & the Chi-squared test
Lecture 43 Section 14.1 – 14.3 Mon, Nov 28, 2005
Math 10, Spring 2019 Introductory Statistics
MATH 2311 Section 8.5.
Presentation transcript:

Math 10, Spring 2019 Introductory Statistics Lecture 22 The Chi-square test

Is the die loaded?

Categorical Random Variable The Box Model The random variable is not Numerical The random variable is “Categorical” There are 6 mutually exclusive categories Each observation falls in one and only one category We don’t calculate the average number of spots of the 60 observations We tabulate the frequency with which each category occurs

Observed frequencies The sample average is 3.75; but we are interested in the total distribution Roll the die n = 60 times.

Incorrect: z-test Box Model 3 spots: high frequency n = 60 Count = Sum of 1s = 17 Expected value = 60 x 1/6 = 10 Average of Box = 1/6 SD of Box = 0.37 SE = SD of Box 2.9 z = (17 - 10) / 2.9 = 2.4 p = 1%

“Data Snooping” Must formulate Null Hypothesis Incorrect Null Hypothesis: “The chance of getting 3 spots is 1/6” This Null Hypothesis is formulated after the fact (!). With multiple categories, one of them is likely to have a large z-value Must formulate Null Hypothesis before you run the trial Correct Null hypothesis: “The die is fair”

Pearson’s Chi-squared statistic = 14.2

Chi-squared distribution: P-value Number of categories: k = 6 Degrees of freedom: k – 1 = 5 P = 1 - CHISQ.DIST(14.2, 5, TRUE) = 1.4% P = CHISQ.TEST(observed range, expected range) Chi-squared curves df = 5 (solid) df = 10 (dashed)

Chi-square versus z-test z-test: to compare the average of a sample with an expected average Chi-squared test: to compare the entire distribution of the sample with an expected distribution Chi-squared: Hypothesis is about distribution of categories box z-test: Hypothesis is only about average of numbers in box

Input for the Chi-squared test Observed frequencies for all categories Null hypothesis: Expected frequencies for all categories P = CHISQ.TEST(observed range, expected range) n = number of “draws” from the box Plays no role in the calculation Degrees of freedom = k – 1 (with k = number of categories)

Chi-squared curve: Approximate for large n The real distribution of the chi-squared statistic for 60 rolls of a die is more jagged than the theoretical chi-squared curve Practical requirement: Each frequency in the table should be at least 5

Chi-squared test for independence One “Box” Two Categorical Random Variables Use Chi-squared test to see whether the two variables are independent? Independence of random variables: See Ch. 5 Homework problem Ch. 13 #5 (p.235)

Handedness and gender HANES data People age 25-34 in U.S.

Calculating Expected Values Null hypothesis: Gender and Handedness are independent P(man and left-handed) = P(man) x P(left-handed) P(man) = 1,067 / 2,237 = 47.7% P(left-handed) = 205/2,237 = 9.16% Expected P(left-handed men) = 4.37% 0.0437 x 2,237 = 97.7

Observed and Expected Frequencies Differences

Chi-squared = 12

Degrees of freedom = 2 Degrees of freedom in 3 x 2 table = 2 x 1 = 2 Degrees of freedom for m x n table = (m - 1) x (n – 1) Degrees of freedom in 3 x 2 table = 2 x 1 = 2

Conclusion Null hypothesis: “Handedness and gender are independent” Chi-squared = 12 Degrees of freedom = 2 P = 1 – CHISQ.TEST(12, 2, TRUE) = 2.5% Small P-value: The Null hypothesis is rejected: Based on the HANES data, handedness and gender are very likely not independent

Chi-squared statistics can be pooled Two or more independent experiments: Can add the separate chi-squared statistics Can add up the degrees of freedom (Do not need to know the sample sizes) Total Chi-squared = 5.8 + 3.1 = 8.9 Total degrees of freedom = 5 + 2 = 7 P = 1 – CHISQ.DIST(8.9, 7, TRUE) = 26%

When P is very large Chi-squared = 0.51, degrees of freedom = 3 CHISQ.DIST(0.51, 3, TRUE) = 8.3% CHISQ.TEST(…, …) = 91.7% P = 91.7% Fisher’s use of chi-squared test on Mendel’s data

Fisher and Mendel Fisher pooled chi-squared stats for many of Mendel’s genetic experiments. Null hypothesis: “Mendel’s genetic model is correct” The pooled P-value was very large. We can not reject the Null hypothesis. But: The probability of getting experimental data that are this close to the expected outcome is extremely small Most likely: Mendel’s experimental data were manipulated.