The Chi-square goodness of fit test

Slides:



Advertisements
Similar presentations
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
Advertisements

Chi-Square Test Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis.
Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
The Chi-Square Test for Association
Quantitative Skills 4: The Chi-Square Test
Please turn in your signed syllabus. We will be going to get textbooks shortly after class starts. Homework: Reading Guide – Chapter 2: The Chemical Context.
Statistics for AP Biology. Understanding Trends in Data Mean: The average or middle of the data Range: The spread of the data Standard deviation: Variation.
Chi-Square Test.
Chi Square (X 2 ) Analysis Calculating the significance of deviation in experimental results.
Chi-square Goodness of Fit Test
Lecture 14 Goodness of Fit and Chi Square
Chi-Square Test A fundamental problem in genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
11.4 Hardy-Wineberg Equilibrium. Equation - used to predict genotype frequencies in a population Predicted genotype frequencies are compared with Actual.
Chi-Squared Test.
Hypothesis Testing:.
Chi Square AP Biology.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Chapter 11 Inference for Tables: Chi-Square Procedures 11.1 Target Goal:I can compute expected counts, conditional distributions, and contributions to.
Chi-Square Test A fundamental problem in genetics is determining whether the experimentally determined data fits the results expected from theory. How.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Chi-Square Test.
Chi square analysis Just when you thought statistics was over!!
Non-parametric tests (chi-square test) Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Chi Square Analysis The chi square analysis allows you to use statistics to determine if your data “good” or not. In our fruit fly labs we are using laws.
Fruit Fly Basics Drosophila melanogaster. Wild Type Phenotype Red eyes Tan Body Black Rings on abdomen Normal Wings.
Statistical Analysis: Chi Square AP Biology Ms. Haut.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chi-Square Analysis AP Biology.
Chi-square goodness of fit tests Chi-square goodness of fit.
Lecture 11. The chi-square test for goodness of fit.
By.  Are the proportions of colors of each M&M stated by the M&M company true proportions?
Science Practice 2: The student can use mathematics appropriately. Science Practice 5: The student can perform data analysis and evaluation of evidence.
Analyzing Data  2 Test….”Chi” Square. Forked-Line Method, F2 UuDd x UuDd 1/4 UU 1/2 Uu 1/4 uu 1/4 DD 1/2 Dd 1/4 dd 1/4 DD 1/2 Dd 1/4 dd 1/4 DD 1/2 Dd.
Chi square Test. Chi squared tests are used to determine whether the difference between an observed and expected frequency distribution is statistically.
Did Mendel fake is data? Do a quick internet search and can you find opinions that support or reject this point of view. Does it matter? Should it matter?
11.1 Chi-Square Tests for Goodness of Fit Objectives SWBAT: STATE appropriate hypotheses and COMPUTE expected counts for a chi- square test for goodness.
Chi Square Pg 302. Why Chi - Squared ▪Biologists and other scientists use relationships they have discovered in the lab to predict events that might happen.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Chi-Square (χ 2 ) Analysis Statistical Analysis of Genetic Data.
Chi Square Analysis. What is the chi-square statistic? The chi-square (chi, the Greek letter pronounced "kye”) statistic is a nonparametric statistical.
Hypothesis Testing Hypothesis vs Theory  Hypothesis  An educated guess about outcome of an experiment  Theory  An explanation of observed facts that.
Chi-Square Analysis AP Biology.
The Chi Square Test A statistical method used to determine goodness of fit Chi-square requires no assumptions about the shape of the population distribution.
Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
Statistical Analysis: Chi Square
Chi-Square Test.
Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
Chi-Square Analysis AP Biology.
Chi-Square Test.
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
MENDELIAN GENETICS CHI SQUARE ANALYSIS
UNIT 6: MENDELIAN GENETICS CHI SQUARE ANALYSIS
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
Chi-Square Analysis.
Chi-Square Test.
Chapter 10 Analyzing the Association Between Categorical Variables
Chi-Square Analysis AP Biology.
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
Statistical Analysis: Chi Square
P-VALUE.
Chi-Square Test.
Chi-Square Analysis AP Biology.
Chi-Squared AP Biology.
Chi-Square Analysis AP Biology.
UNIT V CHISQUARE DISTRIBUTION
S.M.JOSHI COLLEGE, HADAPSAR
Chi-Square Test A fundamental problem in Science is determining whether the experiment data fits the results expected. How can you tell if an observed.
Chi-Square Analysis AP Biology.
Presentation transcript:

The Chi-square goodness of fit test

Chi-square goodness of fit Core issue in statistics: When are you viewing just random noise and when is there a real trend? Example: To see if squash shape & color are linked genes do a test cross. x GgLl ggll 1 : 1 : 1 : 1 ????

When to use a chi-square test Your response variable is count data. You have more than one category of the response variable. You have a hypothesis for the responses you expect. There are lots of different statistical tests to use. You have to take into consideration what kind of data and hypothesis you are testing to find which test is appropriate. You want to know if the difference between the responses you observe and the responses you expect is significant or not.

Turn a hypothesis into a number Your hypothesis tells you what you expect any given response (observation) to be. Turn your expectation into a fraction or percentage. Example hypothesis: “The MSU football team will win every single game this season.” So, according to my hypothesis, I expect MSU’s chance of winning any game is ___%. 100% Does this mean MSU will win 100% of their games? Example hypothesis: “The MSU football team’s number of wins and losses will be random.” So, according to this new hypothesis, I expect the team’s chance of winning any game is ___%. 50%

Turn a hypothesis into a number Hyp.: “People over the age of 60 are 50% more likely to attend a baseball game than younger people.” So, according to my hypothesis if I go to a baseball game and find out the ages for all the fans in the audience, I expect the odds of any one fan being > 60 to be… x+ (x-50) = 100, solve for x. 75% or 3 out of 4. What are the odds a fan will be < 60 years old?

Turn a hypothesis into a number “Pre-hypothesis”: Given the choice, people prefer red and blue m&m’s over the other 4 colors. But don’t know how strong their preference might be. So test the “null hypothesis”—People choose m&m colors at random, i.e. they don’t show preference. (vs. “alternative” or “experimental” hypothesis). So, according to my null hypothesis, if I hand around a bowl of m&ms, I expect the chance of each color being chosen is… 1/6 or 16.67%. Use chi square test to see if what you actually observe is significantly different from 1/6. Null hypothesis is always “there is no difference between groups”. 1/6 means there is an equal chance of someone choosing any one of the 6 colors.

The chi-square test Observed Expected The chi-square test determines whether or not the difference between the responses you observe and the responses you expect is significant. Significant = not due to random chance alone. Calculate the “strength of the difference”, get a value that tells you the probability the difference is due to chance (random noise) alone. If this probability is small (<5%), we conclude there is a significant difference (the difference is not simply due to chance) between obs and exp values. Observed Expected Game % fans > 60 years old 1 69 2 80 3 20 4 55 5 67 6 76 7 47 8 81 9 70 10 68 Game % fans > 60 years old 1 75 2 3 4 5 6 7 8 9 10

Interpreting the chi-square test ≈ ≠ Hypothesis: “People over the age of 60 are 50% more likely to attend a baseball game than younger people.” If the test tells you your data are not significantly different from what you expect, (your data have a “good fit” to the expected values), you support the hypothesis. Note: no statistical test ever proves a hypothesis! If the test tells you your data are significantly different from what you expect, you reject the hypothesis. Observed Expected Game % fans > 60 years old 1 69 2 80 3 20 4 55 5 67 6 76 7 47 8 81 9 70 10 68 Game % fans > 60 years old 1 75 2 3 4 5 6 7 8 9 10

Σ What is chi-square? “Chi-square” symbol is χ2 (Greek). χ2 = (Observed – Expected)2 Expected Based on your hypothesis! Σ “Sum of” Observed Expected Obs-Exp (Obs-Exp)2 Exp Category 1 Category 2 … χ2 total Degrees of Freedom Number of categories minus 1 = N-1

Example problem #1 χ2 total Observed Expected Obs-Exp A university biology department would like to hire a new professor. They advertised the opening and received 220 applications, 25% of which came from women. The department came up with a “short list” of their favorite 25 candidates, 5 women and 20 men, for the job. You want to know if there is evidence for the search committee being biased against women. Note: If the committee is unbiased the proportion of women in the short list should match the proportion of women in all the applications. Define your hypothesis. Set up table. Women: 25 * 0.25 = Men: 25 * 0.75 = Observed Expected Obs-Exp (Obs-Exp)2 Exp χ2 total Degrees of Freedom Women 5 6.25 -1.25 1.5625 0.25 Men 20 18.75 1.25 1.5625 0.08 25 = 25 0.33 1

Chi-square probability table Probabilities  Observed values not significantly different from expected (differences due to random chance). Support hypothesis. Reject hyp. Observed values are significantly different from expected (differences not just due to random chance). Reject hypothesis.

Chi-square probability table Probabilities  Observed values not significantly different from expected (differences due to random chance). Support hypothesis. Reject hyp. Probability range: 0.5 < p < 0.6 Means that there is a 50-60% probability that the difference between obs & exp values are from random chance alone. Observed values are significantly different from expected (differences not just due to random chance). Reject hypothesis. So, is the department biased against women applicants?

Example problem #2 Work in groups

Example problem #2 χ2 total Observed Expected Obs-Exp Hypothesis: Expected values: Body color and wing size are unlinked genes. Gray Normal wings (GgWw): 9/16 * 102 = 57.375 Gray Vestigial wings (Ggww): Expected ratio? 3/16 * 102 = 19.125 9:3:3:1. Ebony Normal wings (ggWw): Ebony Vestigial (ggww): 1/16 * 102 = 6.375 Observed Expected Obs-Exp (Obs-Exp)2 Exp χ2 total Degrees of Freedom Gray Norm. 53 57.375 -4.375 19.141 0.333 Gray Vest. 16 19.125 -3.125 9.766 0.511 Ebony Norm. 25 19.125 5.875 34.516 1.805 8 1.625 Ebony Vest. 6.375 2.641 0.414 102 = 102 3.063 3

Chi-square probability table Probabilities  Reject hyp. Support hypothesis. Probability range: 0.3 < p < 0.4 Means that there is a 30-40% probability that the difference between obs & exp values are from random chance alone. Biology?

Example problem #3 Using Chi-square to test for linked genes

Example problem #3 1:0:0:1 Hypothesis: Squash color and shape are not linked genes. OR Squash color and shape are linked genes. Describe the phenotypes and circle the recombinants. LlGg llGg llgg Llgg 3. If the 2 genes are not linked the expected ratio is: 1:1:1:1 4. If the two genes are linked the expected phenotype ratio is: 1:0:0:1

Example problem #3 χ2 total Observed Expected Obs-Exp If you tested the hypothesis that squash shapre and color ARE LINKED (1:1:1:1) : 5. Calculate the expected number of offspring for each phenotype: Wild Wild (LlGg) : 509/4 = 127.25 Wild Orange (Llgg) : 127.25 Round Wild (llGg) : Round Orange (llgg) : Observed Expected Obs-Exp (Obs-Exp)2 Exp χ2 total Degrees of Freedom Wild Wild 228 127.25 100.75 10150.56 79.8 Wild Orange 17 127.25 -110.25 12155.06 95.5 Round Wild 21 127.25 -106.25 11289.06 88.7 243 115.75 Round Orange 127.25 13398.06 105.3 369.3 3

Chi-square probability table Probability range: 0.3 < p < 0.4 Probabilities  Reject hyp. Support hypothesis. Statistical meaning: 30-40% probability that the difference between obs & exp values are from random chance alone. The obs and exp values are not significantly different. Support hypothesis. Biological meaning?

Example problem #3 χ2 total Observed Expected Obs-Exp If you tested the hypothesis that squash shapre and color ARE NOT LINKED (1:0:0:1) : 5. Calculate the expected number of offspring for each phenotype: Wild Wild (LlGg) : 509/2 = 254.5 Wild Orange (Llgg) : Round Wild (llGg) : Round Orange (llgg) : 509/2=254.5 Observed Expected Obs-Exp (Obs-Exp)2 Exp χ2 total Degrees of Freedom Wild Wild 228 254.5 -26.5 702.25 2.76 Wild Orange 17 17 289 (Undef.) 0 Round Wild 21 21 441 (Undef.) 0 243 Round Orange 254.5 -11.5 132.25 0.52 3.28 3

Chi-square probability table Probability range: p < 0.01 Probabilities  Reject hyp. Support hypothesis. Statistical meaning: < 1% probability that the difference between obs & exp values are from random chance alone. The obs and exp values are significantly different. Reject hypothesis. Biological meaning?

Example problem #3 Hypothesis not linked  p<0.01  Reject hypothesis Hypothesis linked  0.3 < p < 0.4, in other words, p > 0.05  Support hypothesis Are these test results in agreement? So do these data show that the genes are linked or not? If you weren’t very confident in your test results, what could you do next to improve your confidence?