Chi-Squared Test The chi-squared ( ) test, or Pearson’s chi-squared test, evaluates the likelihood that variation in your results was due to chance.

Slides:



Advertisements
Similar presentations
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Advertisements

AP Biology.  Segregation of the alleles into gametes is like a coin toss (heads or tails = equal probability)  Rule of Multiplication  Probability.
Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
Quantitative Skills 4: The Chi-Square Test
Chi-square notes. What is a Chi-test used for? Pronounced like kite, not like cheese! This test is used to check if the difference between expected and.
Chi Square.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Chi-Square Test A fundamental problem in genetics is determining whether the experimentally determined data fits the results expected from theory. How.
Chi Squared Test. Why Chi Squared? To test to see if, when we collect data, is the variation we see due to chance or due to something else?
Chi square analysis Just when you thought statistics was over!!
Chi Square Analysis The chi square analysis allows you to use statistics to determine if your data “good” or not. In our fruit fly labs we are using laws.
Welcome to MM570 Psychological Statistics
Chi-Square Analysis AP Biology.
Chi-Square Test. Chi-Square (χ 2 ) Test Used to determine if there is a significant difference between the expected and observed data Null hypothesis:
Did Mendel fake is data? Do a quick internet search and can you find opinions that support or reject this point of view. Does it matter? Should it matter?
The Chi Square Equation Statistics in Biology. Background The chi square (χ 2 ) test is a statistical test to compare observed results with theoretical.
DRAWING INFERENCES FROM DATA THE CHI SQUARE TEST.
Chi Square Pg 302. Why Chi - Squared ▪Biologists and other scientists use relationships they have discovered in the lab to predict events that might happen.
AP Biology Heredity PowerPoint presentation text copied directly from NJCTL with corrections made as needed. Graphics may have been substituted with a.
Chi-Square (χ 2 ) Analysis Statistical Analysis of Genetic Data.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Chi-Square Analysis AP Biology.
The Chi Square Test A statistical method used to determine goodness of fit Chi-square requires no assumptions about the shape of the population distribution.
Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
Chi-Squared (2) Analysis
Statistical Analysis: Chi Square
I. CHI SQUARE ANALYSIS Statistical tool used to evaluate variation in categorical data Used to determine if variation is significant or instead, due to.
The Chi-square Statistic
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Chi-Squared Χ2 Analysis
Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
AP Biology Intro to Statistics
Genetics and Probability
Cell Cycle and Chi Square
The Chi Squared Test.
Chi Square Analysis The chi square analysis allows you to use statistics to determine if your data “good” or not. In our fruit fly labs we are using laws.
Chapter 12 Tests with Qualitative Data
Chi-Square Analysis AP Biology.
Inferential Statistics
Inferential statistics,
Chapter 11 Goodness-of-Fit and Contingency Tables
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
Chi Square SBI3UP.
Chi Square.
MENDELIAN GENETICS CHI SQUARE ANALYSIS
UNIT 6: MENDELIAN GENETICS CHI SQUARE ANALYSIS
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
Chi square.
Chi-Square Analysis.
Chi Square Analysis The chi square analysis allows you to use statistics to determine if your data is “good” or not. In our fruit fly labs we are using.
Chi-Square Analysis AP Biology.
Statistical Analysis Chi-Square.
Chi Square (2) Dr. Richard Jackson
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
Chi Square Analysis The chi square analysis allows you to use statistics to determine if your data is “good”. In our fruit fly labs we are using laws of.
AP Biology c2 in the AP Biology Curriculum
Chi-Square Analysis AP Biology.
AP Biology c2 in the AP Biology Curriculum
Chi2 (A.K.A X2).
Chi-Square Analysis AP Biology.
How do you know if the variation in data is the result of random chance or environmental factors? O is the observed value E is the expected value.
UNIT V CHISQUARE DISTRIBUTION
S.M.JOSHI COLLEGE, HADAPSAR
Chi Square Analysis The chi square analysis allows you to use statistics to determine if your data “good” or not. In our fruit fly labs we are using laws.
THE CHI-SQUARE TEST JANUARY 28, 2013.
Chi square.
Chi-Square Test A fundamental problem in Science is determining whether the experiment data fits the results expected. How can you tell if an observed.
Chi-Square Analysis AP Biology.
Presentation transcript:

Chi-Squared Test The chi-squared ( ) test, or Pearson’s chi-squared test, evaluates the likelihood that variation in your results was due to chance. It can’t tell you whether the variation was because your independent variable caused it, but it can be used as evidence to rule out a null hypothesis.

AP Biology Curricula that Incorporates c2 Genetic Problems Hardy-Weinberg Equilibrium Problems Behavior Lab Mitosis Lab In part b, a number line representation might be helpful to represent f = g’, but students must remember to justify the answer in words since AP graders are instructed to “cover up the number line” and look at what students wrote-students may show a number line to assist, but teachers are recommended to emphasize a “nice, concise” way of writing the reason for the answer as shown on the rubric. Students MUST connect g’(x) to f(x) to receive credit.

Chi-Squared Test Sigma, “the sum of” “Expected,” the data expected based on the hypothesis “Observed,” the data you actually collected

HOW DO YOU USE THIS TABLE PROPERLY? This statistical test is compared to a theoretical probability distribution These probability (p) values are on the Chi Square distribution table HOW DO YOU USE THIS TABLE PROPERLY? you need to determine the degrees of freedom Degrees of freedom is the # of groups (categories) in your data minus one (1) If the level of significance read from the table is less than .05 or 5% then your hypothesis is accepted and the data is useful…the data is NOT due to randomness!

Statistics, Analysis, and AP Biology On the AP exam, both multiple choice and free response style questions visit this connection using graphical, analytical, and application-based stems. Let’s take a look at some recent free response questions that illustrate this concept:

CHI SQUARE TABLE Accept Null Hypothesis (chance ONLY) CHI-SQUARE DISTRIBUTION TABLE Accept Null Hypothesis (chance ONLY) Reject Null Hypothesis (NOT chance ONLY) Probability (p) Degrees of Freedom 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001 1 0.004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.83 2 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 9.21 13.82 3 0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25 7.82 11.34 16.27 4 1.06 1.65 2.20 3.36 4.88 7.78 9.49 13.38 18.47 5 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52 6 1.63 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.46 7 2.17 2.83 3.82 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.32 8 2.73 3.49 4.59 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.12 9 3.32 4.17 5.38 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88 10 3.94 4.86 6.18 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59 In biological applications, a probability 5% is usually adopted as the standard conventional criteria for probability to have statistical significance is 0.001-0.05

The following formula is used You need 2 different hypotheses: 1. NULL Hypothesis Data are occurring by chance and it is all RANDOM! There is NO preference between the groups of data. 2. Alternative Hypothesis Data are occurring by some outside force. It is NOT by chance and it is NOT RANDOM! There is preference between the groups of data.

Student’s Pitfalls in Using c2 Students lack an understanding of the “null hypothesis”. Students have difficulties in determining what is expected. Students have difficulties in working with fractions of different denominators. In part b, a number line representation might be helpful to represent f = g’, but students must remember to justify the answer in words since AP graders are instructed to “cover up the number line” and look at what students wrote-students may show a number line to assist, but teachers are recommended to emphasize a “nice, concise” way of writing the reason for the answer as shown on the rubric. Students MUST connect g’(x) to f(x) to receive credit.

The Null Hypothesis The null hypothesis refers to a general statement or default position that there is no relationship between two measured phenomena, OR a type of hypothesis used in statistics that proposes that no statistical significance exists in a set of given observations. The null hypothesis attempts to show that no variation exists between variables. If the null hypothesis is rejected or the data does not support it, then one can conclude there is statistical significance between variables. In part b, a number line representation might be helpful to represent f = g’, but students must remember to justify the answer in words since AP graders are instructed to “cover up the number line” and look at what students wrote-students may show a number line to assist, but teachers are recommended to emphasize a “nice, concise” way of writing the reason for the answer as shown on the rubric. Students MUST connect g’(x) to f(x) to receive credit.

Two Types of Hypotheses: 1. NULL HYPOTHESIS states that there is no substantial statistical deviation between observed and expected data. a hypothesis of no difference (or no effect) is called a null hypothesis symbolized H0 In other words, the results are totally random and occurred by chance alone. There is NO preference. The null hypothesis states that the two variables are independent, or that there is NO relationship to one another.

Hypothesis Example A scientist studying bees and butterflies. Her hypothesis was that a single bee visiting a flower will pollinate with a higher efficiency than a single butterfly, which will help produce a greater number of seeds in the flower bean pod. We will call this hypothesis H1 or an alternate hypothesis because it is an alternative to the null hypothesis. What is the null hypothesis? H0: There is no significant difference between bees and butterflies in the number of seeds produced by the flowers they pollinate.

Two Types of Hypotheses: 2. ALTERNATIVE HYPOTHESIS states that there IS a substantial/ significant statistical deviation between observed and expected data. a hypothesis of difference (or effect) is called a alternative hypothesis symbolized H1 In other words, the results are affected by an outside force and are NOT random and did NOT occur by chance alone. There is a preference .

2 Types of Chi Square Problems Non-genetic Null Hypothesis: Data is due to chance and is completely random. There is no preference between the groups/categories. Alternative Hypothesis Data is NOT due to chance and there IS a preference between the groups/categories. Data is not random. Genetic Data is due to chance and is random due to independent assortment being random. Punnett square ratios are expected. If there are 2+ genes involved in the experiment…There is no gene linkage affecting independent assortment & segregation. Punnett square ratios are expected. Data is due NOT to chance and is NOT random. Punnett square ratios are NOT expected. If there are 2+ genes …There IS gene linkage affecting independent assortment & segregation

most scientists have decided: What level of probability does one choose to decide whether two groups differ as a result of NON-CHANCE events or simply because of CHANCE? most scientists have decided: If difference between 2+ groups is so great that it would happen by chance fewer than 1 out of 20 times ("P" < 0.05), then the groups differ significantly. That is, the null hypothesis (due to chance/no difference in data) is rejected. If greater confidence in the results is desired, scientists will choose probability levels of less than 1 in 100 (P < .01) or 1 in 1000 (P < 0.001).

Chi-Squared Test Example, let’s test the hypothesis that a coin is weighted towards heads. Null hypothesis: Coin flips are purely chance. If I flip the coin 100 times, and the hypothesis is correct, it should come up heads 50 times and tails 50 times. I do the test, and it comes up heads 68 times and tails 32 times. Chi-squared analysis can help me determine whether that variation is due to chance, i.e. whether my null hypothesis holds any water. You must have at least two possible outcomes in your experiment (heads and tails, here) for the test to work. Chi-square doesn’t work if you don’t have enough data points/trials. An oft-cited magic number is 30, but run as many trials as is reasonable and let the mathematical chips fall where they may.

Chi-Squared Test “Observed,” 68 heads, 32 tails “Expected,” 50 heads, (68-50)2 = 324 …….. 324/50 = 6.48 PLUS (32-50)2 = 324 ……… 324/50 = 6.48 Chi square = 6.48 + 6.48 = 12.96

Chi-Squared Test Degrees of freedom: The number of outcomes minus 1 In our coin example, we have two outcomes being tested, heads and tails. That gives us one degree of freedom (2-1 = 1). Critical value (or p-value): Basically, how certain you can be of your result. The industry standard p-value is .05, and if your chi-square works it, that amounts to “I am 95% positive that this result is non-random.” A p-value of .01 amounts to “I am 99% positive that this result is non-random.” p-value of .001 is 99.9% certainty. Use .05 in AP Bio.

Chi-Squared Test Now that you have your chi-square, degrees of freedom, and critical value, you’re nearly done. You just need a chart of critical values. Find your degree of freedom and your p-value in the row and column headers. Read down and across to find your cell. If your chi-squared value is GREATER than that number, your null hypothesis is REJECTED. You’ve supported your results as non-random. If your chi-squared value is LESS than or EQUAL to that number, your null hypothesis is SUPPORTED. Variation is random.

Chi-Squared Test Our coin test gave us a chi-square of 12.96. Does that support or reject the null hypothesis? Does this mean that the coin is definitely rigged or definitely fair??

Chi-Squared Test Try this problem: You’re testing to see if fruit flies prefer different fruits: apples, oranges, grapefruits. Null hypothesis: there is no preference. Actual data: Of 147 fly visits that landed on fruit for at least 20 seconds, 48 flies spent at least 20 seconds on an apple, 87 flies spent at least 20 seconds on an orange, and 12 flies spent at least 20 seconds on a grapefruit. Is this variation due to chance?

Chi-Squared Test You can use means instead of counts. Try this problem: You’re testing to see if fruit flies prefer different fruits: apples, oranges, grapefruits. Null hypothesis: there is no preference. Actual data: You release 30 flies into a container with three fruits and clock how much time they spend on fruit. Some of your flies spend more time on apples or oranges or grapefruits, others less. Altogether, they spend an average of 45% of their time on apples, 28% of their time on oranges, and 27% of their time on grapefruits. Is this variation due to chance?

Chi-Squared Test If you reject the null hypothesis, your results can be reported as “significant” or “statistically significant.” When writing them up, you need to include all of the following: degrees of freedom, critical value (written as “less than” the p value), number of subjects (N), chi squared value. Round to two decimal places. For instance, I would write of our coin test: Coin flips were found to be non-random in a chi-squared test, X2 (2, N=100) = 12.96, p<.05. From this, we can conclude that coin flips were significantly weighted towards heads.

Statistics Again: statistics like these don’t answer your question for you. I’m more than 95% confident that the coin flips are non-random, but it doesn’t mean the coin was rigged! Maybe it was the way I flipped it, or air currents, or the table shape, or something else. The stats are like another data point, another piece of evidence. You have to engage your brain and interpret your statistics, no differently than how you must interpret raw data. And a crummy study design can give you great-looking statistics (or terrible ones).