HARDY-WEINBERG EQUILIBRIUM

Slides:



Advertisements
Similar presentations
Lab 10: Mutation, Selection and Drift
Advertisements

Lab 10: Mutation, Selection and Drift. Goals 1.Effect of mutation on allele frequency. 2.Effect of mutation and selection on allele frequency. 3.Effect.
Lab 3 : Exact tests and Measuring Genetic Variation.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
7 Probability Experiments, Sample Spaces, and Events
Inferential Statistics & Hypothesis Testing
Chapter 10 Chi-Square Tests and the F- Distribution 1 Larson/Farber 4th ed.
Chapter 2: Hardy-Weinberg Gene frequency Genotype frequency Gene counting method Square root method Hardy-Weinberg low Sex-linked inheritance Linkage and.
Lecture 4--Genetic Data Analysis For extra credit question, please use the index cards provided at the back of the room. Print your name, TA name, and.
Making Inferences for Associations Between Categorical Variables: Chi Square Chapter 12 Reading Assignment pp ; 485.
1 MF-852 Financial Econometrics Lecture 4 Probability Distributions and Intro. to Hypothesis Tests Roy J. Epstein Fall 2003.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Statistics for the Social Sciences Psychology 340 Fall 2006 Hypothesis testing.
CHAPTER 11: CHI-SQUARE TESTS.
Statistics for the Social Sciences Psychology 340 Spring 2005 Hypothesis testing.
Section 7-2 Hypothesis Testing for the Mean (n  30)
Chapter 23 Population Genetics © John Wiley & Sons, Inc.
Copyright © Cengage Learning. All rights reserved. 11 Applications of Chi-Square.
11.4 Hardy-Wineberg Equilibrium. Equation - used to predict genotype frequencies in a population Predicted genotype frequencies are compared with Actual.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Testing Hypotheses I Lesson 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics n Inferential Statistics.
Presented by Mohammad Adil Khan
Sections 8-1 and 8-2 Review and Preview and Basics of Hypothesis Testing.
6- 1 Chapter Six McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Announcements 1. Answers to Ch. 3 problems 6, 7, 8, 12, 17, 22, 32, 35 posted - 230A. 2. Problem set 1 answers due in lab this week at the beginning of.
7 Elementary Statistics Hypothesis Testing. Introduction to Hypothesis Testing Section 7.1.
Chapter 9 Large-Sample Tests of Hypotheses
Introduction to Hypothesis Testing: One Population Value Chapter 8 Handout.
Population Genetics is the study of the genetic
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Population Genetics I. Basic Principles. Population Genetics I. Basic Principles A. Definitions: - Population: a group of interbreeding organisms that.
Ch9. Inferences Concerning Proportions. Outline Estimation of Proportions Hypothesis concerning one Proportion Hypothesis concerning several proportions.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Binomial Experiment A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties:
AP Biology Lab 7: Genetics (Fly Lab). AP Biology Lab 7: Genetics (Fly Lab)  Description  given fly of unknown genotype use crosses to determine mode.
Chapter 10 Chi-Square Tests and the F-Distribution
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 10.17:
GOODNESS OF FIT Larson/Farber 4th ed 1 Section 10.1.
Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do.
Chi square analysis Just when you thought statistics was over!!
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
Chapter Outline Goodness of Fit test Test of Independence.
Dan Piett STAT West Virginia University Lecture 12.
Binomial Distribution
Random Variables Learn how to characterize the pattern of the distribution of values that a random variable may have, and how to use the pattern to find.
© Copyright McGraw-Hill 2004
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 24, 2015.
Lecture 11. The chi-square test for goodness of fit.
Lecture 22: Quantitative Traits II
IE241 Final Exam. 1. What is a test of a statistical hypothesis? Decision rule to either reject or not reject the null hypothesis.
1 Testing Statistical Hypothesis The One Sample t-Test Heibatollah Baghi, and Mastee Badii.
Did Mendel fake is data? Do a quick internet search and can you find opinions that support or reject this point of view. Does it matter? Should it matter?
Chi square and Hardy-Weinberg
The Chi Square Equation Statistics in Biology. Background The chi square (χ 2 ) test is a statistical test to compare observed results with theoretical.
III. Statistics and chi-square How do you know if your data fits your hypothesis? (3:1, 9:3:3:1, etc.) For example, suppose you get the following data.
1 Outline 1.Count data 2.Properties of the multinomial experiment 3.Testing the null hypothesis 4.Examples.
Hardy-Weinberg Equilibrium When mating is completely random, the zygotic frequencies expected in the next generation may be predicted from the knowledge.
C HAPTER 2  Hypothesis Testing -Test for one means - Test for two means -Test for one and two proportions.
 What is Hypothesis Testing?  Testing for the population mean  One-tailed testing  Two-tailed testing  Tests Concerning Proportions  Types of Errors.
Section 10.1 Goodness of Fit © 2012 Pearson Education, Inc. All rights reserved. 1 of 91.
The Chi Square Test A statistical method used to determine goodness of fit Chi-square requires no assumptions about the shape of the population distribution.
Outline.
Chapter Six McGraw-Hill/Irwin
CONCEPTS OF HYPOTHESIS TESTING
Lecture 4: Testing for Departures from Hardy-Weinberg Equilibrium
Inference on Categorical Data
Last Update 12th May 2011 SESSION 41 & 42 Hypothesis Testing.
Testing Hypotheses I Lesson 9.
Presentation transcript:

HARDY-WEINBERG EQUILIBRIUM Population Genetics Lab 2 BINOMIAL PROBABILITY & HARDY-WEINBERG EQUILIBRIUM

Last Week : Sample Point Methods: Example: Use the Sample Point Method to find the probability of getting exactly two heads in three tosses of a balanced coin. 1. The sample space of this experiment is: 2. Assuming that the coin is fair, each of these 8 outcomes has a probability of 1/8. 3. The probability of getting two heads is the sum of the probabilities of outcomes 2, 3, and 4 (HHT, HTH, and THH), or 1/8 + 1/8 + 1/8 = 3/8 = 0.375. Outcome Toss 1 Toss 2 Toss 3 Shorthand Probabilities 1 Head HHH 1/8 2 Tail HHT 3 HTH 4 THH 5 TTH 6 THT 7 HTT 8 TTT

Total # of sample points = 230 = 1,073,741,824 Sample- point method : Example: Find the probability of getting exactly 10 heads in 30 tosses of a balanced coin. Total # of sample points = 230 = 1,073,741,824

Need a way of accounting for all the possibilities Example: In drawing 3 M&Ms from an unlimited M&M bowl that is always 60% red and 40% green, what is the P(2 green)? If one green M&M is just as good as another… Great example!

Binomial Probability Distribution Where, n = Total # of trials. y = Total # of successes. s = probability of getting success in a single trial. f = probability of getting failure in a single trial (f = 1-s).

Assumptions of Binomial Distribution Number of trials are independent, finite, and conducted under the same conditions. There are only two types of outcome.(Ex. success and failure). Outcomes are mutually exclusive and independent. Probability of getting a success in a single trial remains constant throughout all the trials. Probability of getting a failure in a single trial remains constant throughout all the trials. Number of success are finite and a non-negative integer (0,n)

Properties of Binomial Distribution Mean or expected # of successes in n trials: Variance of y: Standard deviation of y, σ (y) = (nsf)1/2 E(y) = ns V(y) = nsf SD(y) = 𝑽(𝒚)

Example: Find the probability of getting exactly 10 heads in 30 tosses of a balanced coin. Solution: We know n = 30 y = 10 s = 0.5 f = 0.5

Example: Find the expected # of heads in 30 tosses of a balanced coin Example: Find the expected # of heads in 30 tosses of a balanced coin. Also calculate variance. Solution: E(Y) = ns = 30*0.5 = 15 V(Y) = nsf = 30*0.5*0.5 = 7.5

Problem 1 (10 minutes) Problem 1: A nuclear allozyme locus has three alleles, A1 and A2, and A3, with frequencies 0.847, 0.133, and 0.020, respectively. If we sample 30 diploid individuals, what is the probability of: a) Not finding any copies of A2? b) Finding at least one copy of A2? c) GRADUATE STUDENTS ONLY: Finding fewer than 2 copies of A2?

Example: How many diploid individuals should be sampled to detect at least one copy of allele A2 from Problem 1 with probability of at least 0.95? Solutions: Emphasize flipping of the inequality when dividing by negative number. Thus, to detect at least one copy of allele A2 with probability of 0.95, one would need to sample at least 90 alleles (i.e., at least 45 diploid individuals).

Problem 2 (15 minutes) Problem 2: The frequency of red-green color-blindness is 0.07 for men and 0.005 for women. You are designing a survey to determine the effect of color blindness on educational success. How many males and females would you have to sample to ensure that the probability including at least one color blind individual of each sex would be 0.90 or greater?

Estimation of allele frequency for co-dominant locus Where, p = Frequency of allele A1 q = Frequency of Allele A2 N11 = # of individuals with genotype A1A1 N12 = # of individuals with genotype A1A2 N22 = # of individuals with genotype A2A2 N = total # of diploid individuals =N11+N12+N22

Estimation of Standard Error Where, p = Frequency of allele A1 q = Frequency of Allele A2 SEp = Standard error for frequency of allele A1 SEq = Standard error for frequency of allele A2 N = total # of diploid individuals =N11+N12+N22

Standard Deviation v. Standard Error Standard deviation: variability of individuals around the sample mean Standard error: variability of sample means around the population mean We expect ~68% of the data to fall within 1 standard deviation of the mean.

A1 A11 A12 … A1j A2 A21 A22 … A2j … Ai Ai1 Ai2 …Aij SUM 1 𝑗 𝐴1𝑗 Genotype1 Genotype2 …Genotype j SUM A1 A11 A12 … A1j 1 𝑗 𝐴1𝑗 A2 A21 A22 … A2j 1 𝑗 𝐴2𝑗 … Ai Ai1 Ai2 …Aij 1 𝑗 𝐴𝑖𝑗 Aij ≥𝟎, 𝒊,𝒋=𝟏,𝟐,𝟑,…

Example: What are the allele frequencies of alleles A1 and A2, if the following genotypes have been observed in a sample of 50 diploid individuals? Genotype Count A1A1 17 A1A2 23 A2A2 10 Solution: N11 = 17, N12 = 23, and N22 = 10 If p = 0.57 -- what is q?

q = 1 – p = 0.43 What do you notice about the standard error of q?

Problem 3 (10 minutes) Estimate the allele frequencies (include their respective standard errors) for alleles A1, A2, and A3 if the following genotypes have been observed in a sample of 200 individuals: Genotype Count A1A1 19 A2A2 17 A3A3 14 A1A2 52 A1A3 57 A2A3 41

Estimation of allele frequency for dominant locus For dominant loci, we only know the genotype of homozygous recessive individuals (in absence of sequence data). (Note: this is an estimate) Where, q = Frequency of Allele A2 N22 = # of individuals with genotype A2A2 N = total # of diploid individuals = N11+N12+N22

Problem 4 (15 minutes) Go to the “Genetics Home Reference” website (http://ghr.nlm.nih.gov) and use the search feature to find a condition caused by a dominant allele in humans. On the main description page, find the frequency of the condition in human populations. Assuming HWE and Mendelian inheritance of the disease, what is the frequency of the recessive allele in this population? What is the standard error of this estimate? How many affected children would you expect in the next generation? (Global/U.S/etc) What are the assumptions of these estimates? Eg. http://ghr.nlm.nih.gov/condition/cornelia-de-lange-syndrome

Hypothesis Testing Hypothesis: Tentative statement for a scientific problem, that can be tested by further investigations. Null Hypothesis(H0): There is no significant difference in observed and expected values. Alternate Hypothesis(H1): There is a significant difference in observed and expected values. Example: H0 = Fertilized and unfertilized crops have equal yields H1 = Fertilized and unfertilized crops do not have equal yields

Remember: In final conclusion after the experiment ,we either – "Reject H0 in favor of H1" Or “Fail to reject H0”,

Type I error: Error due to rejection of a null hypothesis, when it is actually true (False positive). Level of significance(LOS) (α) : Maximum probability allowed for committing “type I error”. At 5 % LOS (α=0.05), we accept that if we were to repeat the experiment many times, we would falsely reject the null hypothesis 5% of the time. Ho is TRUE Ho is FALSE Accept Ho Fail to reject Ho β 1-α Type II Error α 1-β Reject Ho Type I Error

P- value: Probability of committing type I error If P-value is smaller than a particular value of α, then result is significant at that level of significance

Testing departure from HWE In a randomly mating population, allele and genotype frequencies remain constant from generation to generation. H0= There is no significant difference between observed and expected genotype frequencies (i.e. Population is in HWE) H1= There is a significant difference between observed and expected genotype frequencies (i.e. Population is not in HWE) “disturbing factors” is a bit of strange phrasing. I don’t think that part is needed. You just list the assumptions of the model separately.

HWE Assumptions Random mating No selection Equal numbers of offspring per parent All progeny equally fit No mutation Single, very large population No migration

χ2 - test Where,

Is this population in Hardy-Weinberg equilibrium ? Example: A population of Mountain Laurel at Cooper’s Rock State Forest has the following observed genotype counts: Genotype Observed number A1A1 5000 A1A2 3000 A2A2 2000 Is this population in Hardy-Weinberg equilibrium ?

χ2 Genotype Obs. #(O) Exp. #(E) (O-E) (O-E)^2 (O-E)^2/E A1A1 5000 4225 Expected frequency under HWE Expected number A1A1 p2 = 0.652 = 0.4225 0.4225  10000 = 4225 A1A2 2pq = 0.455 0.455  10000 = 4550 A2A2 q2 = 0.1225 0.1225  10000 = 1225 Genotype Obs. #(O) Exp. #(E) (O-E) (O-E)^2 (O-E)^2/E A1A1 5000 4225 775 600625 142.1598 A1A2 3000 4550 -1550 2402500 528.022 A2A2 2000 1225 490.3061 χ2 1160.488

The critical value (Table value) of χ2 at 1 df and at α=0.05 is approx. 3.84. Conclusion: Because the calculated value of χ2 (1160.49) is greater than the critical value (3.84), we reject the null hypothesis and accept the alternative (Not in HWE). Explain how to determine df

Problem 5 (10 minute) Based on the observed genotype counts in Problem 3, test whether the population that had been sampled is in HWE. Critical Chi square values are given in the table to the right. Think carefully about which one you should use (Hint: How many parameters are estimated from the data when the allele frequencies of 3 alleles are estimated?). What are some possible explanations for the observed results?   d.f. Critical value of χ2 1 3.8415 2 5.9915 3 7.8147 4 9.4877 5 11.0705 6 12.5916