Download presentation
Presentation is loading. Please wait.
Published byAriel Lloyd Modified over 9 years ago
1
Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 27, 2012
2
Last Time uReview of genetic variation and Mendelian Genetics uMethods for detecting variation Morphology Allozymes DNA Markers äAnonymous äSequence-tagged
3
Today uSequence probability calculation uMolecular markers: DNA sequencing uIntroduction to statistical distributions uEstimating allele frequencies uIntroduction to Hardy-Weinberg Equilibrium uUsing Hardy-Weinberg: Estimating allele frequencies for dominant loci
4
If nucleotides occur randomly in a genome, which sequence should occur more frequently? AGTTCAGAGT AGTTCAGAGTAACTGATGCT What is the expected probability of each sequence to occur once? How many times would each sequence be expected to occur by chance in a 100 Mb genome?
5
AGTTCAGAGT What is the expected probability of each sequence to occur once? What is the sample space for the first position? ATGCATGC Probability of “A” at that position? Probability of “A” at position 1, “G” at position 2, “T” at position 3, etc.? AGTTCAGAGTAACTGATGCT
6
AGTTCAGAGT How many times would each sequence be expected to occur in a 100 Mb genome? AGTTCAGAGTAACTGATGCT Why is this calculation wrong?
7
AB AGTTCAGAGTAACTGATGCT UCA AGU CUC AUU GAC UAC GA Ser Cys Phe Ile Asp Tyr UGA AGU CUC AUU GAC UAG GA Stop Cys Phe Ile Asp Stop
8
DNA Sequencing uDirect determination of sequence of bases at a location in the genome Shotgun versus PCR sequencing uDye terminators (Sanger) and capillaries revolutionized DNA sequencing uModern sequencing methods (sequencing by synthesis, pyrosequencing) have catapulted sequencing into realm of population genetics uHuman genome took 10 years to sequence originally, and hundreds of millions of dollars uNow we can do it in a week for <$2,000
9
SNPs uA Single Nucleotide Polymorphism (SNP) is a single base mutation in DNA. uThe most common source of genetic polymorphism (e.g., 90% of all human DNA polymorphisms). uIdentify SNP by screening a sample of individuals from study population: usually 16 to 48 uOnce identified, SNP are assayed in populations using high-throughput methods
10
Genotyping by Sequencing uNew sequencing methods generate 10’s of millions of short sequences per run uCombine restriction digests with sequencing and pooling to genotype thousands of markers covering genome at very high density http://www.maizegenetics.net/images/stories/GBS_CSSA_101102sem.pdf Generate 10’s of thousands of markers for <$100 per sample Presence-Absence Polymorphism SNP
11
Genotyping by Sequencing Cost Example http://www.maizegenetics.net/gbs-overview
12
Statistical Distributions: Normal Distribution uMany types of estimates follow normal distribution Can be visualized as a frequency distribution (histogram) Can interpret as a probability density function Variance (V x ): A measure of the dispersion around the mean: Expected Value (Mean): where n is the number of samples Standard Deviation (sd): A measure of dispersion around the mean that is on same scale as mean 1 sd 2 sd
13
Standard Error of Mean uStandard Deviation is a measure of how individual points differ from the mean estimates in a single sample uStandard Error is a measure of how much the estimate differs from the true parameter value (in the case of means, μ) If you repeated the experiment, how close would you expect the mean estimate to be to your previous estimate? Standard Error of the Mean (se): 95% Confidence Interval:
14
Estimating Allele Frequencies, Codominant Loci uMeasured allele frequency is maximum likelihood estimator of the true frequency of the allele in the population (See Hedrick, pp 82-83 for derivation) uExpected number of observations of allele A 1 : E(Y)=np Where n is number of samples For diploid organisms, n = 2N, where N is number of individuals sampled uExpected number of observations of allele A 1 is analogous to the mean of a sample from a normal distribution uAllele frequency can also be interpreted as an estimate of the mean
15
uAssume a population of Mountain Laurel (Kalmia latifolia) at Cooper’s Rock, WV Allele Frequency Example Red buds: 5000 Pink buds: 3000 White buds: 2000 uPhenotype is determined by a single, codominant locus: Anthocyanin uWhat is frequency of “red” alleles (A 1 ), and “white” alleles (A 2 )? A1A1A1A2A2A2A1A1A1A2A2A2 Frequency of A 1 = p Frequency of A 2 = q
16
Allele Frequencies are Distributed as Binomials uBinomials are variables that can be interpreted as the number of successes and failures in a series of trials uBased on samples from a population For two-allele system, each sample is like a “trial” Does the individual contain Allele A 1 ? Remember, q=1-p, so only one parameter is estimated Number of ways of observing y positive results in n trials Probability of observing y positive results in n trials once where s is the probability of a success, and f is the probability of a failure
17
Given the allele frequencies that you calculated earlier for Cooper’s Rock Kalmia latifolia, what is the probability of observing two “white” alleles in a sample of two plants?
18
Variation in Allele Frequencies, Codominant Loci uBinomial variance is pq or p(1-p) uVariance in number of observations of A 1 : V(Y) = np(1-p) uVariance in allele frequency estimates (codominant, diploid): uStandard Error of allele frequency estimates: uNotice that estimates get better as sample size increases uNotice also that variance is maximum at intermediate allele frequencies
19
Maximum variance as a function of allele frequency for a codominant locus
20
Why is variance highest at intermediate allele frequencies? p = 0.5 If this were a target, how variable would your outcome be in each case (red versus white hits)? Variance is constrained when value approaches limits (0 or 1) p = 0.125
21
What if there are more than 2 alleles? uGeneral formula for calculating allele frequencies in multiallelic system with codominant alleles: uVariance and Standard Error of allele frequency estimates remain:
22
How do we estimate allele frequencies for dominant loci? A2A2A2A2 CodominantlocusDominant locus A1A1A1A1 A1A2A1A2 A2A2A2A2 - + A1A1A1A1 A1A2A1A2 CodominantlocusDominant locus - +
23
Hardy-Weinberg Law uAfter one generation of random mating, single-locus genotype frequencies can be represented by a binomial (with 2 alleles) or a multinomial function of allele frequencies Frequency of A 2 A 2 (Q)Frequency of A 1 A 1 (P)Frequency of A 1 A 2 (H)
24
How does Hardy-Weinberg Work? uReproduction is a sampling process uExample: Mountain Laurel at Cooper’s Rock Red Flowers: 5000 Pink Flowers: 3000 White Flowers: 2000 A1A1A1A2A2A2A1A1A1A2A2A2 Frequency of A 1 = p = 0.65 Frequency of A 2 = q = 0.35 : A 2 =14 : A 1 =26 Alleles: : 4: 10 Genotypes: : 6 Phenotypes: : 4: 10: 6 What are expected numbers of phenotypes and genotypes in a sample of 20 trees? What are expected frequencies of alleles in pollen and ovules?
25
What will be the genotype and phenotype frequencies in the next generation? What assumptions must we make?
26
Hardy-Weinberg Equilibrium uAfter one generation of random mating, genotype frequencies remain constant, as long as allele frequencies remain constant uProvides a convenient Neutral Model to test for departures from assumptions uAllows genotype frequencies to be represented by allele frequencies: simplification of calculations
27
Hardy-Weinberg Assumptions uDiploid uLarge population uRandom Mating: equal probability of mating among genotypes uNo mutation uNo gene flow uEqual allele frequencies between sexes uNonoverlapping generations
28
Graphical Representation of Hardy-Weinberg Law (p+q) 2 = p 2 + 2pq + q 2 = 1
29
Relationship Between Allele Frequencies and Genotype Frequencies under Hardy-Weinberg
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.