Lab 11 :Test of Neutrality and Evidence for Selection
Goals: 1.Calculate exp. # of different allele in a population for different marker. 2.Detect departure from neutrality using- 1.Ewens- Watterson test. 2.Tajima’s D test. 3.HKA test and 4.Synonymous and Nonsynonymous nucleotide substitution test
Infinite Alleles Model (IAM) Each mutation produces a new allele At equilibrium, number of alleles and shape of allele frequency distribution remain constant Lost alleles replaced by new mutations
Ewens -Watterson test Expected homozygosity under mutation-drift equilibrium and assuming IAM: Expected homozygosity under HWE: P-value Balancing selection or recent bottleneck P-value > 0.975: Too uneven -> Directional selection or population growth
Problem 1. Estimates of the long-term effective population size of human populations vary widely, ranging from as low as ~3,000 to as high as ~100,000. To estimate allele frequencies for a forensic identification study, you are genotyping individuals selected at random from a population with an estimated N e = 7,500. You are using one allozyme and one microsatellite marker, with estimated mutation rates = 0.8 and = 9.2 10 -2, respectively. How many different alleles do you expect to find for each marker in a sample of: 7 people? 12 people? What assumptions were made for these calculations to be valid?
Tajima’s D Under neutrality, we expect the following: Test of the coalescent model – Assumes neutral alleles and constant population size
Tajima’s D test d = − S = 0 Under neutrality D =.
(Hamilton 270)
plantsciences.ucdavis.edu
Problem 2. File aspen_phy.arp (which is already in Arlequin format) contains sequence data from exon 1 of the phytochrome B2 (phyB2) gene of 24 aspen (Populus tremula) trees sampled along a wide latitudinal gradient in Europe. Use Arlequin to: a.Determine the number of polymorphic sites (S) and calculate the nucleotide diversity ( ) based on these sequences. b.Perform the tests of neutrality developed by Ewens- Watterson and Tajima and interpret the results. c. Provide a statistical and a biological interpretation of the results from the two neutrality tests.
Hudson-Kreitman- Aguade(HKA) test (Hamilton 266)
Hudson-Kreitman- Aguade(HKA) test AdhControl locus Polymorphism within species (S/m) Divergence between Species(D/m) Ratio (within/between) χ2χ p-value0.016
FileRegion of tb1Subspecies utr_mays.arp5’ untranslated regionmays utr_par.arp5’ untranslated regionparviglumis exon_mays.arpexonmays exon_par.arpexonparviglumis Test Atb1 5’ untranslated regionAverage of control loci Polymorphism within subspecies Divergence between subspecies χ2χ p-value0.001 Test Btb1 translated regionAverage of control loci Polymorphism within subspecies Divergence between subspecies χ2χ p-value0.26 Problem 3. Files utr_mays.arp, utr_par.arp, exon_mays.arp, and exon_par.arp contain sequence data from the 5’ untranslated region and from an exon of the teosinte branched1(tb1) gene of maize (Zea mays ssp. mays) and its most likely wild progenitor Zea mays ssp. parviglumis. For each of these regions of tb1 and for each subspecies: Use Arlequin to determine the number of segregating sites (S) and calculate the nucleotide diversity ( ). What can you infer by comparing nucleotide diversity between the two species for each region? Use Arlequin to perform the tests of neutrality developed by Ewens-Watterson and Tajima. Interpret and discuss the results. Interpret and discuss the results from the following 2 HKA tests: GRADUATE STUDENTS ONLY: Download and read the paper describing this study (Wang et al. 1999), which is uploaded on the lab page of the class website, and provide an extended biological interpretation of the results of a) – c).
Synonymous and Nonsynonymous Nucleotide Substitution test dN = Observed # nonsynonymous substitutions/nonsynonymous site dS= Observed # synonymous subsitutions/synonymous site 5’-ATT GTT CAT CGT ACC CAT CGA-3’ 5’-ATT GTT CAT CGC ACC CAA CGA-3’ Synonymous site Synonymous mutation Nonsynonymous site Nonsynonymous mutation
Problem 4. Calculate the ω = d N /d S ratio based on the following 2 DNA sequences: 5’-ATG GTT CAT TTT ACC GGA CGA AGT CGA TTA-3’ 5’-ATG GTT CAC TTG ACC GCA CGA AGT AGA TTA-3’ Seq 1 Codon No. potential synonymous sites (s j ) No. potential nonsynonymous sites (n j ) Seq 2 Codon No. potential synonymous sites (s j ) No. potential nonsynonymo us sites (n j ) ATG03 03 GTT Total