Linkage and Linkage Disequilibrium AB = 25% Ab = 25% aB = 25% ab = 25% f(Ab) = f(A) x f(b) A B B b a b A a AB = 50% Ab = 0% aB = 0% ab = 50% A B f(Ab) ≠ f(A) x f(b) B b a b A a A locus and B locus are in Linkage Disequilibrium D = f(AB) x f(ab) - f(Ab) x f(aB) Maximum with no recombination D = 0 with free recombination (linkage equilibrium)
Are alleles at separate loci paired at random? Linkage equilibrium: Are alleles at separate loci paired at random? D = x11 − p1q1 D = x22 − p2q2
A locus and B locus are in Linkage Disequilibrium D = f(AB) x f(ab) - f(Ab) x f(aB) Maximum with no recombination D -> 0 with free recombination (linkage equilibrium) When allele frequencies are intermediate: f(A) = f(a) = f(B) = f(b) = 0.5, and maximal LD occurs so that no recombinants are present: f(AB) = f(ab) = 0.5, so D = 0.5 x 0.5 – 0.0 x 0.0 = 0.25 When allele frequencies are skewed: f(A) = 0.9, f(a) = 0.1; f(B) = 0.9, f(b) = 0.1 and maximal LD occurs so that no recombinants are present, D is less than 0.25: f(AB) = 0.9, and f(ab) = 0.1, so D = 0.9 x 0.1 – 0.0 x 0.0 = 0.09
LD as a two-locus Hardy Weinberg problem
Linkage disequilibrium (LD) decays with distance and time AB = (1-r)/2 Ab = r/2 aB = r/2 ab = (1-r)/2 A B a b r = Rate of recombination
Empirical demonstration of the Decay of LD over time
Epistasis
QTL for flower traits in Mimulus (monkey flowers) Different pollinators M. lewisii F1 M. cardinalis F2’s
Genetic map of monkey flower http://www.genetics.org/cgi/content/full/159/4/1701/F1
Quantitative trait locus (QTL)mapping: Screen for marker-trait associations in F2s or RILs Parentals F1 M, Q M, Q M, Q M, Q M, Q F2 Inbreed to make Recombinant inbred lines (RILs) Scan genome for association Between molecular marker and phenotype Small Large Association between Molecular marker (M) and QTL(Q) M, Q m, q
detecting an association between a genetic marker (M) QTL Mapping: detecting an association between a genetic marker (M) and a gene affecting a quantitative trait (Q). QTL here Marker here http://isotope.bti.cornell.edu/img/intro/qtl_fig_2.gif QTL mapping works because there is linkage disequilibrium (LD) between the marker (M) and the QTL (Q): mm marker genotypes are correlated with small size MM marker genotypes are correlated with large size
Most traits in organisms Show continuous variation How do we find the genes That affect these “quantitative” traits Scan the genome for Nucleotide sites that Co-vary with the phenotype
Genome wide association studies: GWAS Mutation “causing” variation in height Tall A Tall A Tall A Tall A Short G Short G Short G Short G Adjacent SNPs are linked Distant sites show no genotype-phenotype association Problem: how do we find the causal SNPs? Needle in a haystack
What is better: More recombination, more markers? Parentals F1 M, Q M, Q M, Q M, Q M, Q F2 Inbreed to make Recombinant inbred lines (RILs) Scan genome for association Between molecular marker and phenotype Small Large Association between Molecular marker (M) and QTL(Q) M, Q m, q
Human Chimp How does DNA evolve? 1 ATGCCCCAACTAAATACTACCGTATGGCCCACCATAATTACCCCCATACT 50 ||||||||||||||||| ||||||| ||||||||||||||||||||||| 1 atgccccaactaaataccgccgtatgacccaccataattacccccatact 50 . . . . . 51 CCTTACACTATTCCTCATCACCCAACTAAAAATATTAAACACAAACTACC 100 ||| |||||||| ||| |||||||||||||||||||||| |||| |||| 51 cctgacactatttctcgtcacccaactaaaaatattaaattcaaattacc 100 101 ACCTACCTCCCTCACCAAAGCCCATAAAAATAAAAAATTATAACAAACCC 150 | ||||| ||||||||||| ||||||||||||||||| || || |||||| 101 atctacccccctcaccaaaacccataaaaataaaaaactacaataaaccc 150 151 TGAGAACCAAAATGAACGAAAATCTGTTCGCTTCATTCATTGCCCCCACA 200 ||||||||||||||||||||||||| |||||||||||| |||||||||| 151 tgagaaccaaaatgaacgaaaatctattcgcttcattcgctgcccccaca 200 201 ATCC 204 |||| 201 atcc 204
Measuring DNA Evolution Align sequences between species Determine length of sequences, L Count number of differences Divergence = proportion of differences D = p-distance = (number of differences) / (length of sequence) Rate of divergence = (sequence divergence) / (age of common ancestor) = D / time Rate of substitution = D / 2 x time time Example: 5 differences in 100 D = 0.05, t = 6 million years Divergence = 0.05/6x106 Divergence = 8.3 x 10-9
Jukes Cantor One parameter model = rate of substitution PA(t) = ¼ + ¾ e-4at = probability that A remains A at time t PNN = ¼ + ¾ e-8at = probability that two sequences have the same nucleotide at N D = proportion of different nucleotides = 1 - PNN Dhat = 3/4(1-e-8t) K = - ¾ ln (1-4/3p) where p = proportion of nucleotide differences (# diffs./total bp)
Kimura two-parameter model b a = rate of transition substitution b = rate of transversion substitution PAA(t) = ¼ + ¾ e-4bt + ½ e-2(a+b)t = probability that A remains A at time t K = ½ ln(1/[1- 2P-Q]) + ¼ ln(1/[1-2Q]) where P = proportion of transitional differences Q = proportion of transversional differences
Comparison of models P-distance Jukes Cantor Kimura 2-parameter Tamura-Nei Etc…
Molecular clocks Approximately constant Divergence of proteins K = •f0 Rate of substitution = Mutation rate x proportion of neutral mutations “Saturation” due to multiple Hits in DNA evolution
Anatomy of a phylogenetic tree Terminal (external) nodes Taxa = OTUs = Operational taxonomic units Taxon1 Taxon2 Taxon3 Taxon4 Taxon5 Taxon6 Polytomy Non-dichotomous splitting External branch Internal branch Internal nodes Root
Relative rate test KAC = KBC KOC is shared Tajima test (m1-m2)2 / (m1+m2) Chi square, df=1 Species O m1 m2 Species A Species B Species C
DNA test of neutrality Antigen binding sites: dN/dS > 1 “positive” selection Neutral prediction: amino acid (nonsynonymous) substitution rate (dN) should be lower than silent (synonymous) substitution rate (dS) True for most genes Follows from functional constraint argument Different for Major Histocompatibility Complec (MHC) loci Antigen recognition sequence shows dN > dS Rest of molecule shows dN > dS, as expected Amino acid mutations are favored in antigen recognition region Promotes diversity, better recognition of foreign peptides http://depts.washington.edu/rhwlab/dq/3structure.html Rest of molecule: dN/dS < 1 Negative (purifying) selection
Maximum likelihood Likelihood of observing the data set OTU1 OTU1 Likelihood of observing the data set Assuming a given tree Assuming a given model of DNA evolution L = P(data|tree) Consider 4-taxon cases within a tree For each site, Identify nucleotides at each of the four taxa Assume all 16 pairs of nucleotides at internal nodes Likelihood of observed 4 terminal nucleotides = sum of 16 independent probabilities Repeat likelihoods for each position in alignment Likelihood of tree = product of individual likelihoods L = P Li for i = 1 to n positions in alignment (or sum of log likelihoods) Calculate likelihood for other trees; choose tree with maximum likelihood HTU1 HTU1 OTU1 OTU1