Ortholog vs. paralog? 1. Collect Sequence Data Good Dataset B species 1 species 2 species 3 species 4 A1 B1 A2 B2 A4 B4 A3 B3 Good Dataset Bad Dataset Draw resulting trees [A1, A2, A3, A4] [A1, B2, A3, A4] OEB 192 – 11.09.14
Alignment taxa1 CGGATAAAC taxa2 CGGATAGAC taxa3 CGCTGATAAAC taxa4 2. Sequence Alignment CGGATAAAC CGGATAGAC CGCTGATAAAC CGGATAC taxa1 taxa2 taxa3 taxa4 CG--GATAAAC CG--GATAGAC CGCTGATAAAC CG--GAT--AC
Choose methods: distance-based Example: Neighbor Joining (NJ) Taxa Characters Species A ATGGCTATTCTTATAGTACG Species B ATCGCTAGTCTTATATTACA Species C TTCACTAGACCTGTGGTCCA Species D TTGACCAGACCTGTGGTCCG Species E TTGACCAGTTCTCTAGTTCG A B C D E Species A ---- 4 10 9 8 Species B -19.3 ---- 8 11 10 Species C -10 -14.7 ---- 3 8 Species D -10.7 -11.3 -16 ---- 5 Species E -12.7 -13.3 -12 -14.7 ---- A B C D E Species A ---- 4 10 9 8 Species B ---- 8 11 10 Species C ---- 3 8 Species D ---- 5 Species E ---- M(AB)=d(AB) -[(r(A) + r(B))/(N-2)] A B C D E Insert node connecting pair most relatively close to each other, than repeat for U1, C, D, E
Discrete character methods 4. Choose Methods Discrete character methods Maximum Parsimony (MP): Model: Evolution goes through the least number of changes Maximum Likelihood (ML): L (data| model) Adjust the internal node to minimize the sum of changes of all characters Scores trees for that most likely given the model parameters Bayesian Inference
? Choose “model” 3. Choose Models Ancestral Sequences Model Observed Sequences ? Model Substitution models (arrows between all nucleotides) 12 rates, 4 equilibrium frequencies, indels
Assess reliability 5. Assess Reliability Re-sampling to produce pseudo-dataset (random weighting) CGATCGTTA CAATGATAG CGCTGATAA CGCTGATCG taxa1 taxa2 taxa3 taxa4 123456789 100 73 I. Bootstrap Sampling with replacement II. Jacknife Random deletion of sub-dataset Randomize dataset to build null likelihood distribution III. Permutation test
Utility of phylogeny: Molecular clock 5. Assess Reliability How assess timeline? Made tree (ML) & regressed branch length against year. Ancestor of M clade: 1931 (95% c.i. made via bootstraps: 1915-1941) – Do you believe it? There is a recovered Zaire sequence, for which they predict would have occurred 1957 (actually 1959)… (Korber et al., 2000)
Utility of phylogeny: Molecular clock 5. Assess Reliability How assess timeline? Made tree (ML) & regressed branch length against year. Ancestor of M clade: 1931 (95% c.i. made via bootstraps: 1915-1941) – Do you believe it? There is a recovered Zaire sequence, for which they predict would have occurred 1957 (actually 1959)… (Korber et al., 2000)
Utility of phylogeny: Molecular clock 5. Assess Reliability How assess timeline? Made tree (ML) & regressed branch length against year. Ancestor of M clade: 1931 (95% c.i. made via bootstraps: 1915-1941) – Do you believe it? There is a recovered Zaire sequence, for which they predict would have occurred 1957 (actually 1959)… (Hillis, 2000)
Utility of phylogeny: infer past environment? Was the ancestor of bacteria a thermophile? Reconstructed EF-Tu at key nodes All ancestral types have high Topt (Gaucher et al., 2003)
Molecular signs of selection Several, dn/ds most common 1 = neutrality; <1 purifying selection; >1 positive selection (particularly sustained) – ex: extreme values show up for genes interacting with immune system (Sawyer & Malik, 2006)
Genetic exchange in bacteria/archaea Transformation Used by Griffiths, and later Avery to show DNA is genetic material Some bacteria naturally competent Transduction Generalized & specialized Conjugation High-frequency recombinants (Hfr) Rare, partial and unidirectional Can be novel genes or new alleles (homologous rec.)
Two “types” of HGT… Transformation Transduction Conjugation Used by Griffiths, and later Avery to show DNA is genetic material Some bacteria naturally competent Transduction Generalized & specialized Conjugation High-frequency recombinants (Hfr) Rare, partial and unidirectional Can be novel genes or new alleles (homologous rec.)
Comparison of HGT to “normal” sex Transformation Used by Griffiths, and later Avery to show DNA is genetic material Some bacteria naturally competent Transduction Generalized & specialized Conjugation High-frequency recombinants (Hfr) Rare, partial and unidirectional Can be novel genes or new alleles (homologous rec.)
Monday (9/19): Horizontal gene transfer