Phylogenetic Inference Data Optimality Criteria Algorithms Results Practicalities BIO520 BioinformaticsJim Lund Reading: Ch8.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

Molecular Phylogeny Analysis, Part II. Mehrshid Riahi, Ph.D. Iranian Biological Research Center (IBRC), July 14-15, 2012.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Phylogenetic analysis
An Introduction to Phylogenetic Methods
Wellcome Trust Workshop Working with Pathogen Genomes Module 6 Phylogeny.
Multiple Sequence Alignment & Phylogenetic Trees.
Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)
Based on lectures by C-B Stewart, and by Tal Pupko Phylogenetic Analysis based on two talks, by Caro-Beth Stewart, Ph.D. Department of Biological Sciences.
Phylogenetic Analysis
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
IE68 - Biological databases Phylogenetic analysis
Molecular Evolution Revised 29/12/06
Bioinformatics and Phylogenetic Analysis
Phylogenetic reconstruction
Phylogenetic reconstruction
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
Phylogenetic trees Sushmita Roy BMI/CS 576
Phylogenetic Analysis
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Terminology of phylogenetic trees
Molecular phylogenetics
Christian M Zmasek, PhD 15 June 2010.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Phylogentic Tree Evolution Evolution of organisms is driven by Diversity  Different individuals carry different variants of.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7.
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Phylogenetics.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Part 9 Phylogenetic Trees
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: (1) Phylogeny inference or “tree building”
Bioinformatics Lecture 3 Molecular Phylogenetic By: Dr. Mehdi Mansouri Mehr 1395.
Phylogeny and the Tree of Life
Phylogenetic basis of systematics
Phylogenetic Inference
Multiple Alignment and Phylogenetic Trees
Methods of molecular phylogeny
Summary and Recommendations
Summary and Recommendations
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

Phylogenetic Inference Data Optimality Criteria Algorithms Results Practicalities BIO520 BioinformaticsJim Lund Reading: Ch8

Our Goals Infer Phylogeny –Optimality criteria –Algorithm Determine the sequence of branching events that reflects the history of a group of organisms.

Phylogenetic Model Assumptions No transfer of genetic information by hybridization All sequences are homologous (orthologous, really) Each position in alignment homologous Observed variation is valid sample from included group Positions evolve independently

Steps in Analysis 1.Data Model (Alignment) –alignment method –“trimming” to a phylogenetic set 2.DNA base substitution model 3.Build Trees –Algorithm based vs Criterion based –Distance based vs Character-based 4.Assess tree quality.

Choice of Input Data Data Type –Aligned sequences, RFLP, morphological data… Molecule of interest –rRNA (general purpose) –Mitochondrial DNA –Selected genes Number/type of taxa –ingroup and outgroup

rRNA Genes Conserved across kingdoms Varies within species Widely sequenced, easy Long, lots of characters

Multiple Alignment Method Phylogenetic Assumptions Alignment parameters –(substitution matrix, gap cost) Aligned features –primary sequence, structure Optimization –statistical, non-statistical

Typical Alignment Method CLUSTAL, then manual editing –Manual editing for phylogeny –phylogenetic assumption in guide tree –parameters a priori and dynamic –Optimization Non-statistical Remove poorly aligned regions Test several gap penalties

Substitution Models G to A, C to T versus N to N Amino acid substitution Forwards and backwards weights identical? Site-to-site variation

Tree-Building Methods Distance-based methods –NJ, FM, ME, UPGMA Character-based methods –Maximum Parsimony (PAUP) –Maximum Likelihood (PHYLIP) Algorithm choice is a contested, active research field.

Molecular phylogenetic tree building methods: Are mathematical and/or statistical methods for inferring the divergence order of taxa, as well as the lengths of the branches that connect them. There are many phylogenetic methods available today, each having strengths and weaknesses. Most can be classified as follows: COMPUTATIONAL METHOD Clustering algorithmOptimality criterion DATA TYPE Characters (bp, aa) Distances PARSIMONY MAXIMUM LIKELIHOOD UPGMA NEIGHBOR-JOINING MINIMUM EVOLUTION LEAST SQUARES

Distance Methods Measure distance (dissimilarity) Accurate if distances are all summative (ultrametric) –NEVER true over large distance Methods –NJ (Neighbor joining) –FM (Fitch-Margoliash) –ME (Minimal Evolution) –UPGMA (Unweighted pair group method with Arithmetic Mean)

Which Distance Method? UPGMA (Unweighted pair group method with Arithmetic Mean) –Least accurate, still commonly used NJ (Neighbor joining) –EXTREMELY RAPID –GIVES ONLY 1 TREE ME (Minimal Evolution) and FM (Fitch-Margoliash) seem best –Minimize tree path lengths

Inferring Trees and Ancestors CCCAGG CCCAAG-> CCCAAG CCCAAA-> CCCAAA CCCAAA-> CCCAAC

Different Criteria 1CCCAGG 2CCCAAG 3CCCAAA 4CCCAAC ,2 can be sister taxa AND 3,4 can be sister taxa Infer ancestor of 1,2 and 3,4 Distance from 1/2, 3/4 equal

Character Methods Maximum Parsimony –minimal changes to produce data –can use different substitution models Maximum Likelihood –turns problem “inside out”, single most likely tree that explains data coin flip analogy –increasingly popular Bayesian –Searches for Best Set of trees that explains data AND fits evolutionary model

Parsimony CCCAGG CCCAAG-> CCCAAG CCCAAA-> CCCAAA CCCAAA-> CCCAAC 4 TAXA, 3 changes minimum Search for shortest tree, the one with the fewest changes.

Likelihood Models Hypothesis 1: All 3 teams are equally good. Hypothesis 2: The Yankees are the best team. Hypothesis 3: The Tigers are the worst team

Searching for Trees

Tree Search Algorithms Exhaustive –VERY INTENSIVE Branch and Bound –Compromise Heuristic –FAST (usually start with NJ) # of taxaNJParsimonyMLBayes 100.2s0.05s4.1s0.5 hr 50.2s.7s7hr4hr

Evaluating Trees Consensus Tree Randomized Trees –Skewness tests Randomized Character Data –Permutation tests (permuted by column) Bootstrap, Jackknife –resampling techniques –Counts how often each clade appears in test data. –>70% probably correct; 50% overestimates accuracy

Tree Congruence Tree-to-Tree Comparison –2 different characters/same groups –Important for evaluating biological hypotheses Example: Did lentiviruses diverge within their current hosts only? Or did plant pathogenicity has arisen many times in fungi?

Inferring evolutionary relationships between the taxa requires rooting the tree: To root a tree mentally, imagine that the tree is made of string. Grab the string at the root and tug on it until the ends of the string (the taxa) fall opposite the root: A B C Root D A B C D Note that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D. Rooted tree Unrooted tree

Now, try it again with the root at another position: A B C Root D Unrooted tree Note that in this rooted tree, taxon A is most closely related to taxon B, and together they are equally distantly related to taxa C and D. C D Root Rooted tree A B

Rooting Trees Molecular Clock –Root=midpoint of longest span –Unreliable, often wrong. Evidence –select fungus as root for plants, eg long branch attraction can be Extrinsic problem Paralog rooting –long branch problems

Phylogenetic Software PHYLIP – – PAUP: Pileup, Lineup, Paupsearch, Paupdisplay – MrBayes –Bayesian trees – Treeview –Several programs going by this name have been written. –Draw/format phylogenic trees –Jave TreeView:

Phylogenetic Stories HIV –complete genome accessible –evolution rapid selection, neutralism? Primate evolution –Which primate is the closest relative to modern humans?

HIV Genome Diversity Error prone (RT) replication High rate of replication –10 10 virions/day In vivo selection pressure And In vivo recombination!

HIV tree Recombinants? ENV GAG AIDS 1996, 10:S13

Subtype E ENV=A “Bootscanning” AIDS 1996, 10:S13

Which species are the closest living relatives of modern humans? Mitochondrial DNA, most nuclear DNA- encoded genes, and DNA/DNA hybridization all show that bonobos and chimpanzees are related more closely to humans than either are to gorillas. The pre-molecular view was that the great apes (chimpanzees, gorillas and orangutans) formed a clade separate from humans, and that humans diverged from the apes at least MYA. MYA Chimpanzees Orangutans Humans Bonobos Gorillas Humans Bonobos GorillasOrangutans Chimpanzees MYA

Phylogenetic Resources NCBI Taxonomy Browser – RDP database (Ribosomal Database Project) – “Tree of Life” –

Practicalities Quality of input alignment critical Examine data from all possible angles –distance, parsimony, likelihood, Bayes Outgroup taxon critical –problem if outgroup shares a selective property with a subset of ingroup Order of input can be problematic –Jumble them!