Molecular Phylogeny Analysis, Part II. Mehrshid Riahi, Ph.D. Iranian Biological Research Center (IBRC), July 14-15, 2012.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Phylogenetic Analysis 1 Phylogeny (phylo =tribe + genesis)
Based on lectures by C-B Stewart, and by Tal Pupko Phylogenetic Analysis based on two talks, by Caro-Beth Stewart, Ph.D. Department of Biological Sciences.
Phylogenetic Analysis
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
Phylogenetic reconstruction
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
Phylogenetic reconstruction
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Lecture 24 Inferring molecular phylogeny Distance methods
Building Phylogenies Parsimony 2.
Phylogenetic trees Sushmita Roy BMI/CS 576
What Is Phylogeny? The evolutionary history of a group.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Maximum parsimony Kai Müller.
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogentic Tree Evolution Evolution of organisms is driven by Diversity  Different individuals carry different variants of.
Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Phylogenetic Inference Data Optimality Criteria Algorithms Results Practicalities BIO520 BioinformaticsJim Lund Reading: Ch8.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
A brief introduction to phylogenetics
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Part 9 Phylogenetic Trees
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: (1) Phylogeny inference or “tree building”
Introduction to Bioinformatics Resources for DNA Barcoding
Phylogenetic basis of systematics
Inferring a phylogeny is an estimation procedure.
Phylogenetic Inference
Multiple Alignment and Phylogenetic Trees
Goals of Phylogenetic Analysis
CSCI2950-C Lecture 8 Molecular Phylogeny: Parsimony and Likelihood
Lecture 7 – Algorithmic Approaches
Phylogeny.
Presentation transcript:

Molecular Phylogeny Analysis, Part II. Mehrshid Riahi, Ph.D. Iranian Biological Research Center (IBRC), July 14-15, 2012

Topics A few examples of what can be inferred from phylogenetic trees Tree-Building Methods Introduction to Distance and Character Based Phylogeny Maximum Parsimony (MP) and Neighbour-joining (NJ) Analysis using PAUP* 4.08b

What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference or “tree building” 2.Character and rate analysis

A few examples of what can be inferred from phylogenetic trees built from DNA or protein sequence data: Which species are the closest living relatives of modern humans? Did the infamous Florida Dentist infect his patients with HIV? Mapping evolutionary transitions Geographic origins Plus countless others…..

Which species are the closest living relatives of modern humans? Mitochondrial DNA and most nuclear DNA-encoded genes, The pre-molecular view MYA Chimpanzees Orangutans Humans Bonobos Gorillas Humans Bonobos GorillasOrangutans Chimpanzees MYA

Phylogenetic Analysis of HIV Virus Lafayette, Louisiana, 1994 – A woman claimed her dentist injected her with HIV+ blood Records show the dentist had drawn blood from an HIV+ patient that day But how to prove the blood from that HIV+ patient ended up in the woman?

HIV Transmission HIV has a high mutation rate, which can be used to trace paths of transmission Two people who got the virus from two different people will have very different HIV sequences Three different tree reconstruction methods (including parsimony) were used to track changes in two genes in HIV (gp120 and RT)

HIV Transmission Took multiple samples from the patient, the woman, and controls (non-related HIV+ people) In every reconstruction, the woman’s sequences were found to be evolved from the patient’s sequences, indicating a close relationship between the two Nesting of the victim’s sequences within the patient sequence indicated the direction of transmission was from dentist to victim This was the first time phylogenetic analysis was used in a court case as evidence (Metzker, et. al., 2002)

Did the Florida Dentist infect his patients with HIV? DENTIST Patient D Patient F Patient C Patient A Patient G Patient B Patient E Patient A Local control 2 Local control 3 Local control 9 Local control 35 Local control 3 Yes: The HIV sequences from these patients fall within the clade of HIV sequences found in the dentist. No From Ou et al. (1992) and Page & Holmes (1998) Phylogenetic tree of HIV sequences from the DENTIST, his Patients, & Local HIV-infected People:

How Many Times Evolution Invented Wings? Whiting, et. al. (2003) looked at winged and wingless stick insects

Reinventing Wings Previous studies had shown winged  wingless transitions Wingless  winged transition much more complicated (need to develop many new biochemical pathways) Used multiple tree reconstruction techniques, all of which required re- evolution of wings

Most Parsimonious Evolutionary Tree of Winged and Wingless Insects The evolutionary tree is based on both DNA sequences and presence/absence of wings Most parsimonious reconstruction gave a wingless ancestor

Blood squirting? No Yes Mapping evolutionary transitions Some horned lizards squirt blood from their eyes when attacked by canids How many times has blood-squirting evolved? Testing evolutionary hypotheses

Blood squirting? No Yes Mapping evolutionary transitions Some horned lizards squirt blood from their eyes when attacked by canids How many times has blood-squirting evolved? This phylogeny suggests a single evolutioary gain and a single loss of blood squirting Testing evolutionary hypotheses

Matsuoka et al. (2002) A B Testing evolutionary hypotheses Geographic origins Where did domestic corn (Zea mays maize) originate? Populations from Highland Mexico are at the base of each maize clade

There are three possible unrooted trees for four taxa (A, B, C, D) AC B D Tree 1 AB C D Tree 2 AB D C Tree 3 Which one is correct?

Trees can be unrooted or rooted These trees show five different evolutionary relationships among the taxa!

x = C A B D AD B E C A D B E C F (2N - 3)!! = # unrooted trees for N taxa Each unrooted tree theoretically can be rooted anywhere along any of its branches

How to root? Using “outgroups” - the outgroup should be a taxon known to be less closely related to the rest of the taxa (ingroups) - it should ideally be as closely related as possible to the rest of the taxa while still satisfying the above condition

Types of phylogenetic analysis methods Phenetic: trees are constructed based on observed characteristics, not on evolutionary history Cladistic: trees are constructed rely on assumptions about ancestral relationships as well as on current data; Distance methods Parsimony and Maximum Likelihood methods

Types of data used in phylogenetic inference: Character-based methods: Use the aligned characters, such as DNA or protein sequences, directly during tree inference. Taxa Characters Species AATGGCTATTCTTATAGTACG Species BATCGCTAGTCTTATATTACA Species CTTCACTAGACCTGTGGTCCA Species DTTGACCAGACCTGTGGTCCG Species ETTGACCAGTTCTCTAGTTCG Distance-based methods: Transform the sequence data into pairwise distances, and use the matrix during tree building. A B C D E Species A Species B Species C Species D Species E Example 1: Uncorrected “p” distance (=observed percent sequence difference) Example 2: Kimura 2-parameter distance (estimate of the true number of substitutions between taxa)

Types of computational methods: Clustering algorithms: Use pairwise distances. Are purely algorithmic methods. Optimality approaches: Use either character or distance data. - minimum branch lengths, - fewest number of events, - highest likelihood

Molecular phylogenetic tree building methods: COMPUTATIONAL METHOD Clustering algorithmOptimality criterion DATA TYPE Characters Distances PARSIMONY MAXIMUM LIKELIHOOD UPGMA NEIGHBOR-JOINING MINIMUM EVOLUTION LEAST SQUARES

Tree-Building Methods Distance NJ Character Maximum Parsimony

Character Methods Maximum Parsimony minimal changes to produce data

Parsimony methods: Optimality criterion: The ‘most-parsimonious’ tree is the one that requires the fewest number of evolutionary events (e.g., nucleotide substitutions, amino acid replacements) to explain the sequences.

Parsimony methods Parsimony methods are based on the idea that the most probable evolutionary pathway is the one that requires the smallest number of changes from some ancestral state For sequences, this implies treating each position separately and finding the minimal number of substitutions at each position

Example of parsimonious tree building Tree on left requires only one change, tree on left requires two: left tree is most parsimonious

Parsimony methods assign a cost to each tree available to the dataset, then screen trees available to the dataset and select the most parsimonious Screening all the trees available to even a smallish dataset would take too much time; branch and bound method builds trees with increasing numbers of leaves but abandons the topology whenever the current tree has a bigger cost than any complete tree

1. Extract Outgroup Species A Species B Species C Molecular characters 2. Sequence AAGCTTCATAGGAGCAACCATTCTAATAATAAGCCTCATAAAGCC AAGCTTCACCGGCGCAGTTATCCTCATAATATGCCTCATAATGCC GTGCTTCACCGACGCAGTTGTCCTCATAATGTGCCTCACTATGCC GTGCTTCACCGACGCAGTTGCCCTCATGATGAGCCTCACTATGCA 3. Align

AAGCTTCATA GAGCTTCACA GTGCTTCACG Outgroup Species A Species B Species C Molecular characters Out A B C Invariable sites These are not useful phylogenetic characters Out A B C

AAGCTTCATA GAGCTTCACA GTGCTTCACG Outgroup Species A Species B Species C Molecular characters Out A B C AGAG TCTC Any mutations at this time would affect A, B and C because they have not yet diverged Synapomorphies supporting A+B+C Out A B C

AAGCTTCATA GAGCTTCACA GTGCTTCACG GTGCCTCACG Outgroup Species A Species B Species C Molecular characters Out A B C AGAG TCTC Any mutations at this time would affect A and B Synapomorphies supporting A+B+C ATAT AGAG Synapomorphies supporting B+C Out A B C

AAGCTTCATA GAGCTTCACA GTGCTTCACG GTGCCTCACG Outgroup Species A Species B Species C Molecular characters Out A B C AGAG TCTC Synapomorphies supporting A+B+C ATAT AGAG Synapomorphies supporting B+C Out A B C Apomorphy for C Any mutations at this time would only affect CTCTC

Algorithms used for tree searching I. Exhaustive search: all possibilities → best tree → requires lots of time and computer resources II. Branch and Bound: a tree is built according to the model given → the tree is compared to the next tree while its constructed → if the first tree is better the second tree is abandoned → third tree… → best possible tree III. Heuristic Search: only the most likely options → saves time and resources, does not always result in the best tree

Exhaustive Search If 11 or fewer OTUs can do an exhaustive search - this guarantees the shortest tree(s) will be found (an exact solution) - every possible tree for n taxa examined - slowest and most rigorous method - provides a frequency histogram of tree scores

Exhaustive Search

Tree searching If OTUs can do a branch and bound search - this also guarantees the shortest tree(s) will be found but not all trees are examined (also an exact solution) - families of trees that cannot lead to shorter trees are discarded and not examined - saves time - faster than exhaustive search - no histogram of tree scores

Tree searching For more than 25 OTUs (most datasets) must use other methods, heuristic searching – approximate methods - do not guarantee the shortest tree will be found

Heuristic Tree searching Stepwise addition - builds starting tree (PAUP options) Asis - the order in the data matrix (poor start unless you’ve sorted the OTUs in some phylogenetic) Closest -starts with shortest 3-taxon tree adds taxa in order that produces the least increase in tree length (greedy heuristic, like NJ) - will produce a ‘good’ starting tree but produces same starting tree each time it is used (unless there are ‘ties’ which are randomly broken)

Heuristic Tree Searching Simple - the first taxon in the matrix is a taken as a reference - taxa are added to it in the order of their decreasing similarity to the reference Random - taxa are added in a random sequence, typically one would perform many replicates each starting with random addition of taxa - most rigorous

Branch Swapping PAUP allows 3 different types of branch swapping listed in order of increasing rigor: - Nearest neighbor interchange (NNI) - Subtree Pruning and Regrafting (SPR) - Tree Bisection-Reconnection (TBR)

Branch Swapping Tree Bisection- Reconnection (TBR) Most thorough branch swapping procedure Tree is broken at internal branch & all possible reconnections are made between 2 subtrees

Statistical Methods to Evaluate Trees 1.Bootstrapping Bootstrapping is a statistical technique that can use random re ‐ sampling of data to determine sampling error for tree topologies Agreement among the resulting trees is summarized with a majority ‐ rule consensus tree n number of trees are built (n=100/1000/5000) → How many times a certain branch is reproduced Each branch of the tree is labelled with the % of bootstrap trees where it occurred. 80% is good, less than 50% is bad

Bootstrapping Constructs a new multiple alignment at random from the real alignment, with the same size. Note that the same column can be sampled more than once, and consequently some columns are not sampled.

Statistical Methods to Evaluate Trees 2.Consensus Trees If you get multiple trees, look for regions that are similar. Those are the regions that you can be more confident are correct.

In-class exercise I Use data set and program, choose maximum parsimony. Use heuristic for the tree building method. Inspect your tree.

Distance Methods Measure distance (dissimilarity) Methods UPGMA (Unweighted pair group method with Arithmetic Mean) NJ (Neighbor joining) FM (Fitch-Margoliash) ME (Minimal Evolution)

NJ example: Step 1 Alignment -> distance ABCDEFG A- B63- C9479- D E F G Example: observed percent sequence difference Distance: Distance matrix:

Step 2: distance -> clade ABCDEFG A- B63- C9479- D E F G

ABCEFDG A- B63- C9479- E F DG Step 3: merge D and G

ABCEFDG A- B63- C9479- E F DG Step 4

AFBCEDG AF- B61- C9279- E DG Step 5

AFBCEDG AF- B61- C9279- E DG Step 6

AFBECDG AF- BE63- C9271- DG Step 7

AFBECDG AF- BE63- C9271- DG Step 8

Step 9 AFBECDG AF- BE63- CDG10288-

Step 10 AFBECDG AF- BE63- CDG AF

Root AFBECDG AFBE- CDG94- AF NJ: distance -> phylogeny AF

In-class exercise II Use same data set and program as in exercise I, but choose distance. Use NJ for the tree building method. Inspect your tree. Compare it to the parsimony generated tree.