TREES. ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp Trees.

Slides:



Advertisements
Similar presentations
Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Motivation “Nothing in biology makes sense except in the light of evolution” Christian Theodosius Dobzhansky.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
Phylogenetic trees as a visualization tools for evolutionary classification.
1. 2 Rooting the tree and giving length to branches.
HIV/AIDS as a Microcosm for the Study of Evolution.
Introduction to Bioinformatics Tutorial 4 Multiple Alignment and Phylogeny.
TREES. Trees HumanChimpGorilla = ChimpGorillaHuman ChimpHumanGorilla = HumanGorilla = Chimp HumanChimpGorilla ≠ ChimpHuman ≠ GorillaChimp.
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
The Tree of Life From Ernst Haeckel, 1891.
Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Phylogenetic reconstruction
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Introduction to Bioinformatics Molecular Phylogeny Lesson 5.
Phylogenetic trees. ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp Trees.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
Building Phylogenies Parsimony 2.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic trees Sushmita Roy BMI/CS 576
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Molecular phylogenetics
The evolution of HIV Why is HIV fatal?. Lethal strains are favored, due to “Short sighted” evolution within hosts Transmission rate advantages.
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
A few words on HIV The virus = HIV The disease = AIDS (Aquired Immunodeficiency Syndrome) First recognized clinically in 1981 By 1992, it had become.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
 Read Chapter 4.  All living organisms are related to each other having descended from common ancestors.  Understanding the evolutionary relationships.
Introduction to Phylogenetic Trees
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogenetic Trees - Parsimony Tutorial #13
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.
Systematics and Phylogenetics Ch. 23.1, 23.2, 23.4, 23.5, and 23.7.
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: (1) Phylogeny inference or “tree building”
Evolutionary genomics can now be applied beyond ‘model’ organisms
Phylogenetic basis of systematics
Phylogenetic Inference
Inferring phylogenetic trees: Distance and maximum likelihood methods
BNFO 602 Phylogenetics Usman Roshan.
CS 581 Tandy Warnow.
Phylogeny.
Presentation transcript:

TREES

ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp Trees

Same thing… s4s5 s1 s3 s2 s4s5 s1 s3 s2 =

Terminology A branch = An edge External node - leaf HumanChimp Chicken Gorilla The root Internal nodes

אלו מהמשפטים הבאים נכון, בהתייחס לעץ הנ"ל? א. האדם והגורילה יותר קרובים זה לזה מהשימפנזה והגורילה. ב. האדם קרוב לתרנגולת ולברווז באותה מידה. ג. התרנגולת יותר קרובה לגורילה מהאדם. ד. א'+ב'. ה. א'+ג'. ו. ב'+ג'. ז. א'+ב'+ג'. ח. אף תשובה אינה נכונה. תרגיל

The maximum parsimony principle. Tree building

Genes: 0 = absence, 1 = presence speciesg1g2g3g4g5g6 s s s s s Tree building

s1s4s3 s2 s5 Evaluate this tree… Tree building

s1s4s3s2s5 Gene number Tree building

s1s4s3s2s5 Gene number 1, Option number Tree building

s1s4s3s2s5 Gene number 1, Option number 2. Number of changes for gene 1 (character 1) = Tree building

s1s4s3 s2 s5 Gene number 2, Option number Tree building

s1s4s3 s2 s5 Gene number 2, Option number Tree building

s1s4s3 s2 s5 Gene number 2, Option number Number of changes for gene 2 (character 2) = 2 Tree building

s1s4s3 s2 s5 Gene number 3, Option number Tree building

s1s4s3 s2 s5 Gene number 3, Option number Number of changes for gene 3 (character 3) = 1 Tree building

s1s4s3 s2 s5 Gene number 4, Option number Tree building

s1s4s3 s2 s5 Gene number 4, Option number Number of changes for gene 4 (character 4) = 2 Tree building

Gene number 5 is the same as Gene number 4 Number of changes for gene 5 (character 5) = 2 Tree building

s1s4s3 s2 s5 Gene number 6, 1 option only: Number of changes for gene 6 (character 6) = 1 Tree building

Sum of changes Number of changes for gene 6 (character 6) = 1 Number of changes for gene 5 (character 5) = 2 Number of changes for gene 4 (character 4) = 2 Number of changes for gene 3 (character 3) = 1 Number of changes for gene 2 (character 2) = 2 Sum of changes for this tree topology = 9 Can we do better ??? Number of changes for gene 1 (character 1) = 1 Tree building

s1s4s3 s2 s5 The MP (most parsimonious) tree: Sum of changes for this tree topology = 8 Tree building

How to efficiently compute the MP score of a tree

The Fitch algorithm (1971): AG C C A HumanChimp Chicken Gorilla Duck {A,G} {A,C,G} {A,C} Postorder tree scan. In each node, if the intersection between the leaves is empty: we apply a union operator. Otherwise, an intersection. U U U U

Number of changes CA Total number of changes = number of union operators => 3 in this case. HumanChimp Chicken Gorilla Duck AGC {A,G} {A,C,G} {A,C} U U U U

GACAGGGA CAAG GCGA GAAA HumanChimp Chicken Gorilla Duck Find minimum number of changes. תרגיל

Chimpanzee Human Gorilla

Chimp Gorilla Position 3 A A T ChimpHumanGorilla AAAAT ACTAG ACAAC Human Position 1 A A A Position 4 A A A Position 5 T C G Position 2 A C C U

Chimp Gorilla Position 3 A A T ChimpHumanGorilla AAAAT ACTAG ACAAC Human Position 1 A A A Position 4 A A A Position 5 T C G Position 2 A C C U

Chimp Gorilla Position 3 A A T GorillaHumanChimp AAAAT ACTAG ACAAC Human Position 1 A A A Position 4 A A A Position 5 C T G Position 2 C A C U

Chimp Gorilla HumanChimp AAAAT ACTAG ACAAC Human ChimpHumanGorilla ChimpHumanGorilla These 3 trees will ALWAYS get the same score

The unrooted tree represents a set of rooted trees

A general observation: the position of the root does not affect the MP score. E D E C A B B C D A A B C E D

s1s4s3s2s Intuition as to why rooting does not change the score. The change will always be on the same branch, no matter where the root is positioned… 1

Which is not a rooted version of this tree? E C E D A B B C D A A B D E CA B C D E תרגיל T3 T1 T2

Gorilla gorilla (Gorilla) Homo sapiens (human) Pan troglodytes (Chimpanzee) Gallus gallus (chicken)

Evaluate all 3 possible UNROOTED trees: Human Chimp Chicken Gorilla Human Gorilla Chimp Chicken Human Chicken Chimp Gorilla MP tree

Rooting based on a priori knowledge: Human Chimp Chicken Gorilla HumanChimpChickenGorilla

Ingroup / Outgroup: HumanChimp Chicken Gorilla INGROUP OUTGROUP

Subtrees HumanChimp Chicken Gorilla Duck A subtree

Monophyletic groups HumanChimp Chicken Gorilla The Gorilla+Human+Chimp are monophyletic. A clade is a monophyletic group.

Paraphyletic = Non- monophyletic groups WhaleChimp Drosophila Zebrafish The Zebrafish+Whale are paraphyletic

Human Chimp Chicken Gorilla Chicken + Rat seems to be monophyletic but they are not, since the root of the tree is between Chicken and the rest. Human and Gorilla are not monophyletic no matter where the root is… Rat When an unrooted tree is given, you cannot know which groups are monophyletic. You can only say which are not.

HOW MANY TREES

How many rooted trees ab abcbaccab N=3, TR(3) = 3 bcd a cbd a dbc a acd b cad b TR = “TREE ROOTED” N=2, TR(2) = 1 dac b abd c bad c dab c abc dbac d cab d bcd a cbd a dbc a N=4, TR(4) = 15

How many rooted trees ab cab TR = “TREE ROOTED” 2 branches. 3 possible places to add “c” bac d dbc a c c c 4 branches. 5 possible places to add “d” 6 branches. 7 possible places to add “e” The number of branches is increased by 2 each time. The number of branches is an arithmetic series. 0,2,4,6,8,…. A(n) = A(1)+(n-1)d. A(1) = 0; d=2. => A(n) = (n-1)*2 = 2n-2

How many rooted trees TR = “TREE ROOTED” The number of branches is increased by 2 each time. The number of branches is an arithmetic series. 0,2,4,6,8,…. A(n) = A(1)+(n-1)d. A(1) = 0; d=2. => A(n) = (n-1)*2 = 2n-2 ab 2 branches. 3 possible places to add “c” c c c Each time we can add a new branch in Br(n)+1 places. [Br(n)=number of branches] TR(n+1) = TR(n)*(BR(n)+1)=TR(n)*(2n-1) TR(5) = TR(4)*7=TR(3)*5*7=TR(2)*3*5*7=1*3*5*7 … TR(n) = 1*3*5*7*…..*(2n-3) [Tr(n)=number of trees with n sequences]

How many rooted trees TR = “TREE ROOTED” n!=1*2*3*4*5*6…..*n = n factorial. TR(n) = 1*3*5*7*…..*(2n-3) = 2*4*6*8*….*(2n-4) = 1*2*3*4*5*6*7*…*(2n-3) (2*1)*(2*2)*(2*3)*(2*4)*….*(2*(n-2)) = 1*2*3*4*5*6*7*…*(2n-3) (2 (n-2) )*(1*2*3*4*….(n-2)) = (2n-3)! (2 (n-2) )*(n-2)! (2n-3)! =

How many rooted trees TR = “TREE ROOTED” TR(n) = 1*3*5*7*…..*(2n-3) = (2 (N-2) )*(n-2)! (2n-3)! = =(2n-3)!!

HEURISTIC SEARCH

There are many trees.., We cannot go over all the trees. We will try to find a way to find the best tree. These are approximate solutions…

Finding the maximum is the same thing as finding the minimum Say we have a computer procedure that given a function, it finds its minimum, and we want to find the maximum of a function f(x). We can just find the minimum of -f(x) and this is minus the maximum of f(x). Example. f(0) = 3; f(1) = 7; f(2) = -5; f(3) = 0; max f(x) = 7. argmax f(x) = 1; -f(0)=-3; -f(1) = -7; -f(2) = 5; -f(3) =0; min(-f(x)) = -7. argmax –(f(x) = 1;

Score = 1700

Score = 1825 Score = 1710 Score = 1410 Score = 1695

Score = 1825 Score = 1828 Score = 1910 Score = 1800

Max score = 2900

Score = 2100 Problem number 1: local maximum Score = 3100 Score = 2900 Local max Global max

This algorithm is “greedy” – it seizes the first improvement encountered. One way to avoid local maxima is to start from many random starting points

Several options to define a neighbor. Option 1Option 2

Nearest-neighbor interchange A BC DA DC BD BC A Each internal branch defines two neighbors

How many neighbors do we check each time? For unrooted trees of n taxa, we have 2n-3 branches. However, only internal branches are interesting, thus we have n-3. Each defines two neighbors, thus the total number of neighbors in each NNI cycle is 2n-6. A B C D E Internal branches External branches NNI is possible only in internal branches

I am greedy

(1)Most greedy: Start searching your neighbors. If you find something better – move there, and start the search again. (2)Just greedy: Check ALL your neighbors. Move to the one that is the highest. (3)Smart greedy: Try all NNI of trees that are tied for the best score. Greedy variants There are many other variants of the greedy search that would not be discussed in this course.

Parsimony has many shortcomings. To name a few: (1) All changes are counted the same, which is not true for biological systems (Leu->Ile is much more likely than Leu->His). (2) Cannot take biological context into account (secondary structures, dependencies among sites, evolutionary distances between the analyzed organisms, etc). (3) Statistical basis questionable.

Alternative: MAXIMUM-LIKELIHOOD METHOD.

Maximum likelihood uses a probabilistic model of evolution Each amino acid has a certain probability to change and this probability depends on the evolutionary distances. Evolutionary distances are inferred from the entire set of sequences.

Evolutionary distances Positions can be conserved because of two reasons. Either because of functional constraints, or because of short evolutionary time. 5 replacements in 10 positions between 2 chimps, is considered very variable. 5 replacements between human, and cucumber, is not considered that variable… Maximum likelihood takes this information into account.

Maximum ParsimonyMaximum Likelihood All changes counted the same Different probabilities to the different types of substitutions Statistically questionable Statistically robust Ignores biological context Accounts for biological context

The likelihood computations t1t1 t5t5 t3t3 X C K t2t2 Z Y MA t6t6 t4t4 We can infer the phylogenetic tree using maximum likelihood. This is more accurate than maximum parsimony.

Maximum likelihood tree reconstruction This is incredibly difficult (and challenging) from the computational point of view, but efficient algorithms to find approximate solutions were developed.

HIV evolution – an example of using phylogeny tools

The virus = HIV The disease = AIDS (Aquired Immunodeficiency Syndrome) First recognized clinically in 1981 By 1992, it had become the major cause of death in individuals years of age in the States. Human Immunodeficiency Virus (HIV)

Till Dec 2007: 25 million people died of AIDS (20 million in 2002) People living with HIV/AIDS in million Africa has 12 million AIDS orphans (2007). 1 out of 3 children in some areas lost at least one of his/her parents HIV Statistics

HIV is a lentivirus Species = HIV Genus = Lentiviruses Family = Retroviridae Lentiviruses have long incubation time, and are thus called “slow viruses”.

In 1986, a distinct type of HIV prevalent in certain regions of West Africa was discovered and was termed HIV type 2. Individuals infected with type 2 also had AIDS, but had longer incubation time and lower morbidity (# of cases/population size). HIV-1 and HIV-2

HIV subtypes

published by the International AIDS Vaccine Initiative

Five lines of evidence have been used to substantiate zoonotic transmission of primate lentivirus: 1. Similarities in viral genome organization; 2. Phylogenetic relatedness; 3. Prevalence in the natural host; 4. Geographic coincidence; 5. Plausible routes of transmission.

For HIV-2, a virus (SIVsm) that is genomically indistinguishable and closely related phylogenetically was found in substantial numbers of wild-living sooty mangabeys whose natural habitat coincides with the epicenter of the HIV-2 epidemic

מנגבי, קוף ארוך זנב מסוג סרקוסבוס מצוי באזורי היערות של אפריקה

Close contact between sooty mangabeys and humans is common because these monkey are hunted for food and kept as pets. No fewer than six independent transmissions of SIVsm to humans have been proposed. The origin of HIV-1 is much less certain.

HIV and SIV tree based on maximum parsimony 1990

This tree can be explained by co-evolution of virus and host. Virus A Primate B Primate C Primate A Virus C Virus B Host-pathogen co-evolution in other SIV

1999 There are at least two different HIV-1 clades, and two different SIVcpz clades Phylogenetic tree

2006. Nature

“We tested 378 chimpanzees and 213 gorilla fecal samples from remote forest regions in Cameroon for HIV-1 cross-reactive antibodies” “Surprisingly, 6 of 213 fecal samples from wild-living gorillas also gave a positive HIV-1 signal” The origin of HIV-O

Bayesian analysis HIV-1 O is a sister clade of SIV from Gorilla!

It seems that chimpanzee transmitted SIV to gorilla and gorilla to human type O, or Chimpanzee transmitted to both gorilla and to human type O Note: gorilla and chimps rarely interact + gorilla are herbivores The origin of HIV-O

Thanks תודה Thank You…