. Phylogenetic Trees Lecture 11 Sections 6.1, 6.2, in Setubal et. al., 7.1, 7.1 Durbin et. al. © Shlomo Moran, based on Nir Friedman. Danny Geiger, Ilan.

Slides:



Advertisements
Similar presentations
Maximum Parsimony Probabilistic Models of Evolutions Distance Based Methods Lecture 12 © Shlomo Moran, Ilan Gronau.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Phylogenetic Trees Lecture 12
. Intro to Phylogenetic Trees Lecture 5 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran. Slight modifications by Benny.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Phylogenetic Trees Lecture 4
פרויקט בתכנות מחקר השוואתי בשחזור עצי אבולוציה: אלגוריתמים קיימים מול תכנות בשלמים אביב 2013 מרצה: שלמה מורן מנחה חיצוני: יוסי שילוח Website:
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:
Some basics: Homology = refers to a structure, behavior, or other character of two taxa that is derived from the same or equivalent feature of a common.
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
The Tree of Life From Ernst Haeckel, 1891.
. Phylogenetic Trees Lecture 1 Credits: N. Friedman, D. Geiger, S. Moran,
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Perfect Phylogeny MLE for Phylogeny Lecture 14
Estimating Evolutionary Distances from DNA Sequences Lecture 14 ©Shlomo Moran, parts based on Ilan Gronau.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Phylogenetic trees Sushmita Roy BMI/CS 576
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Molecular phylogenetics
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetics II.
 Read Chapter 4.  All living organisms are related to each other having descended from common ancestors.  Understanding the evolutionary relationships.
Phylogenetic Tree Reconstruction
Introduction to Phylogenetic Trees
Introduction to Phylogenetics
CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Phylogeny Ch. 7 & 8.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Subtree Prune Regraft & Horizontal Gene Transfer or Recombination.
Phylogenetic Trees - Parsimony Tutorial #13
. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Maximum Parsimony Phenetic (distance based) methods are fast and often accurate but discard data and are not based on explicit character states at each.
Phylogenetic Trees - Parsimony Tutorial #12
Evolutionary genomics can now be applied beyond ‘model’ organisms
Phylogenetic basis of systematics
Character-Based Phylogeny Reconstruction
Multiple Alignment and Phylogenetic Trees
The Tree of Life From Ernst Haeckel, 1891.
CS 581 Tandy Warnow.
Reading Phylogenetic Trees
Phylogeny.
Perfect Phylogeny Tutorial #10
Presentation transcript:

. Phylogenetic Trees Lecture 11 Sections 6.1, 6.2, in Setubal et. al., 7.1, 7.1 Durbin et. al. © Shlomo Moran, based on Nir Friedman. Danny Geiger, Ilan Gronau

2 Evolution Evolution of new organisms is driven by u Diversity l Different individuals carry different variants of the same basic blue print u Mutations l The DNA sequence can be changed due to single base changes, deletion/insertion of DNA segments, etc. u Selection bias

3 Theory of Evolution u Basic idea l speciation events lead to creation of different species (speciation: physical separation into groups where different genetic variants become dominant) u Any two species share a (possibly distant) common ancestor u This is described by a rooted tree – the tree of life.

4 u Any two species share a (possibly distant) common ancestor u The process of evolution consists of: l speciation events. l mutations along evolutionary branches. Tree of Life Source: Alberts et al

5 Often only a subtree is studied Definition: A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species; also called a phylogenetic tree.

6 Components of Phylogenenetic Trees u Leaves - current day species (or taxa – plural of taxon) u Internal vertices - hypothetical common ancestors u Edges length - “time” from one speciation to the next u The Tree Topology – the tree structure, ignoring edge lengths. Usually the goal is to find the topolgy. AardvarkBisonChimpDogElephant

7 Historical Note u Until mid 1950’s phylogenies were constructed by experts based on their opinion (subjective criteria) u Since then, focus on objective criteria for constructing phylogenetic trees l Thousands of articles in the last decades u Important for many aspects of biology l Classification l Understanding biological mechanisms

. A. Introduction (this lecture) 1. The phylogenetic Reconstruction Problem: from sequences to trees 2.Morphological vs. molecular sequences 3. Possible pitfalls 4. Directed and undirected trees 5. The “big” problem, the “small” problem. Outline

. B. Character based methods (this + next lectures) 1. Perfect Phylogeny 2. Maximum Parsimony 3. Maximum Likelihood (not studied in this course) These methods consider the evolution of each character separately. Try to find the tree which gives the “best” evolutionary explanation: - least number of observed mutations (1&2), or most probable tree (3). These optimization problems are typically NP-hard. We’ll discuss ways for solving simplified versions of the problems. Outline (cont)

C. Distance based methods (last 1-2 lectures) - Run in polynomial time - Compute distances between all taxon-pairs - Find a tree (edge-weighted) best-describing the distances Outline (cont)

. Distance Methods (cont.) 1.Efficient reconstruction ( O(n 2 ) time ) from accurate distances 2. Reconstruction from noisy distances: Can we reconstruct accurate trees from approximate distances? Worst-case noise model More realistic noise models: inter-species distances derived from probabilistic models of mutations. Outline (end)

12 AATCCTG ATAGCTG AATGGGC GAACGTA AAACCGA ACGGTCA ACGGATA ACGGGTA ACCCGTG ACCGTTG TCTGGTA TCTGGGA TCCGGAAAGCCGTG GGGGATT AAAGTCA AAAGGCG AAACACA AAAGCTG Evolution as a Tree

13 AATCCTG ATAGCTG AATGGGC GAACGTA AAACCGA ACCGTTG TCTGGGA TCCGGAAAGCCGTG GGGGATT Phylogenetic Reconstruction

14 B : AATCCTG C : ATAGCTG A : AATGGGC D : GAACGTA E : AAACCGA J : ACCGTTG G : TCTGGGA H : TCCGGAA I : AGCCGTG F : GGGGATT Goal: reconstruct the ‘true’ tree as accurately as possible reconstruct A B C F G IHJ D E Phylogenetic Reconstruction

15 What are the sequences? l “Significant” (eg morphological) characters, which distinguish between species l Molecular characters: DNA (4 letters) Proteins (20 letters)  Construct the tree by comparing “homologous” sequences.

16 What are the sequences? Morphological vs. Molecular u Classical methods. morphological features: l number of legs, lengths of legs, etc. u Modern methods. molecular features: l Gene (DNA) sequences l Protein sequences u Analysis based on homologous sequences (e.g., globins) in different species

17 Possible pitfall in reconstruction: Misleading selection of sequences u Gene/protein sequences can be homologous for several different reasons: u Orthologs -- sequences diverged after a speciation event u  Paralogs -- sequences diverged after a duplication event (next slides) u  Xenologs -- sequences diverged after a horizontal transfer (e.g., by virus)

18 Misleading selection of sequences: Using paralogs instead of orthologs Consider evolutionary tree of three taxa: …and assume that at some point in the past a gene duplication event occurred. Gene Duplication

19 Paralogs instead of Orthologs Speciation events Gene Duplication 1A 2A 3A3B 2B1B The gene evolution is described by this tree (1,2,3 are species; A, B are the copies of the same gene). Copy B Copy A

20 Speciation events Gene Duplication 1A 2A 3A3B 2B1B If we happen to consider genes 1A, 2B, and 3A of species 1,2,3, we get a wrong tree. In the sequel we assume all given sequences are orthologs – created from a common ancestor by specification events. S S S Paralogs instead of Orthologs

21 Rooted vs. Undirected Trees A natural representation of phylogeny is rooted trees Common Ancestor

22 Types of trees Unrooted tree represents the same phylogeny without the root node Most known tree-reconstruction techniques do not distinguish between different placements of the root.

23 Rooted versus unrooted trees Tree a a b Tree b c Tree c Represents the three rooted trees

24 Positioning Roots in Unrooted Trees u We can estimate the position of the root by introducing an outgroup: l a set of species that are definitely distant from all the species of interest AardvarkBisonChimpDogElephant Falcon Proposed root

25 Two phylogenenetic trees of the same species: Do these trees represent the same evolutionary history? AardvarkBisonChimpDogElephant Aardvark Bison Chimp Dog Elephant

26 When two unrooted phylogenetic trees are considered different? Trees T 1 and T 2 on the same set of species are considered identical if they represent the same evolutionary history, i.e.: they have the same topology. Formally, this is equivalent to: There is a tree isomorphism h: T 1  T 2 s.t: For each species x, h(x)=x.

27 The two trees represent the same evolution AardvarkBisonChimpDogElephant Aardvark Bison Chimp Dog Elephant w v h(u)h(u) u h(w)h(w) h(v)h(v)

28 The “Big” reconstruction problem, the “Small” problem The “big” problem: compute the whole phylogenetic tree from the n input sequences. The “small” problem: Assume the tree topology and the identities of the leaf-species are known. Reconstruct the sequences at the internal vertices, and give a score to the resulted phylogeny. Connection between the problems: In order to solve the big problem, solve the small problem on all possible trees with n leaves, and output the tree(s) with the highest “score”. This is impossible in practice for more than few taxa.

29 Input for the “big” problem A : CAGGTA B : CAGACA C : CGGGTA D : TGCACT E : TGCGTA Our task: Find evolutionary tree with leafs corresponding to the 5 sequences, which best explains the evolution of the strings.

30 Input for the “small” problem AardvarkBisonChimpDogElephant A : CAGGTA B : CAGACA C : CGGGTA D : TGCACT E : TGCGTA The tree and assignments of strings to the leaves is given, and we need only to assign strings to internal vertices.

31 Character-based methods for constructing phylogenies In this approach, trees are constructed by comparing the characters of the corresponding sequences. Characters may be morphological (teeth structures) or molecular (nucleotides in homologous DNA sequences). We will present two methods: “Perfect Phylogeny” and “Maximum Parsimony” Basic Assumption in these methods: Best tree is one with minimal number of observed mutations (character changes along the edges, aka substitutions).

32 Character based methods: Input data species C1C1 C2C2 C3C3 C4C4 …CmCm dog AACAGGTCTTCGAGGCCC horse AACAGGCCTATGAGACCC frog AACAGGTCTTTGAGTCCC human AACAGGTCTTTGATGACC pig AACAGTTCTTCGATGGCC *********** Each character (column) is processed independently. The green character will separate the human and pig from frog, horse and dog. The red character will separate the dog and pig from frog, horse and human.

33 The perfect phylogeny problem u A character is assumed to be a significant property, which distinguishes between species (e.g. dental structure, number of legs/limbs). u A characters state is a value of the character (eg: human dental structure). u Assumption: It is unlikely that a given state will be created twice in the evolution tree. Such characters are called “Homoplasy free”, and are detailed next.

34 Homoplasy-free characters 1 Homoplasy free characters should avoid: reversal transitions u A species regains a state it’s direct ancestor has lost. u Famous known exceptions: l Teeth in birds. l Legs in snakes.

35 Homoplasy-free characters 2 …and also avoid convergence transitions u Two species possess the same state while their least common ancestor possesses a different state. u Famous known exceptions: The marsupials.

36 Input: 1.A set of species 2.A set of characters 3.For each character, assignment of states to the species Problem: Is there a phylogenetic tree T=(V,E), s.t. the evolution of all characters is “homoplasy free” (no reversal, no convergence) The Perfect Phylogeny Problem First, we define the problem using graph- theoretic terms.

37 Characters = Colorings A coloring of a tree T=(V,E) is a mapping C:V  [set of colors] A partial coloring of T is a coloring of a subset of the vertices U  V: C:U  [set of colors] U=

38 Each character defines a (partial) coloring of the corresponding phylogenetic tree: Characters as Colorings Species ≡ Vertices States ≡ Colors

39 Convex Colorings (and Characters) Definition: A (partial/total) coloring of a tree is convex iff all d-carriers are disjoint Let T=(V,E) be a partially colored tree, and d be a color. The d-carrier is the minimal subtree of T containing all vertices colored d

40 A character is Homoplasy free (avoids reversal and convergence transitions) ↕ The corresponding (partial) coloring is convex Convexity  Homoplasy Freedom

41 Input: Partial colorings (C 1,…,C k ) of a set of vertices U (in the example: 3 total colorings: left, center, right, each by two colors). Problem: Is there a tree T=(V,E), s.t. U  V and for i=1,…,k,, C i is a convex (partial) coloring of T? R B PR G P B B PR G A The Perfect Phylogeny Problem (pure graph theoretic setting) PP is NP-Hard In general In the tutorial you will see a special case solvable in p-time.

42 Maximum Parsimony Perfect Phylogeny is not only hard to compute, but in many cases it doesn’t exist. Next we discuss a more common approach, called “Maximum Parsimony”, which looks for a tree which minimizes the number of mutations.

43 Maximum Parsimony A Character-based method Input: u h sequences (one per species), all of length k. Goal: u Find a tree whose leaves are labeled by the input sequences, and an assignment of sequences to internal nodes, such that the total number of substitutions is minimized.

44 Example Input: four nucleotide sequences: AAG, AAA, GGA, AGA taken from four species. AGA AAA GGA AAG AAA Total #substitutions = 4 By the parsimony principle, we seek a tree whose leaves are labeled by the input sequences, and assignment of sequences to internal vertices, with minimum total number of mutations (ie, letter changes) along the tree edges. Here is one possible tree + sequences assignment.

45 Example Continued Here are two other trees+ sequence assignments: AGA GGA AAA AAG AAA AGA AAA Total #substitutions = 3 GGA AAA AGA AAG AAA Total #substitutions = 4 The left solution is preferred over the right one. A solution has two parts: First, select a tree and label its leaves by the input sequences; then, assign sequences to the internal vertices.

46 Example With One Letter Sequences u Suppose we have five species, such that three have ‘C’ and two ‘T’ at a specified position u Minimal tree has only one evolutionary change: C C C C C T T T T  C

47 Parsimony score AGA GGA AAA AAG AAA AGA AAA Parsimony score = 3 GGA AAA AGA AAG AAA Parsimony score = 4 The parsimony score of a leaf-labeled tree T is the minimum possible number of mutations over all assignments of sequences to internal vertices of T.

48 Parsimony Based Reconstruction We have here both the small and big problems: 1. The small problem: find the parsimony score for a given leaf labeled tree. 2.The big problem: Find a tree whose leaves are labeled by the input sequences, with the minimum possible parsimony score. 3.We will see efficient algorithms for (1). (2) is hard.

49 Example of Input for a Given Tree AardvarkBisonChimpDogElephant A : CAGGTA B : CAGACA C : CGGGTA D : TGCACT E : TGCGTA Given a tree whose leaves are labeled by sequences, we need only to assign strings to internal vertices.