Introduction to Bioinformatics Molecular Phylogeny Lesson 5.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Introduction to Phylogenies
Multiple Sequence Alignment & Phylogenetic Trees.
Lecture 4: Phylogeny and the Tree of Life Campbell: Chapter 26
Phylogenetic Analysis
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic Trees Systematics, the scientific study of the diversity of organisms, reveals the evolutionary relationships between organisms. Taxonomy,
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Early Earth (a brief history of time & the Big Bang)
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Phylogenetic trees Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Chapter 2.
Molecular Evolution Revised 29/12/06
© Wiley Publishing All Rights Reserved. Phylogeny.
1. 2 Rooting the tree and giving length to branches.
The Tree of Life From Ernst Haeckel, 1891.
Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
07/05/2004 Evolution/Phylogeny Introduction to Bioinformatics MNW2.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Phylogenetic trees Sushmita Roy BMI/CS 576
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Molecular phylogenetics
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
 Read Chapter 4.  All living organisms are related to each other having descended from common ancestors.  Understanding the evolutionary relationships.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Introduction to Phylogenetic Trees
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Molecular Phylogeny. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Using blast to study gene evolution – an example.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
PHYLOGENY AND THE TREE OF LIFE CH 26. I. Phylogenies show evolutionary relationships A. Binomial nomenclature: – Genus + species name Homo sapiens.
1 Molecular evidence of HIV transmission in a criminal case Metzker, Michael L. et al. (2002) Proc. Natl. Acad. Sci. USA 99,
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Chapter 25: Phylogeny and Systematics. “Taxonomy is the division of organisms into categories based on… similarities and differences.” p. 495, Campbell.
Part 9 Phylogenetic Trees
Systematics and Phylogenetics Ch. 23.1, 23.2, 23.4, 23.5, and 23.7.
PHYOGENY & THE Tree of life Represent traits that are either derived or lost due to evolution.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
1 Dan Graur Molecular Phylogenetics. 2 Objectives of molecular phylogenetics Reconstruct the correct evolutionary relationships among biological entities.
Ch. 26 Phylogeny and the Tree of Life. Opening Discussion: Is this basic “tree of life” a fact? If so, why? If not, what is it?
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Multiple Sequence alignment and Phylogenetic trees.
Bioinformatics Lecture 3 Molecular Phylogenetic By: Dr. Mehdi Mansouri Mehr 1395.
Phylogeny and the Tree of Life
Phylogenetic basis of systematics
Methods of molecular phylogeny
The Tree of Life From Ernst Haeckel, 1891.
Chapter 25 Phylogeny and the Tree of Life
Phylogenetics Chapter 26.
Phylogenetic Trees Jasmin sutkovic.
Presentation transcript:

Introduction to Bioinformatics Molecular Phylogeny Lesson 5

2 Theory of Evolution: Life is monophyletic All organisms on Earth had a common ancestor. Any two organisms share a common ancestor in their past. Ancestor Descendant 1 Descendant 2

3 Theory of Evolution: Speciation events lead to creation of different species (two species ). Speciation caused by physical separation into groups where different genetic variants become dominant. Ancestor Descendant 1 Descendant 2

4 Ancestor

5

6

7 extinct extant 1 extant 2 The genetic distance between any two extant organisms is computable.

8 The differences between 1 and 2 are the result of changes on the lineage leading to descendant 1 + those on the lineage leading to descendant 2. descendant 1 descendant 2 ancestor

9 Thus, any set of species are related: the relation is Phylogeny The relationships can be represented by Phylogenetic Tree (or dendrogram)

10 5 MYA 120 MYA 1,500 MYA MYA = Million Years Ago

11 Phylogenetic Tree Terminology Graph composed of nodes & branches Each branch connects two adjacent nodes A B C D E F R

12 Phylogenetic Tree Terminology Nodes represent the taxonomic units Taxonomic units = species/genes/individuals Branch = relations among the taxonomic units (descant & ancestry) Branching pattern = Topology Branch lengths correspond to number of substitutions. Longer branch means more substitutions.

13 Phylogenetic Tree Terminology A B CDE internal node - hypothetical most recent common ancestors leaf (terminal node) - current day species or gene “ taxa ” Branches Root

14 OTUs & HTUs OTUs = Operational Taxonomic Units –leaves of the tree HTUs = Hypothetical Taxonomic Units –internal nodes of the tree

15 ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimpTrees

16 Same thing s4s5 s1 s3 s2 s4s5 s1 s3 s2 =

17 Newick format A B C D E ((A,B),(C,(D,E)));

18 Rooted vs. unrooted trees

19 Gorilla gorilla (Gorilla) Homo sapiens (human) Pan troglodytes (Chimpanzee) Gallus gallus (chicken)

20 3 possible UNROOTED trees: Human Chimp Chicken Gorilla Human Gorilla Chimp Chicken Human Chicken Chimp Gorilla the best tree

21 Rooting based on priori knowledge: Human Chimp Chicken Gorilla HumanChimpChickenGorilla

22 Ingroup / Outgroup: HumanChimp Chicken Gorilla INGROUP OUTGROUP

23 Monophyletic groups (clades): A group is monophyletic (clade) if it has a common ancestor and all the descendents of this ancestor are in the group.

24 Monophyletic groups HumanChimp Chicken Gorilla The Gorilla+Human+Chimp are monophyletic

25 Non-monophyletic groups WhaleChimp Drosophila Zebra-fish The Zebra-fish+Whale are not monophyletic: Adaptation to water occurred more than once during evolution, independently… (or was lost in the lineage leading to chimp).

26 Monophyletic groups: Human Chimp Chicken Gorilla When an unrooted tree is given, you cannot know which groups are monophyletic. You can only say which are not. For example, Chicken + Rat might be monophyletic if the root was between Chicken + Rat and the rest. In fact, the real root of the tree is between Chicken and the rest, hence Chicken and rat are not monophyletic. But, Human and Gorilla are not monophyletic no matter where is the root… Rat

27 What data can be used? (1) Molecular data (DNA, RNA, proteins) (2) Morphological data (living or fossilized organisms)

28 Advantages of molecular data: Heritable entities Characters’ description is unambiguous Molecular data are amenable to quantitative treatment Can assess evolutionary relationship among distantly related organisms (ribosomal RNA) More abundant data (bacteria, algae)

29 What we can learn from phylogenetics tree? Determining the closest relatives of the organism that’s you are interested in.

30 Example 1: Which species are closest to Human? Human Chimpanzee Gorilla Orangutan Gorilla Chimpanzee Orangutan Human Molecular analysis: Chimpanzee is related more closely to human than the gorilla Pre-Molecular analysis: The great apes (chimpanzee, Gorilla & orangutan) Separate from the human

31 Example 2 : Guilty Sequence - scientists map a murder weapon “In 1998, a Louisiana doctor was convicted of attempting to murder his ex-girlfriend, a nurse. The murder weapon was a syringe of HIV-infected blood drawn from a patient under the doctor's care.”

32 History of the virus: ©2002 National Academy of Sciences, U.S.A. Metzker, Michael L. et al. (2002) Proc. Natl. Acad. Sci. USA 99, Phylogenetic analysis of the RT region. The smaller set of boxed sequences represents the sequences from the victim, and the larger set of boxed sequences represents the patient plus victim sequences. LA denote viral sequences from control HIV-1 infected individuals.

33 Species trees and Gene trees Species trees - representing the evolutionary relationships among species (the speciation process). Gene trees – Different genes may have different evolutionary history.

34 Before Darwin, homology was defined morphologically. Similarity between properties in various species. Example: Bats and butterflies fly, but the structures are different. Bats fly and whales swim, yet the bones in a bat's wing and a whale's flipper are strikingly alike. Conclusions: 1. Bats and butterflies wings are not homologous. 2. Bat wings and whales flippers are homologous. What is Homology ?

35 Darwin (1859): Homology is a result of descent with modifications from a common ancestor. Modern genetics: Homology is determined by genes. Two sequences are homologous if they are similar and share a common ancestor (similarity by itself is not enough). Large enough similarities typically imply homology. Homology Interpretation: from Darwin to 21st Century

36 Homolog A gene related to a second gene by descent from a common ancestral DNA sequence.

37 Orthologs Homologous sequences are orthologous if they were separated by a speciation event: If a gene exists in a species, and that species diverges into two species, then the copies of this gene in the resulting species are orthologous.

38 Orthologs Orthologs will typically have the same or similar function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes.

39   Orthologs speciation ancestor descendant 2

40 Paralogs Homologous sequences are paralogous if they were separated by a gene duplication event: If a gene in an organism is duplicated, then the two copies are paralogous.

41 Paralogs Orthologs will typically have the same or similar function. This is not always true for paralogs due to lack of the original selective pressure upon one copy of the duplicated gene, this copy is free to mutate and acquire new functions.

42 Paralogs    Duplication

43 Orthologs & Paralogs    Duplication Speciation Species a Species b ParalogsOrthologs

44 How many rooted trees ab abcbaccab N=3, TR(3) = 3 bcd a cbd a dbc a acd b cad b TR = “TREE ROOTED” N=2, TR(2) = 1 dac b abd c bad c dab c abc dbac d cab d bcd a cbd a dbc a N=4, TR(4) = 15

45 Number of Number of taxarooted treesunrooted trees , ,13510,395 92,027,025135, ,459,4252,027, ,729,07534,459, ,749,310,575654,729,075 Number of possible trees:

46 N Rooted =(2n-3)! / 2 n-2 (n-2)! N Unrooted =(2n-5)! / 2 n-3 (n-3)! Number of possible trees

47 Evolution is an historical process. Only one historical narrative is true. From 8,200,794,532,637,891,559,375 possibilities for 20 taxas, 1 possibility is true and 8,200,794,532,637,891,559,374 are false. Truth is one, falsehoods are many.

48 How do we know which of the 8,200,794,532,637,891,559,375 trees is true? We don’t, we infer by using decision criteria.

49 Methods

50 Approach 1 - Distance methods Two steps: –Compute a distances between any two sequences from the MSA. –Find the tree that agrees most with the distance table. Approach 2 - Character state methods Input: multiple sequence alignment Algorithms: –Maximum parsimony (MP) –Maximum likelihood (ML)

51 Step 1 :Distances estimation There are different methods to compute the distance between any two sequences. For example, one can take into account different probabilities between transitions and transversions… B 8 OTUABC CDCD D A

52 Step 2: From a distance table to a tree Algorithms: –UPGMA –Neighbor Joining (NJ)

53 Neighbor Joining (NJ) Reconstructs unrooted tree Calculates branch lengths Based on Star decomposition In each stage, the two nearest nodes of the tree are chosen and defined as neighbors in our tree. This is done recursively until all of the nodes are paired together.

54 What are neighbours? Neighbours are defined as a pair of OTU's who have one internal node connecting them. Neighbors, we are … B D A C A and B are neighbours, C and D are neighbours, But… A and C are not neighbours…

55 Which pair is closest? Neighbors, we are … r i r i =Σd ik /(N-2) average distance from all nodes M ij = d ij - [r i + r j ] distance of i,j relative to the rest

OTU A BC CDECDE D A B 8 A B C D(B,D) A C E E E OTU A (B,D)C CECE 7 6 A 10 E

57 (B,D) A C E OTU A (B,D)C CECE 7 6 A 10 E (B,D) (C,E) A B D C E A =

58 Advantages and disadvantages of NJ Advantages –is fast and thus suited for large datasets and for bootstrap analysis –permist lineages with largely different branch lengths –permits correction for multiple substitutions Disadvantages –sequence information is reduced gives only one possible tree –strongly dependent on the model of evolution used.