From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

Alignments and alignment reliability The first critical step in sequence analysis – the know how Eyal Privman and Osnat Penn Tel Aviv University COST Training.
Tree of Life Chapter 26.
Plant Molecular Systematics (Phylogenetics). Systematics classifies species based on similarity of traits and possible mechanisms of evolution, a change.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Molecular Clock I. Evolutionary rate Xuhua Xia
Molecular Evolution Revised 29/12/06
© Wiley Publishing All Rights Reserved. Phylogeny.
Multiple Sequence Alignment (MSA) and Phylogeny. Clustal X.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Multiple Sequence Alignment (MSA) and Phylogeny. One of the options to get multiple sequence Fasta file.
Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2.
Bioinformatics and Phylogenetic Analysis
Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.
07/05/2004 Evolution/Phylogeny Introduction to Bioinformatics MNW2.
Introduction to Bioinformatics Molecular Phylogeny Lesson 5.
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
Consensus Consensus tree A consensus tree summarizes information common to two or more trees. bcdeabcdeabcdea.
Bioinformatics tools for phylogeny and visualization
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
P HYLOGENETIC T REE. OVERVIEW Phylogenetic Tree Phylogeny Applications Types of phylogenetic tree Terminology Data used to build a tree Building phylogenetic.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Multiple sequence alignment and their reliability The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel January 2013 By.
Warm-Up 1.Contrast adaptive radiation vs. convergent evolution? Give an example of each. 2.What is the correct sequence from the most comprehensive to.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Chapter 26 Phylogeny and the Tree of Life
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
1 Phylogeny Workshop By Eyal PrivmanEyal Privman The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel November 2009
Molecular Phylogeny. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel.
From basic Concepts to Advanced applications Molecular Evolution and Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
1 Molecular evidence of HIV transmission in a criminal case Metzker, Michael L. et al. (2002) Proc. Natl. Acad. Sci. USA 99,
Classification. Cell Types Cells come in all types of shapes and sizes. Cell Membrane – cells are surrounded by a thin flexible layer Also known as a.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Classification.
Phylogenetics.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Using Divide-and-Conquer to Construct the Tree of Life Tandy Warnow University of Illinois at Urbana-Champaign.
Systematics and Phylogenetics Ch. 23.1, 23.2, 23.4, 23.5, and 23.7.
Chapter 26 Phylogeny and Systematics. Tree of Life Phylogeny – evolutionary history of a species or group - draw information from fossil record - organisms.
Phylogeny and Taxonomy. Phylogeny and Systematics The evolutionary history of a species or related species Reconstructing phylogeny is done using evidence.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Introduction to Bioinformatics Resources for DNA Barcoding
Phylogenetic basis of systematics
Phylogeny and the Tree of Life
Methods of molecular phylogeny
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Chapter 25 Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Phylogenetics Chapter 26.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Unit Genomic sequencing
Presentation transcript:

From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel May

2 of 50 The Human Genome Project ("behind the scene”) Venter et. al., Science 292: (2001) International Human Genome Sequencing Consortium, Nature, 409: (2001) The club resident J.D. Watson: Back2back with DJ. Venter -

3 of 50 Genome Sequencing – Ongoing Revolution The race is (still) on…The promise is huge…

4 of 50 SRA(Sequence Read Archive)= raw seq. from next-generation machines Trace=raw seq. from 90s machines)

5 of 50 Darwin’s teachings– Tree-like evolution Introduction – The tree concept

6 of 50 Darwin’s teachings– common descent Introduction – The tree concept

7 of 50 Common Descent – Modern evidence Introduction – The tree concept "The unity of life is no less remarkable than its diversity" "The unity of life is no less remarkable than its diversity" THEODOSIUS DOBZHANSK

8 of 50 Mathematicians developed tools to analyze Trees Adapted from Huson et al connected graph without cycles is a tree.connectedcycles Not a tree! (cycle)Rooted binary tree Tree Part of the wider field of graph theory Bridges of Königsberg

9 of 50 What is a Phylogenetic Tree? Phylogenetic tree: (hypothetical) historical pattern of evolutionary relationships among organisms Introduction – The tree concept (Greek: phylon = race and genetic = birth) sps Horizontal branch length – proportional to evolutionary distances (unit = substitution / site)

10 of 50 Molecular evidence of HIV transmission in a criminal case Introduction - Anecdotes Metzker, Michael L. et al. (2002) Proc. Natl. Acad. Sci. USA 99,

11 of 50 Criminal investigation August 1994 a nurse tests negative for HIV. breaks off a messy 10 year affair with a doctor. Three weeks later the doctor gives his ex-mistress a vitamin B-12 shot In January 1995, the nurse tests positive for both HIV and hepatitis C. The doctor’s office records from the day are missing (but eventually found). The doctor had withdrawn blood samples from a known HIV patient and a known hepatitis C patient the same day as the vitamin B-12 shot. The nurse had never had contact with either patient Introduction - Anecdotes Circumstantial evidence that the doctor injected blood from a patient of his into this ex-girlfriend…. How can this be proved using a phylogenetic approach?

12 of 50 HIV – short background Extreme heterogeneity Within each patient there are many different viral strains ("quasi-species") Introduction - Anecdotes

13 of 50 History of the virus: gp120 (Gene tree) PATIENT VICTIM CONTROLS ©2002 National Academy of Sciences, U.S.A. Introduction - Anecdotes

14 of 50 History of the virus: RT (Gene tree) VICTIM PATIENT Introduction - Anecdotes Source sequences that are paraphyletic (other sequences are nested within them) with respect to the recipient sequences provide evidence for the direction of transmission.

15 of 50 Ernst Haeckel's Monophyletic tree of organisms, 1866 Reconstructing the tree of life

16 of 50 Organisms classified into 2 domains:  Eukaryotes including {plants, animals, protists, fungi}  Prkaryotes = Bacteria Whittaker, 1969

17 of 50 Reconstructing the Tree Of Life Carl Woese, phylogenetic taxonomy of 16S ribosomal RNAphylogenetictaxonomy16S ribosomal RNA Critiques: Woese un-balanced the tree of life… (too much representation for microbial species)

18 of 50 Phylogenetic analysis: Not only among organisms - Cancer phylogeny A phylogeny of acute myeloid leukemia (AML) subtypes Riester et al. 2010Liu et al. 2009

19 of 50 Phylogenetic analysis: Not only in biology – Language evolution Russell and Atkinson Researchers learn the evolution of languages by treating them like genomes. Instead of COGs (gene families), analyze COGNATES (words families)

20 of 50 Reading Trees: Which tree is more accurate? Reading Trees Haeckel’s pedigree of man Human "on top" – wrong!

21 of 50 Rooted vs. Un-rooted trees human mouse fugu Drosophila root edge internal node leaf human mouse fugu Drosophila root edge internal node (ancestor) leaf time Reading Trees

22 of 50 Gorilla gorilla (Gorilla) Homo sapiens (human) Pan troglodytes (Chimpanzee) Gallus gallus (chicken) How do we root a tree? Reading Trees

23 of 50 Rooting based on a priori knowledge: Using Outgroup Human Chimp Chicken Gorilla HumanChimpChickenGorilla Reading Trees

24 of 50 Comparative Genomics – " All life is one" Compare homologues sequences – Multiple Sequence Alignments

25 of 50   Orthologs speciation ancestor descendant 2 (e.g., dog) descendant 1 (e.g., human) Orthologs will typically have the same or similar function in the course of evolution.

26 of 50 Paralogs   Duplication Evolutionary innovation - lack of the original selective pressure upon one copy, this copy is free to mutate and acquire new functions.

27 of 50 Alignment and phylogeny are mutually dependant Inaccurate tree building MSA Sequence alignment Phylogeny reconstruction Unaligned sequences

28 of 50 Part II: Tools

29 of 50 Multiple sequence alignment (MSA) Several advanced MSA programs are available. Today we will use two: MAFFT – fast and relatively accurate PRANK – distinct from all other MSA programs because of its correct treatment of insertions/deletions Tools - Alignments

30 of 50 MAFFT Web server (& download option): Efficiency-tuned variants  quick & dirty or slow but accurate Nucleic Acids Research, 2002, Vol. 30, No © 2002 Oxford University PressOxford University Press MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform Kazutaka Katoh, Kazuharu Misawa 1, Kei-ichi Kuma and Takashi Miyata * Tools - Alignments

31 of 50 Choosing a MAFFT strategy quick & dirty slow but accurate Tools - Alignments

32 of 50 MAFFT output Saving the output Choose a format: Clustal, Fasta, or click "Reformat" to convert to a selection of other formats Save page as a text file A colored view of the alignment Tools - Alignments

33 of 50 PRANK Tools - Alignments

34 of 50 Classical alignment errors for HIV env Tools - Alignments CLUSTALW PRANK

35 of 50 PRANK Web server: Tools - Alignments

36 of 50 PRANK output If you need a different format – copy the results to the READSEQ sequence converter: Tools - Alignments

37 of 50 Downloadable PRANK  PRANK: A command-line program interface  PRANKSTER: A program with graphical user interface Tools - Alignments

38 1.Download the sequence files from the web-site Open "fahA.fas" in Notepad/Browser – these are 65 protein sequences in FASTA format. 2.Run PRANK web server (1)

39 of 50 Trees Reconstruction Methods

40 of 50 Phylogeny reconstruction Different approaches (algorithms / programs): Distance based methods (e.g. neighbor-joining, as in ClustalW)  Fast but inaccurate Maximum parsimony (e.g. MEGA)MEGA Maximum likelihood methods (e.g. phyML, RAxML)  Accurate but slowerphyMLRAxML Bayesian methods (e.g. MrBayes)  Most accurate but very slowMrBayes ABCDEABCDE Guide tree A D C B E MSA Pairwise distance table Tools - Trees

41 of 50 PhyML The most widely used maximum likelihood (ML) program Web server (& download) : Tools - Trees

42 of 50 Downloadable PhyML Less user-friendly, but allows using local computer power Run "phyml.bat" Drag the file from Windows Explorer to the blue window Enter "d" to switch from DNA to AA Enter "y" to run

43 1.Give "fahA.prank.phylip" or "fahA.mafft.phylip" as input to the phyML webserver (don't forget to choose "Amino-acids" and enter your and name) (2)

44 of 50 RAxML Web server: Similar maximum likelihood (ML) methodology as phyML, but much faster  Faster results with bootstrap Tools - Trees

45 of 50 Bootstrapping Now we have a tree, but what is the reliability of this tree?

46 of 50 Bootstrap A. Generate pseudo-data sets by sampling N positions Do not change the number of sequences. Resample ( time) : ATCTG…A 2 : ATCTG…C 3 : ACTTA…C 4 : ACCTA…T : AATTT…T 2 : AATTT…G 3 : AACTT…T 4 : AACTT…T 11244x : TTTAT…T 2 : TAACC…G 3 : TAACC…T 4 : TGGGA…T 47789…x : AGGTA…T 2 : AGGAC…G 3 : AAAAC…A 4 : AAAGG…C 15578…x

47 of 50 Bootstrap B. Reconstruct a tree from each data set : AATTT…T 2 : AATTT…G 3 : AACTT…T 4 : AACTT…T 11244x : TTTAT…T 2 : TAACC…G 3 : TAACC…T 4 : TGGGA…T 47789…x : AGGTA…T 2 : AGGAC…G 3 : AAAAC…A 4 : AAAGG…C 15578…x Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4

48 of 50 C. compute the majority rule consensus. Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 67% 100% In 67% of the data sets, the split between SP1+SP2 and the rest of the tree was found. Bootstrap

49 1.Give "fahA.prank.phylip" or "fahA.mafft.phylip" as input to the RAxML webserver (don't forget to tick "Protein sequences" and “Maximum likelihood search” and enter your ) (3)

50 of 50 FigTree: tree visualization and figure creation Manipulate a node Manipulate a clade Manipulate a taxon

51 1.In case tree are not ready yet… download tree from website 2.Open "fahA.prank.phylip_phyml_tree.txt" in FigTree Play around with the different options and make a pretty figure! 1.Find out how to color specific clades, as below 2.Try each of the three options under "Layout" 4.Export a figure in PDF format (File  Export Graphic…) (4)

52 of 50 Final Questions…