Introduction to Bioinformatics Tutorial 4 Multiple Alignment and Phylogeny.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Introduction to Phylogenies
Multiple Sequence Alignment & Phylogenetic Trees.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
© Wiley Publishing All Rights Reserved. Phylogeny.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Bioinformatics and Phylogenetic Analysis
Phylogenetic reconstruction
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Multiple sequence alignment
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
Phylogenetic trees Sushmita Roy BMI/CS 576
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
A few words on HIV The virus = HIV The disease = AIDS (Aquired Immunodeficiency Syndrome) First recognized clinically in 1981 By 1992, it had become.
BINF6201/8201 Molecular phylogenetic methods
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Using Molecular Information to Investigate the Evolutionary Origin of the HIV Virus.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Introduction to Phylogenetic Trees
16. Molecular Phylogenetics
Introduction to Phylogenetics
Reading Phylogenetic Trees
Calculating branch lengths from distances. ABC A B C----- a b c.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Phylogenetics.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Phylogeny.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. By Chris Paine
What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: (1) Phylogeny inference or “tree building”
Phylogenetic basis of systematics
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Multiple Sequence Alignment
Molecular basis of evolution.
Phylogenetic Trees.
Molecular Evolution.
Cladograms.
BNFO 602 Phylogenetics Usman Roshan.
Phylogenetics Chapter 26.
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Phylogeny and the Tree of Life
Presentation transcript:

Introduction to Bioinformatics Tutorial 4 Multiple Alignment and Phylogeny

2 ClustalW Input Fast alignment? Scoring matrix Alignment format Fast alignment options Gap scoring Phylogenetic trees Input sequences

3 ClustalW Output (1) Input sequences Pairwise alignment scores Building alignment Final score

4 ClustalW Output (2) Sequence namesSequence positions Match strength in decreasing order: * :.

5 Phylogenetic Trees Represent closeness between many entities –In our case, genomic or protein sequences human chimp monkey Observed entity Unobserved commonality Distance representation

6 Rooting Trees A tree can be hung from a root –Adds directional information –Requires addition of ‘outgroup’ humanchimpmonkeypig We know this is furthest So we hang the tree from where it joins

7 Phylogeny and Evolution Evolutionary Time Speciation Number of mutations Common Ancestor

8 Tree Reconstruction Build tree based on organism sequences Distance-based methods –Use pairwise alignment scores to build tree –Ignores sequences after initial alignments Character-based methods –Learn a tree with intermediate sequences that minimizes total number of mutations –Slower but generally better results

9 Distance-based Example (1)

10 Distance-based Example (2)

11 Distance-based Example (3)

12 Newick Tree Format (CFTR_SHEEP: , (CFTR_HUMAN: , (CFTR_MOUSE: , (CFTR_RABIT: , (CFTR_SQUAC: , CFTR_XENLA: ) : ) : ) : ) : , CFTR_BOVIN: );

13 Phylodendron Input Graphical style Newick tree description Tree size Orientation

14 Calculation of HIV/SIV Neighbor-joining tree Why phylogenetic analyses? Mutations accumulate in the genomes of pathogens, especially viruses, during a spread of an infection. This can be used to document the history of transmission events. Phylogenetic analysis of these mutations may not only be used to reconstruct the history of a pathogen's spread through host populations but can also be used to make predictions about it's future progress. The unsolved HIV/SIV relationship One interesting case, where phylogenetic treebuilding is useful, is the unsolved HIV/SIV relationship: HIV-1, HIV-2 and SIV. AIDS (acquired immunodeficiency syndrome) is caused by two different human viruses: HIV-1, group M and O HIV-2, subtypes A to E There are many related viruses in a variety of non-human primates. These related viruses are called SIV (simian immunodeficiency viruses).

15 Calculation of HIV/SIV Neighbor-joining tree Phylogenetic studies have shown that primate lentiviruses are all in the same clade. Within this clade there are five major lineages (the subscripts denotes the host) : HIV-1 and SIV CPZ (Chimpanzee) HIV-2, SIV SM (Sooty mangabey) and SIV MAC (Captive macaque) SIV AGM (African green monkey) SIV MND (Mandrill) SIV SYK (Sykes´ monkey) The NJ tree in our example is based on the poly protein sequence from HIV-1, HIV-2 and SIV with HTLV-1 as an outgroup. HTLV-1 (human T-lymphotropic virus type 1) is another human retroviral pathogen that has originated from related simian viruses.

16 Calculation of HIV/SIV Neighbor-joining tree Step by step summary: 1.Define all taxa and calculate all pairwise distances. 2.Pick two nodes in the star (i and j) for which the distance is minimal. 3.Define a new node (x) and calculate r i and r j. 4.Calculate d ix and d jx, thereby joining x to i and j respectively. 5.Remove i and j from the star and insert x instead. 6.Calculate d xm for all m in the star. Continue until the star has been resolved and root the tree in a final step.

17 Step1 minimum

18 Step1 (cont.) The calculation starts with the star: The branch lengths between node 5 and 10 and between 6 and 10 are calculated with these formulas: In this case L = 9 New node x = 10 r i =r 5 =Σd 5k /(L-2) = /(9-2) = r j =r 6 =Σd 6k /(L-2) = /(9-2) = d ix =d 5 10 =(d r 5 - r 6 )/2 = ( )/2 = d jx =d 6 10 = d d 5 10 = =

19 Step1(cont.)

20 Step2 minimum

21 Step2 (cont.) Calculation of the new branches: In this case L = 8 New node x = 11 r i =r 3 =Σd 3k /(L-2) = /(8-2) = r j = r 4 =Σd 4k /(L-2) = /(8-2) = d ix =d 3 11 =(d r 3 - r 4 )/2 = ( )/2 = d jx =d 4 11 = d d 3 11 = =

22 Step1(cont.)

23 Step3 minimum

24 Step3 (cont.) Calculation of the new branches: In this case L = 7 New node x = 12 r i =r 2 =Σd 2k /(L-2) = /(7-2) = r j =r 11 =Σd 11k /(L-2) = /(7-2) = d ix =d 2 12 =(d r 2 - r 11 )/2 = ( )/2 = d jx =d = d d 2 12 = =

25 Step1(cont.)

26 Step 7 In this case L = 3 New node x = 16: r 13 = r 15 = d = d =

27 Step 7 (cont.) Because node 9 is the outgroup, the root will be placed between node 9 and the other nodes. The distance between node 9 and the first internal node is

28 Conclusions HIV-2 (H2) is more closely related to SIV (S) from sooty mangabey than to HIV-1 (H1). HIV-1 seems to be more closely related to SIV from chimpanzee. This means that HIV- 1 and HIV-2 have originated independently from two different SIV strains. There must have been a cross-species transmission from chimpanzee SIV to human HIV-1. There also seems to have been a cross- species transmission from human to MAN/MAC.

29 Conclusions As one can see the branch between the H2-ROD A and the to SIV taxa has a low support. Only 56% of the trees have this topology. Therefore the transmission events from human to non-human primates are very uncertain.

30 Exercise In this exercise you will perform a phylogenetic analysis of the human globin sequences. You will compare your results to current prevalent knowledge on the globin family, according to the following summary on the globin sequences: Myoglobin and hemoglobins diverged from one another before the emergence of worms, about 800 million year ago. The hemoglobins diverged into two families (the α-family and β-family) following a gene duplication, about 450 million years ago, which is before the emergence of mammals. The α-family diverged into the zeta, teta and alpha genes, and the β-family diverged into the beta, gamma_G, gamma_A, delta and epsilon genes, all following a series of gene duplications. The most recent duplication was that gamma_G from gamma_A, which occurred around the separation of the simians (humans, chimp, gorilla, etc.) from the pro- siminas (such as lemurs and lorises), about 55 million years ago. (adapted from Graur and Li, 1999)

31 Exercise (cont.) 1.Reconstruct the phylogenetic tree of the human globins using Neighbor joining. Make sure tree is properly rooted (by defining an outgroup) according to the information in the above summary. Point out where the hemoglobins and myoglobin diverged, and where the α-family and β-family diverged. 2.Which of the following groups are monophyletic according to the tree you obtained: (i) alpha, beta, delta, (ii) alpha, teta, zeta, (iii) epsilon, beta, delta 3.Bootstrap the tree you built with 1000 bootstrap iterations. Display the tree with the bootstrap values displayed. On which branch was the lowest bootstrap value obtained? Explain what this means.