Lab 4.1: Multiple Sequence Alignment & Phylogenetic Trees

Slides:



Advertisements
Similar presentations
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Advertisements

 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Molecular Evolution Revised 29/12/06
MICB 405 Bioinformatics Mini-Lab #4 – ClustalX Dr. Joanne Fox We gratefully acknowledge the funding for the development of these teaching.
© Wiley Publishing All Rights Reserved. Phylogeny.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
Multiple sequence alignment
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Sequencing a genome and Basic Sequence Alignment
Trees, Stars, and Multiple Biological Sequence Alignment Jesse Wolfgang CSE 497 February 19, 2004.
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple sequence alignment
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Multiple Sequence Alignment School of B&I TCD May 2010.
Christian M Zmasek, PhD Burnham Institute for Medical Research Bioinformatics and Systems Biology
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Eidhammer et al. Protein Bioinformatics Chapter 4 1 Multiple Global Sequence Alignment and Phylogenetic trees Inge Jonassen and Ingvar Eidhammer.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Sequence Alignment Only things that are homologous should be compared in a phylogenetic analysis Homologous – sharing a common ancestor This is true for.
Copyright OpenHelix. No use or reproduction without express written consent1.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Manually Adjusting Multiple Alignments Chris Wilton.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
Multiple Sequence Alignment Carlow IT Bioinformatics November 2006.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Lab 4.31 Lab 4.3: Molecular Evolution Jennifer Gardy Molecular Biology & Biochemistry Simon Fraser University.
Lab 4.11 Lab 4.1: Multiple Sequence Alignment Jennifer Gardy Molecular Biology & Biochemistry Simon Fraser University.
Multiple Sequence Alignment Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune
Lesson: Sequence processing
Biogeography and Phylogenetics
Phylogeny - based on whole genome data
Distance based phylogenetics
Multiple sequence alignment (msa)
A Very Basic Gibbs Sampler for Motif Detection
The ideal approach is simultaneous alignment and tree estimation.
Pipelines for Computational Analysis (Bioinformatics)
Lab 8.3: RNA Secondary Structure
Multiple Alignment and Phylogenetic Trees
B3- Olympic High School Bioinformatics
Multiple Sequence Alignment
Algorithm Animation for Bioinformatics Algorithms
Molecular Evolution.
Summary and Recommendations
Adva Yeheskel Bioinformatics Unit, Tel Aviv University 8/5/2018
Explore Evolution: Instrument for Analysis
MULTIPLE SEQUENCE ALIGNMENT
Summary and Recommendations
Presentation transcript:

Lab 4.1: Multiple Sequence Alignment & Phylogenetic Trees February, 2006 Jennifer Gardy Centre for Microbial Diseases & Immunity Research University of British Columbia jennifer@cmdr.ubc.ca Lab 4.1 (c) 2006 CGDN

http://creativecommons.org/licenses/by-sa/2.0/ Lab 4.1

Goals Learn how to create, view, and interpret multiple sequence alignments and phylogenetic trees Understand the difference between orthologs and paralogs Use an alignment and a tree to deduce information about biological sequences Complete the phylogeny assignment Complete Module 3 of the Integrated Assignment Lab 4.1

Outline Quick review of MSA How Clustal works Research Question Lab 4.1 February, 2006 Quick review of MSA How Clustal works Research Question Worked example 1: Creating an MSA Neighbour-joining trees Bootstrapping Worked example 2: Creating/viewing a tree Free time to work on phylogeny assignment Lab 4.1 (c) 2006 CGDN

MSAs: A Quick Review Why perform an MSA? How does one perform an MSA? Visualize trends between homologous sequences: Shared regions of homology Regions unique to a sequence within a family Structural/functional motif As the first step in a phylogenetic analysis Improve accuracy of structure predictions How does one perform an MSA? Automated alignment with manual editing Considerations: Select sequences carefully: Homologous over length No unrelated sequences Computational intensity Lab 4.1

Clustal – A Common MSA Tool Progressive alignment strategy: Sequences used to make guide tree Most similar two sequences aligned = consensus Next closest sequence aligned to the consensus 13 MITTEN 1 MITTENS 2 KITTIES 3 SMITTEN 4 KITTENS 1 3 4 2 1 -MITTENS 3 SMITTEN- 13 MITTEN 4 KITTENS 134 ITTEN Manual editing: Fine adjustment of particular columns Incorporate specific knowledge Removal of gappy bits Important for phylogenetic analysis Removal of parts of/whole sequences Non-homologous regions Sequences included by error 134 ITT-E 2 KITTIES Lab 4.1

Research Question: Background Olfaction: our sense of smell Small chemical compound (odorant) binds olfactory receptor (OR) on nasal epithelium OR changes shape, signaling cascade perception of odor >1000 OR genes in mammals (~350 active in humans), each binds 1 or more odorants But we smell many more scents than this! How? Combinatorial effects of Ors Odorants are enantiomers: Non-superimposable mirror images Each enantiomer has unique scent Lab 4.1

Research Question: Background ORs are G-protein coupled receptors (GPCRs) All are very similar in structure 7 transmembrane helices 3rd transmembrane helix thought to be important for odorant specificity Odorant:OR relationship not known for most of the OR genes Starting to employ bioinformatics techniques Lab 4.1

Research Question: Outline Computational analysis of OR sequences: 4 human sequences: OR6N1, OR6N2, OR6K2, OR6K6 OR6N1 binds R(-) carvone (spearmint) OR6K2 binds R(+) limonene (citrus fruit) 3 mouse homologs: olfr420, olfr425, olfr429 Can we figure out the odorant specificity of the other odorant receptors? Which residues might be important for odorant binding specificity? Lab 4.1

Starting up ClustalX Day 4 wiki > OR_proteins.txt Open nano or other text ed., paste sequence in and save Start ClustalX - $ clustalx File: Load sequences Edit: -Remove all gaps Alignment: -Do complete alignment -Alignment parameters Trees: -Bootstrapped NJ -Output format options Name Window Sequence Lab 4.1

OR Proteins in ClustalX File > Load sequences > OR_proteins.txt How are unaligned sequences displayed? Left-aligned, in order of input Default colouring (identity) – see help file for details Conservation score graph Lab 4.1

Alignment Parameters Alignment > Alignment Parameters > Multiple Alignment Parameters What effect might increasing the gap penalties have on alignment? Decreasing the gap penalties? Increased gap penalties: Fewer gaps May cause program to miss important insertion/deletion events Decreased gap penalties More gaps Makes it easier to align sequences that may not be closely related to each other Lab 4.1

Do an Alignment Alignment > Do complete alignment Generates an .aln file Lab 4.1

Building an NJ Tree - An Example Cbw protein from cat, rat, bat, mat and Matt Compare all sequences to each other. Assign divergence values to each pair Assemble the values in a distance matrix Cat Rat Bat Mat - 0.7 0.8 0.2 1.0 Matt 0.6 0.4 0.5 0.9 Lab 4.1

Building an NJ Tree 4. Arrange the subjects in a “star” phylogeny Lab 4.1

Building an NJ Tree 5. Fuse the two branches with the least divergence Cat Rat Bat Mat - 0.7 0.8 0.2 1.0 Matt 0.6 0.4 0.5 0.9 Lab 4.1

Building an NJ Tree 6. Create a new distance matrix using the fusion consensus sequence Cat RatBat Mat - 0.75 1.0 0.8 Matt 0.6 0.45 0.9 7. Fuse the next two closest sequences 8. Repeat until tree completed Lab 4.1

A Completed Tree Alternative displays are possible: Lab 4.1

A Word about Bootstrapping 1001 definitions, none of which have to do with boots. Or straps. http://en.wikipedia.org/wiki/Bootstrapping In phylogenetic analysis, bootstrapping is a simple test of phylogenetic accuracy: Does my whole dataset strongly support my tree? Or was this tree just marginally better than the other alternatives? Lab 4.1

Bootstrapping – The Wordy Version Original dataset is “randomly sampled with replacement” Multiple (N=100, 1000, etc…) “pseudo-datasets” of the same size as the original are created Each of the N pseudo-datasets is used to create a tree If a specific branching order is found in X of the N trees, that node is given the bootstrap support value X X values of 70% or more = very reliable groupings Lab 4.1

Bootstrapping – The Picture Version Slice original MSA of Y residues into Y columns, put the columns into a hat Pull out a random column, place it in column #1 of your new test set 3. Put the column back in the hat 4. Pull another column from the hat, place it in column #2 in the test set, put it back 5. Repeat until a pseudo-dataset of Y columns has been made “random sampling” “with replacement” Lab 4.1

Bootstrapping Repeat N number of times to generate N pseudo-datasets For each pseudo-dataset, draw a tree (yields N trees) Compare your tree to all N trees. How often do the branching orders in your tree appear in the N pseudo-trees? 3 On branches of your tree, write # of times that branch appeared in your pseudo-dataset trees 2 1 2 Lab 4.1

Our Bootstrapped Neighbour-Joining Tree Trees > “Exclude positions w/ gaps” Any column with 1+ gaps Deletes uninformative regions Not good for gappy MSAs “Correct for multiple subs.” M -> V -> L -> V 3, not 1, mutations Correction formula makes distances proportional to time since divergence Trees > Output Format Options Bootstrap labels on “node”, not “branch” Makes for easier visualization Bootstrap N-J Tree Lab 4.1

Draw an Bootstrapped NJ Tree Trees > Bootstrap NJ Tree What does this file look like? ( OR6K2:0.09089, olfr420:0.06487) :0.11633, OR6N1:0.06154, olfr429:0.05385) :0.10172, OR6N2:0.17713) :0.12270) :0.10391, OR6K6:0.10215, olfr425:0.08024); Not very tree-like Lab 4.1

View the Tree with NJPlot Lab 4.1 February, 2006 NJPlot – very basic njplot at prompt Turn on bootstrap value display Can swap nodes Many other tree viewing programs available Treeview, 3 different views: Lab 4.1 (c) 2006 CGDN

Remainder of lab time/Evening open lab Complete the phylogeny assignment! View tree on-screen Use tree and facts on pg. 3,4 of Lab 4.1. notes to answer Q1-Q5 Look at alignment of single transmembrane segment TM helix 3, thought to be important for odorant specificity Answer remaining assignment questions using this data Attention biologists! Karma alert! Help your teammates to understand evolution today, and they’ll help you understand programming tomorrow! Module 3 of the Integrated Assignment MSA/tree analysis of sequences found in Module 2 Last IA component for Week 1. Phylogeny assignment due 9am, IA due 11am Saturday Half-day open lab tomorrow afternoon, plus evening times Lab 4.1