Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lab 4.1: Multiple Sequence Alignment & Phylogenetic Trees

Similar presentations


Presentation on theme: "Lab 4.1: Multiple Sequence Alignment & Phylogenetic Trees"— Presentation transcript:

1 Lab 4.1: Multiple Sequence Alignment & Phylogenetic Trees
February, 2006 Jennifer Gardy Centre for Microbial Diseases & Immunity Research University of British Columbia Lab 4.1 (c) 2006 CGDN

2 Lab 4.1

3 Goals Learn how to create, view, and interpret multiple sequence alignments and phylogenetic trees Understand the difference between orthologs and paralogs Use an alignment and a tree to deduce information about biological sequences Complete the phylogeny assignment Complete Module 3 of the Integrated Assignment Lab 4.1

4 Outline Quick review of MSA How Clustal works Research Question
Lab 4.1 February, 2006 Quick review of MSA How Clustal works Research Question Worked example 1: Creating an MSA Neighbour-joining trees Bootstrapping Worked example 2: Creating/viewing a tree Free time to work on phylogeny assignment Lab 4.1 (c) 2006 CGDN

5 MSAs: A Quick Review Why perform an MSA? How does one perform an MSA?
Visualize trends between homologous sequences: Shared regions of homology Regions unique to a sequence within a family Structural/functional motif As the first step in a phylogenetic analysis Improve accuracy of structure predictions How does one perform an MSA? Automated alignment with manual editing Considerations: Select sequences carefully: Homologous over length No unrelated sequences Computational intensity Lab 4.1

6 Clustal – A Common MSA Tool
Progressive alignment strategy: Sequences used to make guide tree Most similar two sequences aligned = consensus Next closest sequence aligned to the consensus 13 MITTEN 1 MITTENS 2 KITTIES 3 SMITTEN 4 KITTENS 1 3 4 2 1 -MITTENS 3 SMITTEN- 13 MITTEN 4 KITTENS 134 ITTEN Manual editing: Fine adjustment of particular columns Incorporate specific knowledge Removal of gappy bits Important for phylogenetic analysis Removal of parts of/whole sequences Non-homologous regions Sequences included by error 134 ITT-E 2 KITTIES Lab 4.1

7 Research Question: Background
Olfaction: our sense of smell Small chemical compound (odorant) binds olfactory receptor (OR) on nasal epithelium OR changes shape, signaling cascade perception of odor >1000 OR genes in mammals (~350 active in humans), each binds 1 or more odorants But we smell many more scents than this! How? Combinatorial effects of Ors Odorants are enantiomers: Non-superimposable mirror images Each enantiomer has unique scent Lab 4.1

8 Research Question: Background
ORs are G-protein coupled receptors (GPCRs) All are very similar in structure 7 transmembrane helices 3rd transmembrane helix thought to be important for odorant specificity Odorant:OR relationship not known for most of the OR genes Starting to employ bioinformatics techniques Lab 4.1

9 Research Question: Outline
Computational analysis of OR sequences: 4 human sequences: OR6N1, OR6N2, OR6K2, OR6K6 OR6N1 binds R(-) carvone (spearmint) OR6K2 binds R(+) limonene (citrus fruit) 3 mouse homologs: olfr420, olfr425, olfr429 Can we figure out the odorant specificity of the other odorant receptors? Which residues might be important for odorant binding specificity? Lab 4.1

10 Starting up ClustalX Day 4 wiki > OR_proteins.txt
Open nano or other text ed., paste sequence in and save Start ClustalX - $ clustalx File: Load sequences Edit: -Remove all gaps Alignment: -Do complete alignment -Alignment parameters Trees: -Bootstrapped NJ -Output format options Name Window Sequence Lab 4.1

11 OR Proteins in ClustalX
File > Load sequences > OR_proteins.txt How are unaligned sequences displayed? Left-aligned, in order of input Default colouring (identity) – see help file for details Conservation score graph Lab 4.1

12 Alignment Parameters Alignment > Alignment Parameters > Multiple Alignment Parameters What effect might increasing the gap penalties have on alignment? Decreasing the gap penalties? Increased gap penalties: Fewer gaps May cause program to miss important insertion/deletion events Decreased gap penalties More gaps Makes it easier to align sequences that may not be closely related to each other Lab 4.1

13 Do an Alignment Alignment > Do complete alignment
Generates an .aln file Lab 4.1

14 Building an NJ Tree - An Example
Cbw protein from cat, rat, bat, mat and Matt Compare all sequences to each other. Assign divergence values to each pair Assemble the values in a distance matrix Cat Rat Bat Mat - 0.7 0.8 0.2 1.0 Matt 0.6 0.4 0.5 0.9 Lab 4.1

15 Building an NJ Tree 4. Arrange the subjects in a “star” phylogeny
Lab 4.1

16 Building an NJ Tree 5. Fuse the two branches with the least divergence
Cat Rat Bat Mat - 0.7 0.8 0.2 1.0 Matt 0.6 0.4 0.5 0.9 Lab 4.1

17 Building an NJ Tree 6. Create a new distance matrix using the fusion consensus sequence Cat RatBat Mat - 0.75 1.0 0.8 Matt 0.6 0.45 0.9 7. Fuse the next two closest sequences 8. Repeat until tree completed Lab 4.1

18 A Completed Tree Alternative displays are possible: Lab 4.1

19 A Word about Bootstrapping
1001 definitions, none of which have to do with boots. Or straps. In phylogenetic analysis, bootstrapping is a simple test of phylogenetic accuracy: Does my whole dataset strongly support my tree? Or was this tree just marginally better than the other alternatives? Lab 4.1

20 Bootstrapping – The Wordy Version
Original dataset is “randomly sampled with replacement” Multiple (N=100, 1000, etc…) “pseudo-datasets” of the same size as the original are created Each of the N pseudo-datasets is used to create a tree If a specific branching order is found in X of the N trees, that node is given the bootstrap support value X X values of 70% or more = very reliable groupings Lab 4.1

21 Bootstrapping – The Picture Version
Slice original MSA of Y residues into Y columns, put the columns into a hat Pull out a random column, place it in column #1 of your new test set 3. Put the column back in the hat 4. Pull another column from the hat, place it in column #2 in the test set, put it back 5. Repeat until a pseudo-dataset of Y columns has been made “random sampling” “with replacement” Lab 4.1

22 Bootstrapping Repeat N number of times to generate N pseudo-datasets
For each pseudo-dataset, draw a tree (yields N trees) Compare your tree to all N trees. How often do the branching orders in your tree appear in the N pseudo-trees? 3 On branches of your tree, write # of times that branch appeared in your pseudo-dataset trees 2 1 2 Lab 4.1

23 Our Bootstrapped Neighbour-Joining Tree
Trees > “Exclude positions w/ gaps” Any column with 1+ gaps Deletes uninformative regions Not good for gappy MSAs “Correct for multiple subs.” M -> V -> L -> V 3, not 1, mutations Correction formula makes distances proportional to time since divergence Trees > Output Format Options Bootstrap labels on “node”, not “branch” Makes for easier visualization Bootstrap N-J Tree Lab 4.1

24 Draw an Bootstrapped NJ Tree
Trees > Bootstrap NJ Tree What does this file look like? ( OR6K2: , olfr420: ) : , OR6N1: , olfr429: ) : , OR6N2: ) : ) : , OR6K6: , olfr425: ); Not very tree-like Lab 4.1

25 View the Tree with NJPlot
Lab 4.1 February, 2006 NJPlot – very basic njplot at prompt Turn on bootstrap value display Can swap nodes Many other tree viewing programs available Treeview, 3 different views: Lab 4.1 (c) 2006 CGDN

26 Remainder of lab time/Evening open lab
Complete the phylogeny assignment! View tree on-screen Use tree and facts on pg. 3,4 of Lab 4.1. notes to answer Q1-Q5 Look at alignment of single transmembrane segment TM helix 3, thought to be important for odorant specificity Answer remaining assignment questions using this data Attention biologists! Karma alert! Help your teammates to understand evolution today, and they’ll help you understand programming tomorrow! Module 3 of the Integrated Assignment MSA/tree analysis of sequences found in Module 2 Last IA component for Week 1. Phylogeny assignment due 9am, IA due 11am Saturday Half-day open lab tomorrow afternoon, plus evening times Lab 4.1


Download ppt "Lab 4.1: Multiple Sequence Alignment & Phylogenetic Trees"

Similar presentations


Ads by Google