Alignments Why do Alignments?. Detecting Selection Evolution of Drug Resistance in HIV.

Slides:



Advertisements
Similar presentations
Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Molecular Evolution Revised 29/12/06
Structural bioinformatics
Alignment Problem (Optimal) pairwise alignment consists of considering all possible alignments of two sequences and choosing the optimal one. Sub-optimal.
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
Heuristic alignment algorithms and cost matrices
1 Protein Multiple Alignment by Konstantin Davydov.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Bioinformatics and Phylogenetic Analysis
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
CS262 Lecture 9, Win07, Batzoglou Multiple Sequence Alignments.
Sequence Analysis Tools
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Tutorial 2: Some problems in bioinformatics 1. Alignment pairs of sequences Database searching for sequences Multiple sequence alignment Protein classification.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 6 – 07/01/08 Multiple sequence alignment 2 Sequence analysis 2007 Optimizing.
Sequence similarity.
Multiple sequence alignments and motif discovery Tutorial 5.
Multiple sequence alignment
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 23rd, 2014.
Multiple Sequence Alignments
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
Pairwise alignment Computational Genomics and Proteomics.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple sequence alignment
Biology 4900 Biocomputing.
Multiple Sequence Alignment
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Pairwise & Multiple sequence alignments
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 24th, 2013.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Protein Sequence Alignment and Database Searching.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Construction of Substitution Matrices
Using Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders Weiwei Zhong.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Sequence Alignment Only things that are homologous should be compared in a phylogenetic analysis Homologous – sharing a common ancestor This is true for.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
MUSCLE An Attractive MSA Application. Overview Some background on the MUSCLE software. The innovations and improvements of MUSCLE. The MUSCLE algorithm.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
COT 6930 HPC and Bioinformatics Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
Phylogenetics: The “E” word in disguise Keith A. Crandall, Brigham Young University
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Multiple Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 13, 2004 ChengXiang Zhai Department of Computer Science University.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
INTRODUCTION TO BIOINFORMATICS
Multiple sequence alignment (msa)
The ideal approach is simultaneous alignment and tree estimation.
Overview of Multiple Sequence Alignment Algorithms
Presentation transcript:

Alignments Why do Alignments?

Detecting Selection Evolution of Drug Resistance in HIV

Selection on Amino Acid Properties TreeSAAP (2003) Wu Method (Sainudiin et al. 2005)

TreeSAAP Properties Alpha-helical tendencies Average number of surrounding residues Beta-structure tendencies Bulkiness Buriedness Chromatographic Index Coil tendencies Composition Compressibility Equilibrium constant (ionization of COOH) Helical contact area Hydropathy Isoelectric point Long-range non-bonded energy Mean r.m.s. fluctuation displacement Molecular volume Molecular weight Normalized consensus hydrophobicity Partial specific volume Polar requirement Polarity Power to be at the C-terminal Power to be at the middle of alpha- helix Power to be at the N-terminal Refractive index Short and medium range non-bonded energy Solvent accessible reduction ratio Surrounding hydrophobicity Thermodynamic transfer hydrophobicity Total non-bonded energy Turn tendencies

TreeSAAP

Rhinoviruses

Selected Sites

3D Mapping

PHENOTYPE GENOTYPE ENVIRONMENT OPSIN: Model System for Molecular Evolution Wavelength (nm) UVIR CRLAKIAMTTVALWFIAWT PYLLINWVGMFARSYLSPV YTIWGYVFAKANAVYNPIV YAISHPKYRAAMEKKLPCL SCKTESDDVSESASTTTSS

Is max Correlated with Ecological Differences? microscopic thin beam of spectral light INPUTOUTPUT INPUT – OUTPUT = pigment absorbance Detect light not absorbed by the photopigment 400 – 700 nm at 1nm intervals

Coil Tendencies, Compressibility, Alpha-Helix

Amino acid alignment number Coil Tendencies Compressibility Power to be at mid alpha Refractive Index Z-score TMI TMIITMIIITMIVTMVTMVI TreeSAAP

Homology

Homology definitions Homology is an evolutionary relationship that either exists or does not. It cannot be partial. An ortholog is a homolog that arose through a speciation event A paralog is a homolog that arose through a gene duplication event. Paralogs often have divergent function. Similarity is a measure of the quality of alignment between two sequences. High similarity is evidence for homology. Similar sequences may be orthologs or paralogs.

One More Homology type Xenology – similarity due to horizontal gene transfer (HGT) How do you discover this?

Alignment Problem (Optimal) pairwise alignment consists of considering all possible alignments of two sequences and choosing the optimal one. Sub-optimal (heuristic) alignment algorithms are also very important: eg BLAST

Key Issues Types of alignments (local vs. global) The scoring system The alignment algorithm Measuring alignment significance

Types of Alignment Global—sequences aligned from end- to-end. Local—alignments may start in the middle of either sequence Ungapped—no insertions or deletions are allowed Other types: overlap alignments, repeated match alignments

Local vs. Global Pairwise Alignments A global alignment includes all elements of the sequences and includes gaps. A global alignment may or may not include "end gap" penalties. Global alignments are better indicators of homology and take longer to compute. A local alignment includes only subsequences, and sometimes is computed without gaps. Local alignments can find shared domains in divergent proteins and are fast to compute

How do you compare alignments? Scoring scheme What events do we score? Matches Mismatches Gaps What scores will you give these events? What assumptions are you making? Score your alignment

Scoring Matrices How do you determine scores? What is out there already for your use? DNA versus Amino Acids? TTACGGAGCTTC CTGAGATCC

Multiple Sequence Alignment Global versus Local Alignments Progressive alignment Estimate guide tree Do pairwise alignment on subtrees ClustalX

Improvements Consistency-based Algorithms T-Coffee - consistency-based objective function to minimize potential errors Generates pair-wise global (Clustal) Local (Lalign) Then combine, reweight, progressive alignment

Iterative Algorithms Estimate draft progressive alignment (uncorrected distances) Improved progressive (reestimate guide tree using Kimura 2-parameter) Refinement - divide into 2 subtrees, estimate two profiles, then re-align 2 profiles Continue refinement until convergence

Software Clustal T-Coffee MUSCLE (limited models) MAFFT (wide variety of models)

Comparisons Speed Muscle>MAFFT>CLUSTALW>T-COFFEE Accuracy MAFFT>Muscle>T-COFFEE>CLUSTALW Lots more work to do here!