Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Slides:



Advertisements
Similar presentations
Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Pfam(Protein families )
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
Structural bioinformatics
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Multiple String.
Sequence Similarity Searching Class 4 March 2010.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
The Protein Data Bank (PDB)
Protein Modules An Introduction to Bioinformatics.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Similar Sequence Similar Function Charles Yan Spring 2006.
Dali: A Protein Structural Comparison Algorithm Using 2D Distance Matrices.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Genomics and bioinformatics summary 1. Gene finding: computer searches, cDNAs, ESTs, 2.Microarrays 3.Use BLAST to find homologous sequences 4.Multiple.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Multiple Sequence Alignments
Multiple Sequence Alignment
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Remote Homology detection: A motif based approach CS 6890: Bioinformatics - Dr. Yan CS 6890: Bioinformatics - Dr. Yan Swati Adhau Swati Adhau 04/14/06.
Protein Structures.
Sequence comparison: Local alignment
Sequencing a genome and Basic Sequence Alignment
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple sequence alignment
Multiple Sequence Alignment
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Eidhammer et al. Protein Bioinformatics Chapter 4 1 Multiple Global Sequence Alignment and Phylogenetic trees Inge Jonassen and Ingvar Eidhammer.
Sequencing a genome and Basic Sequence Alignment
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Protein and RNA Families
Multiple sequence comparison (MSC) Reading: Setubal/Meidanis, 3.4 Gusfield, Algorithms on Strings, Trees and Sequences, chapter 14.
Manually Adjusting Multiple Alignments Chris Wilton.
COT 6930 HPC and Bioinformatics Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
Computational Biology, Part C Family Pairwise Search and Cobbling Robert F. Murphy Copyright  2000, All rights reserved.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
InterPro Sandra Orchard.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Multiple String Comparison – The Holy Grail. Why multiple string comparison? It is the most critical cutting-edge toοl for extracting and representing.
Multiple sequence alignment (msa)
Sequence comparison: Local alignment
Predicting Active Site Residue Annotations in the Pfam Database
Sequence Based Analysis Tutorial
Sequence Based Analysis Tutorial
Protein Structures.
Protein structure prediction.
MULTIPLE SEQUENCE ALIGNMENT
Presentation transcript:

Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics

Multiple Sequence Alignment One amino acid sequence plays coy; a pair of homologous sequences whisper; many aligned sequences shout out loud. Very informative

Definition A global alignment of a set of sequences is obtained by –inserting into each sequence gap characters so that –the resulting sequences are of the same length and so that –no “column” has only gap characters

Example: Chromo domains aligned

Use of alignments High sequence similarity usually means significant structural and/or functional similarity. The reverse does not need to be true Homolog proteins (common ancestor) can vary significantly in large parts of the sequences, but still retain common 2D-patterns, 3D-patterns or common active site or binding site. Comparison of several sequences in a family can reveal what is common for the family. Something common for several sequences can be significant when regarding all of the sequences, but need not if regarding only two. Multiple alignment can be used to derive evolutionary history.

Use of alignments Predict features of aligned objects –conserved positions structurally/functionally important

Conserved positions

Use of alignments Predict features of aligned objects –conserved positions structurally/functionally important –patterns of hydrophobicity/hydrophilicity secondary structure elements

Helix pattern

Use of alignments Predict features of aligned objects –conserved positions structurally/functionally important –patterns of hydrophobicity/hydrophilicity secondary structure elements –“gappy” regions loops/variable regions

Loop?

Use of Alignments - make patterns/profiles Can make a profile or a pattern that can be used to match against a sequence database and identify new family members Profiles/patterns can be used to predict family membership of new sequences Databases of profiles/patterns –PROSITE –PFAM –PRINTS –...

Prosite: Motifs for classification Protein sequence Prosite pattern 1 Prosite pattern 2 Prosite pattern n Family 1Family 2Family n Pattern Regular expression Profile

Pattern from alignment [FYL]-x-[LIVMC]-[KR]-W-x-[GDNR]-[FYWLE]-x(5,6)-[ST]-W-[ES]-[PSTDN]-x(3)-[LIVMC]

Alignment problem Given a set of sequences, produce a multiple alignment which corresponds as well as possible to the biological relationships between the corresponding bio-molecules

For homologous proteins Two residues should be aligned (on top of each other) –if they are homologous (evolved from the same residue in a common ancestor protein) –if they are structurally equivalent

Automatic approach Need a way of scoring alignments –fitness function which for an alignment quantifies its “goodness” Need an algorithm for finding alignments with good scores Not all methods provide a scoring function for the final alignment!

Analysis of fitness function One can test whether the alignments optimal under a given fitness function correspond well to the biological relationships between the sequences For example, if the structure of (some of) the proteins are known.

Align by use of dynamic programming Dynamic programming finds best alignment of k sequences with given scoring scheme For two sequences there are three different column types For three sequences there are seven different column types x means an amino acid, - a blank Sequence1 x - x x - - x Sequence2 x x - x - x - Sequence3 x x x - x - x Time complexity of O(n k ) (sequence lengths = n)

Use of dynamic programming Dynamic programming finds best alignment of k sequences given scoring scheme

Algorithm for dynamic programming