Bioinformatics – NSF Summer School 2003 Z. Luthey-Schulten, UIUC.

Slides:



Advertisements
Similar presentations
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Advertisements

Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Sequence allignement 1 Chitta Baral. Sequences and Sequence allignment Two main kind of sequences –Sequence of base pairs in DNA molecules (A+T+C+G)*
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Profiles for Sequences
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Structural bioinformatics
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Similarity Searching Class 4 March 2010.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
1-month Practical Course Genome Analysis (Integrative Bioinformatics & Genomics) Lecture 3: Pair-wise alignment Centre for Integrative Bioinformatics VU.
Bioinformatics and Phylogenetic Analysis
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Tutorial 2: Some problems in bioinformatics 1. Alignment pairs of sequences Database searching for sequences Multiple sequence alignment Protein classification.
Multiple sequence alignment
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Project No. 7 Structural Genomics of the RGS Protein Family: Development of a Public Web-based Informatics Database Resource Dahai Gai Samuel Kalet Hongbo.
Multiple Sequence Alignments
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Protein Structures.
Chapter 5 Multiple Sequence Alignment.
Developing Pairwise Sequence Alignment Algorithms
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Pairwise & Multiple sequence alignments
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Tertiary Structure Prediction Methods Any given protein sequence Structure selection Compare sequence with proteins have solved structure Homology Modeling.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
COMPARATIVE or HOMOLOGY MODELING
Protein Sequence Alignment and Database Searching.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
Multiple sequence alignments Introduction to Bioinformatics Jacques van Helden Aix-Marseille Université (AMU), France Lab.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Sequence Based Analysis Tutorial
A B C D E F G H I J K FigS1. Supplemental Figure S1. Evolutionary relationships of Arabidopsis and tomato Aux/IAA proteins. The evolutionary history was.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
Motif Search and RNA Structure Prediction Lesson 9.
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Protein Sequence Alignment Multiple Sequence Alignment
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Genome Revolution: COMPSCI 004G 8.1 BLAST l What is BLAST? What is it good for?  Basic.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Predicting Active Site Residue Annotations in the Pfam Database
Sequence Based Analysis Tutorial
Dr Tan Tin Wee Director Bioinformatics Centre
Sequence Based Analysis Tutorial
Protein Structures.
Homology Modeling.
Protein structure prediction.
Presentation transcript:

Bioinformatics – NSF Summer School 2003 Z. Luthey-Schulten, UIUC

Sequence-Sequence Alignment Smith-Watermann Needleman-Wunsch Sequence-Structure Alignment Threading Hidden Markov

Sequence Alignment & Dynamic Programming Seq. 1: a 1 a 2 a a 4 a 5 …a n Seq. 2: c 1 - c 2 c 3 c 4 c 5 - …c m number of possible alignments: Smith-Waterman alignment algorithm Score Matrix H: Traceback AWGHE AW--HE

AWGHE AW--HE Smith-Waterman Local Alignment Score Matrix

Blosum 40 Substitution Matrix

Protein Structural Relationships Can protein structural relationships help us to understand evolutionary dynamics? Is there a connection between evolutionary events and changes in protein structure? What is the effect of gene duplication, horizontal gene transfer, and other evolutionary mechanisms on protein shape? SubstitutionIndelDomain Insertion O’Donoghue and Luthey-Schulten, UIUC 2003

Sequence Alignment & Dynamic Programming Seq. 1: a 1 a 2 a a 4 a 5 …a n Seq. 2: c 1 - c 2 c 3 c 4 c 5 - …c m number of possible alignments: Needleman-Wunsch alignment algorithm Score Matrix H: Traceback ??? Tutorial: W=d

Needleman-Wunsch Global Alignment Similarity Values Initialization of Gap Penalties

Filling out the Score Matrix H

Traceback and Alignment The Alignment Traceback (blue) from optimal score

Energy Landscape Theory of Structure Prediction

Protein Structure Prediction SISSIRVKSKRIQLG…. 1-D protein sequence3-D protein structure Sequence Alignment Ab Initio protein folding SISSRVKSKRIQLGLNQAELAQKV------GTTQ… QFANEFKVRRIKLGYTQTNVGEALAAVHGS… Target protein of unknown structure Homologous/analogous protein of known structure Sequence Alignment: the Energy Function ? E= E match + E gap E gap = E match =

A1A1 A3A3 A2A2 A4A4 A 5 … “Scaffold” structure Target sequence threading alignment between target and scaffold Threading: Sequence-Structure Alignment Threading Energy Function R. Goldstein, Z. Luthey-Schulten, P. Wolynes (1992, PNAS)

Gap Penalties Sequence-Structure Gap Energy Distribution of Gaps InsertionDeletion target scaffold Bulge R. Goldstein, Z. Luthey-Schulten, P. Wolynes (1994) Proc 27 th Annu Hawaii Int Conf Sys Sci:306.

Similarity Measures Sequence Identity fraction of identically matched residues Q “Structural Identity” fraction of native contacts

EsEs 22

Homology Modeling - Threading

Results from CASP5 CM/FR The prediction is never better than the scaffold. Threading Energy function requires improvement.

You are now entering the twilight zone of sequence identity. We need profiles! Watch for Bioinformants!!!

Profiles – Evolution Revisited “What molecular sequences taught us in the 1960’s was that the genealogical history of an organism is written to one extent or another into the sequences of each of its genes, an insight that became the central tenet of a new discipline, molecular evolution” Woese (PNAS, 2000) Pauling (1965)

Universal Tree The Universal Phylogenetic Tree inferred from comparative analyses of rRNA sequences: Woese(PNAS, 1990)

Horizontal Gene Transfer O’Donoghue and Luthey-Schulten, UIUC 2003

Multiple Sequence Alignments “The aminoacyl-tRNA synthetases, perhaps better than any other molecules in the cell, eptiomize the current situation and help to under standard (the effects) of HGT” Woese (PNAS, 2000; MMBR 2000)

Standard Dogma Molecular Biology DNA RNA Proteins Role of AARS? Charging of t-RNA

NCBI 3D

LeuRS Canonical Tree Woese, Olsen (UIUC), Ibba (Panum Inst.), Soll (Yale) Micro. Mol. Biol. Rev. March

D,N Sequence Phylogenetic Trees Woese, Olsen (UIUC), Ibba (Panum Inst.), Soll (Yale) Micro. Mol. Biol. Rev. March

Fold Motifs of AARSs O’Donoghue and Luthey-Schulten, UIUC 2003

Structure Conserved More than Sequence Structural Overlap of Class II AARS Conserved helicesConserved sheets

Subset of Class II Structural Tree O’Donoghue and Luthey-Schulten, UIUC 2003

Novel Evolutionary Connections from Sequence and Structure Woese, Olsen (UIUC), Ibba (Panum Inst.), Soll (Yale) Micro. Mol. Biol. Rev. March Canonical Pattern D E F L W Y BAE Canonical Pattern + A B I H P M Basal Canonical V T A R B A Gemini K 1 K 2 C S G N Q No canonical pattern Horizontal transfer after B-AE split. O’Donoghue, Luthey-Schulten, UIUC 2003

l, gap length (residues)r ij, spatial gap distance (Å) Gap Distribution Functions Spatial Gap Distribution FuncitonLength Gap Distribution Function B. Qian & R. Goldstein. (2001) Proteins 45:102.

Structural Alignment Methods PDB - Structural Neighbors – CE (Bourne) Stamp - Russell

Multiple Structural Alignments STAMP 1.Initial Alignment Multiple Sequence alignment Ridged Body “Scan” 2.Refine Initial Alignment & Produce Multiple Structural Alignment R. Russell, G. Barton (1992) Proteins 14: 309. Dynamic Programming (Smith-Waterman) through P matrix gives optimal set of equivalent residues. This set is used to re-superpose the two chains. Then iterate until alignment score is unchanged. This procedure is performed for all pairs.

Multiple Structural Alignments STAMP – cont’d 2.Refine Initial Alignment & Produce Multiple Structural Alignment Alignment score: Multiple Alignment: Create a dendrogram using the alignment score. Successively align groups of proteins (from branch tips to root). When 2 or more sequences are in a group, then average coordinates are used.

Stamp Output/Secondary Structure O’Donoghue and Luthey-Schulten, UIUC 2003

Stamp Output/Clustal Format O’Donoghue and Luthey-Schulten, UIUC 2003

Examples of Useful Web Tools Genomes – Sequence and Gene Information Domain Architecture Multiple Sequence Alignments Phylogenetic Trees Structural Databases Hidden Markov Methods

NCBI: Genomes

Charging the tRNA Woese, Olsen (UIUC), Ibba (Panum Inst.), Soll (Yale) Micro. Mol. Biol. Rev. March

NCBI 3D

Report from SWISS-PROT

PFAM Report

Sequence Dendrogram from Clustal Luthey-Schulten, UIUC 2003

Phylogenetic Tree in Tutorial Pogorelov and Luthey-Schulten, UIUC 2003

Alignment in MOE

Transmembrane Proteins - HMM Example Bacteriorhodpsin – Anurag Sethi UIUC

Stamp Profile Sethi and Luthey-Schulten, UIUC 2003

HMMer Profile-Profile Alignment Sethi and Luthey-Schulten, UIUC 2003

Clustal Profile-Profile Alignment Sethi and Luthey-Schulten, 2003

Structure Prediction Modeller 6.2/Hmmer Sethi and Luthey-Schulten, UIUC 2003 Modeller 6.2 A. Sali, et al.

Acknowledgements Felix Autenrieth Barry Isralewitz Patrick O’Donoghue Taras Pogorelov Anurag Sethi