Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

Slides:



Advertisements
Similar presentations
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Pfam(Protein families )
Hidden Markov models for detecting remote protein homologies Kevin Karplus, Christian Barrett, Richard Hughey Georgia Hadjicharalambous.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structural bioinformatics
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.
Protein structure (Part 2 of 2).
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It.
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Thomas Blicher Center for Biological Sequence Analysis
It & Health 2009 Summary Thomas Nordahl Petersen.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Appendix: Automated Methods for Structure Comparison Basic problem: how are any two given structures to be automatically compared in a meaningful way?
The Protein Data Bank (PDB)
Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia.
Introduction to bioinformatics
ProteinStructuralDatabases. Proteins are built from amino-acids. Introduction H | NH2-c-CO2H | R.
©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align,
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Structure databases, searches and alignments Marian Novotny Molecular Bioinformatics X3.
Protein threading Structure is better conserved than sequence
Sequence Alignment III CIS 667 February 10, 2004.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Structure Prediction II
Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Structural Prediction. Protein Structure is Hierarchical.
IBGP/BMI 705 Lab 4: Protein structure and alignment TA: L. Cooper.
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
. Sequence Alignment. Sequences Much of bioinformatics involves sequences u DNA sequences u RNA sequences u Protein sequences We can think of these sequences.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Secondary structure prediction
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein Strucure Comparison Chapter 6,7 Orengo. Helices α-helix4-turn helix, min. 4 residues helix3-turn helix, min. 3 residues π-helix5-turn helix,
DALI Method Distance mAtrix aLIgnment
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning.
Protein Secondary Structure Prediction G P S Raghava.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Programme Last week’s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues Summary.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein Sequence Alignment Multiple Sequence Alignment
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
Sequence similarity, BLAST alignments & multiple sequence alignments
Chapter 14 Protein Structure Classification
Multiple sequence alignment (msa)
Figure 3.14A–D Protein structure (layer 1)
Protein Structures.
Protein structure prediction.
Presentation transcript:

Structural alignment

Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique shape (tertiary or three-dimensional structure). However, proteins with similar sequences adopt very similar structures. Cyclophilin from B. malayi Cyclophilin A from H. sapiens

Why structural alignment ? we have sequence alignment - Clustal… KTHLCV KSHA -V that gives us an idea about a correspondence of amino acids of two (or more ) proteins That enables to infer information about function And evolution of the Protein If the sequences are similar enough !!!!

What is twilight zone ? Sequence alignment unambiguously distinguish only between protein pairs of similar structure and non-similar structures when the pairwise sequence identity is high. High sequence identity roughly means over 40 %. The signal gets blurred in the twilight zone of % sequence identity.

More of the twilight zone More than 90 % sequence pairs with the sequence identity lower than 25 % have different structures. Significance of sequence alignments is length dependent. The longer the sequence the lower identity is required to be be called significant.Nevertheless, it converges to 25% with alignments longer than 80 amino acids. ‘The more similar than identical’ rule can reduce a number of false positives. Using of intermediate sequences for finding links between more distant families can also reduce a number of false positives.

How far can the sequence identity drop? Average sequence identity of random alignments % Average sequence identity of remote homologues 8.5 %

How does it work? From

Numbers Given the average length of a protein 300 amino acid, there are possibilities of building the average protein - more than atoms in universe. In reality just few hundred thousand sequences are known. It is believed that a number of basic protein folds is between

Structural alignment because: Structures are better conserved than sequences structural alignment can imply a functional similarity that is not detectable from a sequence alignment. Might help to improve sequence alignment when structures are available (phylogenetic studies, homology modeling). Will improve sequence alignment methods (use of structural alignments’ substitution matrices, gap penalties). Will improve sequence prediction methods

PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO GLU ALA ILE CYS PHE ASN VAL CYS ARG THR PRO GLU ALA ILE CYS Sequence versus structural alignment

Material

Is it difficult to make structural alignment? Structural alignment is NP-hard (nondeterministic polynomial time) problem. In other words, it is not tractable properly. Even, if it would, the result would be correct from technical point of view not necessary from biological point of view. Yes, it is.

General solution Use a heuristic approach: 1.Represent the proteins A and B in some coordinate independent space 2.Compare A and B 3.Optimize the alignment between A and B (e.g. minimize R.M.S.d.) 4.Measure the statistical significance of the alignment against some random set of structure comparisons

“..in some coordinate independent space…” Make the problem easier by: - comparing only distance matrices of atoms -comparing secondary structure element (SSE) - comparing cartoons - comparing vectors of SSE - combination of mentioned methods - ….

None of the methods guarantee the finding of the closest structure and two methods can disagree at all amino acid positions. Nevertheless they can still provide a valuable insight into the history of the protein and give hints concerning the function.

ServerLocationMethod CE Extension of optimal path 1 DALI Distance-matrix alignment 2 DEJAVU SSE alignment with C  atom optimisation 3 LOCK Absolute orientation of corresponding points 4 MATRAS Markov transition model of evolution 5 PRIDE C  C  atom distances 6 SSM Graph matching algorithm TOP SSE alignment 7 TOPS tops.ebi.ac.uk/tops/compare1. html TOPS-diagram alignment 8 TOPSCAN pscan Secondary topology-string alignment 9 VAST tsearch.html Vector alignment 10 Methods for fold comparison

Protein structure classification If you want to know which structures are similar to a known structure, these systems might help: A)Manual - SCOP B)Semi-automatic - CATH C)Automatic - FSSP

CATH C (class) - secondary structure composition A (architecture) - overall shape, secondary structure elements orientation T (topology) - overall shape, secondary structure elements orientation + connectivity H (homologous superfamily) - Sequence identity >= 35%, 60% of larger structure equivalent to smaller SSAP score >= 80.0 and sequence identity >= 20% 60% of larger structure equivalent to smaller SSAP score >= 80.0, 60% of larger structure equivalent to smaller and domains which have related functions S (sequence families) - clustering based on the sequence identity level

Summary Structural alignment can help with protein annotations even when the sequence similarity is not significant. Sequence identity of two proteins with similar structures can be lower than 10 % - number of folds is limited. Recent progress in the protein structure determination increases the usefulness of structural alignment. Structural alignment is difficult problem that is solved by heuristic methods. These methods simplify the problem by moving from 3D space to 2D space sacrificing the optimum result for the speed.

Summary II Different methods can provide completely different alignments. In our results, CE, Dali,Matras and Vast were the best servers for finding structural relatives. A few structural classification systems were developed (CATH, FSSP, SCOP), they provide hierarchical classification of protein structures and enable to infer functional and evolutional relationships between proteins.