Lecture 3.31 Superposition & Threading † Gary Van Domselaar University of Alberta † Slides adapted from David Wishart.

Slides:



Advertisements
Similar presentations
3D Structure Prediction & Assessment Pt. 2 David Wishart 3-41 Athabasca Hall
Advertisements

Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Protein Tertiary Structure Prediction
Chapter 4.1 Mathematical Concepts
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Linear Algebra and SVD (Some slides adapted from Octavia Camps)
Chapter 4.1 Mathematical Concepts. 2 Applied Trigonometry Trigonometric functions Defined using right triangle  x y h.
IPM-POLYTECHNIQUE-WPI Workshop on Bioinformatics and Biomathematics April 11-21, 2005 IPM School of Mathematics Tehran.
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
CSCE 590E Spring 2007 Basic Math By Jijun Tang. Applied Trigonometry Trigonometric functions  Defined using right triangle  x y h.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Structure Prediction and Analysis
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
BINF6201/8201 Principle components analysis (PCA) -- Visualization of amino acids using their physico-chemical properties
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
Tertiary Structure Prediction Methods Any given protein sequence Structure selection Compare sequence with proteins have solved structure Homology Modeling.
Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein Strucure Comparison Chapter 6,7 Orengo. Helices α-helix4-turn helix, min. 4 residues helix3-turn helix, min. 3 residues π-helix5-turn helix,
DALI Method Distance mAtrix aLIgnment
Doug Raiford Lesson 17.  Framework model  Secondary structure first  Assemble secondary structure segments  Hydrophobic collapse  Molten: compact.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Protein Structure Prediction Graham Wood Charlotte Deane.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Computer Graphics Matrices
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Jürgen Sühnel Supplementary Material: 3D Structures of Biological Macromolecules Exercise 1:
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
Lab Meeting 10/08/20041 SuperPose: A Web Server for Automated Protein Structure Superposition Gary Van Domselaar October.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
Lab Lab 10.2: Homology Modeling Lab Boris Steipe Departments of Biochemistry and.
Computational Structure Prediction
Protein Structure Prediction and Protein Homology modeling
Protein Structures.
Homology Modeling.
Protein structure prediction.
DALI Method Distance mAtrix aLIgnment
Game Programming Algorithms and Techniques
Protein structure prediction
Presentation transcript:

Lecture 3.31 Superposition & Threading † Gary Van Domselaar University of Alberta † Slides adapted from David Wishart

Lecture 3.32 Outline Vectors, matrices and other geometry issues General Superposition concepts Threading and threading methods

Lecture 3.33 Vectors Define Bonds and Atomic Positions x y z Origin CO bond

Lecture 3.34 Review - Vectors (1,2,1) (0,0,0) u u = 1i + 2j + 1k ^^^ u = = (1-0) 2 + (2-0) 2 + (1-0) 2 = 6 u Vectors have a length & a direction x y z

Lecture 3.35 Review - Vectors Vectors can be added together Vectors can be subtracted Vectors can be multiplied (dot or cross or by a matrix) Vectors can be transformed (resized) Vectors can be translated Vectors can be rotated

Lecture 3.36 Matrices A matrix is a table or “array” of characters A matrix is also called a tensor of “rank 2” row column A 5 x 6 Matrix # columns # rows

Lecture 3.37 Different Types of Matrices A square Matrix A symmetric Matrix A column Matrix (A vector)

Lecture 3.38 Different Types of Matrices A B C D E F G H I J K L M N O P Q R S T U V W X cos  sin  0 sin  -cos  A rectangular Matrix A rotation Matrix A row Matrix (A vector)

Lecture 3.39 Review - Matrix Multiplication x1 + 4x2 + 0x0 2x0 + 4x1 + 0x1 2x2 + 4x3 + 0x0 1x1 + 3x2 + 1x0 1x0 + 3x1 + 1x1 1x2 + 3x3 + 1x0 1x1 + 0x2 + 0x0 1x0 + 0x1 + 0x1 1x2 + 0x3 + 0x0 x

Lecture Rotation cos  sin  0 -sin  cos  cos  sin  0 -sin  cos  Rotate about x Rotate about z   x z y

Lecture Rotation cos  sin  0 -sin  cos  cos  sin  0 -sin  cos  Clockwise about xClockwise about z cos  -sin  0 sin  cos  cos  -sin  0 sin  cos  Counterclockwise about xCounterclockwise about z

Lecture Rotation X = X = x y z x y z cos  sin  0 -sin  cos  cos  sin  0 -sin  cos 

Lecture Rotation (Detail) X = x y z x y z =  cos  sin  -sin  + cos  cos  sin  0 -sin  cos  cos  sin  0 -sin  cos 

Lecture Superposition Objective is to match or overlay 2 or more similar objects Requires use of translation and rotation operators (matrices/vectors) Recall that very three dimensional object can be represented by a plane defined by 3 points

Lecture Superposition x y z a b c a’ b’ c’ x y z a b c a’ b’ c’ Identify 3 “equivalence” points in objects to be aligned

Lecture b’ c’ Superposition x y z x y z a b c a’ b’ c’ a b c Translate points a,b,c and a’,b’,c’ to origin

Lecture b’ c’ Superposition x y z a b c b’ c’ x y z  a b c Rotate the a,b,c plane clockwise by  about x axis

Lecture Superposition b’ c’ x y z a b c b’ c’ x y z a bc   Rotate the a,b,c plane clockwise by  about z axis

Lecture Superposition b’ c’ x y z a bc b’ c’ x y z a bc  Rotate the a,b,c plane clockwise by  about x axis

Lecture Superposition b’ c’ x y z a bc b’ c’ x y z a bc  ’ Rotate the a’,b’,c’ plane anticlockwise by  ’ about x axis

Lecture Superposition b’ c’ x y z a bc b’ c’ x y z a bc  ‘ Rotate the a’,b’,c’ plane anticlockwise by  ’ about z axis

Lecture Superposition b’ c’ x y z a bc Rotate the a’,b’,c’ plane clockwise by  ’ about x axis b’ c’ x y z a bc ’’

Lecture Superposition Apply all rotations and translations to remaining points b’ c’ x y z a bc b’ c’ x y z a bc

Lecture Superposition BeforeAfter b’ c’ x y z a bc x y z a b c a’ b’ c’

Lecture Returning to the “red” frame BeforeAfter y z x b’ c’ x y z a bc a b c

Lecture Returning to the “red” frame Begin with the superimposed structures on the x-y plane Apply counterclockwise rot. By  Apply counterclockwise rot. By  Apply counterclockwise rot. By  Apply red translation to red origin Just do things in reverse order!

Lecture Superposition - Applications Ideal for comparing or overlaying two or more protein structures Allows identification of structural homologues (CATH and SCOP) Allows loops to be inserted or replaced from loop libraries (comparative modelling) Allows side chains to be replaced or inserted with relative ease

Lecture Side Chain Placement SCWRL

Lecture C COOHH2NH2N H NH 3 + Amino Acid Side Chains

Lecture Adding a Side Chain x y z x y z x y z

Lecture Adding a Side Chain x y z x y z y

Lecture Adding a Side Chain x y z x y z y

Lecture Adding a Side Chain x y z x y z y

Lecture Adding a Side Chain x y z x y z y

Lecture Superposition The concept of superposition is key to many aspects of protein structure generation and comparison Superposition may be used to insert side chains and loops (for homology models) Side chains require more consideration as side chain packing ultimately determines the 3D structure of proteins

Lecture Superposition - RMSD The degree of similarity between two or more structures is described by its average root mean square deviation (RMSD): x1x1 x1x1 x5x5 x4x4 x3x3 x2x2 y1y1 y2y2 y3y3 y4y4 y5y5

Lecture Superposition Software Swiss PDB Viewer –Aligns 2 homologous structures

Lecture Superposition Software CE: Structure Comparison by Combinatorial Extension Superposition for 2 chains and for multiple chains (new)

Lecture Superposition Software SuperPose Superposition for 2 chains and for multiple chains Subdomain superposition Superposition of structures with low sequence identity

Lecture Definition Threading - A protein fold recognition technique that involves incrementally replacing the sequence of a known protein structure with a query sequence of unknown structure. The new “model” structure is evaluated using a simple heuristic measure of protein fold quality. The process is repeated against all known 3D structures until an optimal fit is found.

Lecture Why Threading? Secondary structure is more conserved than primary structure Tertiary structure is more conserved than secondary structure Therefore very remote relationships can be better detected through 2 o or 3 o structural homology instead of sequence homology

Lecture Visualizing Threading T H R E A D THREADINGSEQNCEECNQESGNI ERHTHREADINGSEQNCETHREAD GSEQNCEQCQESGIDAERTHR...

Lecture Visualizing Threading T H R E THREADINGSEQNCEECNQESGNI ERHTHREADINGSEQNCETHREAD GSEQNCEQCQESGIDAERTHR...

Lecture Visualizing Threading T H THREADINGSEQNCEECNQESGNI ERHTHREADINGSEQNCETHREAD GSEQNCEQCQESGIDAERTHR...

Lecture Visualizing Threading THREADINGSEQNCEECNQESGNI ERHTHREADINGSEQNCETHREAD GSEQNCEQCQESGIDAERTHR...

Lecture Visualizing Threading THREAD..SEQNCEECN..THREAD..SEQNCEECN..

Lecture Threading Database of 3D structures and sequences –Protein Data Bank (or non-redundant subset) Query sequence –Sequence < 25% identity to known structures Alignment protocol –Dynamic programming Evaluation protocol –Distance-based potential or secondary structure Ranking protocol

Lecture Kinds of Threading 2D Threading or Prediction Based Methods (PBM) –Predict secondary structure (SS) or ASA of query –Evaluate on basis of SS and/or ASA matches 3D Threading or Distance Based Methods (DBM) –Create a 3D model of the structure –Evaluate using a distance-based “hydrophobicity” or pseudo-thermodynamic potential

Lecture D Threading Algorithm Convert PDB to a database containing sequence, SS and ASA information Predict the SS and ASA for the query sequence using a “high-end” algorithm Perform a dynamic programming alignment using the query against the database (include sequence, SS & ASA) Rank the alignments and select the most probable fold

Lecture Database Conversion >Protein1 THREADINGSEQNCEECNQESGNI HHHHHHCCCCEEEEECCCHHHHHH ERHTHREADINGSEQNCETHREAD HHCCEEEEECCCCCHHHHHHHHHH >Protein2 QWETRYEWQEDFSHAECNQESGNI EEEEECCCCHHHHHHHHHHHHHHH YTREWQHGFDSASQWETRA CCCCEEEEECCCEEEEECC >Protein3 LKHGMNSNWEDFSHAECNQESG EEECCEEEECCCEEECCCCCCC

Lecture Secondary Structure Table 10 --

Lecture o Structure Identification DSSP - Database of Secondary Structures for Proteins (swift.embl-heidelberg.de/dssp) VADAR - Volume Area Dihedral Angle Reporter (redpoll.pharmacy.ualberta.ca) PDB - Protein Data Bank ( QHTAWCLTSEQHTAAVIWDCETPGKQNGAYQEDCA HHHHHHCCEEEEEEEEEEECCHHHHHHHCCCCCCC

Lecture Accessible Surface Area Solvent Probe Accessible Surface Van der Waals Surface Reentrant Surface

Lecture ASA Calculation DSSP - Database of Secondary Structures for Proteins (swift.embl-heidelberg.de/dssp) VADAR - Volume Area Dihedral Angle Reporter ( GetArea - QHTAWCLTSEQHTAAVIWDCETPGKQNGAYQEDCAMD BBPPBEEEEEPBPBPBPBBPEEEPBPEPEEEEEEEEE

Lecture Other ASA sites Connolly Molecular Surface Home Page – Naccess Home Page – ASA Parallelization – Protein Structure Database –

Lecture D Threading Algorithm Convert PDB to a database containing sequence, SS and ASA information Predict the SS and ASA for the query sequence using a “high-end” algorithm Perform a dynamic programming alignment using the query against the database (include sequence, SS & ASA) Rank the alignments and select the most probable fold

Lecture ASA Prediction PredictProtein-PHDacc (58%) – PredAcc (70%?) –condor.urbb.jussieu.fr/PredAccCfg.html QHTAW... QHTAWCLTSEQHTAAVIW BBPPBEEEEEPBPBPBPB

Lecture D Threading Algorithm Convert PDB to a database containing sequence, SS and ASA information Predict the SS and ASA for the query sequence using a “high-end” algorithm Perform a dynamic programming alignment using the query against the database (include sequence, SS & ASA) Rank the alignments and select the most probable fold

Lecture G E N ETICS G E N E S I S GENETICS G E N E S I S

Lecture S ij (Identity Matrix) A C D E F G H I K L M N P Q R S T V W Y A C D E F G H I K L M N P Q R S T V W Y

Lecture A A T V D A 1 V D A A T V D A 1 1 V D A A T V D A V D A A T V D A V 0 V D A A T V D A V V D A A T V D A V V D

Lecture A Simple Example... A A T V D A V V D A A T V D A V V D A A T V D A V V D A A T V D | | | | A - V V D A A T V D | | | | A V V D A A T V D | | | | A V - V D

Lecture Let’s Include 2 o info & ASA H E C H E C E P B E P B S ij = k 1 S ij + k 2 S ij + k 3 S ij seq strcasa total S ij strc S ij asa

Lecture A A T V D A 2 V D A A T V D A 2 2 V D A A T V D A V D A A T V D A V 1 V D A A T V D A V V D A A T V D A V V D E E E C C EECCEECC EECCEECC EECCEECC EECCEECC EECCEECC EECCEECC

Lecture A Simple Example... A A T V D A V V D A A T V D A V V D A A T V D A V V D E E E C C EECCEECC EECCEECC EECCEECC A A T V D | | | | A - V V D A A T V D | | | | A V V D A A T V D | | | | A V - V D

Lecture D Threading Performance In test sets 2D threading methods can identify 30-40% of proteins having very remote homologues (i.e. not detected by BLAST) using “minimal” non-redundant databases (<700 proteins) If the database is expanded ~4x the performance jumps to 70-75% Performs best on true homologues as opposed to postulated analogues

Lecture D Threading Advantages Algorithm is easy to implement Algorithm is very fast (10x faster than 3D threading approaches) The 2D database is small ( 1.5 Gbytes) Appears to be just as accurate as DBM or other 3D threading approaches Very amenable to web servers

Lecture Servers - PredictProtein

Lecture Servers - 123D

Lecture Servers - GenThreader

Lecture More Servers -

Lecture D Threading Disadvantages Reliability is not 100% making most threading predictions suspect unless experimental evidence can be used to support the conclusion Does not produce a 3D model at the end of the process Doesn’t include all aspects of 2 o and 3 o structure features in prediction process PSI-BLAST may be just as good (faster too!)

Lecture Making it Better Include 3D threading analysis as part of the 2D threading process -- offers another layer of information Include more information about the “coil” state (3-state prediction isn’t good enough) Include other biochemical (ligands, function, binding partners, motifs) or phylogenetic (origin, species) information

Lecture D Threading Servers Generate 3D models or coordinates of possible models based on input sequence Loopp (version 2) – 3D-PSSM – All require addresses since the process may take hours to complete

Lecture 3.375

Lecture 3.376