Lecture 3.31 Superposition & Threading † Gary Van Domselaar University of Alberta † Slides adapted from David Wishart
Lecture 3.32 Outline Vectors, matrices and other geometry issues General Superposition concepts Threading and threading methods
Lecture 3.33 Vectors Define Bonds and Atomic Positions x y z Origin CO bond
Lecture 3.34 Review - Vectors (1,2,1) (0,0,0) u u = 1i + 2j + 1k ^^^ u = = (1-0) 2 + (2-0) 2 + (1-0) 2 = 6 u Vectors have a length & a direction x y z
Lecture 3.35 Review - Vectors Vectors can be added together Vectors can be subtracted Vectors can be multiplied (dot or cross or by a matrix) Vectors can be transformed (resized) Vectors can be translated Vectors can be rotated
Lecture 3.36 Matrices A matrix is a table or “array” of characters A matrix is also called a tensor of “rank 2” row column A 5 x 6 Matrix # columns # rows
Lecture 3.37 Different Types of Matrices A square Matrix A symmetric Matrix A column Matrix (A vector)
Lecture 3.38 Different Types of Matrices A B C D E F G H I J K L M N O P Q R S T U V W X cos sin 0 sin -cos A rectangular Matrix A rotation Matrix A row Matrix (A vector)
Lecture 3.39 Review - Matrix Multiplication x1 + 4x2 + 0x0 2x0 + 4x1 + 0x1 2x2 + 4x3 + 0x0 1x1 + 3x2 + 1x0 1x0 + 3x1 + 1x1 1x2 + 3x3 + 1x0 1x1 + 0x2 + 0x0 1x0 + 0x1 + 0x1 1x2 + 0x3 + 0x0 x
Lecture Rotation cos sin 0 -sin cos cos sin 0 -sin cos Rotate about x Rotate about z x z y
Lecture Rotation cos sin 0 -sin cos cos sin 0 -sin cos Clockwise about xClockwise about z cos -sin 0 sin cos cos -sin 0 sin cos Counterclockwise about xCounterclockwise about z
Lecture Rotation X = X = x y z x y z cos sin 0 -sin cos cos sin 0 -sin cos
Lecture Rotation (Detail) X = x y z x y z = cos sin -sin + cos cos sin 0 -sin cos cos sin 0 -sin cos
Lecture Superposition Objective is to match or overlay 2 or more similar objects Requires use of translation and rotation operators (matrices/vectors) Recall that very three dimensional object can be represented by a plane defined by 3 points
Lecture Superposition x y z a b c a’ b’ c’ x y z a b c a’ b’ c’ Identify 3 “equivalence” points in objects to be aligned
Lecture b’ c’ Superposition x y z x y z a b c a’ b’ c’ a b c Translate points a,b,c and a’,b’,c’ to origin
Lecture b’ c’ Superposition x y z a b c b’ c’ x y z a b c Rotate the a,b,c plane clockwise by about x axis
Lecture Superposition b’ c’ x y z a b c b’ c’ x y z a bc Rotate the a,b,c plane clockwise by about z axis
Lecture Superposition b’ c’ x y z a bc b’ c’ x y z a bc Rotate the a,b,c plane clockwise by about x axis
Lecture Superposition b’ c’ x y z a bc b’ c’ x y z a bc ’ Rotate the a’,b’,c’ plane anticlockwise by ’ about x axis
Lecture Superposition b’ c’ x y z a bc b’ c’ x y z a bc ‘ Rotate the a’,b’,c’ plane anticlockwise by ’ about z axis
Lecture Superposition b’ c’ x y z a bc Rotate the a’,b’,c’ plane clockwise by ’ about x axis b’ c’ x y z a bc ’’
Lecture Superposition Apply all rotations and translations to remaining points b’ c’ x y z a bc b’ c’ x y z a bc
Lecture Superposition BeforeAfter b’ c’ x y z a bc x y z a b c a’ b’ c’
Lecture Returning to the “red” frame BeforeAfter y z x b’ c’ x y z a bc a b c
Lecture Returning to the “red” frame Begin with the superimposed structures on the x-y plane Apply counterclockwise rot. By Apply counterclockwise rot. By Apply counterclockwise rot. By Apply red translation to red origin Just do things in reverse order!
Lecture Superposition - Applications Ideal for comparing or overlaying two or more protein structures Allows identification of structural homologues (CATH and SCOP) Allows loops to be inserted or replaced from loop libraries (comparative modelling) Allows side chains to be replaced or inserted with relative ease
Lecture Side Chain Placement SCWRL
Lecture C COOHH2NH2N H NH 3 + Amino Acid Side Chains
Lecture Adding a Side Chain x y z x y z x y z
Lecture Adding a Side Chain x y z x y z y
Lecture Adding a Side Chain x y z x y z y
Lecture Adding a Side Chain x y z x y z y
Lecture Adding a Side Chain x y z x y z y
Lecture Superposition The concept of superposition is key to many aspects of protein structure generation and comparison Superposition may be used to insert side chains and loops (for homology models) Side chains require more consideration as side chain packing ultimately determines the 3D structure of proteins
Lecture Superposition - RMSD The degree of similarity between two or more structures is described by its average root mean square deviation (RMSD): x1x1 x1x1 x5x5 x4x4 x3x3 x2x2 y1y1 y2y2 y3y3 y4y4 y5y5
Lecture Superposition Software Swiss PDB Viewer –Aligns 2 homologous structures
Lecture Superposition Software CE: Structure Comparison by Combinatorial Extension Superposition for 2 chains and for multiple chains (new)
Lecture Superposition Software SuperPose Superposition for 2 chains and for multiple chains Subdomain superposition Superposition of structures with low sequence identity
Lecture Definition Threading - A protein fold recognition technique that involves incrementally replacing the sequence of a known protein structure with a query sequence of unknown structure. The new “model” structure is evaluated using a simple heuristic measure of protein fold quality. The process is repeated against all known 3D structures until an optimal fit is found.
Lecture Why Threading? Secondary structure is more conserved than primary structure Tertiary structure is more conserved than secondary structure Therefore very remote relationships can be better detected through 2 o or 3 o structural homology instead of sequence homology
Lecture Visualizing Threading T H R E A D THREADINGSEQNCEECNQESGNI ERHTHREADINGSEQNCETHREAD GSEQNCEQCQESGIDAERTHR...
Lecture Visualizing Threading T H R E THREADINGSEQNCEECNQESGNI ERHTHREADINGSEQNCETHREAD GSEQNCEQCQESGIDAERTHR...
Lecture Visualizing Threading T H THREADINGSEQNCEECNQESGNI ERHTHREADINGSEQNCETHREAD GSEQNCEQCQESGIDAERTHR...
Lecture Visualizing Threading THREADINGSEQNCEECNQESGNI ERHTHREADINGSEQNCETHREAD GSEQNCEQCQESGIDAERTHR...
Lecture Visualizing Threading THREAD..SEQNCEECN..THREAD..SEQNCEECN..
Lecture Threading Database of 3D structures and sequences –Protein Data Bank (or non-redundant subset) Query sequence –Sequence < 25% identity to known structures Alignment protocol –Dynamic programming Evaluation protocol –Distance-based potential or secondary structure Ranking protocol
Lecture Kinds of Threading 2D Threading or Prediction Based Methods (PBM) –Predict secondary structure (SS) or ASA of query –Evaluate on basis of SS and/or ASA matches 3D Threading or Distance Based Methods (DBM) –Create a 3D model of the structure –Evaluate using a distance-based “hydrophobicity” or pseudo-thermodynamic potential
Lecture D Threading Algorithm Convert PDB to a database containing sequence, SS and ASA information Predict the SS and ASA for the query sequence using a “high-end” algorithm Perform a dynamic programming alignment using the query against the database (include sequence, SS & ASA) Rank the alignments and select the most probable fold
Lecture Database Conversion >Protein1 THREADINGSEQNCEECNQESGNI HHHHHHCCCCEEEEECCCHHHHHH ERHTHREADINGSEQNCETHREAD HHCCEEEEECCCCCHHHHHHHHHH >Protein2 QWETRYEWQEDFSHAECNQESGNI EEEEECCCCHHHHHHHHHHHHHHH YTREWQHGFDSASQWETRA CCCCEEEEECCCEEEEECC >Protein3 LKHGMNSNWEDFSHAECNQESG EEECCEEEECCCEEECCCCCCC
Lecture Secondary Structure Table 10 --
Lecture o Structure Identification DSSP - Database of Secondary Structures for Proteins (swift.embl-heidelberg.de/dssp) VADAR - Volume Area Dihedral Angle Reporter (redpoll.pharmacy.ualberta.ca) PDB - Protein Data Bank ( QHTAWCLTSEQHTAAVIWDCETPGKQNGAYQEDCA HHHHHHCCEEEEEEEEEEECCHHHHHHHCCCCCCC
Lecture Accessible Surface Area Solvent Probe Accessible Surface Van der Waals Surface Reentrant Surface
Lecture ASA Calculation DSSP - Database of Secondary Structures for Proteins (swift.embl-heidelberg.de/dssp) VADAR - Volume Area Dihedral Angle Reporter ( GetArea - QHTAWCLTSEQHTAAVIWDCETPGKQNGAYQEDCAMD BBPPBEEEEEPBPBPBPBBPEEEPBPEPEEEEEEEEE
Lecture Other ASA sites Connolly Molecular Surface Home Page – Naccess Home Page – ASA Parallelization – Protein Structure Database –
Lecture D Threading Algorithm Convert PDB to a database containing sequence, SS and ASA information Predict the SS and ASA for the query sequence using a “high-end” algorithm Perform a dynamic programming alignment using the query against the database (include sequence, SS & ASA) Rank the alignments and select the most probable fold
Lecture ASA Prediction PredictProtein-PHDacc (58%) – PredAcc (70%?) –condor.urbb.jussieu.fr/PredAccCfg.html QHTAW... QHTAWCLTSEQHTAAVIW BBPPBEEEEEPBPBPBPB
Lecture D Threading Algorithm Convert PDB to a database containing sequence, SS and ASA information Predict the SS and ASA for the query sequence using a “high-end” algorithm Perform a dynamic programming alignment using the query against the database (include sequence, SS & ASA) Rank the alignments and select the most probable fold
Lecture G E N ETICS G E N E S I S GENETICS G E N E S I S
Lecture S ij (Identity Matrix) A C D E F G H I K L M N P Q R S T V W Y A C D E F G H I K L M N P Q R S T V W Y
Lecture A A T V D A 1 V D A A T V D A 1 1 V D A A T V D A V D A A T V D A V 0 V D A A T V D A V V D A A T V D A V V D
Lecture A Simple Example... A A T V D A V V D A A T V D A V V D A A T V D A V V D A A T V D | | | | A - V V D A A T V D | | | | A V V D A A T V D | | | | A V - V D
Lecture Let’s Include 2 o info & ASA H E C H E C E P B E P B S ij = k 1 S ij + k 2 S ij + k 3 S ij seq strcasa total S ij strc S ij asa
Lecture A A T V D A 2 V D A A T V D A 2 2 V D A A T V D A V D A A T V D A V 1 V D A A T V D A V V D A A T V D A V V D E E E C C EECCEECC EECCEECC EECCEECC EECCEECC EECCEECC EECCEECC
Lecture A Simple Example... A A T V D A V V D A A T V D A V V D A A T V D A V V D E E E C C EECCEECC EECCEECC EECCEECC A A T V D | | | | A - V V D A A T V D | | | | A V V D A A T V D | | | | A V - V D
Lecture D Threading Performance In test sets 2D threading methods can identify 30-40% of proteins having very remote homologues (i.e. not detected by BLAST) using “minimal” non-redundant databases (<700 proteins) If the database is expanded ~4x the performance jumps to 70-75% Performs best on true homologues as opposed to postulated analogues
Lecture D Threading Advantages Algorithm is easy to implement Algorithm is very fast (10x faster than 3D threading approaches) The 2D database is small ( 1.5 Gbytes) Appears to be just as accurate as DBM or other 3D threading approaches Very amenable to web servers
Lecture Servers - PredictProtein
Lecture Servers - 123D
Lecture Servers - GenThreader
Lecture More Servers -
Lecture D Threading Disadvantages Reliability is not 100% making most threading predictions suspect unless experimental evidence can be used to support the conclusion Does not produce a 3D model at the end of the process Doesn’t include all aspects of 2 o and 3 o structure features in prediction process PSI-BLAST may be just as good (faster too!)
Lecture Making it Better Include 3D threading analysis as part of the 2D threading process -- offers another layer of information Include more information about the “coil” state (3-state prediction isn’t good enough) Include other biochemical (ligands, function, binding partners, motifs) or phylogenetic (origin, species) information
Lecture D Threading Servers Generate 3D models or coordinates of possible models based on input sequence Loopp (version 2) – 3D-PSSM – All require addresses since the process may take hours to complete
Lecture 3.375
Lecture 3.376