Protein Structure Modeling (2). Prediction

Protein Structure Modeling (2)

Prediction http://www.bmm.icnet.uk/people/rob/CCP11BBS/

Template-Based Prediction Structure is better conserved than sequence Structure can adopt a wide range of mutations. Physical forces favor certain structures. Number of fold is limited. Currently ~700 Total: 1,000 ~10,000 TIM barrel

Evolutionary Comparison Sequence-sequence comparison: homology modeling (similar sequence – similar structure) Sequence-structure comparison: threading / fold recognition (sequences fold into a limited number of folds)

~90% of new globular proteins share similar folds with known structures, implying the general applicability of comparative modeling methods for structure prediction general applicability of template-based modeling methods for structure prediction (currently 60-70% of new proteins, and this number is growing as more structures being solved) NIH Structural Genomics Initiative plans to experimentally solve ~10,000 “unique” structures and predict the rest using computational methods Scope of the Problem

Why do we need structural models? 1.only 20% of all proteins have a homologue in PDB 2.for ~ 70% of the proteins a suitable structure from which to build a 3D model is available. 3.predict functions of proteins that share low degrees of sequence similarity 4.identify proteins that may have new folds

How many structures are there ? Source: http://www.rcsb.org/pdb/holdings.html Protein Data Bank (PDB) Status: March 12, 2002

How many folds are there ? Source: http://scop.berkeley.edu/count.html Structural Classification of Proteins (SCOP): Status (1 Mar 2002) based on 13220 PDB entries

Identification of new folds Source: http://www.rcsb.org/pdb/holdings.html

Old fold vs. new fold A chain fold is considered old if it is similar to one of selected chains according to the following criteria: RMSD < 3.0Å number of aligned positions >= 70% of the length of this chain.

How many more folds are there ? Estimation: number of possible folds ~ 4,000 database of 930 folds covers 90% of protein families Source: Govindarajan S., Recabarren R., & Goldstein R.A. 1999 Proteins: Structure, Function, and Genetics 35:408-414

Homology Modeling also called “comparative protein modeling”, “modeling by homology”, “knowledge-based modeling” the most successful tool for prediction of protein structure from sequence

Homology Modeling Sequence is aligned with sequence of known structure, usually share sequence identity of 30% or more. The sequence is then superimposed onto the template, replacing equivalent side chain atoms where necessary. Refinement of structure to make it closer to actual than the template.

Homology Modeling Given a sequence what is the best way of mounting it onto a known structure GHIKLSYTVNEQNLKPERFFYTSAVAIL

What is the basis for homology modeling? The relative RMSD of the  -carbon coordinates is ~ 1 Å, if the protein core share 50% identity. Protein sequences with > 70% similarity allow construction of models with < 3 Å RMSD Reduction to: -Loop structure modeling (connections , , ,  ) -Side-chain modeling (energy refinement)

Input requirements for Homology Modeling 1.TARGET SEQUENCE (primary protein sequence with unknown structure) 2.TEMPLATE (protein whose 3D structure has already been determined) 3.SEQUENCE ALIGNMENT (using Clustal W) between template and target sequence

Find the appropriate template Please enter your sequence in FASTA format. SWISS-MODEL Blast Find the Appropriate Modelling Template(s) Source: http://www.expasy.org/swissmod/SM_Blast.html

Choose a template

Template search results 4CD2A toptop LIGAND INDUCED CONFORMATIONAL CHANGES IN THE CRYSTAL STRUCTURES OF PNEUMOCYSTIS CARINII DIHYDROFOLATE REDUCTAS COMPLEXES WITH FOLATE AND NADP+ MOL_ID: 1; MOLECULE: DIHYDROFOLATE REDUCTASE; CHAIN: A; SYNONYM: PCDHFR; EC: 1.5.1.3; ENGINEERED: YES MOL_ID: 1; ORGANISM_SCIENTIFIC: PNEUMOCYSTIS CARINII; ORGANISM_COMMON: BACTERIA; EXPRESSION_SYSTEM: ESCHERICHIA COLI; EXPRESSION_SYSTEM_COMMON: BACTERIA; EXPRESSION_SYSTEM_PLASMID: PT7-7; EXPRESSION_SYSTEM_GENE: C-DNA P.CARINII DHFR V.CODY,N.GALITSKY,D.RAK,J.R.LUFT,W.PANGBORN,S.F.QUEENER Length = 202 Score = 157 bits (393), Expect = 9e-39 Identities = 82/220 (37 Positives = 138/220 (62 Gaps = 22/220 (10 Query: 232 RDLTMIVAVSSPNLGIGKKNSMPWHIKQEMAYFANVTSSTESSGQLEEGKSKIMNVVIMG 291 LT IVA GIG NS PW K E YF VTS E MNVV MG Sbjct: 1 KSLTLIVALTT-SYGIGRSNSLPWKLKKEISYFKRVTSFVPTFDSFES-----MNVVLMG 54

Mounting the sequence onto the structure template Target

Mounted sequence Yellow = adrenergic receptor sequence Blue = adrenergic receptor (PDB 1F88 )

Modeled structure Gaps

Corrected Model

Refinement Bond angle energy Dihedral angle energy van der Waals energy Electrostatic interactions Hydrogenbonds Geometrical constraints Packing density

Evaluating your model inaccurate if atomic coordinates are not within 0.5 A RMSD of template control

Threading-Based Protein Structure Prediction

Threading, Fold recognition, Protein fold assignments Given: a database of protein structures / folds summarizing designs found in nature individual protein sequence Goal: Find the structural backbone that best fits the protein sequence. Opposite of protein folding problem.

Concept of Threading structure prediction through recognizing native-like fold oThread (align or place) a query protein sequence onto a template structure in “optimal” way oGood alignment gives approximate backbone structure Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE Template set Prediction accuracy: fold recognition / alignment

Why is it called threading ? threading a specific sequence through all known folds for each fold estimate the probability that the sequence can have that fold

Fold Recognition and Threading Limited number of folds: 800-1000 Known number of folds ~ 700 Sequence-fold agreement ?

Application of Threading Predict structure Identify distant homologues of protein families Predict function of protein with low degree of sequence similarity to other proteins

Structure Families SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/http://scop.mrc-lmb.cam.ac.uk/scop/ (domains, good annotation) CATH: http://www.biochem.ucl.ac.uk/bsm/cath/http://www.biochem.ucl.ac.uk/bsm/cath/ CE: http://cl.sdsc.edu/ce.htmlhttp://cl.sdsc.edu/ce.html Dali Domain Dictionary: http://columba.ebi.ac.uk:8765/holm/ddd2.cgi http://columba.ebi.ac.uk:8765/holm/ddd2.cgi FSSP: http://www2.ebi.ac.uk/dali/fssp/http://www2.ebi.ac.uk/dali/fssp/ (chains, updated weekly) HOMSTRAD: http://www-cryst.bioc.cam.ac.uk/~homstrad/ HSSP: http://swift.embl-heidelberg.de/hssp/http://swift.embl-heidelberg.de/hssp/

Hierarchy of Templates Homologous family: evolutionarly related with a significant sequence identity -- 1827 in SCOP Superfamily: different families whose structural and functional features suggest common evolutionary origin --1073 in SCOP (good tradeoff for accuracy/computing) Fold: different superfamilies having same major secondary structures in same arrangement and with same topological connections (energetics favoring certain packing arrangements); -- 686 out of 39,893 in SCOP Class: secondary structure composition.

Template and Fold Secondary structures and their arrangement Non-redundant representatives through structure-structure comparison

Core of a Template Core secondary structures:  -helices and  -strands

Representation of folds: Definition of Template Residue type / profile Secondary structure type Solvent accessibility Coordinates for C  / C  (Pairwise preferences between two residues)

Threading - is alignment squared. Environmental preferences of aa’s: 3DPSSM –As environment classes (  -helix,  -sheet), solvent accessibility –Pair potentials: physical interactions –Substitution matrices Possible alignments to template is evaluated. Evaluation of each position is dependent on rest of alignment.

Scoring Function …YKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEW… How well a residue fits a structural environment: E_s (singleton term) How preferable to put two particular residues nearby: E_p (pairwise term) Alignment gap penalty: E_g Total energy: E_m + E_p + E_s + E_g Describe how sequence fits template How well a sequence residue aligns to a residue on structure: E_m (mutation term)

What We Learned… Why threading? Evolutionary foundation of threading Template library and its generation Concept of scoring function

CASP (Critical Assesment of Structure Predictions) the annual competition in protein structure prediction. http://predictioncenter.llnl.gov/casp5/Casp5.html

CASP (Critical Assesment of Structure Predictions) Targets for comparative modelling(15) fold recognition(22) ab initio modelling(15) http://predictioncenter.llnl.gov/casp5/Casp5.html

CASP Experiment Experimentalists are solicited to provide information about structures expected to be soon solved Predictors retrieve the sequence from prediction center (predictioncenter.llnl.gov) Deposit predictions throughout the season Meeting held to assess results

Prediction Categories Comparative Modeling – modeling by homology Fold Recognition –Advanced Sequence Comparison Methods –Threading New Fold Methods/ “ab initio” Categories are separated by distance from any known structure

Expected Performance Predicted model X-ray structure target t0100 PROSPECT (threading) prediction in CASP4: 12 out 19 folds recognized

Conclusions When a suitable template structure exists in PDB, using homology modeling on target sequence is best for predicting the structure Fold Recognition servers can help find a template when conventional sequence analysis methods fail Combining elements from several sources may allow you to construct reasonably accurate models

Prediction http://www.bmm.icnet.uk/people/rob/CCP11BBS/

Protein Structure Modeling (2). Prediction

Similar presentations

Presentation on theme: "Protein Structure Modeling (2). Prediction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Protein Structure Modeling (2). Prediction

Similar presentations

Presentation on theme: "Protein Structure Modeling (2). Prediction"— Presentation transcript:

Similar presentations

About project

Feedback