CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Protein Fold recognition Morten Nielsen, CBS, BioCentrum, DTU.

Slides:



Advertisements
Similar presentations
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Hidden Markov Models What are the good for? Morten Nielsen CBS.
Pfam(Protein families )
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
Hidden Markov models for detecting remote protein homologies Kevin Karplus, Christian Barrett, Richard Hughey Georgia Hadjicharalambous.
Protein Fold recognition Morten Nielsen, CBS, BioSys, DTU.
Protein Tertiary Structure Prediction
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Protein Fold recognition Morten Nielsen, CBS, BioCentrum, DTU.
Fold Recognition Ole Lund, Assistant professor, CBS.
Protein structure and homology modeling Morten Nielsen, CBS, BioCentrum, DTU.
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
Protein Fold recognition Morten Nielsen, CBS, Department of Systems Biology, DTU.
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Protein structure Anne Mølgaard, Center for Biological Sequence Analysis.
Thomas Blicher Center for Biological Sequence Analysis
Fold Recognition Ole Lund, Associate professor, CBS.
Protein Fold recognition
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Proteins, Pair HMMs, and Alignment. CS262 Lecture 8, Win06, Batzoglou A state model for alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC.
Similar Sequence Similar Function Charles Yan Spring 2006.
Protein homology modeling Morten Nielsen, CBS, BioCentrum, DTU.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein structure Classification Ole Lund, Associate professor, CBS, DTU.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Protein Classification. PDB Growth New PDB structures.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Bioinformatics Ayesha M. Khan Spring 2013.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
COMPARATIVE or HOMOLOGY MODELING
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Programme Last week’s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues Summary.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Homology Modeling 原理、流程,還有如何用該工具去預測三級結構 Lu Chih-Hao 1 1.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Psi-Blast Morten Nielsen, Department of systems biology, DTU.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein Classification
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Blast heuristics, Psi-Blast, and Sequence profiles Morten Nielsen Department of systems biology, DTU.
Chapter 14 Protein Structure Classification
Protein Structure Prediction and Protein Homology modeling
Outline Basic Local Alignment Search Tool
Homology Modeling.
Protein structure prediction.
Programme Last week’s quiz results + Summary
Presentation transcript:

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Protein Fold recognition Morten Nielsen, CBS, BioCentrum, DTU

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Outline Why model protein structure Classification of protein structures –Fold, Superfamily, Family Protein homology modeling –Template (fold) recognition –Alignment –Side chain modeling –Loop modeling Reliability measures –%id bad, P-value good Historical overview –Blast (simple alignment) –Psi Blast (profiles) –Profile-profile alignment –Structural features –Recombinant or democratic homology modeling Best methods

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Why protein modeling? Experimental effort to determine protein structure is very large and costly The gap between the size of the protein sequence data and protein structure data is large and increasing Close to 50% of all new sequences can be homology modeled

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Swiss-Prot database

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU PDB New Fold Growth The number of unique folds in nature is fairly small (possibly a few thousands) 90% of new structures submitted to PDB in the past three years have similar structural folds in PDB New folds Old folds New PDB structures

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Protein classification Number of protein sequences grow exponentially Number of solved structures grow exponentially Number of new folds identified very small (and close to constant) Protein classification can –Generate overview of structure types –Detect similarities (evolutionary relationships) between protein sequences

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Protein world Protein fold Protein structure classification Protein superfamily Protein family New Fold

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Classification schemes SCOP –Manual classification (A. Murzin) CATH –Semi manual classification (C. Orengo) FSSP –Automatic classification (L. Holm)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Levels in SCOP Class# Folds# Superfamilies # Families All alpha proteins All beta proteins Alpha and beta proteins (a/b) Alpha and beta proteins (a+b) Multi-domain proteins Membrane and cell surface proteins Small proteins Total

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Major classes in SCOP Classes –All alpha proteins –Alpha and beta proteins (a/b) –Alpha and beta proteins (a+b) –Multi-domain proteins –Membrane and cell surface proteins –Small proteins

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU All  : Hemoglobin (1bab)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU All  : Immunoglobulin (8fab)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU  Triosephosphate isomerase (1hti)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU  : Lysozyme (1jsf)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Families Proteins whose evolutionarily relationship is readily recognizable from the sequence (>~25% sequence identity) Families are further subdivided in to Proteins Proteins are divided into Species –The same protein may be found in several species

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Superfamilies Proteins which are (remote) evolutionarily related –Sequence similarity low –Share function –Share special structural features Relationships between members of a superfamily may not be readily recognizable from the sequence alone

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Folds * Proteins which have >~50% of their secondary structure elements arranged the in the same order in the protein chain and in three dimensions are classified as having the same fold No evolutionary relation between proteins *confusingly also called fold classes

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Links PDB (protein structure database) – SCOP (protein classification database) –scop.berkeley.eduscop.berkeley.edu CATH (protein classification database) – FSSP (protein classification database) –

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Superfamilies Proteins which are (remote) evolutionarily related –Sequence similarity low –Share function –Share special structural features Relationships between members of a superfamily may not be readily recognizable from the sequence alone Fold Family Superfamily Proteins

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Model accuracy. Swiss-model models sharing 25-95% sequence identity with the submitted sequences (

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Identification of fold If sequence similarity is high proteins share structure (Safe zone) If sequence similarity is low proteins may share structure (Twilight zone) Most proteins do not have a high sequence homologous partner Rajesh Nair & Burkhard Rost Protein Science, 2002, 11,

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Identification of correct fold % ID is a poor measure –Many evolutionary related proteins share low sequence homology Alignment score even worse –Many sequence will score high against every thing (hydrophobic stretches) P-value or E-value more reliable

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU P and E values E-value –Number of expected hits in database with score higher than match –Depends on database size P-value –Probability that a random hit will have score higher than match –Database size independent Score P(Score) Score hits with higher score (E=10) hits in database => P=10/10000 = 0.001

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Protein Homology modeling Identify fold (template) for modeling –Find the structure in the PDB database that resembles the unknown structure the most –Can be used to predict function Align protein sequence to template –Simple alignment methods –Sequence profiles –Threading methods –Pseudo force fields Model side chains and loops

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Template identification Simple sequence based methods –Align (BLAST) sequence against sequence of proteins with known structure (PDB database) Sequence profile based methods –Align sequence profile (Psi-BLAST) against sequence of proteins with known structure (PDB) –Align sequence profile against profile of proteins with known structure (FFAS) Sequence and structure based methods –Align profile and predicted secondary structure against proteins with known structure (3D-PSSM)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Template identification Threading methods –Align sequence against structural environment of proteins with known structure Use biological information –Functional annotation in databases –Active sites

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence profiles In conventional alignment, a scoring matrix (BLOSUM62) gives the score for matching two amino acids –In reality not all positions in a protein are equally likely to mutate –Some amino acids (active cites) are highly conserved, and the score for mismatch must be very high –Other amino acids are mutate almost for free, and the score for mismatch is lower than the BLOSUM score Sequence profiles can capture this

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU ADDGSLAFVPSEF--SISPGEKIVFKNNAGFPHNIVFDEDSIPSGVDASKISMSEEDLLN TVNGAI--PGPLIAERLKEGQNVRVTNTLDEDTSIHWHGLLVPFGMDGVPGVSFPG---I -TSMAPAFGVQEFYRTVKQGDEVTVTIT-----NIDQIED-VSHGFVVVNHGVSME---I IE--KMKYLTPEVFYTIKAGETVYWVNGEVMPHNVAFKKGIV--GEDAFRGEMMTKD--- -TSVAPSFSQPSF-LTVKEGDEVTVIVTNLDE------IDDLTHGFTMGNHGVAME---V ASAETMVFEPDFLVLEIGPGDRVRFVPTHK-SHNAATIDGMVPEGVEGFKSRINDE---- TVNGQ--FPGPRLAGVAREGDQVLVKVVNHVAENITIHWHGVQLGTGWADGPAYVTQCPI Sequence profiles Conserved Non-conserved Matching any thing but G => large negative score Any thing can match TKAVVLTFNTSVEICLVMQGTSIV----AAESHPLHLHGFNFPSNFNLVDPMERNTAGVP

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence profiles 1.Align (BLAST) sequence against large sequence database (Swiss-Prot) 2.Select significant alignments and make profile (weight matrix) using techniques for sequence weighting and pseudo counts (see lecture on HMM’s) 3.Use weight matrix to align against sequence database to find new significant hits 4.Repeat 2 and 3 until stop criteria

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU PDB-BLAST Procedure 1.Build sequence profile by iterative PSI- BLAST search against a sequence database 2.Use profile to search database of proteins with known structure (PDB)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Transitive BLAST Procedure 1.Find homologues to query (your) sequence 2.Find homologues to these homologues 3.Etc. –Can be implemented with e.g. BLAST or PSI- BLAST Also known as Intermediate Sequence Search (ISS)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Example Sequence profiles Alignment of protein sequences 1PLC._ and 1GYC.A E-value > 1000 Profile alignment –Align 1PLC._ against Swiss-prot –Make position specific weight matrix from alignment –Use this matrix to align 1PLC._ against 1GYC.A E-value < Rmsd=3.3

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence profiles Score = 97.1 bits (241), Expect = 9e-22 Identities = 13/107 (12%), Positives = 27/107 (25%), Gaps = 17/107 (15%) 1PLC._: 3 ADDGSLAFVPSEFSISPGEKI------VFKNNAGFPHNIVFDEDSIPSGVDASKIS 56 F + G++ N+ + +G + + 1GYC.A: VFPSPLITGKKGDRFQLNVVDTLTNHTMLKSTSIHWHGFFQAGTNWADGP 79 1PLC._: 57 MSEEDLLNAKGETFEVAL---SNKGEYSFYCSP--HQGAGMVGKVTV 98 A G +F G + ++ G+ G V 1GYC.A: 80 AFVNQCPIASGHSFLYDFHVPDQAGTFWYHSHLSTQYCDGLRGPFVV 126 Rmsd=3.3 Å Model red Template blue

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Including structure Sequence with in a protein superfamily share remote sequence homology, but they share high structural homology Structure is known for template Predict structural properties for query –Secondary structure –Surface exposure

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Using structure Sequence&structure profile-profile based alignments –Template profiles Multiple structure alignments Sequence based profiles –Query profile Sequence based profile Predicted secondary structure –Position specific gap penalties derived from secondary structure

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Structure biased alignment (3D-PSSM)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Threading A T N L Y K E T L.. Deletions Insertion Alignment score from structural fitness (pair potential) How well does K fit environment at P6? If P8 is acidic then fine, if P8 is basic then poor

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Threading Threading does not work –The average protein does not exist Threading can be used in combination with sequence profiles, local structural features to improve alignment

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU CASP –Critical Assessment of Structure Predictions –Every second year –Sequences from about-to-be-solved-structures are given to groups who submit their predictions before the structure is published –Modelers make prediction –Meeting in December where correct answers are revealed

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU CASP5 overview

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Successful fold recognition groups at CASP5 3D-Jury (Leszek Rychlewski) 3D-CAM (Krzysztof Ginalski) Template recombination (Paul Bates) HMAP (Barry Honig) PROSPECT (Ying Xu) ATOME (Gilles Labesse)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Democratic homology modeling Let the silent majority rule –The highest score hit will often be wrong –Many prediction methods will have the correct fold among the top hits –If many different prediction methods all have some fold among the top hits, this fold is probably correct

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU 3D-Jury (Rychlewski) Inspired by Ab initio modeling methods –Average of frequently obtained low energy structures is often closer to the native structure than the lowest energy structure Find most abundant high scoring model in a list of prediction from several predictors 1.Use output from a set of servers 2.Superimpose all pairs of structures 3.Similarity score S ij = # of C a pairs within 3.5Å (if #>40;else S ij =0) 4.3D-Jury score =  ij S ij /(N+1) Similar methods developed by A Elofsson (Pcons) and D Fischer (3D shotgun)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU LiveBench The Live Bench Project is a continuous benchmarking program. Every week sequences of newly released PDB proteins are being submitted to participating fold recognition servers. The results are collected and continuous evaluated using automated model assessment programs. A summary of the results is produced after several months of data collection. The servers must delay the updating of their structural template libraries by one week to participate

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Meta prediction server Web interface to a list of public protein structure prediction servers Submit query sequence to all selected servers in one go

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Meta Server

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Meta Server

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU 3D Jury

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU 188 targets in total Threshold for 5 false positives: 50 for 3D Jury

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Links to fold recognition servers Databases of links – – Meta server – 3DPSSM – good graphical output – GenTHREADER – FUGUE2 – SAM – FOLD – FFAS/PDBBLAST –

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU From fold to structure Flying to the moon has not made man conquer space Finding the right fold does not allow you to make accurate protein models –Can allow prediction of protein function Alignment is still a very hard problem –Most protein interactions are determined by the loops, and they are the least conserved parts of a protein structure

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Ab initio protein modeling Modeling of newfold proteins Only when every thing else fails Challenge Close to impossible to model Natures folding potential Example

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU New folds are in general constructed from a set of subunits, where each subunit is a part of a known fold. The subunits are small compared to the overall fold of the protein. No objective function exists to guide the global packing of the subunits. Challenge. Folding potential d ij = 6Å Objective function s ij = 120aa

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Fragments with correct local structure Natures potential Empirical potential A way to solution Glue structure piece wise from fragments. Guide process by empirical potential (Potential of mean force)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Examples (Rosetta web server) Rosetta prediction Homology modeling

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Take home message Identifying the correct fold is only a small step towards successful homology modeling Do not trust % ID or alignment score to identify the fold. Use p-values Use sequence profiles and local protein structure to align sequences Do not trust one single prediction method, use consensus methods (3D Jury) Only if every things fail, use ab initio methods