Download presentation
Presentation is loading. Please wait.
Published byHeather Gibson Modified over 9 years ago
1
Lecture 10 – protein structure prediction
2
A protein sequence
3
>gi|22330039|ref|NP_683383.1| unknown protein; protein id: At1g45196.1 [Arabidopsis thaliana] MPSESSYKVHRPAKSGGSRRDSSPDSIIFTPESNLSLFSSASVSVDRCSSTSDAHDRDDSLISAWKEEFEVKKDDESQNL DSARSSFSVALRECQERRSRSEALAKKLDYQRTVSLDLSNVTSTSPRVVNVKRASVSTNKSSVFPSPGTPTYLHSMQKGW SSERVPLRSNGGRSPPNAGFLPLYSGRTVPSKWEDAERWIVSPLAKEGAARTSFGASHERRPKAKSGPLGPPGFAYYSLY SPAVPMVHGGNMGGLTASSPFSAGVLPETVSSRGSTTAAFPQRIDPSMARSVSIHGCSETLASSSQDDIHESMKDAATDA QAVSRRDMATQMSPEGSIRFSPERQCSFSPSSPSPLPISELLNAHSNRAEVKDLQVDEKVTVTRWSKKHRGLYHGNGSKM RDHVHGKATNHEDLTCATEEARIISWENLQKAKAEAAIRKLEKYFPQMKLEKKRSSSMEKIMRKVKSAEKRAEEMRRSVL DNRVSTASHGKASSFKRSGKKKIPSLSGCFTCHVF
4
Protein Structure Heparin docking – Red: heparin; blue: central domain Yellow: C-terminal domain
5
A Protein Structure alpha-helix beta-sheet loop core
6
Domain and Folds A discrete portion of a protein assumed to fold independently of the rest of the protein and possessing its own function. Most proteins have multi-domains. The core 3D structure of a domain is called a fold. There are only a few thousand possible folds.
7
Protein Similarity Level Family –The proteins in the same family are homologous at the sequence level. Super Family –all members of the super family should have the same overall domain architecture, i.e., the same domains in the same order Fold –The folds of two domains are similar.
8
Protein Folding Problem A protein folds into a unique 3D structure under the physiological condition. Lysozyme sequence: KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS DGNGMNAWVA WRNRCKGTDV QAWIRGCRL
9
Relevance of Protein Structure in the Post-Genome Era sequence structure function medicine
10
Structure-Function Relationship Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism. A predicted structure is a powerful tool for function inference. Trp repressor as a function switch
11
Structure-Based Drug Design HIV protease inhibitor Structure-based rational drug design is still a major method for drug discovery.
12
Protein Structure Prediction Structure: Traditional experimental methods: X-Ray or NMR to solve structures; generate a few structures per day worldwide cannot keep pace for new protein sequences Strong demand for structure prediction: more than 30,000 human genes; 10,000 genomes will be sequenced in the next 10 years. Unsolved problem after efforts of two decades.
13
Ab initio Structure Prediction An energy function to describe the protein obond energy obond angle energy odihedral angel energy ovan der Waals energy oelectrostatic energy Minimize the function and obtain the structure. Not practical in general oComputationally too expensive oAccuracy is poor
14
Template-Based Prediction Structure is better conserved than sequence Structure can adopt a wide range of mutations. Physical forces favor certain structures. Number of fold is limited. Currently ~700 Total: 1,000 ~10,000 TIM barrel
15
~90% of new globular proteins share similar folds with known structures, implying the general applicability of comparative modeling methods for structure prediction general applicability of template-based modeling methods for structure prediction (currently 60-70% of new proteins, and this number is growing as more structures being solved) NIH Structural Genomics Initiative plans to experimentally solve ~10,000 “unique” structures and predict the rest using computational methods Scope of the Problem
16
Homology Modeling Sequence is aligned with sequence of known structure, usually sharing sequence identity of 30% or more. Superimpose sequence onto the template, replacing equivalent sidechain atoms where necessary. Refine the model by minimizing an energy function. Applicable to ~20% of all proteins.
17
Concept of Threading oThread (align or place) a query protein sequence onto a template structure in “optimal” way oGood alignment gives approximate backbone structure Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE Template set Prediction accuracy: fold recognition / alignment
18
4 Components of Threading Template library Scoring function Alignment Confidence assessment
19
Core of a Template Core secondary structures: -helices and -strands
20
Definition of Template Residue type / profile Secondary structure type Solvent assessibility Coordinates for C / C RES 1 G 156 S 23 10.528 -13.223 9.932 11.977 -12.741 10.115 RES 5 P 157 H 110 12.622 -17.353 10.577 12.981 -16.146 11.485 RES 5 G 158 H 61 17.186 -15.086 9.205 16.601 -15.457 10.578 RES 5 Y 159 H 91 16.174 -10.939 12.208 16.612 -12.343 12.727 RES 5 C 160 H 8 12.670 -12.752 15.349 14.163 -13.137 15.545 RES 1 G 161 S 14 15.263 -17.741 14.529 15.022 -16.815 15.733
21
Energy (Score) Function …YKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEW… Singleton energy: How well a residue fits a template position (sequence and structural environment): E_s Pairwise energy: How preferable to put two particular residues nearby: E_p Alignment gap penalty: E_g Total energy: E_p + E_s + E_g
22
Threading problem Threading: Given a sequence, and a fold (template), compute the optimal alignment score between the sequence and the fold. If we can solve the above problem, then –Given a sequence, we can try each known fold, and find the best fold that fits this sequence. –Because there are only a few thousands folds, we can find the correct fold for the given sequence. Threading is NP-hard.
23
Computational Methods Branch and Bound. Integer Program. –Use linear programming plus branch and bound.
24
ab initio threading homology
25
Blue Gene On December 6, 1999, IBM announced a $100 million research initiative to build the world's fastest supercomputer, "Blue Gene", to tackle fundamental problems in computational biology. More than one petaflop/s (1,000,000,000,000,000 floating point operations per second)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.