Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 10 – protein structure prediction. A protein sequence.

Similar presentations


Presentation on theme: "Lecture 10 – protein structure prediction. A protein sequence."— Presentation transcript:

1 Lecture 10 – protein structure prediction

2 A protein sequence

3 >gi|22330039|ref|NP_683383.1| unknown protein; protein id: At1g45196.1 [Arabidopsis thaliana] MPSESSYKVHRPAKSGGSRRDSSPDSIIFTPESNLSLFSSASVSVDRCSSTSDAHDRDDSLISAWKEEFEVKKDDESQNL DSARSSFSVALRECQERRSRSEALAKKLDYQRTVSLDLSNVTSTSPRVVNVKRASVSTNKSSVFPSPGTPTYLHSMQKGW SSERVPLRSNGGRSPPNAGFLPLYSGRTVPSKWEDAERWIVSPLAKEGAARTSFGASHERRPKAKSGPLGPPGFAYYSLY SPAVPMVHGGNMGGLTASSPFSAGVLPETVSSRGSTTAAFPQRIDPSMARSVSIHGCSETLASSSQDDIHESMKDAATDA QAVSRRDMATQMSPEGSIRFSPERQCSFSPSSPSPLPISELLNAHSNRAEVKDLQVDEKVTVTRWSKKHRGLYHGNGSKM RDHVHGKATNHEDLTCATEEARIISWENLQKAKAEAAIRKLEKYFPQMKLEKKRSSSMEKIMRKVKSAEKRAEEMRRSVL DNRVSTASHGKASSFKRSGKKKIPSLSGCFTCHVF

4 Protein Structure Heparin docking – Red: heparin; blue: central domain Yellow: C-terminal domain

5 A Protein Structure alpha-helix beta-sheet loop core

6 Domain and Folds A discrete portion of a protein assumed to fold independently of the rest of the protein and possessing its own function. Most proteins have multi-domains. The core 3D structure of a domain is called a fold. There are only a few thousand possible folds.

7 Protein Similarity Level Family –The proteins in the same family are homologous at the sequence level. Super Family –all members of the super family should have the same overall domain architecture, i.e., the same domains in the same order Fold –The folds of two domains are similar.

8 Protein Folding Problem A protein folds into a unique 3D structure under the physiological condition. Lysozyme sequence: KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS DGNGMNAWVA WRNRCKGTDV QAWIRGCRL

9 Relevance of Protein Structure in the Post-Genome Era sequence structure function medicine

10 Structure-Function Relationship Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism. A predicted structure is a powerful tool for function inference. Trp repressor as a function switch

11 Structure-Based Drug Design HIV protease inhibitor Structure-based rational drug design is still a major method for drug discovery.

12 Protein Structure Prediction Structure: Traditional experimental methods: X-Ray or NMR to solve structures; generate a few structures per day worldwide cannot keep pace for new protein sequences Strong demand for structure prediction: more than 30,000 human genes; 10,000 genomes will be sequenced in the next 10 years. Unsolved problem after efforts of two decades.

13 Ab initio Structure Prediction  An energy function to describe the protein obond energy obond angle energy odihedral angel energy ovan der Waals energy oelectrostatic energy  Minimize the function and obtain the structure.  Not practical in general oComputationally too expensive oAccuracy is poor

14 Template-Based Prediction Structure is better conserved than sequence Structure can adopt a wide range of mutations. Physical forces favor certain structures. Number of fold is limited. Currently ~700 Total: 1,000 ~10,000 TIM barrel

15  ~90% of new globular proteins share similar folds with known structures, implying the general applicability of comparative modeling methods for structure prediction  general applicability of template-based modeling methods for structure prediction (currently 60-70% of new proteins, and this number is growing as more structures being solved)  NIH Structural Genomics Initiative plans to experimentally solve ~10,000 “unique” structures and predict the rest using computational methods Scope of the Problem

16 Homology Modeling Sequence is aligned with sequence of known structure, usually sharing sequence identity of 30% or more. Superimpose sequence onto the template, replacing equivalent sidechain atoms where necessary. Refine the model by minimizing an energy function. Applicable to ~20% of all proteins.

17 Concept of Threading oThread (align or place) a query protein sequence onto a template structure in “optimal” way oGood alignment gives approximate backbone structure Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE Template set Prediction accuracy: fold recognition / alignment

18 4 Components of Threading  Template library  Scoring function  Alignment  Confidence assessment

19 Core of a Template Core secondary structures:  -helices and  -strands

20 Definition of Template  Residue type / profile  Secondary structure type  Solvent assessibility  Coordinates for C  / C  RES 1 G 156 S 23 10.528 -13.223 9.932 11.977 -12.741 10.115 RES 5 P 157 H 110 12.622 -17.353 10.577 12.981 -16.146 11.485 RES 5 G 158 H 61 17.186 -15.086 9.205 16.601 -15.457 10.578 RES 5 Y 159 H 91 16.174 -10.939 12.208 16.612 -12.343 12.727 RES 5 C 160 H 8 12.670 -12.752 15.349 14.163 -13.137 15.545 RES 1 G 161 S 14 15.263 -17.741 14.529 15.022 -16.815 15.733

21 Energy (Score) Function …YKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEW… Singleton energy: How well a residue fits a template position (sequence and structural environment): E_s Pairwise energy: How preferable to put two particular residues nearby: E_p Alignment gap penalty: E_g Total energy: E_p + E_s + E_g

22 Threading problem Threading: Given a sequence, and a fold (template), compute the optimal alignment score between the sequence and the fold. If we can solve the above problem, then –Given a sequence, we can try each known fold, and find the best fold that fits this sequence. –Because there are only a few thousands folds, we can find the correct fold for the given sequence. Threading is NP-hard.

23 Computational Methods Branch and Bound. Integer Program. –Use linear programming plus branch and bound.

24 ab initio threading homology

25 Blue Gene On December 6, 1999, IBM announced a $100 million research initiative to build the world's fastest supercomputer, "Blue Gene", to tackle fundamental problems in computational biology. More than one petaflop/s (1,000,000,000,000,000 floating point operations per second)

26


Download ppt "Lecture 10 – protein structure prediction. A protein sequence."

Similar presentations


Ads by Google