Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.

Similar presentations


Presentation on theme: "Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson."— Presentation transcript:

1 Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

2 Protein Folding 1.Introduction 2.Protein Structure 3.Interactions 4.Protein Folding Models 5.Biomolecular Modelling 6.Bioinformatics

3

4 Homology Modelling Based on the observation that “Similar sequences exhibit similar structures” Known structure is used as a template to model an unknown (but likely similar) structure with known sequence First applied in late 1970’s using early computer imaging methods (Tom Blundell)

5 Homology Modelling Offers a method to “Predict” the 3D structure of proteins for which it is not possible to obtain X-ray or NMR data Can be used in understanding function, activity, specificity, etc. Of interest to drug companies wishing to do structure-aided drug design

6 Terminology target is sequence of protein with unknown 3D structure template is sequence of protein already with known 3D structure (modelled experimentally usually) alignment of sequence is rearrangement of subsets (of a sequence) to find maximum similarities rigid bodies conserved  -helices &  -sheets sidechains: decorative stuff for backbone (v. imp. for interactions), actually residues connected to rigid bodies. loops connecting to the core-regions (  -helices &  -sheets) spatial restraints (bond length, bond angles etc.) PDB Protein Data Bank

7 Homology Modelling Identify homologous sequences in PDB Align query sequence with homologues Find Structurally Conserved Regions (SCRs) Identify Structurally Variable Regions (SVRs) Generate coordinates for core region Generate coordinates for loops Add side chains (Check rotamer library) Refine structure using energy minimization Validate structure

8 Step 1: ID Homologues in PDB PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGHKLMCNASQERWW PRETWQLKHGFDSADAMNCVCNQWER GFDHSDASFWERQWK Query SequencePDB

9 Step 1: ID Homologues in PDB PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGHKLMCNASQERWW PRETWQLKHGFDSADAMNCVCNQWER GFDHSDASFWERQWK Query SequencePDB PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGHKLMCNASQERWW PRETWQLKHGFDSADAMNCVCNQWER GFDHSDASFWERQWK PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFG PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGPRTEINSEQENCE PRTEINSEQUENCEPRTEINSEQNCE QWERYTRASDFHGTREWQIYPASDFG TREWQIYPASDFGPRTEINSEQENCE PRTEINSEQUENCEPRTEINSEQNCE QWERYTRASDFHGTREWQ PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQQWEWEWQWEWEQWEWE WQRYEYEWQWNCEQWERYTRASDFHG TREWQIYPASDWERWEREWRFDSFG PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGHKLMCNASQERWW PRETWQLKHGFDSADAMNCVCNQWER GFDHSDASFWERQWK PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFG PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGPRTEINSEQENC PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQQWEWEWQWEWEQWEWE WQRYEYEWQWNCEQWERYTRASDFHG TR Hit #1 Hit #2

10 Step 2: Align Sequences GENETICS G100000000 E0 0 0000 N00 00000 E0 0 0000 S0000000 I00000 00 S0000000 GENETICS G60403020 10 0 E405030 2010 0 N30 4020 10 0 E203020302010 0 S20 10 I 20100 S0000000 Dynamic Programming Identity MatrixSum Matrix

11 Step 2: Align Sequences GENETICS G100000000 E0 0 0000 N00 00000 E0 0 0000 S0000000 I00000 00 S0000000 GENETICS G0 E0 N0 E0 S I0 S0000000 Dynamic Programming Identity MatrixSum Matrix

12 Step 2: Align Sequences GENETICS G100000000 E0 0 0000 N00 00000 E0 0 0000 S0000000 I00000 00 S0000000 GENETICS G0 E0 N0 E0 S I0 S0000000 Dynamic Programming 10 Identity MatrixSum Matrix

13 Step 2: Align Sequences GENETICS G100000000 E0 0 0000 N00 00000 E0 0 0000 S0000000 I00000 00 S0000000 GENETICS G0 E0 N0 E0 S I0 S0000000 Dynamic Programming 10 Identity MatrixSum Matrix

14 Step 2: Align Sequences GENETICS G100000000 E0 0 0000 N00 00000 E0 0 0000 S0000000 I00000 00 S0000000 GENETICS G0 E0 N0 E0 S I0 S0000000 Dynamic Programming 10 Identity MatrixSum Matrix

15 Step 2: Align Sequences GENETICS G100000000 E0 0 0000 N00 00000 E0 0 0000 S0000000 I00000 00 S0000000 GENETICS G0 E0 N0 E0 S I0 S0000000 Dynamic Programming 10 20 Identity MatrixSum Matrix

16 20 Step 2: Align Sequences GENETICS G100000000 E0 0 0000 N00 00000 E0 0 0000 S0000000 I00000 00 S0000000 GENETICS G0 E0 N0 E0 S I0 S0000000 Dynamic Programming 10 Identity MatrixSum Matrix 10

17 20 Step 2: Align Sequences GENETICS G100000000 E0 0 0000 N00 00000 E0 0 0000 S0000000 I00000 00 S0000000 GENETICS G0 E0 N0 E0 S I0 S0000000 Dynamic Programming 10 Identity MatrixSum Matrix 10

18 Step 2: Align Sequences ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEG ASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEA MCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAA Query Hit #1 Hit #2 Hit #1Hit #2

19 Alignment Key step in Homology Modelling Global (Needleman-Wunsch) alignment is absolutely required Small error in alignment can lead to big error in structural model Multiple alignments are usually better than pairwise alignments

20 Alignment Thresholds

21 Hit #1Hit #2 Step 3: Find SCR’s ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEG ASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEA MCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAA HHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCBBBBBBBBB Query Hit #1 Hit #2 SCR #1 SCR #2

22 Structurally Conserved Regions (SCR’s) Correspond to the most stable regions (usually interior) of protein Correspond to sequence regions with lowest level of gapping, highest level of sequence conservation Usually correspond to secondary structures

23 Hit #1Hit #2 Step 4: Find SVR’s ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEG ASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEA MCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAA HHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCBBBBBBBBB Query Hit #1 Hit #2 SVR (loop)

24 Structurally Variable Regions (SVR’s) Correspond to the least stable or most flexible regions (usually exterior) of protein Correspond to sequence regions with highest level of gapping, lowest level of sequence conservation Usually correspond to loops and turns

25 ATOM 1 N ALA A 1 21.389 25.406 -4.628 1.00 23.22 2TRX 152 ATOM 2 CA ALA A 1 21.628 26.691 -3.983 1.00 24.42 2TRX 153 ATOM 3 C ALA A 1 20.937 26.944 -2.679 1.00 24.21 2TRX 154 ATOM 4 O ALA A 1 21.072 28.079 -2.093 1.00 24.97 2TRX 155 ATOM 5 CB ALA A 1 21.117 27.770 -5.002 1.00 28.27 2TRX 156 ATOM 6 OG SER A 1 22.276 27.925 -5.861 1.00 32.61 2TRX 157 ATOM 7 N GLU A 2 20.173 26.028 -2.163 1.00 21.39 2TRX 158 ATOM 8 CA GLU A 2 19.395 26.125 -0.949 1.00 21.57 2TRX 159 ATOM 9 C GLU A 2 20.264 26.214 0.297 1.00 20.89 2TRX 160 ATOM 10 O GLU A 2 19.760 26.575 1.371 1.00 21.49 2TRX 161 Step 5: Generate Coordinates ATOM 1 N SER A 1 21.389 25.406 -4.628 1.00 23.22 2TRX 152 ATOM 2 CA SER A 1 21.628 26.691 -3.983 1.00 24.42 2TRX 153 ATOM 3 C SER A 1 20.937 26.944 -2.679 1.00 24.21 2TRX 154 ATOM 4 O SER A 1 21.072 28.079 -2.093 1.00 24.97 2TRX 155 ATOM 5 CB SER A 1 21.117 27.770 -5.002 1.00 28.27 2TRX 156 ATOM 6 OG SER A 1 22.276 27.925 -5.861 1.00 32.61 2TRX 157 ATOM 7 N ASP A 2 20.173 26.028 -2.163 1.00 21.39 2TRX 158 ATOM 8 CA ASP A 2 19.395 26.125 -0.949 1.00 21.57 2TRX 159 ATOM 9 C ASP A 2 20.264 26.214 0.297 1.00 20.89 2TRX 160 ATOM 10 O ASP A 2 19.760 26.575 1.371 1.00 21.49 2TRX 161 ALA

26 Step 5: Generate Core Coordinates For identical amino acids, transfer all atom coordinates (XYZ) to query protein For similar amino acids, transfer backbone coordinates & replace side chain atoms while respecting  angles For different amino acids, transfer only the backbone coordinates (XYZ) to query sequence

27 Step 6: Replace SVRs (loops) FGHQWERTFGHQWERT YAYE--KS Query Hit #1

28 Loop Library Loops extracted from PDB using high resolution (<2 Å) X-ray structures Typically thousands of loops in DB Includes loop coordinates, sequence, # residues in loop, C  -C  distance, preceding 2 o structure and following 2 o structure (or their C  coordinates)

29 Step 6: Replace SVRs (loops) Must match desired # residues Must match C  -C  distance (<0.5 Å) Must not bump into other parts of protein (no C  -C  distance <3.0 Å) Preceding and following C  ’s (3 residues) from loop should match well with corresponding C  coordinates in template structure

30 Step 6: Replace SVRs (loops) Loop placement and positioning is done using superposition algorithm Loop fits are evaluated using RMSD calculations and standard “bump checking” If no “good” loop is found, some algorithms create loops using randomly generated  angles

31 Step 7: Add Side Chains

32 C COOHH2NH2N H NH 3 + Amino Acid Side Chains

33 Step 7: Add Side Chains Done primarily for SVRs (not SCRs) Rotamer placement and positioning is done via a superposition algorithm using rotamers taken from a standardized library (Trial & Error) Rotamer fits are evaluated using simple “bump checking” methods

34 Step 8: Energy Minimization

35 Energy Minimization Removes atomic overlaps and unnatural strains in the structure Stabilizes or reinforces strong hydrogen bonds, breaks weak ones Brings protein to lowest energy in about 1-2 minutes CPU time

36 Actual Modelled The Final Result

37 Summary Identify homologous sequences in PDB Align query sequence with homologues Find Structurally Conserved Regions (SCRs) Identify Structurally Variable Regions (SVRs) Generate coordinates for core region Generate coordinates for loops Add side chains (Check rotamer library) Refine structure using energy minimization Validate structure

38 (a)Sketch the chemical structure corresponding to the atoms listed in the excerpt from the Protein Data Bank (PDB) file below. Add the missing side chain atoms for Ala. Indicate the position of the peptide bond on your sketch. ATOM 4 NSER A 1 37.713 48.836 48.018 ATOM 2 CA SER A 1 38.297 49.795 48.973 ATOM 3 C SER A 1 37.542 49.799 50.268 ATOM 4 O SER A 1 38.112 49.890 51.357 ATOM 5 CB SER A 1 38.285 51.207 48.322 ATOM 6 OG SER A 1 39.140 51.101 47.211 ATOM 7 N ALA A 2 36.185 49.706 50.220 ATOM 8 CA ALA A 2 35.421 49.702 51.450 ATOM 9 C ALA A 2 35.725 48.497 52.368 ATOM 10 O ALA A 2 35.884 48.638 53.572 ATOM 11 CB ALA A 2 33.910 49.733 51.233 (b)Indicate on your sketch in part (a) the N-terminus and C-terminus, and the main- chain dihedral angles, , , and . (c)Place the following main-chain bond lengths in order of decreasing length (longest first): C α -C, C-O, C-N and N-C α. Explain the physical reason for this ordering. (d)Sketch a graph showing the variation of energy versus separation of two particles for a van der Waals interaction.


Download ppt "Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson."

Similar presentations


Ads by Google