Download presentation
Presentation is loading. Please wait.
Published byVirginia Smith Modified over 9 years ago
1
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson
2
Protein Folding 1.Introduction 2.Protein Structure 3.Interactions 4.Protein Folding Models 5.Biomolecular Modelling 6.Bioinformatics
4
Homology Modelling Based on the observation that “Similar sequences exhibit similar structures” Known structure is used as a template to model an unknown (but likely similar) structure with known sequence First applied in late 1970’s using early computer imaging methods (Tom Blundell)
5
Homology Modelling Offers a method to “Predict” the 3D structure of proteins for which it is not possible to obtain X-ray or NMR data Can be used in understanding function, activity, specificity, etc. Of interest to drug companies wishing to do structure-aided drug design
6
Terminology target is sequence of protein with unknown 3D structure template is sequence of protein already with known 3D structure (modelled experimentally usually) alignment of sequence is rearrangement of subsets (of a sequence) to find maximum similarities rigid bodies conserved -helices & -sheets sidechains: decorative stuff for backbone (v. imp. for interactions), actually residues connected to rigid bodies. loops connecting to the core-regions ( -helices & -sheets) spatial restraints (bond length, bond angles etc.) PDB Protein Data Bank
7
Homology Modelling Identify homologous sequences in PDB Align query sequence with homologues Find Structurally Conserved Regions (SCRs) Identify Structurally Variable Regions (SVRs) Generate coordinates for core region Generate coordinates for loops Add side chains (Check rotamer library) Refine structure using energy minimization Validate structure
8
Step 1: ID Homologues in PDB PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGHKLMCNASQERWW PRETWQLKHGFDSADAMNCVCNQWER GFDHSDASFWERQWK Query SequencePDB
9
Step 1: ID Homologues in PDB PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGHKLMCNASQERWW PRETWQLKHGFDSADAMNCVCNQWER GFDHSDASFWERQWK Query SequencePDB PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGHKLMCNASQERWW PRETWQLKHGFDSADAMNCVCNQWER GFDHSDASFWERQWK PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFG PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGPRTEINSEQENCE PRTEINSEQUENCEPRTEINSEQNCE QWERYTRASDFHGTREWQIYPASDFG TREWQIYPASDFGPRTEINSEQENCE PRTEINSEQUENCEPRTEINSEQNCE QWERYTRASDFHGTREWQ PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQQWEWEWQWEWEQWEWE WQRYEYEWQWNCEQWERYTRASDFHG TREWQIYPASDWERWEREWRFDSFG PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGHKLMCNASQERWW PRETWQLKHGFDSADAMNCVCNQWER GFDHSDASFWERQWK PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFG PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQNCEQWERYTRASDFHG TREWQIYPASDFGPRTEINSEQENC PRTEINSEQENCEPRTEINSEQUENC EPRTEINSEQQWEWEWQWEWEQWEWE WQRYEYEWQWNCEQWERYTRASDFHG TR Hit #1 Hit #2
10
Step 2: Align Sequences GENETICS G100000000 E0 0 0000 N00 00000 E0 0 0000 S0000000 I00000 00 S0000000 GENETICS G60403020 10 0 E405030 2010 0 N30 4020 10 0 E203020302010 0 S20 10 I 20100 S0000000 Dynamic Programming Identity MatrixSum Matrix
11
Step 2: Align Sequences GENETICS G100000000 E0 0 0000 N00 00000 E0 0 0000 S0000000 I00000 00 S0000000 GENETICS G0 E0 N0 E0 S I0 S0000000 Dynamic Programming Identity MatrixSum Matrix
12
Step 2: Align Sequences GENETICS G100000000 E0 0 0000 N00 00000 E0 0 0000 S0000000 I00000 00 S0000000 GENETICS G0 E0 N0 E0 S I0 S0000000 Dynamic Programming 10 Identity MatrixSum Matrix
13
Step 2: Align Sequences GENETICS G100000000 E0 0 0000 N00 00000 E0 0 0000 S0000000 I00000 00 S0000000 GENETICS G0 E0 N0 E0 S I0 S0000000 Dynamic Programming 10 Identity MatrixSum Matrix
14
Step 2: Align Sequences GENETICS G100000000 E0 0 0000 N00 00000 E0 0 0000 S0000000 I00000 00 S0000000 GENETICS G0 E0 N0 E0 S I0 S0000000 Dynamic Programming 10 Identity MatrixSum Matrix
15
Step 2: Align Sequences GENETICS G100000000 E0 0 0000 N00 00000 E0 0 0000 S0000000 I00000 00 S0000000 GENETICS G0 E0 N0 E0 S I0 S0000000 Dynamic Programming 10 20 Identity MatrixSum Matrix
16
20 Step 2: Align Sequences GENETICS G100000000 E0 0 0000 N00 00000 E0 0 0000 S0000000 I00000 00 S0000000 GENETICS G0 E0 N0 E0 S I0 S0000000 Dynamic Programming 10 Identity MatrixSum Matrix 10
17
20 Step 2: Align Sequences GENETICS G100000000 E0 0 0000 N00 00000 E0 0 0000 S0000000 I00000 00 S0000000 GENETICS G0 E0 N0 E0 S I0 S0000000 Dynamic Programming 10 Identity MatrixSum Matrix 10
18
Step 2: Align Sequences ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEG ASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEA MCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAA Query Hit #1 Hit #2 Hit #1Hit #2
19
Alignment Key step in Homology Modelling Global (Needleman-Wunsch) alignment is absolutely required Small error in alignment can lead to big error in structural model Multiple alignments are usually better than pairwise alignments
20
Alignment Thresholds
21
Hit #1Hit #2 Step 3: Find SCR’s ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEG ASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEA MCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAA HHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCBBBBBBBBB Query Hit #1 Hit #2 SCR #1 SCR #2
22
Structurally Conserved Regions (SCR’s) Correspond to the most stable regions (usually interior) of protein Correspond to sequence regions with lowest level of gapping, highest level of sequence conservation Usually correspond to secondary structures
23
Hit #1Hit #2 Step 4: Find SVR’s ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEG ASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEA MCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAA HHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCBBBBBBBBB Query Hit #1 Hit #2 SVR (loop)
24
Structurally Variable Regions (SVR’s) Correspond to the least stable or most flexible regions (usually exterior) of protein Correspond to sequence regions with highest level of gapping, lowest level of sequence conservation Usually correspond to loops and turns
25
ATOM 1 N ALA A 1 21.389 25.406 -4.628 1.00 23.22 2TRX 152 ATOM 2 CA ALA A 1 21.628 26.691 -3.983 1.00 24.42 2TRX 153 ATOM 3 C ALA A 1 20.937 26.944 -2.679 1.00 24.21 2TRX 154 ATOM 4 O ALA A 1 21.072 28.079 -2.093 1.00 24.97 2TRX 155 ATOM 5 CB ALA A 1 21.117 27.770 -5.002 1.00 28.27 2TRX 156 ATOM 6 OG SER A 1 22.276 27.925 -5.861 1.00 32.61 2TRX 157 ATOM 7 N GLU A 2 20.173 26.028 -2.163 1.00 21.39 2TRX 158 ATOM 8 CA GLU A 2 19.395 26.125 -0.949 1.00 21.57 2TRX 159 ATOM 9 C GLU A 2 20.264 26.214 0.297 1.00 20.89 2TRX 160 ATOM 10 O GLU A 2 19.760 26.575 1.371 1.00 21.49 2TRX 161 Step 5: Generate Coordinates ATOM 1 N SER A 1 21.389 25.406 -4.628 1.00 23.22 2TRX 152 ATOM 2 CA SER A 1 21.628 26.691 -3.983 1.00 24.42 2TRX 153 ATOM 3 C SER A 1 20.937 26.944 -2.679 1.00 24.21 2TRX 154 ATOM 4 O SER A 1 21.072 28.079 -2.093 1.00 24.97 2TRX 155 ATOM 5 CB SER A 1 21.117 27.770 -5.002 1.00 28.27 2TRX 156 ATOM 6 OG SER A 1 22.276 27.925 -5.861 1.00 32.61 2TRX 157 ATOM 7 N ASP A 2 20.173 26.028 -2.163 1.00 21.39 2TRX 158 ATOM 8 CA ASP A 2 19.395 26.125 -0.949 1.00 21.57 2TRX 159 ATOM 9 C ASP A 2 20.264 26.214 0.297 1.00 20.89 2TRX 160 ATOM 10 O ASP A 2 19.760 26.575 1.371 1.00 21.49 2TRX 161 ALA
26
Step 5: Generate Core Coordinates For identical amino acids, transfer all atom coordinates (XYZ) to query protein For similar amino acids, transfer backbone coordinates & replace side chain atoms while respecting angles For different amino acids, transfer only the backbone coordinates (XYZ) to query sequence
27
Step 6: Replace SVRs (loops) FGHQWERTFGHQWERT YAYE--KS Query Hit #1
28
Loop Library Loops extracted from PDB using high resolution (<2 Å) X-ray structures Typically thousands of loops in DB Includes loop coordinates, sequence, # residues in loop, C -C distance, preceding 2 o structure and following 2 o structure (or their C coordinates)
29
Step 6: Replace SVRs (loops) Must match desired # residues Must match C -C distance (<0.5 Å) Must not bump into other parts of protein (no C -C distance <3.0 Å) Preceding and following C ’s (3 residues) from loop should match well with corresponding C coordinates in template structure
30
Step 6: Replace SVRs (loops) Loop placement and positioning is done using superposition algorithm Loop fits are evaluated using RMSD calculations and standard “bump checking” If no “good” loop is found, some algorithms create loops using randomly generated angles
31
Step 7: Add Side Chains
32
C COOHH2NH2N H NH 3 + Amino Acid Side Chains
33
Step 7: Add Side Chains Done primarily for SVRs (not SCRs) Rotamer placement and positioning is done via a superposition algorithm using rotamers taken from a standardized library (Trial & Error) Rotamer fits are evaluated using simple “bump checking” methods
34
Step 8: Energy Minimization
35
Energy Minimization Removes atomic overlaps and unnatural strains in the structure Stabilizes or reinforces strong hydrogen bonds, breaks weak ones Brings protein to lowest energy in about 1-2 minutes CPU time
36
Actual Modelled The Final Result
37
Summary Identify homologous sequences in PDB Align query sequence with homologues Find Structurally Conserved Regions (SCRs) Identify Structurally Variable Regions (SVRs) Generate coordinates for core region Generate coordinates for loops Add side chains (Check rotamer library) Refine structure using energy minimization Validate structure
38
(a)Sketch the chemical structure corresponding to the atoms listed in the excerpt from the Protein Data Bank (PDB) file below. Add the missing side chain atoms for Ala. Indicate the position of the peptide bond on your sketch. ATOM 4 NSER A 1 37.713 48.836 48.018 ATOM 2 CA SER A 1 38.297 49.795 48.973 ATOM 3 C SER A 1 37.542 49.799 50.268 ATOM 4 O SER A 1 38.112 49.890 51.357 ATOM 5 CB SER A 1 38.285 51.207 48.322 ATOM 6 OG SER A 1 39.140 51.101 47.211 ATOM 7 N ALA A 2 36.185 49.706 50.220 ATOM 8 CA ALA A 2 35.421 49.702 51.450 ATOM 9 C ALA A 2 35.725 48.497 52.368 ATOM 10 O ALA A 2 35.884 48.638 53.572 ATOM 11 CB ALA A 2 33.910 49.733 51.233 (b)Indicate on your sketch in part (a) the N-terminus and C-terminus, and the main- chain dihedral angles, , , and . (c)Place the following main-chain bond lengths in order of decreasing length (longest first): C α -C, C-O, C-N and N-C α. Explain the physical reason for this ordering. (d)Sketch a graph showing the variation of energy versus separation of two particles for a van der Waals interaction.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.