Download presentation
Presentation is loading. Please wait.
1
Protein Structure Space Patrice Koehl Computer Science and Genome Center http://www.cs.ucdavis.edu/~koehl/
2
From Sequence to Function KKAVINGEQIRSISDLHQTLKK WELALPEYYGENLDALWDCLTG VEYPLVLEWRQFEQSKQLTENG AESVLQVFREAKAEGCDITI Sequence Structure Function ligand
3
Protein Structure Space 1CTF1TIM1K3R 1A1O1NIK1AON 68 AA247 AA268 AA 384 AA4504 AA8337 AA
4
Outline Protein Shape Descriptors Differential Geometry Tools Classifying Proteins The Shapes of Protein Structures Protein Structure Space Dimension? Complexity of Protein Structures Are Proteins 3D, or 1D objects?
5
Outline Protein Shape Descriptors Differential Geometry Tools Classifying Proteins The Shapes of Protein Structures Protein Structure Space Dimension? Complexity of Protein Structures Are Proteins 3D, or 1D objects?
6
Classification of Protein Structure: CATH C A T Alpha Mixed Alpha Beta Sandwich Tim Barrel Other Barrel Super Roll Barrel
7
Protein Structure Space Test set 2,930 proteins out of 23,000 proteins in PDB No sequence similarity (Fasta E-value < e-4) Reference structural similarity defined from CATH 769 folds 104,000 pairs of similar structures out of 4,600,000 pairs Performance measure: ROC curve (Receiver Operating Characteristic)
8
X Distance Matrix Metric Matrix Points in Space Projecting Protein Structure Space
9
k k k k Class Fold Projecting Protein Structure Space
10
Protein Structure Similarity Root mean square distance: cRMS: N: number of equivalent atoms between A and B R, T: rigid transformation that minimizes cRMS.
11
Eigenvalues of the Metric Matrix: Protein Structure Classes Measure of Structure Similarity: cRMS after Optimal Superposition (Structal)
12
A Picture of the Protein Structure Space Proteins Proteins α and Proteins
13
A Picture of the Protein Structure Space 1a81G2 1sfcK0 1repC2 1bdo00 2bi6H0 Proteins Proteins α and Proteins
14
Outline Protein Shape Descriptors Differential Geometry Tools Classifying Proteins The Shapes of Protein Structures Protein Structure Space Dimension? Complexity of Protein Structures Are Proteins 3D, or 1D objects?
15
0 10 20 30 40 50 60 70 80 90 100 Rate of true negatives (%) 100 90 80 70 60 50 30 20 Rate of true positives (%) 40 10 Random measure Area = 0.5 “Perfect” measure Area = 1.0 Protein Fold Space ROC Analysis (Receiver Operating Characteristic)
16
Protein Fold Space ROC Analysis (Receiver Operating Characteristic) True positives pairs of proteins that belong to the same T class of CATH True negatives pairs of proteins that belong to the same C class, but not the same T class.
17
Rate of true negatives (%) Rate of true positives (%) Protein Fold Space Fasta: 0.54 CATH Class : 0.51 CATH Fold20 : 0.98 Fold20: first 20 coordinates derived from the CATH fold matrix CATH class: first 3 coordinates derived from the CATH class matrix
18
Rate of true negatives (%) Rate of true positives (%) Protein Fold Space Fasta: 0.54 Structal: 0.88
19
Protein Structure Features y x z R(x,y,z ) Global radius of curvature: Thickness: (Gonzalez & Maddocks, PNAS, 1999, 96:4769)
20
Thickness of a protein structure = 2.60 Ǻ
21
Curvature Feature Vector
22
Performance of the Curvature Feature Vector Curvature vector performs better than fasta. Needs more features to match Structal. Rate of true negatives (%) Rate of true positives (%) Fasta: 0.54 Structal: 0.88 C 5 : 0.65
23
Protein Structure Features: Writhing Fain and Røgen, PNAS, 100: 119 (2003) Writhing Number + - Sign of Crossing (t 1 ) (t 2 ) Writhe Feature Vector for Each Protein 1
24
Protein Structure Features: Writhing W 10 Writhe performs better than C 5 Curvature Rate of true positives (%) Rate of true negatives (%) Fasta: 0.54 C 5 : 0.65 Structal: 0.88 W 10 : 0.77
25
Outline Protein Shape Descriptors Differential Geometry Tools Classifying Proteins The Shapes of Protein Structures Protein Structure Space Dimension? Complexity of Protein Structures Are Proteins 3D, or 1D objects?
26
Clustering Protein Fragments to Extract a Small Set of Representatives (a Library) data clustered data library (Simulated annealing K means)
27
Generating an approximate structure Fragment library A B C D
28
Generating an approximate structure Fragment library A B C D
29
Generating an approximate structure A B C D
30
Fragment library Generating an approximate structure A B C D
31
Fragment library Generating an approximate structure A B C D Structural Sequence: AC
32
Fitting Protein Structures better 100 fragments of length 5 0.91 Ǻ cRMS 50 fragments of length 7 2.78 Ǻ cRMS
33
Complexity(states/residue) Average cRMS distance Longer fragments give better fit at same complexity Fragment Size: 7 residues 6 residues 5 residues 4 residues (Kolodny, Koehl, Guibas, Levitt, J. Mol. Biol.,323, 297 2002) N: number of fragments L: size of each fragment
34
Choosing the “right” library Size LN such that Complexity=20 7160000 68000 5400 420
35
cRMS model-experimental structure 0.20.61.0 0.20.61.0 # of structures A Structural Alphabet for Protein Backbone Protein size Fragment size: 4 Number of fragment: 20
36
Structural Alphabet: Application to Structure Comparison cRMS = 1Å
37
cRMS is not a Metric cRMS = 2.8 Ǻ cRMS = 2.85 Ǻ
38
Collaborators Michael Levitt (Computational Biology) Stanford University Marc Delarue (Biophysics) Institut Pasteur, Paris Rachel Kolodny (Computer Science) Columbia University Herbert Edelsbrunner (Math/Computer Science) Duke University Peter Roegen (Math) DTU, Denmark Joel Hass (Math) UC Davis
39
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.