Download presentation
Presentation is loading. Please wait.
Published byOswin McCoy Modified over 8 years ago
1
Conformational Space
2
Conformation of a molecule: specification of the relative positions of all atoms in 3D-space, Typical parameterizations: List of coordinates of atom centers List of torsional angles (e.g., the - - for a protein) Conformational space: Space of all conformations
3
Conformational Space q1q1 qiqi q2q2 qjqj q N-1 qNqN
4
Conformational Space q 1 q 3 q 0 q n q 4
5
Relation to Robotics/Graphics q 1 q 3 q 0 q n q 4 q 2 (t) Configuration space
6
Need for a Metric Simulation and sampling techniques can produce millions of conformations Which conformations are similar? Which ones are close to the folded one? Do some conformations form small clusters (e.g. key intermediates while folding)?
7
Metric in Conformational Space A metric over conformational space C is a function: d: c,c’ C d(c,c’) + {0} such that: d(c,c’) = 0 c = c’ (non-degeneracy) d(c,c’) = d(c’,c) (symmetry) d(c,c’) + d(c’,c”) d(c,c”)(triangle inequality)
8
But not all metrics are “good” Euclidean metric: d(c,c’) = i=1,...,n (| i - i ’| 2 + | i - i ’| 2 )
11
Metric in Conformational Space A “good” metric should measure how well the atoms in two conformations can be aligned Usual metrics: cRMSD, dRMSD
12
RMSD Given two sets of n points in 3 A = {a 1,…,a n } and B = {b 1,…,b n } The RMSD between A and B is: RMSD(A,B) = [ (1/n) i=1,…,n ||a i -b i || 2 ] 1/2 where ||a i -b i || denotes the Euclidean distance between a i and b i in 3 RMSD(A,B) = 0 iff a i = b i for all i
13
cRMSD Molecule M with n atoms a 1,…,a n Two conformations c and c’ of M a i (c) is position of a i when M is at c cRMSD(c,c’) is the minimized RMSD between the two sets of atom centers: min T [ (1/n) i=1,…,n ||a i (c) – T(a i (c’))|| 2 ] 1/2 where the minimization is over all possible rigid-body transform T
16
cRMSD cRMSD verifies triangle inequality cRMSD takes linear time to compute Often, cRMSD is restricted to a subset of atoms, e.g., the C atoms on a protein’s backbone
17
Representation Restricted to C Atoms Protein 1tph - The positions of AA residue centers (Cα atoms) mainly determine the structure of a protein. - In structural comparison, people usually work only on the backbone of Cα atoms, and neglect the other atoms.
18
Possible project: Design a method for efficiently finding nearest neighbors in a sampled conformation space of a protein, using the cRMSD metric.
19
dRMSD Molecule M with n atoms a 1,…,a n Two conformations c and c’ of M {d ij (c)}: n n symmetrical intra-molecular distance matrix in M at c dRMD(c, c’) is : [ (1/n(n-1)) i=1,…,n-1 j =i+1,…,n (d ij (c) – d ij (c’)) 2 ] 1/2 {d ij } is usually restricted to a subset of atoms, e.g., the C atoms on a protein’s backbone
20
Intra-Molecular Distance Matrix Distances between C pairs of a protein with 142 residues. Darker squares represent shorter distances.
21
Intra-Molecular Distance Matrix Distances between C pairs of a protein with 142 residues. Darker squares represent shorter distances. 1 40 85 45
22
Intra-Molecular Distance Matrix
23
dRMSD Molecule M with n atoms a 1,…,a n Two conformations c and c’ of M {d ij (c)}: n n symmetrical intra-molecular distance matrix in M at c dRMSD(c, c’) = [ (2/n(n-1)) i=1,…,n-1 j =i+1,…,n (d ij (c) – d ij (c’)) 2 ] 1/2 {d ij } is usually restricted to a subset of atoms, e.g., the C atoms on a protein’s backbone
24
dRMSD Molecule M with n atoms a 1,…,a n Two conformations c and c’ of M {d ij (c)}: n n symmetrical intra-molecular distance matrix in M at c dRMSD(c, c’) = [ (2/n(n-1)) i=1,…,n-1 j =i+1,…,n (d ij (c) – d ij (c’)) 2 ] 1/2 {d ij } is usually restricted to a subset of atoms, e.g., the C atoms on a protein’s backbone Advantage: No aligning transform Drawback: Takes quadratic time to compute
25
Is dRMSD a metric? dRMSD(c, c’) = [ (2/n(n-1)) i=1,…,n-1 j =i+1,…,n (d ij (c) – d ij (c’)) 2 ] 1/2 is a metric in the n(n-1)/2-dimensional space, where a conformation c is represented by {d ij (c)} But, in this representation, the same point represents both a conformation and its mirror image
26
k -Nearest-Neighbors Problem Given a set S of conformations of a protein and a query conformation c, find the k conformations in S most similar to c (w.r.t. cRMSD, dRMSD, other metric) Can be done in time O(N(log k + L)) where: - N = size of S - L = time to compare two conformations
27
k -Nearest-Neighbors Problem The total time needed to compute the k nearest neighbors of every conformation in S is O(N 2 (log k + L)) Much too long for large datasets where N ranges from 10,000’s to millions!!! Can be improved by: 1. Reducing L 2. More efficient algorithm (e.g., kd-tree)
28
kd-Tree In a d-dimensional space, where d>2, range searching for a point takes O(dn 1-1/d )
29
k -Nearest-Neighbors Problem Idea: simplify protein’s description
30
cRMSD O(n) time dRMSD O(n 2 ) time Assume that each conformation is described by the coordinates of the n C atoms
31
This representation is highly redundant Proximity along the chain entails spatial proximity Atoms can’t bunch up, hence far away atoms along the chain are on average spatially distant cici cjcj
32
m-Averaged Approximation Cut the backbone into fragments of m C atoms Replace each fragment by the centroid of the m C atoms Simplified cRMSD and dRMSD 3n coordinates3n/m coordinates
33
8 diverse proteins (54 -76 residues) Decoy sets of N =10,000 conformations from the Park-Levitt set [Park et al, 1997] Evaluation: Test Sets [Lotan and Schwarzer, 2003] mcRMSDdRMSD 30.990.96-0.98 40.98-0.990.94-0.97 60.92-0.990.78-0.93 90.81-0.980.65-0.96 120.54-0.920.52-0.69 Higher correlation for random sets ( greater savings) Correlation:
34
Running Times
35
Further Reduction for dRMSD 1) Stack m-averaged distance matrices as vectors of a matrix A
36
A r N Vector a i of elements of distance matrix of i th conformation (i = 1 to N)
37
Further Reduction for dRMSD 1) Stack m-averaged distance matrices as vectors of a matrix A 2) Compute the SVD A = UDV T
38
A (r x N) r N U (r x r) D (r x r) V T (r x N) = SVD Decomposition Vector a j of elements of distance matrix of j th conformation (j = 1 to N) Orthonormal (rotation) matrix Diagonal matrix
39
A (r x N) r N U (r x r) V T (r x N) = SVD Decomposition Vector a j of elements of distance matrix of j th conformation (j = 1 to N) Orthonormal (rotation) matrix Diagonal matrix s 1 s 2 s r 0 0 s 1 s 2 ... s r 0 (singular values)
40
A (r x N) r N U (r x r) D (r x r) V T (r x N) = SVD Decomposition Vector a j of elements of distance matrix of j th conformation (j = 1 to N) Orthonormal (rotation) matrix Diagonal matrix Matrix with orthonormal rows vjTvkTvjTvkT v i and v j are orthogonal unit Nx1 vectors
41
A (r x N) r N U (r x r) D (r x r) V T (r x N) = SVD Decomposition r-dimensional space x y X Y Representation of A in space (X,Y) does not depend on the coordinate system!
42
v1Tv1T v2Tv2T A (r x N) r N U (r x r) D (r x r) V T (r x N) = SVD Decomposition s 1 s 2 s 3 s r ||s 1 v 1 || ||s 2 v 2 ||...
43
v1Tv1T v2Tv2T A (r x N) r N U (r x r) D (r x r) V T (r x N) = SVD Decomposition s 1 s 2 s 3 s r vpTvpT p principal components
44
A (r x N) r N U (r x r) D (r x r) V T (r x N) = SVD Decomposition s 1 s 2 s p v1Tv1T v2Tv2T vpTvpT p principal components 0
45
Further Reduction for dRMSD 1) Stack m-averaged distance matrices as vectors of a matrix A 2) Compute the SVD A = UDV T 3) Project onto p principal components
46
Correlation between dRMSD and is reduced to summing up 12 to 20 terms (instead of ~ 80 to 200, since the proteins have 54 to 76 amino acids)
47
Complexity of SVD SVD of rxN matrix, where N > r, takes O(r 2 N) time Here r ~ (n/m) 2 So, time complexity is O(n 4 N) Would be too costly without m-averaging
48
Evaluation for 1CTF Decoy Sets [Lotan and Schwarzer, 2003] N = 100,000, k = 100, 4-averaging, 16 PCs 70% correct, with furthest NN off by 20% Brute-force: 84 h Brute-force + m-averaging: 4.8 h Brute-force + m-averaging + PC: 41 min kD-tree + m-averaging + PC: 19 min Speedup greater than x200 6 k approximate NNs contain all true k NNs Use m-averaging and PC reduction as fast filters
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.