Download presentation
Presentation is loading. Please wait.
1
Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University NSF-ITR Meeting on November 14, 2002
2
Stanford’s Participants PI’s: L. Guibas, J.C. Latombe, M. Levitt Research Associate: P. Koehl Postdocs: F. Schwarzer, A. Zomorodian Graduate students: S. Apaydin (EE), S. Ieong (CS), R. Kolodny (CS), I. Lotan (CS), A. Nguyen (Sc. Comp.), D. Russel (CS), R. Singh (CS), C. Varma (CS) Undergraduate students: J. Greenberg (CS), E. Berger (CS) Collaborating faculty: A. Brunger (Molecular & Cellular Physiology) D. Brutlag (Biochemistry) D. Donoho (Statistics) J. Milgram (Math) V. Pande (Chemistry)
3
Problems Addressed Biological functions derive from the structures (shapes) achieved by molecules through motions Determination, classification, and prediction of 3D protein structures Modeling of molecular energy and simulation of folding and binding motion
4
What’s New for Computer Science? Massive amount of experimental data Importance of similarities Multiple representations of structure Continuous energy functions Many objects forming deformable chains Many degrees of freedom Ensemble properties of pathways
5
Massive amount of experimental data Massive amount of experimental data Abstract/simplify data sets into compact data structures E.g.: Electron density map Medial axis
6
Importance of similarities Importance of similarities Segmentation/matching/scoring techniques data set clustered data small library E.g.: Libraries of protein fragments [Kolodny, Koehl, Guibas, Levitt, JMB (2002)]
7
1tim Approximations Complexity 10 (100 fragments of length 5) 0.9146A cRMS Complexity 2.26 (50 fragments of length 7) 2.7805A cRMS real protein
8
Alignment of Structural Motifs [Singh and Saha; Kolodny and Linial] Problem: Determine if two structures share common motifs: 2 (labelled) structures in R 3 A={a 1,a 2,…,a n }, B={b 1,b 2,…,b m } Find subsequences s a and s b s.t the substructures {a s a (1),a s a (2),…, a s a (l) } {b s b (1),b s b (2),…, b s b (l) } are similar Twofold problem: alignment and correspondence Score Approximation Complexity
9
Iterative Closest Point (Besl-McKay) for alignment: [R. Singh and M. Saha. Identifying Structural Motifs in Proteins. Pacific Symp. on Biocomputing, Jan. 2003.] Score: RMSD distance
10
[R. Singh and M. Saha. Identifying Structural Motifs in Proteins. Pacific Symp. on Biocomputing, Jan. 2003.] Trypsin Trypsin active site
11
[R. Singh and M. Saha. Identifying Structural Motifs in Proteins. Pacific Symp. on Biocomputing, Jan. 2003.] Trypsin active site against 42Trypsin like proteins
12
Multiple representations of structure Multiple representations of structure ProShape software [Koehl, Levitt (Stanford), Edelsbrunner (Duke)]
13
Decoys generated using “physical” potentials Select best decoys using distance information Statistical potentials for proteins based on alpha complex [Guibas, Koehl, Zomorodian]
14
Many pairs of objects, but relatively few are close enough to interact Data structures that capture proximity, but undergo small or rare changes During motion simulation - detect steric clashes (self-collisions) - find pairs of atoms closer than cutoff Continuous energy functions Continuous energy functions Many objects in deformable chains Many objects in deformable chains
15
Other application domains: Modular reconfigurable robots Reconstructive surgery
16
Fixed Bounding-Volume hierarchies don’t work sec17
17
Instead, exploit what doesn’t change: chain topology Adaptive BV hierarchies [Guibas, Nguyen, Russel, Zhang] [Lotan, Schwarzer, Halperin, Latombe] (SOCG’02) sec17
18
Wrapped bounding sphere hierarchies [Guibas, Nguyen, Russel, Zhang] (SoCG 2002) WBSH undergoes small number of changes Self-collision: O(n logn ) in R2 O(n2-2/d) in R d, d 3
19
ChainTrees [Lotan, Schwarzer, Halperin, Latombe] (SoCG’02) Assumption: Few degrees of freedom change at each motion step (e.g., Monte Carlo simulation) Find all pairs of atoms closer than a given cutoff Find which energy terms can be reused
20
ChainTrees [Lotan, Schwarzer, Halperin, Latombe] (SoCG’02) Updating: Finding interacting pairs : (in practice, sublinear)
21
ChainTrees Application to MC simulation (comparison to grid method) (68)(144)(374) (755) (68)(144)(374) (755) m=1m = 5
22
Future work: ChainTrees Open problem: How to find good moves to make when the conformation is compact and random moves are rejected with high probability? Run new series of experiments with more complex energy field: EEF1 [Lazaridis & Karplus] (with Pande) Use library of fragments (with Koehl)
23
Capture proximity information with a sparse spanner 3HVT Future Work: Spanner for deformable chain [Agarwal, Gao, Duke; Nguyen, Zhang, Stanford]
24
Many degrees of freedom Many degrees of freedom Tools to explore large dimensional conformation space: - Sampling strategies - Nearest neighbors
25
Sampling structures by combining fragments [Kolodny, Levitt] a b c d cabcab bbc Library of protein fragments Discrete set of candidate structures
26
Find k nearest neighbors of a given protein conformation in a set of n conformations (cRMS, dRMS) a0a0 a1a1 amam a6a6 a5a5 a4a4 a3a3 a2a2 Idea: Cut backbone into m equal subsequences Nearest neighbors in high-dimensional space [Lotan and Schwarzer]
27
Full rep., dRMS (brute force)~84h Ave. rep., dRMS (brute force) :~4.8h SVD red. rep., dRMS (brute force)41min SVD red. rep., dRMS (kd-tree)19min 100,000 decoys of 1CTF (Park-Levitt set) Computation of 100 NN of each conformation ~80% of computed NNs are true NNs kd-tree software from ANN library (U. Maryland)
28
Ensemble properties of pathways Ensemble properties of pathways Stochastic nature of molecular motion requires characterizing average properties of many pathways
29
Example #1: Probability of Folding p fold Unfolded set Folded set p fold 1- p fold “We stress that we do not suggest using p fold as a transition coordinate for practical purposes as it is very computationally intensive.” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate for Protein Folding” Journal of Chemical Physics (1998). HIV integrase [Du et al. ‘98]
30
Example #2: Ligand-Protein Interaction [Sept, Elcock and McCammon `99] 10K to 30K independent simulations
31
vivi vjvj P ij Probabilistic Roadmap [Apaydin, Brutlag, Hsu, Guestrin, Latombe] (RECOMB’02, ECCB’02) Idea: Capture the stochastic nature of molecular motion by a network of randomly selected conformations and by assigning probabilities to edges
32
P ii F: Folded setU: Unfolded set P ij i k j l m P ik P il P im Let f i = p fold (i) After one step: f i = P ii f i + P ij f j + P ik f k + P il f l + P im f m =1 One linear equation per node Solution gives p fold for all nodes No explicit simulation run All pathways are taken into account Sparse linear system Probabilistic Roadmap [Apaydin, Brutlag, Hsu, Guestrin, Latombe] (RECOMB’02, ECCB’02)
33
Probabilistic Roadmap Correlation with MC Approach 1ROP (repressor of primer) 2 helices 6 DOF
34
Monte Carlo: 49 conformations Over 11 days of computer time Over 10 6 energy computations Roadmap: 5000 conformations 1 - 1.5 hours of computer time ~15,000 energy computations ~4 orders of magnitude speedup! Probabilistic Roadmap Computation Times (1ROP)
35
Future work: Probabilistic Roadmap Non-uniform sampling strategies Encoding molecular dynamics into probabilistic roadmaps (with V. Pande) Quantitative experiments with ligand-protein binding (with V. Pande)
36
Bio-X – Clark Center
37
The following slides relate to non-research issues. I do not plan to present them. Jack and Leo may want to use the contents of some of them for their own presentations.
38
Tutorial on Delaunay, Alpha-Shape and Pockets (Koehl) A biocomputing Notebook (Koehl) Biocomputation lectures in pre-existing classes: –CS326 – motion planning: molecular motion, probabilistic roadmaps, self-collision detection (Latombe) –CS468 – intro to computational topology: finding pockets and tunnels in molecules, compute surface areas and volumes and their derivative (Zomorodian) New class on Algorithmic Biology (Batzoglu, Guibas, Latombe) Graduate Curriculum Committee, Bio-Engineering Dept., Stanford (Latombe) Education
39
PhD students Serkan Apaydin, EE An Nguyen, Scientific Computing Carlos Guestrin, CS (Daphne Koller’s group) Itay Lotan, CS Rachel Kolodny, CS Daniel Russel, CS Samuel Ieong, CS Trained Students (1/2) Most graduate students have a principal advisor in CS and a secondaryone in a bio-related department (Levitt, Brutlag, Pande)
40
Graduated Master students Rohit Singh, finding motifs in proteins, best Stanford CS master’s thesis, June ’02 [current position: bioinformatics company in San Diego] Chris Varma, study of ligand-protein interaction with probabilistic roadmaps, June ’02 [current position: PhD student, Harvard/MIT Biomedical program] Current Master student Ben Wong, modeling T cell activity Undergraduate Eric Berger, CS, Stanford, summer internship Julie Greeberg, CS, Harvard, summer internship Trained Students (2/2)
41
Prof. Alberto Munoz Math Dept., University of Yucatan, Mexico 3 months, Summer’02 Haptic interaction and probabilistic roadmaps Prof. Ileana Streinu Smith College 6 months, from Sept.’02 Protein folding Visitors
42
- Guibas and Levitt, with J. Milgram (Math): topology of configuration spaces of chains - Guibas, with V. Pande (Chemistry) and D. Donoho (Statistics) non-linear multi-resolution analysis of molecular motions - Latombe and Apaydin, with D. Brutlag (Biochemistry) and V. Pande: probabilistic roadmaps - Latombe and Lotan with V. Pande: efficient MC simulation Interactions Within Stanford
43
- Collision Detection for Deforming Necklaces, P. Agarwal, L. Guibas, A. Nguyen, D. Russel, and L. Zhang. Invited to special issue of Comp. Geom., Theory and Applications, following presentation at SoCG'02. - Kinetic Medians and kd-Trees, P. Agarwal, J. Gao, and L. Guibas. Proc. 10th European Symp. Algorithms, LNCS 2461, Springer-Verlag, 5-16, 2002. - Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion, M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, and J.C. Latombe. Proc. RECOMB'02, Washington D.C., pp. 12-21, 2002. - Efficient Maintenance and Self-Collision testing for Kinematic Chains, I. Lotan, F. Schwarzer, D. Halperin, and J.C. Latombe, SoCG’02, pp. 43-42. June 2002. - Stochastic Conformational Roadmaps for Computing Ensemble Properties of Molecular Motion, M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, and J.C. Latombe. Workshop on Algorithmic Foundations of Robotics (WAFR), Nice, Dec. 2002. Interactions Outside Stanford
44
- BCATS ‘01 and ‘02 [Bio-Computation At Stanford] - RECOMB ’02 [Int. Conf. on Research in Computational Biology] - ISMB ‘02 [Int. Conf. on Intelligent Syst. for Molecular Biology] - ECCB 2002 [European Conf. on Computational Biology] - Biophysical Society Symp. on Molecular Simulations in Structural Biology, 2002 - SoCG 2002 [ACM Symp. on Computational Heometry] Attendance to Conferences
45
- Latombe and Levitt serve as members of the Scientific Leadership Council of Stanford’s Bio-X program - Presentations: Stanford’s Bio-X Symposium (3/02), Stanford’s Computer Forum (3/02), Berkeley’s Broad Area Seminar (4/02) - Conference committees: Guibas, program committee, WAFR’02 and SoCG’03 Latombe, program committee, 1 st IEEE Bioinformatics Conf. ‘03 Apaydin, organization committee of BCATS’02 Outreach
46
The following slides are extra slides that I removed from my presentation for lack of time
47
General Goals Larger proteins considered computational efficiency Diversity of molecules and interactions computational abstractions Extension of in-silico experiments computational correctness Enable biological studies that were not possible before, more systematically
48
Approach Select hard problems Close interaction between computer scientists (Guibas, Koehl, Latombe) and biologists (Koehl, Levitt, Brutlag, Pande, Brunger) Most graduate students are CS students with secondary advisor in biology Perform extensive tests
49
Electron density map Medial axis [Guibas, Brunger, Russel] Medial axis of iso-surfaces to estimate backbone Cleaning and simplification of axis to filter noise out Persistence of features across multiple iso-surfaces sec17
50
Continuous energy function Continuous energy function Essential for protein structure prediction and molecular motion simulation: - Statistical potentials based on alpha complex - Maintenance of energy values during simulation
51
Instead, exploit what doesn’t change: chain topology Adaptive BV hierarchies Balanced binary trees of constant topology Efficient repair of position/size of BVs [Guibas, Nguyen, Russel, Zhang] [Lotan, Schwarzer, Halperin, Latombe] (SOCG’02) sec17
52
Future Work: Spanner for deformable chain [Agarwal, Gao, Duke; Nguyen, Zhang, Stanford]
53
1ROP (repressor of primer) 2 helices 6 DOF 1HDD (Engrailed homeodomain) 3 helices 12 DOF H-P energy model with steric clash exclusion [Sun et al., 95] Probabilistic Roadmap
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.