Presentation is loading. Please wait.

Presentation is loading. Please wait.

Algorithms Exploiting the Chain Structure of Proteins Itay Lotan Computer Science.

Similar presentations


Presentation on theme: "Algorithms Exploiting the Chain Structure of Proteins Itay Lotan Computer Science."— Presentation transcript:

1 Algorithms Exploiting the Chain Structure of Proteins Itay Lotan Computer Science

2 Proteins 101 Involved in all functions of our body: metabolism, motion, defense, etc. Michael Levitt

3 Protein representation  Torsion angle model:  Cα model:

4 Structure determination Bernhard Rupp X-ray crystallography

5 Outline 1.Fast energy computation during Monte Carlo simulation 2.Model completion for protein X-ray crystallography 3.Large scale computation of similarity Exploit specific properties of proteins to perform the computation efficiently

6 Outline 1.Fast energy computation during Monte Carlo simulation 2.Model completion for protein X-ray crystallography 3.Large scale computation of similarity Lotan, Schwarzer, Halperin* and Latombe. J. Comput. Bio. 2004 (to appear) * CS Department, Tel-Aviv University

7 Monte Carlo simulation (MCS)  Estimate thermodynamic quantities  Search for low-energy conformations and the folded structure Popular method for sampling the conformation space of proteins:

8 MCS: How it works 2.Compute energy E of new conformation 3.Accept with probability: Requires >>10 6 steps to sample adequately 1.Propose random change in conformation

9  Bonded terms: Bond lengths: Bond angles: Dihedral angles:  Non-bonded terms: Van der Waals: Electrostatic: Heuristic: Go models, HP models, etc. Energy function

10 Pair-wise interactions  Cutoff distance (6 - 12Å)  Linear number of interactions contribute to energy (Halperin & Overmars ’ 98) Challenge: Find all interacting pairs without enumerating all pairs

11 Related work Computer Science  Bounding volume hierarchies for collision detection Gotschalk et al. ’96 Larsen et al. ’00 Guibas et al. ’02  Space partition methods for collision detection Faverjon ’84 Halperin & Overmars ’98  Collisions detection for chains Halperin et al. ’97 Guibas et al. ’02 Biology  Neighbor lists Verlet ’67 Brooks et al. ’83  Grid Quentrec & Brot ’73 Hockney et al. ’74 Van Gunsteren et al. ’84  Neighbor lists + grid Yip & Elber ’89 Petrella ’02

12 Grid method d : Cutoff distance  Linear complexity  Optimal in worst case

13 Contributions  Efficient maintenance and self-collision detection for kinematic chains  Efficient computation of pair-wise interactions in MCS of proteins  Scheme for caching and reusing partial energy sums during MCS  MCS software* Much faster than existing algorithm (grid method) *Download at: http://robotics.stanford.edu/~itayl/mcs

14 Properties of kinematic chains  Small changes  large effects

15 Properties of kinematic chains  Small changes  large effects

16 Properties of kinematic chains  Small changes  large effects  Local changes  global effects

17 Properties of kinematic chains  Small changes  large effects  Local changes  global effects  Few DoF changes  long rigid sub- chains

18 Properties of kinematic chains  Small changes  large effects  Local changes  global effects  Few DoF changes  long rigid sub- chains

19 ChainTree: A tale of two hierarchies  Transform hierarchy: approximates kinematics of protein backbone at successive resolutions  Bounding volume hierarchy: approximates geometry of protein at successive resolutions

20 Hierarchy of transforms

21 A B C D E F G H I T AB T BC T AC T HI T CD T DE T EF T FG T GH T CE T EG T GI T AE T EI T AI

22 Hierarchy of bounding volumesB BABA BHBH BGBG BFBF BEBE BDBD BCBC B CD B EF B GH B AB B AD B EH B AH

23 The ChainTree T AB B A T BC B B T CD B C T DE B D T EF B E T FG B F T GH B G T HI B H T AC B AB T CE B CD T EG B EF T GI B GH T AE B AD T EI B EH T AI B AH A B C D E F G H I

24 Updating the ChainTree T AB B A T BC B B T CD B C T DE B D T EF B E T FG B F T GH B G T HI B H T AC B AB T CE B CD T EG B EF T GI B GH T AE B AD T EI B EH T AI B AH A B C D E F G H I

25 Computing the energy ABCDEF GH JKLM NO P Pruning rules: 1.Prune search when distance between bounding volumes is more than cutoff distance 2.Do not search inside rigid sub-chains Recursively search ChainTree for interactions

26 ABCDEF GH JKLM NO P Computing the energy [ P ]

27 ABCDEF GH JKLM NO P [ N ] [ P ]

28 ABCDEF GH JKLM NO P [ N ][ O ] [ P ]

29 ABCDEF GH JKLM NO P [ N-O ][ N ][ O ] [ P ]

30 Computing the energy [ N-O ] [ J-K ] [ A-C ] [ B-C ] [ A-D ] [ B-D ] ABCDEF GH JKLM NO P [ J ] [ N ] [ K ] [ C ] [ D ] [ C-D ] [ O ] [ P ]

31 Computing the energy [ P ] [ N ][ N-O ] [ J-K ][ K ][ K-L ][ J-M ][ J-L ][ K-M ] [ A-G ] [ B-G ] [ A-H ] [ B-H ] [ A-C ] [ B-C ] [ A-D ] [ B-D ] [ C ] [ D ] [ C-D ] [ A-E ] [ B-E ] [ A-F ] [ B-F ] [ C-E ] [ C-F ] [ C-G ] [ C-H ] [ D-G ] [ D-H ] [ J ] [ A ] [ B ] [ A-B ] [ D-E ] [ D-F ] [ O ] [ L ][ L-M ][ M ] [ E ] [ F ] [ E-F ] [ E-G ] [ F-G ] [ E-H ] [ F-H ] [ H ] [ G ] [ H-G ] ABCDEF GH JKLM NO P

32 Computing the energy E(O) ABCDEF GH JKLM NO P [ P ] [ N ][ N-O ] [ J-K ][ K ][ K-L ][ J-M ][ J-L ][ K-M ] [ A-G ] [ B-G ] [ A-H ] [ B-H ] [ A-C ] [ B-C ] [ A-D ] [ B-D ] [ C ] [ D ] [ C-D ] [ A-E ] [ B-E ] [ A-F ] [ B-F ] [ C-E ] [ C-F ] [ C-G ] [ C-H ] [ D-G ] [ D-H ] [ J ] [ A ] [ B ] [ A-B ] [ D-E ] [ D-F ] [ O ] [ L ][ L-M ][ M ] [ E ] [ F ] [ E-F ] [ E-G ] [ F-G ] [ E-H ] [ F-H ] [ H ] [ G ] [ H-G ]

33 Computing the energy  Only changed interactions are found  Reuse unaffected partial sums  Better performance for Longer proteins Fewer simultaneous changes

34  Updating:  Searching: Computational complexity worst case bound Much faster in practice

35 Test [68 res.][144 res.][374 res.][755 res.] [68 res.][144 res.][374 res.][755 res.] 1-DoF change5-DoF change

36 Simulation of α-Synuclein  140 res. protein implicated in Parkinson’s disease  Multi-canonical Replica-exchange MC regime  Over 1000 CPU days of simulation  Study conformations at room temp.  Joint work with Vijay Pande

37 Outline 1.Fast energy computation during Monte Carlo simulation 2.Model completion for protein X-ray crystallography 3.Large scale computation of similarity Lotan, van den Bedem*, Deacon* and Latombe, WAFR 2004 van den Bedem*, Lotan, Latombe and Deacon*, submitted to Acta. Cryst. D * Joint Center for Structural Genomics (JCSG) at SSRL

38 Protein Structure Initiative 152K sequenced genes (30K/year) 25K determined structures (3.6K/year)  Reduce cost and time to determine protein structure  Develop software to automatically interpret the electron density map (EDM)

39 EDM 3-D “image” of atomic structure High value (electron density) at atom centers Density falls off exponentially away from center

40 Automated model building  ~90% built at high resolution (2Å)  ~66% built at medium to low resolution (2.5 – 2.8Å)  Gaps left at noisy areas in EDM (blurred density) Gaps need to be resolved manually

41 The Fragment completion problem  Input EDM Partially resolved structure 2 Anchor residues Length of missing fragment  Output A small number of candidate structures for missing fragment A robotics inverse kinematics (IK) problem

42 Related work Computer Science  Exact IK solvers Manocha & Canny ’94 Manocha et al. ’95  Optimization IK solvers Wang & Chen ’91  Redundant manipulators Khatib ’87 Burdick ’89  Motion planning for closed loops Han & Amato ’00 Yakey et al. ’01 Cortes et al. ’02, ’04 Biology/Crystallography  Exact IK solvers Wedemeyer & Scheraga ’99 Coutsias et al. ’04  Optimization IK solvers Fine et al. ’86 Canutescu & Dunbrack Jr. ’03  Ab-initio loop closure Fiser et al. ’00 Kolodny et al. ’03  Database search loop closure Jones & Thirup ’86 Van Vlijman & Karplus ’97  Semi-automatic tools Jones & Kjeldgaard ’97 Oldfield ’01

43 Contributions  Sampling of gap-closing fragments biased by the EDM  Refinement of fit to density without breaking closure  Fully automatic fragment completion software for X-ray Crystallography Novel application of a combination of inverse kinematics techniques

44 Two-stage IK method 1.Candidate generations: Optimize density fit while closing the gap 2.Refinement: Optimize closed fragments without breaking closure

45 Stage 1: candidate generation  Generate random conformation  Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack Jr. ’03)

46 Stage 1: candidate generation  Generate random conformation  Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)

47 Stage 1: candidate generation  Generate random conformation  Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)

48 Stage 1: candidate generation  Generate random conformation  Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)

49 Stage 1: candidate generation  Generate random conformation  Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03) CCD moves biased toward high-density

50 Stage 2: refinement 1-D manifold  Target function T (goodness of fit to EDM)  Minimize T while retaining closure  Closed conformations lie on Self-motion manifold of lower dimension

51 Stage 2: null-space minimization Jacobian: linear relation between joint velocities and end-effector linear and angular velocity. Compute minimizing move using: N – orthonormal basis of null space

52 Stage 2: minimization with closure 1.Choose sub-fragment with n > 6 DOFs 2.Compute using SVD 3.Project onto 4.Move until minimum is reached or closure is broken Escape from local minima using Monte Carlo with simulated annealing

53 Test: artificial gaps  Completed structure (gold standard)  Good density (1.6Å res.)  Remove fragment and rebuild LengthHigh (2.0Å)Medium (2.5Å)Low (2.8Å) 4100% (0.14Å)100% (0.19Å)100% (0.32Å) 8100% (0.18Å)100% (0.23Å)100% (0.36Å) 1291% (0.51Å)96% (0.41Å)91% (0.52Å) 1591% (0.53Å)88% (0.63Å)83% (0.76Å) Produced by H. van den Bedem

54 Test: true gaps  Completed structure (gold standard)  O.K. density (2.4Å res.)  6 gaps left by model builder (RESOLVE) LengthTop scorerLowest error 40.44Å0.40Å 40.22Å 50.78Å 50.36Å 70.72Å0.66Å 100.43Å Produced by H. van den Bedem

55 Example: TM0423 PDB: 1KQ3, 376 res. 2.0Å resolution 12 residue gap Best: 0.3Å aaRMSD

56 Example: TM0813 GLU-83 GLY-96 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap Best: 0.6Å aaRMSD

57 Example: TM0813 GLU-83 GLY-96 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap Best: 0.6Å aaRMSD

58 Example: TM0813 GLU-83 GLY-96 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap Best 0.6Å aaRMSD

59 Outline 1.Fast energy computation during Monte Carlo simulation 2.Model completion for protein X-ray crystallography 3.Large scale computation of similarity Lotan and Schwarzer, J. Comput. Biol. 11(2–3): 299–317, 2004

60 Large scale similarity  Analysis of simulation trajectories Molecular dynamics simulation Monte Carlo simulation  Clustering of decoy sets (e.g., Shortle et al. ’98)  Stochastic Roadmap Simulation (Apaydin et al. ’03) Fast similarity measures are needed for analyzing large sets of conformations

61  Uniform simplification of protein structure for similarity computation  Speed-up existing similarity measures  Method offers trade-off between speed and precision  Efficient computation of nearest neighbors Contributions

62 m -Averaged approximation  Cut chain into pieces of length m  Replace each sequence of m C α atoms by its centroid 3n coordinates 3n/m coordinates

63 Chains and distances  Proximity along the chain entails spatial proximity  Far away links along the chain are spatially distant (on average) cici cjcj

64 Similarity measures

65 1. Decoy sets: conformations from the Park-Levitt set (Park et al, ’97), N =10,000 2. Random sets: conformations generated by the program FOLDTRAJ (Feldman & Hogue, ’00), N = 5000 Evaluation: test sets 8 structurally diverse proteins (54 -76 residues)

66 Evaluation results: decoy sets m cRMSdRMS 30.990.96-0.98 40.98-0.990.94-0.97 60.92-0.990.78-0.93 90.81-0.980.65-0.96 120.54-0.920.52-0.69  9x for cRMS (m = 9)  36x for dRMS (m = 6) Higher correlation for random sets!

67 Brute force complexity: for all k Nearest-neighbors problem Given a set S of conformations of a protein and a query conformation c, find the k conformations in S most similar to c N – size of S L – time to compute similarity

68 kd-tree: time per query Limitations: 1.Requires Minkowski metric: 2.Less efficient when d> 20 Efficient nearest neighbor search cRMS is not a Minkowski metric dRMS has dimensionality of Reduce dRMS dimensionality using SVD

69 Reduction using SVD 1. Stack m -averaged distance matrices as vectors 2. Compute the SVD of entire set 3. Project onto principle components dRMS is reduced to  20 dimensions Complexity of SVD ~

70 Testing the method  Use decoy sets ( N = 10,000 ) and random sets ( N = 5,000 )  m -averaging with ( m = 4 )  Project onto 16 PCs for decoys, 12 PCs for random sets  Find k = 10, 25, 100 NNs for 250 conformations in each set

71 Results  Decoy sets: ~77% correct Furthest NN off by 10% - 15% (0.7 Å – 1.5 Å ) ~4 k approximate NNs contain all true k NNs  Random sets: slightly better results Use reduction as fast filter

72 Running Time N = 100,000, m=4, PC = 16 Find k = 100 for each conformation Brute-force: ~84 hours Brute-force + m-averaging: ~4.8 hours Brute-force + m-averaging + SVD: 41 minutes kd-tree + m-averaging + SVD: 19 minutes kd-tree has more impact for larger sets

73 Contributions  Energy computation in MCS Efficient maintenance and self-collision detection for kinematic chains Efficient computation of pair-wise interactions in MCS of proteins Caching scheme for partial energy sums during MCS MCS software  Model completion in X-ray crystallography sampling of gap-closing fragments biased towards the EDM Refinement of fit to density without breaking closure Fully automatic fragment completion software  Similarity computation for large conformation sets Uniform simplification of protein structure for similarity computation Speed-up existing similarity measures Method offers trade-off between speed and precision Efficient computation of nearest neighbors

74 Take-home message Taking into account physical properties of proteins can lead to efficient algorithms for a wide variety of applications in structural biology

75 Outlook Models that simplify the physics and chemistry of proteins Algorithms that exploit properties of protein models computer scientistbiophysicist/biochemist Develop simplified protein models that lend themselves to efficient computations

76 Acknowledgements  Jean-Claude Latombe  Vijay Pande  Michael Levitt  Leo Guibas  Axel Brunger, Balaji Prabhakar, Serafim Batzoglou  Fabian Schwarzer, Henry van den Bedem, Dan Halperin  Carlo Tomasi  Daniel Russakoff, Rachel Kolodny  Latombe group Serkan Apaydin, Tim Bretl, Joel Brown, Phil Fong, Mitul Saha, Pekka Isto, Kris Hauser  Pande group Bojan Zagrovic, Stefan Larson, Lillian Chong, Young Min Rhee, Sidney Elmer, Chris Snow, Guha Jayachandran, Eric Sorin, Sung-Joo Lee, Jim Cladwell, Michael Shirts, Nina Singhal, Relly Brandman, Vishal Vaidyanathan, Nick Kelley, Mark Engelhardt  Levitt Group Patrice Koehl, Tanya Raschke, Erik Lindahl

77 Thank you!


Download ppt "Algorithms Exploiting the Chain Structure of Proteins Itay Lotan Computer Science."

Similar presentations


Ads by Google