P2P-based Simulator for Protein Folding Shun-Yun Hu 2005/06/03
Introduction A Look at Simulations Simulations are important tools in scientific research Larger scale and higher resolution are constantly sought However, computational resource can be limited An Untapped Potential 300 Million PCs on the Internet (2000 est.) Up to 80% to 90% of CPU is wasted Large supply of computing resource, growing rapidly
Examples (UC Berkeley – space radio analysis) 5.3 M world-wide participants 2.2 M years of single-processor CPU 54 teraflop machine (current top 3: 70.72, 51.87, 35.86) (Standford – protein’s 3D structure) 30,000 volunteers 1 M days of single-processor CPU Published 23 papers in: Science, Nature, Nature Structural Biology, PNAS, JMB, etc
The Grand Question Can we build the ultimate simulator for large-scale simulation utilizing millions of computers world-wide? Potential applications: Nuclear reaction Star clusters Atomic-scale modeling in material science Weather, earthquakes Biology (protein, ecosystem, brain,...)
Promise & Challenge of P2P Promises Growing resource, decentralized Scalable Commodity hardware Affordable Challenges Topology maintenance dynamic join/leave Efficient content retrieval no global knowledge
A Simulation Scenario How can we utilize P2P for simulation-purpose? Answer: depends on what you want to simulate We observe that many simulations… are spatially-oriented (i.e. based on coordinate systems) run in discrete time-steps exhibit localized interaction (i.e. short-range interaction) example: molecular dynamics (MD) simulation Protein folding?
Protein Folding Problem Thermodynamic Hypothesis: native structure has lowest free energy.
Simulation Difficulties Timescale limitation of classical MD methods Small protein folds in 10s of s (10 -6 ) full-atomic simulation of 1 ns (10 -9 ) takes one CPU day 1,000 ~ 10,000 gap (it might take decades) Rough energy landscape Funnel-like (quick initial descend) Local minimum traps
Parallelization Timescale limitation Folding time is statistically distributed. Try many trajectories will obtain folding in much shorter time Free energy barriers Most time is spent in free energy minimum “waiting” Re-initialize configurations after crossing a barrier. Limitations Can simulate only small proteins Simulation within time-step is not decomposable
Molecular Dynamics in P2P Many atoms (nodes) on a 2D plane ( > 1,000) Positions (coordinates) may change at each time-step How to synchronize positions with those in Area of Interest (AOI)? Area of Interest
Proposed Approach Voronoi-based Overlay Network (VON) Supports spatially-oriented simulations Scalable, efficient, fully-distributed P2P
VON Design Concepts Identify enclosing and boundary neighbors (EN & BN) Each node constructs a Voronoi of all AOI neighbors ENs are minimally maintained Mutual collaboration in neighbor discovery by BNs CircleArea of Interest (AOI) Whiteself Yellowenclosing neighbor (EN) L. Blueboundary neighbor (BN) PinkEN & BN GreenAOI neighbor D. Blueunknown neighbor Use Voronoi to solve the neighbor discovery problem
Summary Idle CPU and networks are untapped potential resources for large-scale simulation Protein folding is a global minimum search problem in complex energy landscape Parallelization using P2P computing is an interesting yet unexplored possibility