Home - Distributed Parallel Protein folding Chris Garlock
Protein Folding - Why is it important Proteins are biological nano-machines which play apart in all of our bodies functions Protein folding is the process all proteins undergo to assemble into their native structure Strive for states of low free energy Sometimes proteins misfold, and misfolded proteins can clump together or aggregate, which can cause serious health problems Alzheimers, Cystic Fibrosis, Several types of cancer and more
How is protein folding simulated on a computer? Atomic level simulations Newtonian mechanics (lots of numerical integration) Proteins can fold in many ways, so a statistical model is needed to represent all of the possibilities accurately Markov State Models Set of states (shapes) Transition rates between states
Problems with serial protein folding Modern computers can simulate ~50 nanoseconds of protein assembly in 24 hours Many proteins fold on the millisecond timescale (1,000,000 nanoseconds) It would take 20,000 days or ~55 years to simulate folding for just one protein!
Folding At Home uses donated computational power from otherwise idle processors across the globe Anyone who wants to contribute to the project just needs to download the client program, and the server will give them work units to complete, once a work unit is finished, the client will return results to the server and get a new work unit 500,000 processor cores outputting 19,900 TFLOP/S to help simulate protein folding This is faster than the Titan supercomputer!
Challenges with distributed computing Heterogeneous processors clients can run on Windows, Linux, OSX, Android, and PS3. A special client can also be downloaded to run on a rack of computers or a cluster of GPU’s Extremely slow communication between processors WAN connections Processors can be unreliable What happens if a host powers down their machine in the middle of a work unit?
How to parallelize folding simulations 1. To start a simulation project, first choose some initial conformations (protein shapes). 2. Each conformation becomes the starting point for some simulations which together are called a run 3. Within each run we launch many different trajectories each called a clone All clones in a run start with the same conformation, but different initial velocities for the atoms involved 4. Because each clone takes large amounts of time to execute, clones are further divided into generations. Generations have to be run serially. 5. Some clones may find additional conformations (states of equilibrium) in which case new runs are started from those conformations 6. Repeat steps 2-5 until the Markov State Model is complete
A Generate Initial Conformation Start a run for A (comprised of 5 clones) B C Discover additional Conformations Start a run for B and C D Discover additional Conformations and new pathways E Start a run for D and E Discover a misfold condition
Results The results of simulations have been verified experimentally several times The drug design industry has begun to use information from simulations to narrow down the number of molecules they test experimentally. This allows drug designers to be more thorough when evaluating a molecule, and improves their throughput. The Folding at Home project has been active for 15 years, and over 100 papers have been published about its findings
Questions?