Massively Parallel Ensemble Methods Using Work Queue Badi’ Abdul-Wahid Department of Computer Science University of Notre Dame CCL Workshop 2012.

Slides:



Advertisements
Similar presentations
1 Real-World Barriers to Scaling Up Scientific Applications Douglas Thain University of Notre Dame Trends in HPDC Workshop Vrije University, March 2012.
Advertisements

BXGrid: A Data Repository and Computing Grid for Biometrics Research Hoang Bui University of Notre Dame 1.
Experience with Adopting Clouds at Notre Dame Douglas Thain University of Notre Dame IEEE CloudCom, November 2010.
DV/dt : Accelerating the Rate of Progress Towards Extreme Scale Collaborative Science Miron Livny (UW), Ewa Deelman (USC/ISI), Douglas Thain (ND), Frank.
Cyberinfrastructure for Scalable and High Performance Geospatial Computation Xuan Shi Graduate assistants supported by the CyberGIS grant Fei Ye (2011)
Scaling Up Without Blowing Up Douglas Thain University of Notre Dame.
Weekly Report Ph.D. Student: Leo Lee date: Oct. 9, 2009.
Prof. Jesús A. Izaguirre Department of Computer Science and Engineering Computational Biology and Bioinformatics at Notre Dame.
1 Condor Compatible Tools for Data Intensive Computing Douglas Thain University of Notre Dame Condor Week 2011.
A many-core GPU architecture.. Price, performance, and evolution.
An improved hybrid Monte Carlo method for conformational sampling of large biomolecules Department of Computer Science and Engineering University of Notre.
Locating structural energy minima of biological molecules in explicit solvent APS March Meeting Baltimore 2006 Eric Dykeman Otto Sankey Arizona State University.
Evaluation of Fast Electrostatics Algorithms Alice N. Ko and Jesús A. Izaguirre with Thierry Matthey Department of Computer Science and Engineering University.
Atomistic Protein Folding Simulations on the Submillisecond Timescale Using Worldwide Distributed Computing Qing Lu CMSC 838 Presentation.
An Introduction to Grid Computing Research at Notre Dame Prof. Douglas Thain University of Notre Dame
Molecular Dynamics and Normal Mode Analysis of WW domain Santanu Chatterjee 1, Christopher Sweet 1, Tao Peng 2, John Zintsmaster 2, Brian Wilson 2, Jesus.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
Building Scalable Applications on the Cloud with Makeflow and Work Queue Douglas Thain and Patrick Donnelly University of Notre Dame Science Cloud Summer.
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
Dreams in a Nutshell Steven Sommer Microsoft Research Institute Department of Computing Macquarie University.
Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
Ana Damjanovic (JHU, NIH) JHU: Petar Maksimovic Bertrand Garcia-Moreno NIH: Tim Miller Bernard Brooks OSG: Torre Wenaus and team.
1 Integrating GPUs into Condor Timothy Blattner Marquette University Milwaukee, WI April 22, 2009.
GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
ExTASY 0.1 Beta Testing 1 st April 2015
Elastic Applications in the Cloud Dinesh Rajan University of Notre Dame CCL Workshop, June 2012.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
Parallel Applications Parallel Hardware Parallel Software IT industry (Silicon Valley) Users Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University.
Investigating Protein Conformational Change on a Distributed Computing Cluster Christopher Woods Jeremy Frey Jonathan Essex University.
Simulation Research Group A Controller for Distributed Discrete-Event Simulation: Akaroa2 Group Leaders: Krys Pawlikowski 1, Don McNickle 2, Greg Ewing.
Introduction to Work Queue Applications CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.
Large Scale Sky Computing Applications with Nimbus Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes – Bretagne Atlantique Rennes, France
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
The Cooperative Computing Lab  We collaborate with people who have large scale computing problems in science, engineering, and other fields.  We operate.
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Ben Tovar University of Notre Dame October 10, 2013.
1 Computational Abstractions: Strategies for Scaling Up Applications Douglas Thain University of Notre Dame Institute for Computational Economics University.
Computational Biology 2008 Advisor: Dr. Alon Korngreen Eitan Hasid Assaf Ben-Zaken.
Introduction to Work Queue Applications Applied Cyberinfrastructure Concepts Course University of Arizona 2 October 2014 Douglas Thain and Nicholas Hazekamp.
Fast Support Vector Machine Training and Classification on Graphics Processors Bryan Catanzaro Narayanan Sundaram Kurt Keutzer Parallel Computing Laboratory,
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Home - Distributed Parallel Protein folding Chris Garlock.
GPU Accelerated MRI Reconstruction Professor Kevin Skadron Computer Science, School of Engineering and Applied Science University of Virginia, Charlottesville,
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Building Scalable Elastic Applications using Work Queue Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft,
Evaluating Clouds for Smart Grid Computing: early Results using GE MARS App Ketan Maheshwari
Demonstration of Scalable Scientific Applications Peter Sempolinski and Dinesh Rajan University of Notre Dame.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Massively Parallel Molecular Dynamics Using Adaptive Weighted Ensemble Badi’ Abdul-Wahid PI: Jesús A. Izaguirre CCL Workshop 2013.
A Computational Study of RNA Structure and Dynamics Rhiannon Jacobs and Dr. Harish Vashisth Department of Chemical Engineering, University of New Hampshire,
A Computational Study of RNA Structure and Dynamics Rhiannon Jacobs and Harish Vashisth Department of Chemical Engineering, University of New Hampshire,
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Instrumenting Badi Abdul-Wahid, RJ Nowling CSE Operating Systems Professor Striegel.
GPGPU use cases from the MoBrain community
Matt Lemons Nate Mayotte
The Accelerated Weighted Ensemble
Haiyan Meng and Douglas Thain
Rui Wu, Jose Painumkal, Sergiu M. Dascalu, Frederick C. Harris, Jr
What’s New in Work Queue
Creating Custom Work Queue Applications
TensorFlow: A System for Large-Scale Machine Learning
Presentation transcript:

Massively Parallel Ensemble Methods Using Work Queue Badi’ Abdul-Wahid Department of Computer Science University of Notre Dame CCL Workshop 2012

Overview Background Challenges and approaches Our Work Queue software – – Accelerated Weighted Ensemble (AWE) Discussion and usage of and AWE

Background: Molecular Dynamics Proteins can be thought of as strings of molecules that adopt 3D conformations through folding. Many diseases are believed to be caused by protein misfolding – Alzheimer’s: amyloid beta aggregation – BSE (Mad Cow): prion misfolding Understanding protein folding allows us to better understand disease and develop and test treatments.

Challenges for MD Bowman, G. R., Voelz, V. A., & Pande, V. S. (2010). Current Opinion in Structural Biology, 21(1), 4–11 Courtesy of Greg Bowman

One Approach: Specialized Hardware Anton – Developed by D.E. Shaw Research – 1000x speedup – Expensive and limited access GPUs – Requires a programming paradigm shift – OpenMM allows MD to run on GPUs – More accessible, but still limited capabilities Eg: certain features present on CPU MD software are not yet provided for GPU

Another Approach: Massive Sampling (Vijay Pande at Stanford) – Crowd sources the computation – Donors allow simulations to be run on their computers – 300,000+ active users CPUs & GPUs 6800 TFLOPs (x86)

Goal: Sample Rare Events Pathways through transition regions Difficult to capture experimentally High computational complexity High potential for parallelism

Adaptive Sampling Traditional methods: Poor sampling of unstable regions Adapt sampling procedure to enrich these areas UC Davis Chemwiki

Products Accelerated Weighted Ensemble – Written with Haoyun Feng Both are built on top of Work Queue Use the Python bindings to Work Queue

Data dependencies/organization Simulation lifetime

Accelerated Weighted Ensemble

WW Fip35 Prep. Simulation – 200+ us – Amber96 – Generalized Born – Timestep of 2fs – Gamma of 1/ps – Nvidia GeForce GTX 480 – OpenMM MSMBuilder

AWE Resources Multiple masters …sharing several pools …from different sources: – ND GRID: CPUs – Condor: ND + Purdue + U. Wisc. Madison. CPUs – Stanford ICME: GPUs – Amazon EC2: High CPU XL instances – Windows Azure: Large CPU instances

Pool allocation for WQ masters

Execution Times

Bottlenecks: communication All data is sent through Work Queue Initialization cost: 34MB Payloads (to/from): 100KB Architecture-depend files specified via WQ- exposed $OS and $ARCH variables 2% of tasks required initialization < 1s average communication time < 1% of master execution time wq.specify_input_file("binaries/$OS-$ARCH/mdrun") wq.specify_input_file("binaries/$OS-$ARCH/assign") wq.specify_input_file("binaries/$OS-$ARCH/g_rms")

Bottlenecks: straggling workers 50% of the time to complete an iteration was due to certain workers taking overlong. – WORK_QUEUE_SCHEDULE_TIME – Fast abort multiplier = 3 Recently implemented WQ Cancel Task allows task replication to overcome this

Preliminary results Transition network from AWESample folding pathways

Conclusions Work Queue allows style applications to easily be run using Work Queue allowed the first application of the AWE algorithm to a large biomolecular system – An important aspect was usage of different resources: CPU + GPU on Cloud + Grid + Dedicated The synchronization barrier in AWE allows straggling workers to have a large impact – Currently testing solution using new WQ features These preliminary results (AWE) are promising, but further validations needed

Acknowledgements Prof. Jesús A. Izaguirre (Notre Dame) Prof. Douglas Thain (Notre Dame) Prof. Eric Darve (Stanford) Prof. Vijay S. Pande (Stanford) Li Yu, Dinesh Rajan, Peter Bui Haoyun Feng, RJ Nowling, James Sweet CCL Izaguirre Group Pande Group NSF NIH

Questions?