Distributed and Loosely Coupled Parallel Molecular Simulations using the SAGA API PI: Ronald Levy, co-PI: Emilio Gallicchio, Darrin York, Shantenu Jha.

Slides:

Advertisements

Similar presentations

PRAGMA BioSciences Portal Raj Chhabra Susumu Date Junya Seo Yohei Sawai.

Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.

WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.

Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini

High Performance Computing Course Notes Grid Computing.

Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

Introduction CSCI 444/544 Operating Systems Fall 2008.

Implementation methodology for Emerging Reconfigurable Systems With minimum optimization an appreciable speedup of 3x is achievable for this program with.

XSEDE 13 July 24, Galaxy Team: PSC Team:

Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.

RCAC Research Computing Presents: DiaGird Overview Tuesday, September 24, 2013.

ProActive Task Manager Component for SEGL Parameter Sweeping Natalia Currle-Linde and Wasseim Alzouabi High Performance Computing Center Stuttgart (HLRS),

Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.

GridFlow: Workflow Management for Grid Computing Kavita Shinde.

Operating Systems Concepts Professor Rick Han Department of Computer Science University of Colorado at Boulder.

April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.

1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,

BOOK CHAPTER R. Mukherjee and T.C. Bishop. “Nucleosomal DNA: Kinked, Not Kinked, or Self-Healing Material?” Frontiers in Nucleic Acids, Chapter 5, pp 69–92.

TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.

Coupling Climate and Hydrological Models Interoperability Through Web Services.

RUP Fundamentals - Instructor Notes

Virtual Machine Hosting for Networked Clusters: Building the Foundations for “Autonomic” Orchestration Based on paper by Laura Grit, David Irwin, Aydan.

Using the WS-PGRADE Portal in the ProSim Project Protein Molecule Simulation on the Grid Tamas Kiss, Gabor Testyanszky, Noam.

LOGO OPERATING SYSTEM Dalia AL-Dabbagh

Chapter 1. Introduction What is an Operating System? Mainframe Systems

Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.

Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.

NE II NOAA Environmental Software Infrastructure and Interoperability Program Cecelia DeLuca Sylvia Murphy V. Balaji GO-ESSP August 13, 2009 Germany NE.

The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.

DOE BER Climate Modeling PI Meeting, Potomac, Maryland, May 12-14, 2014 Funding for this study was provided by the US Department of Energy, BER Program.

Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski

Architecting Web Services Unit – II – PART - III.

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.

BOOK CHAPTER R. Mukherjee and T.C. Bishop. “Nucleosomal DNA: Kinked, Not Kinked, or Self-Healing Material?” Frontiers in Nucleic Acids, Chapter 5, pp 69–92.

1 Preparing Your Application for TeraGrid Beyond 2010 TG09 Tutorial June 22, 2009.

IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

Protein Molecule Simulation on the Grid G-USE in ProSim Project Tamas Kiss Joint EGGE and EDGeS Summer School.

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.

Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)

CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei

The Future of the iPlant Cyberinfrastructure: Coming Attractions.

The Grid System Design Liu Xiangrui Beijing Institute of Technology.

October 21, 2015 XSEDE Technology Insertion Service Identifying and Evaluating the Next Generation of Cyberinfrastructure Software for Science Tim Cockerill.

Enabling Science Through Campus Bridging A case study with mlRho Scott Michael July 24, 2013.

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.

NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.

Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.

A Software Framework for Distributed Services Michael M. McKerns and Michael A.G. Aivazis California Institute of Technology, Pasadena, CA Introduction.

1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.

NCEP ESMF GFS Global Spectral Forecast Model Weiyu Yang, Mike Young and Joe Sela ESMF Community Meeting MIT, Cambridge, MA July 21, 2005.

1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.

TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.

NCI Enterprise Services (aka COPPA) CTRP and the Suite March 19, 2009.

Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.

Selenium server By, Kartikeya Rastogi Mayur Sapre Mosheca. R

CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.

High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.

Considering Time in Designing Large-Scale Systems for Scientific Computing Nan-Chen Chen 1 Sarah S. Poon 2 Lavanya Ramakrishnan 2 Cecilia R. Aragon 1,2.

Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.

Purdue RP Highlights TeraGrid Round Table November 5, 2009 Carol Song Purdue TeraGrid RP PI Rosen Center for Advanced Computing Purdue University.

INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.

Integrating Scientific Tools and Web Portals

Tools and Services Workshop

Joslynn Lee – Data Science Educator

Centre for Computational Science, University College London

Recap: introduction to e-science

Overview of Workflows: Why Use Them?

Presentation transcript:

Distributed and Loosely Coupled Parallel Molecular Simulations using the SAGA API PI: Ronald Levy, co-PI: Emilio Gallicchio, Darrin York, Shantenu Jha Consultants: Yaakoub El Khamra, Matt McKenzie

Quick Outline Objective Context Challenges Solution: SAGA – Not really new – SAGA on TeraGrid – SAGA on XSEDE Supporting Infrastructure – BigJob abstraction – DARE science gateway Work plan: where we are now References

Objective Prepare the software infrastructure to conduct large scale distributed uncoupled (ensemble-like) and coupled replica exchange molecular dynamics simulations using SAGA with the IMPACT and AMBER molecular simulation programs. Main Project: NSF CHE , NIH grant GM30580, addresses forefront biophysical issues concerning basic mechanisms of molecular recognition in biological systems

Protein-Ligand Binding Free Energy Calculations with IMPACT... = 0 = = λ-exchanges  protein-ligand interaction progress parameter Parallel Hamiltonian Replica Exchange Molecular Dynamics Bound state Unbound state Achieve equilibrium between multiple binding modes. Many l- replicas each using multiple cores. Coordination and scheduling challenges.

Context: Not Just An Interesting Use Case This is also an awarded CDI Type II Project: “Mapping Complex Biomolecular Reactions with Large Scale Replica Exchange Simulations on National Production Cyberinfrastructure “, CHE ($1.65M for 4 years), PI: R. Levy ECSS was requested to support the infrastructure – Integration, deployment – Many aspects in co-operation with SAGA team Dedicated SAGA team member was assigned by SAGA Project

Challenges Scientific Objective: – Perform Replica-Exchange on ~10K replicas, each replica large simulation Infrastructure Challenges – Launch and monitor/manage 1k-10K individual or loosely coupled replicas – Long job duration: days to weeks (will have to checkpoint & restart replicas) – Pairwise asynchronous data exchange (no global synchronization)

Consultant Challenges Consultants Challenges: – Software Integration and testing – high IO load, file clutter – Globus issues – Environment issues – mpirun/ibrun and MPI host-file issues – Run-away job issues – Package version issues

Simple Solution: SAGA Simple, integrated, stable, uniform and community-standard – Simple and Stable: 80:20 restricted scope – Integrated: similar semantics & style across primary functional areas – Uniform: same interface for different distributed systems – The building blocks upon which to construct “consistent” higher-levels of functionality and abstractions – OGF-standard, “official” Access Layer/API of EGI, NSF-XSEDE

SAGA Is Production-Grade Software Most (if not all) of the infrastructure is tried and true and deployed, but not at large scale – higher-level capabilities in development Sanity checks, perpetual demos and performance checks are run continuously to find problems Focusing on hardening and resolving issues when running at scale

SAGA on TeraGrid SAGA deployed as a CSA on Ranger, Kraken, Lonestar and QueenBee Advert service hosted by SAGA team Replica exchange workflows, EnKF workflows and coupled simulation workflows using BigJob on TeraGrid (many papers; many users) Basic science gateway framework

SAGA on XSEDE Latest version (1.6.1) is available on Ranger, Kraken and Lonestar, will be deployed on Blacklight and Trestles as CSA Automatic deployment and bootstrapping scripts available Working on virtual images for advert service and gateway framework (hosted at Data Quarry) Effort is to make the infrastructure: “system friendly, production ready and user accessible”

Perpetual SAGA Demo on XSEDE

FutureGrid & OGF-GIN

SAGA Supporting Infrastructure Advert Service: central point of persistent distributed coordination (anything from allocation project names to file locations and job info). Now a VM on Data Quarry! BigJob: a pilot job framework that acts as a container job for many smaller jobs (supports parallel and distributed jobs) DARE: science gateway framework that supports BigJob jobs

Aside: BigJob SAGA BigJob comprises of three components BigJob Manager that provides the pilot job abstraction and manages the orchestration and scheduling of BigJobs (which in turn allows the management of both bigjob objects and subjobs) BigJob Agent that represents the pilot job and thus, the application-level resource manager running on the respective resource, and Advert Service that is used for communication between the BigJob Manager and Agent. BigJob supports MPI and distributed (multiple-machine) workflows

Distributed, High Throughput & High Performance

BigJob Users (Limited Sampling) Dr. Jong-Hyun Ham, Assistant Professor, Plant Pathology and Crop Physiology, LSU/AgCenter Dr. Tom Keyes, Professor, Chemistry Department, Boston University Dr. Erik Flemington, Professor, Cancer Medicine, Tulane University Dr. Chris Gissendanner, Associate Professor, Department of Basic Pharmaceutical Sciences, College of Pharmacy, University of Louisiana at Monroe Dr. Tuomo Rankinen, Human Genome Lab, Pennington Biomedical Research Center

Case Study SAGA BigJob: The computational studies of nucleosome positioning and stability Main researchers: Rajib Mukherjee, Hideki Fujioka, Abhinav Thota, Thomas Bishop and Shantenu Jha Main supporters: Yaakoub El Khamra (TACC) and Matt McKenzie (NICS) Support from: LaSIGMA: Louisiana Alliance for Simulation-Guided Materials Applications NSF award number #EPS NIH: R01GM to Bishop Molecular Dynamics Studies of Nucleosome Positioning and Receptor Binding TeraGrid/XSEDE Allocation MCB to Bishop High Throughput High Performance MD Studies of the Nucleosome

Molecular Biology 101 Under Wraps C&E News July 17, Felsenfeld&Groudine, Nature Jan 2003

High Throughput of HPC Creation of nucleosome Problem space is large All atom, fully solvated nucleosome ~158,432 atoms NAMD 2.7 with Amber force fields 16 nucleosomes x 21 different sequences of DNA = 336 simulations Each requires 20 ns trajectory: broken into 1 ns lengths = 6,720 simulations ~25TB total output 4.3 MSU project 13.7nm x 14.5nm x 10.1nm

The 336 Starting positions

21, 42, 63 subjobs (1 ns simulations) The simulation time seems to be same while the wait time varies considerably. Simulation using BigJob in different machines Running many MD simulations on Many supercomputers, Jha et al, under review,

Simulation using BigJob requesting 24,192 cores 9 BigJob runs were submitted to Kraken Simulating 126 subjobs each with 192 cores BigJob size = 126 X 192 = 24,192 cores 60% of this study utilized Kraken Running many MD simulations on Many supercomputers, Jha et al, under review,

IF the protein completely controlled the deformations of the DNA then we would expect the red flashes to always appear at the same locations IF the DNA sequence determined the deformations (i.e. kinks occur at weak spots in the DNA) then we would expect that the red flashes would simply shift around the nucleosome Comparing the simulations in reading order: Interplay of both the mechanical properties of DNA and the shape of the protein

DARE Science Gateway Framework A science gateway to support (i) Ensemble MD Users, (ii) replica exchange users is attractive By our count: ~200 Replica Exchange papers a year; even more ensemble MD papers Build a gateway framework and an actual gateway (or two), make it available for the community – DARE-NGS: Next Generation Sequence data analysis (NIH supported) – DARE-HTHP: for executing High-Performance Parallel codes such as NAMD and AMBER in high-throughput mode across multiple distributed resources concurrently

DARE Infrastructure Secret sauce: L1 and L2 layers Depending on the application type, use a single tool, a pipeline or a complex workflow

Work plan: where we are now Deploy SAGA infrastructure on XSEDE resources (Finished on Ranger, Kraken and Lonestar) + Module support Deploy a central DB for SAGA on XSEDE resources (Finished: on an IU Data Quarry VM) Deploy the BigJob pilot job framework on XSEDE resources (Finished on Ranger) Develop BigJob based scripts to launch NAMD and IMPACT jobs (Finished on Ranger) Deploy the science gateway framework on VM (Done) Run and test scale-out

References SAGA website: BigJob website: DARE Gateway: