SAN DIEGO SUPERCOMPUTER CENTER Blue Gene for Protein Structure Prediction (Predicting CASP Targets in Record Time) Ross C. Walker.

Slides:



Advertisements
Similar presentations
Clusters, Grids and their applications in Physics David Barnes (Astro) Lyle Winton (EPP)
Advertisements

MPI version of the Serial Code With One-Dimensional Decomposition Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy.
Parallel Processing with OpenMP
Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis.
Introductions to Parallel Programming Using OpenMP
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Early Experiences with Datastar: A 10TF Power4 + Federation.
Parallel Research at Illinois Parallel Everywhere
Protein Structure Prediction using ROSETTA
A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches Martin Burtscher 1 and Hassan Rabeti 2 1 Department of Computer Science,
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Evaluating the Tera MTA Allan Snavely, Wayne Pfeiffer et.
SAN DIEGO SUPERCOMPUTER CENTER Accounting & Allocation Subhashini Sivagnanam SDSC Special Thanks to Dave Hart.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Science Advisory Committee Meeting - 20 September 3, 2010 Stanford University 1 04_Parallel Processing Parallel Processing Majid AlMeshari John W. Conklin.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Bioinformatics Ayesha M. Khan Spring 2013.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Yavor Todorov. Introduction How it works OS level checkpointing Application level checkpointing CPR for parallel programing CPR functionality References.
“SEMI-AUTOMATED PARALLELISM USING STAR-P " “SEMI-AUTOMATED PARALLELISM USING STAR-P " Dana Schaa 1, David Kaeli 1 and Alan Edelman 2 2 Interactive Supercomputing.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
Research Support Services Research Support Services.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Rensselaer Why not change the world? Rensselaer Why not change the world? 1.
S AN D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE On pearls and perils of hybrid OpenMP/MPI programming.
Andreas Horneffer for the LOPES Collaboration Detecting Radio Pulses from Air Showers with LOPES.
Supercomputing Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science.
Scientific Workflow Scheduling in Computational Grids Report: Wei-Cheng Lee 8th Grid Computing Conference IEEE 2007 – Planning, Reservation,
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
IM&T Vacation Program Benjamin Meyer Virtualisation and Hyper-Threading in Scientific Computing.
Trace-Based Optimization for Precomputation and Prefetching Madhusudan Raman Supervisor: Prof. Michael Voss.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
Diskless Checkpointing on Super-scale Architectures Applied to the Fast Fourier Transform Christian Engelmann, Al Geist Oak Ridge National Laboratory Februrary,
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Protein Structure Prediction
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.
Modelling protein tertiary structure Ram Samudrala University of Washington.
Belgrade, 26 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Overview of on-going work on NMMB HPC performance at BSC.
SAN DIEGO SUPERCOMPUTER CENTER Advanced User Support Project Overview Adrian E. Roitberg University of Florida July 2nd 2009 By Ross C. Walker.
Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox
CIS250 OPERATING SYSTEMS Chapter One Introduction.
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
© 2010 Pittsburgh Supercomputing Center Pittsburgh Supercomputing Center RP Update July 1, 2010 Bob Stock Associate Director
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
P.M. VanRaden and D.M. Bickhart Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD, USA
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Advanced User Support for MPCUGLES code at University of Minnesota October 09,
Experiences Running Seismic Hazard Workflows Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF.
SAN DIEGO SUPERCOMPUTER CENTER Advanced User Support Project Overview Thomas E. Cheatham III University of Utah Jan 14th 2010 By Ross C. Walker.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Getting Started: XSEDE Comet Shahzeb Siddiqui - Software Systems Engineer Office: 222A Computer Building Institute of CyberScience May.
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.
Petascale Computing Resource Allocations PRAC – NSF Ed Walker, NSF CISE/ACI March 3,
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
1
1.3 Operating system services An operating system provide services to programs and to the users of the program. It provides an environment for the execution.
Automated Structure Prediction using Robetta in CASP11 Baker Group David Kim, Sergey Ovchinnikov, Frank DiMaio.
HPC In The Cloud Case Study: Proteomics Workflow
HPC In The Cloud Case Study: Proteomics Workflow
Tom LeCompte High Energy Physics Division Argonne National Laboratory
Protein dynamics Folding/unfolding dynamics
Protein dynamics Folding/unfolding dynamics
Presentation transcript:

SAN DIEGO SUPERCOMPUTER CENTER Blue Gene for Protein Structure Prediction (Predicting CASP Targets in Record Time) Ross C. Walker

SAN DIEGO SUPERCOMPUTER CENTER The CASP Competition What is CASP? Critical Assessment of Techniques for Protein Structure Prediction (CASP) Biennial competition in protein structure prediction “world cup” of protein structure prediction CASP v7 ran 10 th May 2006 to 29 th Aug 2006 ca. 100 sequences over 100 days

SAN DIEGO SUPERCOMPUTER CENTER Protein Structure Prediction (Rosetta) Homology Modeling (Large sequence alignment) Template Based Modeling (Some sequence alignment) Ab Initio (No appreciable sequence alignment) The Rosetta Code of Prof. David Baker (HHMI) Supports all 3 Approaches

SAN DIEGO SUPERCOMPUTER CENTER Template Based Predictions Used for the majority of CASP targets Align sequence with proteins of known structure Generate initial “decoy” structures Do a monte-carlo refinement of the structures Structures with lowest energy “should” be the native structure.

SAN DIEGO SUPERCOMPUTER CENTER

The Problem Many thousands of refinements need to be completed in order to adequately sample phase space. CASP competition is time sensitive Sequences released continuously Predictions must be submitted within 3 weeks of sequence release Requires access to large computing resources.

SAN DIEGO SUPERCOMPUTER CENTER SDSC and Rosetta A collaboration between SDSC’s Scientific Applications Computing (SAC) group and David Baker Scientists from SDSC parallelized the Rosetta code to run on many thousands of processors Provided tailored resource allocation on SDSC Blue Gene and DataStar machines Provided the Baker team with access to 2 orders of magnitude more computing power than they had for CASP 6 (2004).

SAN DIEGO SUPERCOMPUTER CENTER Rosetta Modifications

SAN DIEGO SUPERCOMPUTER CENTER Modifications Specific to Blue Gene 1)Aggressively account for all memory used. 2)Variable Chunk Size Distribution by Master Thread. 3)No Global Communications - All point to point. 4)Distributed I/O - All tasks read directly from disk and write directly to disk. (No distribution of work packets over interconnect - overloads master thread. Only Job ID info sent) 5)Master generation of random seed for each slave thread - ensures no two threads have the same random seed.

SAN DIEGO SUPERCOMPUTER CENTER Performance

SAN DIEGO SUPERCOMPUTER CENTER Rosetta Usage on SDSC Blue Gene CASP ,080,000 SUs used (Average run size = 2048 cpus) 2007 (Estimated) Protein Structure Prediction2,500,000 SUs (4096 cpus) Protein Design1,800,000 SUs (2048 cpus)

SAN DIEGO SUPERCOMPUTER CENTER A Demonstration Successful scaling to >40,000 processors allowed a demonstration to be run at IBM Watson Research Labs Ross Walker (SDSC) and Srivatsan Raman (UW) took a CASP target released earlier in the day Generated Initial Guesses Submitted Job to all 20 racks of IBM Watson Blue Gene Ran for 3 hours Generated 120,000 Decoys Best candidate was selected and submitted as CASP prediction the same day.

SAN DIEGO SUPERCOMPUTER CENTER Results CASP 2006 Target T0380 Green = Prediction Blue = X-Ray Pink = Initial Template

SAN DIEGO SUPERCOMPUTER CENTER Results CASP 2006 Target T0380 Baker team results shown in black.

SAN DIEGO SUPERCOMPUTER CENTER The Future (1 million CPUs and beyond) Hierarchical Job Distribution System (1 master thread approach will be overloaded). On the fly detection of failed nodes and error correction. Manual Buffering of I/O? [Requires more memory per node] Parallelization of individual refinements. (SMP or MPI options)

SAN DIEGO SUPERCOMPUTER CENTER Acknowledgements David Baker (UW) Srivatsan Raman (UW) John Karanicolas (UW) IBM T.J.Watson Research SDSC NSF Funded SAC Program