TMC BioGrid A GCC Consortium Ken Kennedy Center for High Performance Software Research (HiPerSoft) Rice University

TMC BioGrid A GCC Consortium Ken Kennedy Center for High Performance Software Research (HiPerSoft) Rice University http://www.cs.rice.edu/~ken/Presentations/BioGrid.pdf

HiPerSoft NSF VGrADS ITR —GrADS project phasing out DOE Los Alamos Computer Science Institute (LACSI) —LANL, Rice, UH, Tennessee, UNC, UNM Collaborations with two NSF PACIs Telescoping Languages Project (NSF,DOE,DoD,Texas) —Domain languages based on Matlab and S DOE SciDAC Languages Project (John Mellor-Crummey) —Co-Array Fortran Gulf Coast Center For Computational Cancer Research —Rice and MD Anderson Cancer Center Two NSF Major Research Infrastructure Grants —Two teraflop clusters (Itanium and Opteron) Houston BioGrid

Texas Medical Center BioGrid A Partnership under the Gulf Coast Consortia —Participants: Rice, UH, Baylor College of Medicine, MD Anderson Cancer Center Goals —Foster research and development on application of Grid computing technology to biomedicine —Construct a useful Grid computational infrastructure Current Infrastructure —Machines: Itanium and Opteron clusters at UH, Itanium cluster at Rice, Pentium cluster at Baylor, MD Anderson pending —Interconnection: 10 Gigabit optical interconnect among Rice,UH, Baylor, MD Anderson in progress, connection to National Lamda Rail pending —Software: Globus + VGrADS software stack (see next slide)

BioGrid Principal Investigators Don Berry, MD Anderson Bradley Broom, MD Anderson Wah Chiu, Baylor Richard Gibbs, Baylor Lennart Johnsson, Houston Ken Kennedy, Rice Charles Koelbel, Rice John Mellor-Crummey, Rice Moshe Vardi, Rice

BioGrid Research Software Research —Virtual Grid Application Development Software (VGrADS) Project –Producing software that will make it easy to develop Grid applications with optimized performance –Automatic scheduling and launching on the Grid based on performance models –Distribution of software stack to construct testbeds Applications —EMAN: 3D Image Reconstruction Application Suite (Baylor-Rice-UH) –Automatically translated (by VGrADS) to Grid execution with load-balanced scheduling —Script-based integration and analysis of experimental cancer data bases planned (MD Anderson, Rice)

The VGrADS Team VGrADS is an NSF-funded Information Technology Research project Keith Cooper Ken Kennedy Charles Koelbel Linda Torczon Rich Wolski Fran Berman Andrew Chien Henri Casanova Carl Kesselman Lennart Johnsson Dan Reed Jack Dongarra Plus many graduate students, postdocs, and technical staff!

VGrADS Principal Investigators Francine Berman, UCSD Henri Casanova, UCSD Andrew Chien, UCSD Keith Cooper, Rice Jack Dongarra, Tennessee Lennart Johnsson, Houston Ken Kennedy, Rice Charles Koelbel, Rice Carl Kesselman, USC ISI Dan Reed, UIUC Richard Tapia Linda Torczon, Rice Rich Wolski, UCSB

National Distributed Problem Solving Database Supercomputer Database Supercomputer

VGrADS Vision Build a National Problem-Solving System on the Grid —Transparent to the user, who sees a problem-solving system Why don’t we have this today? —Complex application development –Dynamic resources require adaptivity –Unreliable resources require fault tolerance –Uncoordinated resources require management —Weak programming tools and models –Tied to physical resources –If programming is hard, the Grid will not not reach its potential What do we propose as a solution? —Virtual Grids (vgrids) raise level of abstraction —Tools exploit vgrids, provide better user interface

GrADSoft Architecture Config- urable Object Program Execution Environment Program Preparation System Performance Feedback Whole- Program Compiler Libraries Source Appli- cation Software Components Binder Performance Problem Real-time Performance Monitor Resource Negotiator Scheduler Grid Runtime System Negotiation

The Virtual Grid Application Development Software (VGrADS) Project Ken Kennedy Center for High Performance Software Rice University http://www.hipersoft.rice.edu/vgrads/

The VGrADS Vision: National Distributed Problem Solving Where We Want To Be —Transparent Grid computing –Submit job –Find & schedule resources –Execute efficiently Where We Are —Low-level hand programming What Do We Need? —A more abstract view of the Grid –Each developer sees a specialized “virtual grid” —Simplified programming models built on the abstract view –Permit the application developer to focus on the problem Database Supercomputer

The Original GrADS Vision Config- urable Object Program Execution Environment Program Preparation System Performance Feedback Whole- Program Compiler Libraries Source Appli- cation Software Components Binder Performance Problem Real-time Performance Monitor Resource Negotiator Scheduler Grid Runtime System Negotiation

Lessons from GrADS Mapping and Scheduling for MPI Jobs is Hard –Although we were able to do some interesting experiments Performance Model Construction is Hard —Hybrid static/dynamic schemes are best —Difficult for application developers to do by hand Heterogeneity is Hard —We completely revised the launching mechanisms to support this —Good scheduling is critical Rescheduling/Migration is Hard —Requires application collaboration (generalized checkpointing) —Requires performance modeling to determine profitability Scaling to Large Grids is Hard —Scheduling becomes expensive

VGrADS Virtual Grid Hierarchy

Virtual Grids and Tools Abstract Resource Request —Permits true scalability by mapping from requirements to set of resources –Scalable search produces manageable resource set —Virtual Grid services permit effective scheduling –Fault tolerance, performance stability Look-Ahead Scheduling —Applications map to directed graphs –Vertices are computations, edges are data transfers —Scheduling done on entire graph –Using automatically-constructed performance models for computations –Depends on load prediction (Network Weather Service) Abstract Programming Interfaces —Application graphs constructed from scripts –Written in standard scripting languages (Python,Perl,Matlab)

Virtual Grids Goal: Provide abstract view of grid resources for application use —Will need to experiment to get the right abstractions Assumptions: —Underlying scalable information service —Shared, widely distributed, heterogeneous resources —Scaling and robustness for high load factors on Grid —Separation of the application and resource management system Basic Approach: —Specify vgrid as a hierarchy of … –Aggregation operators (ClusterOf, LooseBagOf, etc.) with … –Constraints (type of processor, installed software, etc.) and … –Application-based rankings (e.g. predicted execution time) —Execution system returns (candidate) vgrid, structured as request —Application can use as it sees fit, make further requests

Programming Tools Focus: Automating critical application-development steps —Building workflow graphs –From Python scripts used by EMAN —Scheduling workflow graphs –Heuristics required (problems are NP-complete at best) –Good initial results if accurate predictions of resource performance are available (see EMAN demo) —Constructing of performance models –Based on loop-level performance models of the application –Requires benchmarking with (relatively) small data sets, extrapolating to larger cases —Initiating application execution –Optimize and launch application on heterogeneous resources

VGrADS Demos at SC04 EMAN - Electron Microscopy Analysis [BCM, Rice, Houston] —3D reconstruction of particles from electron micrographs —Workflow scheduling and performance prediction to optimize mapping EMAN Refinement Process

EMAN Workflow Scheduling Experiment Testbed —64 dual processor Itanium IA-64 nodes (900 MHz) at Rice University Terascale Cluster [RTC] —60 dual processor Itanium IA-64 nodes (1300 MHz) at University of Houston [acrl] —16 Opteron nodes (2009 MHz) at University of Houston Opteron cluster [medusa] Experiment —Ran the EMAN refinement cycle and compared running times for “classesbymra”, the most compute intensive parallel step in the workflow —Determine the 3D structure of the ‘rdv’ virus particle with large input data [2GB]

Results: Efficient Scheduling We compared the following workflow scheduling strategies 1.Heuristic Scheduling with accurate performance models generated semi-automatically - HAP 2.Heuristic Scheduling with crude performance models based on CPU power of the resources - HCP 3.Random Scheduling with no performance models - RNP 4.Weighted random scheduling with accurate performance models - RAP We compared the makespan of the “classesbymra” step for the different scheduling strategies

Results: Efficient Scheduling Scheduling method # instances mapped to RTC (IA-64) # instances mapped to medusa (Opteron) # nodes picked at RTC # nodes picked at medusa Execution Time at RTC (minutes) Execution Time at medusa (minutes) Overall makespan (minutes) HAP50605013386505 HCP58525013757410757 RNP892143911212981121 RAP57533410762530762 Set of resources: 50 RTC nodes, 13 medusa nodes HAP - Heuristic Accurate PerfModel HCP - Heuristic Crude PerfModel RNP - Random No PerfModel RAP - Random Accurate PerfModel

Results: Load Balance # instances mapped to RTC (IA-64) # instances mapped to medusa (Opteron) # instances mapped to acrl (IA-64) Execution Time at RTC (minutes) Execution Time at medusa (minutes) Execution time at acrl (minutes) Overall makespan (minutes) 294239383410308410 Set of resources: 43 RTC nodes, 14 medusa nodes, 39 acrl nodes Good load balance due to accurate performance models

Results: Accurateness of Performance Models Our performance models were pretty accurate — rank[RTC_node] / rank[medusa_node] = 3.41 — actual_exec_time[RTC_node] / actual_exec_time[medusa_node] = 3.82 — rank[acrl_node] / rank[medusa_node] = 2.36 — actual_exec_time[acrl_node] /actual_exec_time[medusa_node] = 3.01 Accurate relative performance model values result in efficient load balance of the classesbymra instances

Final Comments The TMC BioGrid —An effort to solve important problems in computational biomedicine —Use the Grid to pool resources –10 Gbps interconnect –Pooled computational resources at the particpating The Challenge —Making it easy to build Grid applications Our Approach —Build on the VGrADS tools effort –Performance model based scheduling on abstract Grids EMAN Challenge Problem —End goal: 3000 Opterons for 100 hours

TMC BioGrid A GCC Consortium Ken Kennedy Center for High Performance Software Research (HiPerSoft) Rice University

Similar presentations

Presentation on theme: "TMC BioGrid A GCC Consortium Ken Kennedy Center for High Performance Software Research (HiPerSoft) Rice University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

TMC BioGrid A GCC Consortium Ken Kennedy Center for High Performance Software Research (HiPerSoft) Rice University

Similar presentations

Presentation on theme: "TMC BioGrid A GCC Consortium Ken Kennedy Center for High Performance Software Research (HiPerSoft) Rice University"— Presentation transcript:

Similar presentations

About project

Feedback