Cyber-Research: Meeting the Challenge of a Terascale Computing Infrastructure Francine Berman Department of Computer Science and Engineering, U. C. San.

Slides:

Advertisements

Similar presentations

Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.

Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.

SACNAS, Sept 29-Oct 1, 2005, Denver, CO What is Cyberinfrastructure? The Computer Science Perspective Dr. Chaitan Baru Project Director, The Geosciences.

Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini

High Performance Computing Course Notes Grid Computing.

From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

Achieving Application Performance on the Information Power Grid Francine Berman U. C. San Diego and NPACI This presentation will probably involve audience.

Performance Prediction Engineering Francine Berman U. C. San Diego Rich Wolski U. C. San Diego and University of Tennessee This presentation will probably.

6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.

Achieving Application Performance on the Computational Grid Francine Berman U. C. San Diego This presentation will probably involve audience discussion,

CSE 160/Berman Programming Paradigms and Algorithms W+A 3.1, 3.2, p. 178, 6.3.2, H. Casanova, A. Legrand, Z. Zaogordnov, and F. Berman, "Heuristics.

Adaptive Computing on the Grid – The AppLeS Project Francine Berman U.C. San Diego.

AppLeS, NWS and the IPG Fran Berman UCSD and NPACI Rich Wolski UCSD, U. Tenn. and NPACI This presentation will probably involve audience discussion, which.

MCell Usage Scenario Project #7 CSE 260 UCSD Nadya Williams

Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.

Achieving Application Performance on the Computational Grid Francine Berman This presentation will probably involve audience discussion, which will create.

NPACI Alpha Project Review: Cellular Microphysiology on the Data Grid Fran Berman, UCSD Tom Bartol, Salk Institute.

Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.

AppLeS / Network Weather Service IPG Pilot Project FY’98 Francine Berman U. C. San Diego and NPACI Rich Wolski U.C. San Diego, NPACI and U. of Tennessee.

Workload Management Massimo Sgaravatto INFN Padova.

New Development in the AppLeS Project or User-Level Middleware for the Grid Francine Berman University of California, San Diego.

Simo Niskala Teemu Pasanen

Dagstuhl, February 16, 2009 Layers in Grids Uwe Schwiegelshohn 17. Februar 2009 Layers in Grids.

Introduction to Grid Computing Ann Chervenak Carl Kesselman And the members of the Globus Team.

1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,

Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.

The Pursuit for Efficient S/C Design The Stanford Small Sat Challenge: –Learn system engineering processes –Design, build, test, and fly a CubeSat project.

STRATEGIES INVOLVED IN REMOTE COMPUTATION

Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.

National Center for Supercomputing Applications The Computational Chemistry Grid: Production Cyberinfrastructure for Computational Chemistry PI: John Connolly.

Nimrod/G GRID Resource Broker and Computational Economy David Abramson, Rajkumar Buyya, Jon Giddy School of Computer Science and Software Engineering Monash.

Parallel Tomography Shava Smallen CSE Dept. U.C. San Diego.

ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.

WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.

What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.

Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.

Development Timelines Ken Kennedy Andrew Chien Keith Cooper Ian Foster John Mellor-Curmmey Dan Reed.

1 Logistical Computing and Internetworking: Middleware for the Use of Storage in Communication Micah Beck Jack Dongarra Terry Moore James Plank University.

Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.

1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.

Logistical Networking Micah Beck, Research Assoc. Professor Director, Logistical Computing & Internetworking (LoCI) Lab Computer.

Service - Oriented Middleware for Distributed Data Mining on the Grid ，劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.

1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.

Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,

Futures Lab: Biology Greenhouse gasses. Carbon-neutral fuels. Cleaning Waste Sites. All of these problems have possible solutions originating in the biology.

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.

Authors: Ronnie Julio Cole David

GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.

March 2004 At A Glance NASA’s GSFC GMSEC architecture provides a scalable, extensible ground and flight system approach for future missions. Benefits Simplifies.

GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.

Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]

Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.

The OptIPuter Project Tom DeFanti, Jason Leigh, Maxine Brown, Tom Moher, Oliver Yu, Bob Grossman, Luc Renambot Electronic Visualization Laboratory, Department.

Adaptive Computing on the Grid Using AppLeS Francine Berman, Richard Wolski, Henri Casanova, Walfredo Cirne, Holly Dail, Marcio Faerman, Silvia Figueira,

Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.

Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.

GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.

Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.

Parallel Tomography Shava Smallen SC99. Shava Smallen SC99AppLeS/NWS-UCSD/UTK What are the Computational Challenges? l Quick turnaround time u Resource.

Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.

UCSD SAN DIEGO SUPERCOMPUTER CENTER Fran Berman Grids in Context Dr. Francine Berman Director, San Diego Supercomputer Center Professor and HPC Endowed.

ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,

Workload Management Workpackage

Clouds , Grids and Clusters

Grid Computing.

The Anatomy and The Physiology of the Grid

Presentation transcript:

Cyber-Research: Meeting the Challenge of a Terascale Computing Infrastructure Francine Berman Department of Computer Science and Engineering, U. C. San Diego San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure

Computation People & Training Instrumentation (large and/or many small) Large Databases Digital Libraries Broadband Network Connectivity Partnership Courtesy of Ruzena Bajcsy, NSF Cyberinfrastructure: Facing the challenges of computation, collaboration and education in the new millennium

High performance computation High-speed networks create a virtual parallel “machine” Collaborative research, education, outreach, training Remote instruments linked to computers and data archives Large amounts of data, and their collection, analysis and visualization Cyber-Research: Synergistic, collaborative research leveraging the cyberinfrastructure’s potential Courtesy of Ruzena Bajcsy, NSF Computation People & Training Instrumentation (large and/or many small) Large Databases Digital Libraries Broadband Network Connectivity Partnership

Deep ImpactBroad ImpactIndividual Impact Value Added – SCALE and SYNERGY Cyberinfrastructure enables the scale and synergy necessary for fundamental advances in computational and computer science through collaborative research, development, infrastructure and education activities Cyber-research goals are to foster

Short History of “Cyber-Research” 1980’s: Grand Challenge Problems –Multidisciplinary, cutting edge computational science –Platform is supercomputer/MPP 1990’s: Cutting edge applications defined for each resource –Communication-intensive applications developed for Gigabit networks –Grid applications linked instruments and computation for large- scale results –Data management becomes an integral part of applications for many communities

Cyber-Research in the Terascale Age Cyber-research in the Terascale Age targets cutting- edge technologies –Terascale computers –Petascale data –Terabit networks Advances in network technology are eliminating the “tyranny of distance” –distributed environments become viable for an expanded class of applications How do we write and deploy applications in these environments?

Infrastructure for Cyber-Research Hardware Resources System-level Middleware User-level Middleware Applications Base-level Infrastructure Resources may be heterogeneous, shared by multiple users, exhibit different performance characteristics, governed in distinct administrative domains Base-level infrastructure provides illusion of a complex aggregate resource (e.g. Globus) System-level middleware better enables user to handle complexity of underlying system (e.g. SRB) User-level middleware enables user to achieve performance on dynamic aggregate resource platforms (think programming environments) Applications must be anticipatory, adaptive, responsive to achieve performance Promotes performance Reduces Complexity

Cyber-Research Applications At any level, the “machine” is heterogeneous and complex with dynamic performance variation Performance-efficient applications must be –Adaptive to a vast array of possible resource configurations and behavior –Anticipatory of load and resource availability –Responsive to dynamic events We have increasing experience deploying scientific applications in these environments –However, it is difficult to routinely achieve performance Hardware Resources System-level Middleware User-level Middleware Applications Base-level Infrastructure

OGI UTK UCSD File transfer time from SC’99 (Portland, Ore.) to OGI (Portland), UTK (Tenn.), UCSD (Ca.) courtesy of Alan Su The Need for Adaptivity Resource load in multi-user distributed environments varies dynamically Even homogeneous environments may be experienced as heterogeneous by the user Network Weather Service monitored and predicted data from SDSC resources courtesy of Rich Wolski

MCellAppLeS/NWS Adaptive Grid-enabled MCell A Tale of Cyber-Research The beginning: An initial collaboration between disciplinary scientists and computer scientists

A Tale of Cyber-Research: The Application MCell - a general simulator for cellular microphysiology –Uses Monte Carlo diffusion and chemical reaction algorithm in 3D to simulate complex biochemical interactions of molecules –Molecular environment represented as 3D space in which trajectories of ligands against cell membranes tracked Researchers: Tom Bartol, Terry Sejnowski [Salk], Joel Stiles [CMU/PSC], Miriam and Edwin Salpeter [Cornell] Code in development for over a decade –Large user community – over 20 sites Ultimate Goal: A complete molecular model of neuro-transmission at level of entire cell

AppLeS - Application-Level Scheduling project Researchers: Fran Berman [UCSD], Rich Wolski [U.Tenn], Henri Casanova [UCSD] and many students AppLeS is focused on real-world adaptive scheduling in dynamic cluster and Grid environments Schedule Deployment Resource Discovery Resource Selection Schedule Planning and Performance Modeling Decision Model accessible resources feasible resource sets evaluated schedules “best” schedule Grid Infrastructure NWS AppLeS + application = self-scheduling application Resources A Tale of Cyber-Research: The Software

Evolutionary Roadmap AppLeS Adaptively scheduled MCell APST Entropia/peer-to-peer Condor Blue Horizon/ MPP GrADS Virtual Instruments LoCI/IBP NWS Hardware Resources System-level Middleware User-level Middleware Applications Base-level Infrastructure

Evolutionary Roadmap AppLeS Adaptively scheduled MCell APST Entropia/peer-to-peer Condor Blue Horizon/ MPP GrADS Virtual Instruments LoCI NWS

Cluster User’s host and storage Network links Storage Scheduling MCell Computational tasks

Why Isn’t Scheduling Easy? Good scheduling consider the location of large shared files Computation and data transfer should minimize file transfer time Adaptive scheduling necessary to account for dynamic environment

Scheduling Model Scheduling goals: –Identify heuristics that minimize execution time (computation and data movement) in a multi-cluster environment –Identify heuristics that are performance-efficient, even if load predictions are not always accurate

Network links Hosts (Cluster 1) Hosts (Cluster 2) Time Resources Computation G Scheduling event Scheduling event Computation Scheduling Approach Contingency Scheduling: Allocation developed by dynamically generating a Gantt chart for scheduling unassigned tasks between scheduling events Basic skeleton –Compute the next scheduling event –Create a Gantt Chart G –For each computation and file transfer currently underway, compute estimate of its completion time and fill in the corresponding slots in G –Select a subset T of the tasks that have not started execution –Until each host has been assigned enough work, heuristically assign tasks to hosts, filling in slots in G –Implement schedule

Network links Hosts (Cluster 1) Hosts (Cluster 2) Time Resources Computation Scheduling event Scheduling event Computation MCell Adaptive Scheduling Free “Parameters” –Frequency of scheduling events –Accuracy of task completion time estimates –Subset T of unexecuted tasks –Scheduling heuristic used

Scheduling Heuristics Scheduling heuristics useful for parameter sweeps in distributed environments –Min-Min [task/resource that can complete the earliest is assigned first] –Max-Min [longest of task/earliest resource times assigned first] –Sufferage [task that would “suffer” most if given a poor schedule assigned first, as computed by max - second max completion times] –Extended Sufferage [minimal completion times computed for task on each cluster, sufferage heuristic applied to these] –Workqueue [randomly chosen task assigned first] Criteria for evaluation: –Which scheduling heuristics minimize execution time? –How sensitive are heuristics to inaccurate performance information? –How well do heuristics exploit the location of shared input files and the cost of data transmission?

How Do the Scheduling Heuristics Compare? Comparison heuristics performance when it is up to 40 times more expensive to send a shared file across the network than it is to compute a task Extended sufferage takes advantage of file sharing to achieve good application performance Max-min Workqueue XSufferage Sufferage Min-min Simulation of MCell run in a multi-cluster environment

How sensitive are the heuristics to inaccurate performance information? One single scheduling event Scheduling events every 250 sec Scheduling events every 500 sec Scheduling events every 125 sec simulations workqueue Gantt chart heuristics

WorkqueueGantt Chart heuristics Workqueue More error Larger Average Makespan “Regime” Scheduling workqueue Gantt chart heuristics Better performance More non-uniformity Fixed workqueue Workqueue Fixed Workqueue Fixed

CLIENT APST Middleware From MCell to Parameter Sweeps Mcell is a parameter sweep application Parameter Sweeps = class of applications that are structured as multiple instances of an “experiment” with distinct parameter sets In PS applications, independent experiments may share input files

Evolutionary Roadmap AppLeS Adaptively scheduled MCell APST Entropia/peer-to-peer Condor Blue Horizon/ MPP GrADS Virtual Instruments LoCI/IBP NWS Hardware Resources System-level Middleware User-level Middleware Applications Base-level Infrastructure

APST User-level Middleware AppLeS Parameter Sweep Template (APST) –Targets structurally similar class of applications (parameter sweeps) –Can be instantiated in a user-friendly timeframe –Provides good application performance –Can be used to target a wide spectrum of platforms adaptively Joint work with Henri Casanova, Dmitrii Zagorodnov, Arnaud Legrand APST paper was best paper finalist at SC’00

NetSolve Globus Blue Horizon NWS IBP Clusters transport APIexecution API metadata API scheduler API Underlying Resources APST Daemon GASS IBP NFS GRAM NetSolve Condor, Ninf, Legion,.. NWS Workqueue Gantt chart heuristic algorithms Workqueue++ MinMinMaxMinSufferageXSufferage APST Client Controller interacts Command-line client Metadata Bookkeeper Actuator Scheduler triggers transferexecute actuate APST Architecture query retrieve

Cool things about APST Scheduler can be used for structurally similar set of Parameter Sweep Applications, in addition to MCell INS2D, INS3D (NASA Fluid Dynamics applications) Tphot (SDSC, Proton Transport application) NeuralObjects (NSI, Neural Network simulations) CS simulation applications for our own research (Model validation) PFAM (JCSG, structural genomics application), etc. Actuator’s APIs are interchangeable and mixable (NetSolve+IBP) + (GRAM+GASS) + (GRAM+NFS) Scheduler allows for dynamic adaptation, anticipation No Grid software is required However lack of it (NWS, GASS, IBP) may lead to poorer performance APST has been released to NPACI, NASA IPG, and is available at

University of Tennessee, Knoxville University of California, San Diego Tokyo Institute of Technology NetSolve + NFS NetSolve + IBP NetSolve + IBP GRAM + GASS Does the location of shared input files matter? CS Experiments –We ran instances of APST/MCell across a wide-area distributed platform –We compared execution times for both workqueue (location-insensitive) and Gantt chart heuristics (location- sensitive).

Data Location Matters workqueue Gantt-chart heuristics Experimental Setting: MCell application with 1,200 tasks: 6 Monte-Carlo simulations Input files: 1, 1, 20, 20, 100, 100 MB Gantt chart heuristics location sensitive Workqueue scheduling heuristics location insensitive 4 scenarios: (a) all input files replicated everywhere (b) 100MB files in Japan + California + Tennessee (c) 100MB files in Japan + California (d) all input files are only in Japan

What Can’t We Do with APST? Only mechanism for steering computation is the user: –User stops the execution at a particular point to view results –System does not alert user when interesting results have been computed –Scheduling goal is minimizing total execution rather than providing real-time feedback to user –Scheduling algorithms not designed to provide good intermediate results, only final results Rudimentary user interface No support for visualization

Evolution Adaptively scheduled MCell APST Entropia/ peer-to-peer Condor Blue Horizon/ MPP GrADS Virtual Instruments LoCI/IBP AppLeS NWS

Evolution Adaptively scheduled MCell APST Entropia/ peer-to-peer Condor Blue Horizon/ MPP GrADS Virtual Instruments LoCI/IBP MCell serves as driving application for VI project MCell used as an example application for Internet Backplane Protocol APST adapts to IBP envts. NPACI Alpha Project: APST targeted to MPPs APST targeted to Condor: Master’s thesis in progress APST will be targeted to GrADS development and execution envt. APST will be targeted to Entropia platforms integrated computational steering and scheduling AppLeS NWS Expanded portability New functionality AppLeS/NWS adaptive approach influences design of VI

From APST to Virtual Instruments Interactive steering is critical –Large multi-dimensional parameter spaces necessitate user- directed search –User-directed search requires feedback APST does not accommodate steering –APST scheduling heuristics optimize execution time, not feedback frequency –No capability for visualization New approach required to develop middleware that combines steering and visualization with effective scheduling

Work-in-Progress: Virtual (Software) Instrument Project Virtual SW instruments provide “knobs” which allow user to change the computational goals during execution, in order to follow promising leads –Application perceived as “continuous” by SW, must be scheduled between non-deterministic steering events –Scheduler and steering mechanism must adapt to continuously changing goals SW and new disciplinary results focus of NSF ITR project Project Participants: Fran Berman (UCSD/SDSC) [PI] Henri Casanova (UCSD/SDSC) Mark Ellisman (UCSD/SDSC) Terry Sejnowski (Salk) Tom Bartol (Salk) Joel Stiles (CMU/PSC) Jack Dongarra (UTK) Rich Wolski (UCSB)

Virtual Instrument Project Status PROJECT GOALS –To investigate and develop the scheduling and steering heuristics and models necessary for performance for software Virtual Instruments –To have a robust, mature and working Virtual Instrument prototype at the end of the project –To enable the MCell user community to use Virtual Instrument technology COMPONENTS UNDER COORDINATED DEVELOPMENT –Dynamic event model Model in which steering events are generated by state of the system, user steering activities, user-provided criteria –Adaptive steering-sensitive scheduling algorithms –Data management and phased (CS and Neuro) experiment strategies –Virtual Instrument Prototype More sophisticated and user-friendly user interface Ultimately, running prototype of software, suitable for MCell users

Computation People & Training Instrumentation (large and/or many small) Large Databases Digital Libraries Broadband Network Connectivity Partnership Targets high- performance distributed environments Uses predictions of network load to help stage input and output data Synergistic research, collaboration, education, training Remote instruments (electron microscopes) linked to computers and data archives Targets large amounts of data requiring collection, analysis and visualization MCell/APST/VI Cyber-research

MCell/APST/Virtual Instrument projects represent a model for the large-scale research, development and collaborations fostered by the PACI program –It is nearly impossible to do work at this scale within a traditional academic department These large-scale, multidisciplinary collaborations are becoming increasingly critical to achieve progress in science and engineering. What Does This Have to Do with SDSC and NPACI?

Cyber-Research requires Scale and Synergy Large-scale cutting edge results require –Cutting edge HW –Usable SW –Human resources Knowledge Relationships Synergy –Education, communication, dissemination

A Cyber-Research Bibliography PROJECT HOME PAGES MCell UCSD Grid Lab and AppLeS Virtual Instrument Project Network Weather Service APST NPACI MCell Alpha Project PLATFORM HOME PAGES SDSC NASA IPG GrADS IBP Entropia