Using Application-Domain Knowledge in the Runtime Support of Multi-Experiment Computational Studies Siu Yau Dissertation Defense, Dec 08.

Slides:

Advertisements

Similar presentations

LIBRA: Lightweight Data Skew Mitigation in MapReduce

Advertisements

SLA-Oriented Resource Provisioning for Cloud Computing

1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.

CS487 Software Engineering Omar Aldawud

Introduction CSCI 444/544 Operating Systems Fall 2008.

Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.

Enabling Interactive Multi-Experiment Computational Studies through a Permeable Runtime System-Application Interface Ph.D. Thesis Proposal Siu Yau Jun.

Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.

ProActive Task Manager Component for SEGL Parameter Sweeping Natalia Currle-Linde and Wasseim Alzouabi High Performance Computing Center Stuttgart (HLRS),

The Role of Software Engineering Brief overview of relationship of SE to managing DSD risks 1.

GridFlow: Workflow Management for Grid Computing Kavita Shinde.

Sim-X: Parallel System Software for Interactive Multi-Experiment Computational Studies Siu-Man Yau, New York University Eitan Grinspun.

Application-Aware Management of Parallel Simulation Collections Siu-Man Yau, New York University Steven G. Parker

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.

1 Optimizing Utility in Cloud Computing through Autonomic Workload Execution Reporter : Lin Kelly Date : 2010/11/24.

Introduction ： ‘Skoll: Distributed Continuous Quality Assurance’ Morimichi Nishigaki.

Result Reuse in Design Space Exploration: A Study in System Support for Interactive Parallel Computing Siu-Man Yau, New York University.

Component-based Authoring of Complex, Petri net-based Digital Library Infrastructure Yung Ah Park, Unmil P. Karadkar, and Richard Furuta Department of.

New Challenges in Cloud Datacenter Monitoring and Management

What is Software Architecture?

Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.

CPIS 357 Software Quality & Testing

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

Model-Driven Analysis Frameworks for Embedded Systems George Edwards USC Center for Systems and Software Engineering

4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University.

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory.

Chapter 3 Top-Down Design with Functions Part II J. H. Wang ( 王正豪 ), Ph. D. Assistant Professor Dept. Computer Science and Information Engineering National.

1 IBM Software Group ® Mastering Object-Oriented Analysis and Design with UML 2.0 Module 9: Describe the Run-time Architecture.

FATCOP: A Mixed Integer Program Solver Michael FerrisQun Chen Department of Computer Sciences University of Wisconsin-Madison Jeff Linderoth, Argonne.

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.

Energy-Aware Resource Adaptation in Tessellation OS 3. Space-time Partitioning and Two-level Scheduling David Chou, Gage Eads Par Lab, CS Division, UC.

CSE 303 – Software Design and Architecture

Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T.

ApproxHadoop Bringing Approximations to MapReduce Frameworks

Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.

System Design. System Design After analyzing the problem, you must decide how to approach the design. During.

Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,

CISC Machine Learning for Solving Systems Problems Presented by: Eunjung Park Dept of Computer & Information Sciences University of Delaware Solutions.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

A Pattern Language for Parallel Programming Beverly Sanders University of Florida.

PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

Kai Li, Allen D. Malony, Sameer Shende, Robert Bell

Advanced Computer Systems

Introduction to Load Balancing:

Operating Systems : Overview

POAD Book: Chapter 8 POAD: Analysis Phase

Parallel Programming By J. H. Wang May 2, 2017.

Parallel Algorithm Design

Quick Introduction to OS

Model-Driven Analysis Frameworks for Embedded Systems

Chapter 5 Designing the Architecture Shari L. Pfleeger Joanne M. Atlee

Operating Systems : Overview

Multithreaded Programming

Laura Bright David Maier Portland State University

Chapter 4: Threads & Concurrency

Presented By: Darlene Banta

CSE 1020:Software Development

From Use Cases to Implementation

Presentation transcript:

Using Application-Domain Knowledge in the Runtime Support of Multi-Experiment Computational Studies Siu Yau Dissertation Defense, Dec 08

Multi-Experiment Study (MES) Simulation software rarely run in isolation Multi-Experiment Computational Study –Multiple executions of a simulation experiment –Goal: Identify interesting regions in input space of simulation code Examples in engineering, science, medicine, finance Interested in aggregate result –Not individual experiment

MES Challenges Systematically cover input space –Refinement + High dimensionality  Large number of experiments (100s or 1000s) and/or user interaction Accurate individual experiments –Spatial + Temporal refinement  Long-running individual experiments (days or weeks per experiment) Subjective goal –Require study-level user guidance

MES on Parallel Architectures Parallel architecture maps well to MES Dedicated, local access to small- to medium-sized parallel computers –Interactive MES  User-directed coverage of exploration space Massively-parallel systems –Multiple concurrent parallel experiments  exploit power of massively parallel systems Traditional systems lack high-level view

Thesis Statement To meet the interactive and computational requirements of Multi-Experiment Studies, a parallel run-time system must view an entire study as a single entity, and use application-level knowledge that are made available from the study context to inform its scheduling and resource allocation decisions.

Outline MES Formulation, motivating examples –Defibrillator Design, Helium Model Validation Related Work Research Methodology Research Test bed: SimX Optimization techniques –Sampling, Result reuse, Resource allocation Contributions

MES Formulation Simulation Code: maps input to result –Design Space: Space of possible inputs to simulation code Evaluation Code: maps result to performance metric –Performance Space: Space of outputs of evaluation code Goal: Find Region of Interest in Design & Performance Space

Example: Defibrillator Design Help design implantable defibrillators Simulation Code: –Electrode placements + shock voltage  Torso potential Evaluation Code: –Torso potential + activation/damage thresholds  % activated & damaged heart tissues Goal: Placement + voltage combination to max activation, min damage

Example: Gas Model Validation Validate gas-mixing model Simulation Code: –Prandtl number + Gas inlet velocity  Helium plume motion Evaluation Code: –Helium plume motion  Velocity profile deviation from real-life data Goal: Find Prandtl number + inlet velocity to minimize deviation

Example: Pareto Optimization Set of inputs that cannot be improved in all objectives Damage Activation

Example: Pareto Optimization Set of inputs that cannot be improved in all objectives

Interactive Exploration of Pareto Frontier –Change set up (voltage, back electrode, etc.)  new study –Interactive exploration of “study space” One user action  one aggregate result Need study-level view, interactive rate Challenge: Defibrillator Design

Challenge: Model Validation Multiple executions of long-running code –6x6 grid = 36 experiment –~3000 timesteps per 8 seconds per timestep –6.5 hours per experiment  10 days per study Schedule and allocate resource as a single entity: how to distribute parallel resources?

Grid Schedulers –Condor, Globus –Each experiment treated as a “black box” Application-aware grid infrastructures: –Nimrod/O and Virtual Instrument –Take advantage of application knowledge – but in ad-hoc fashion –No consistent set of APIs reusable across different MESs Related Work: Grid Schedulers

Related Work: Parallel Steering Grid-based Steering –Grid-based: RealityGrid, WEDS –Steer execution of inter-dependent tasks –Different focus: Grid Vs cluster Parallel Steering Systems –Falcon, CUMULVS, CSE –Steers single executions (not collections) on parallel machines

Methodology Four example MESs, varying properties StudyBridge Design Defibrillator Design Animation Design Gas Model Validation User interaction NoYes No No. of experiments 100K65K~100K36 Time per experiment 7 secs2 secs< 1 sec6.5 hours Parallel code? No Yes Study goalPareto Optimization Aesthetic Measure Pareto Optimization

Methodology (cont’d) Identify application-aware system policies –Scheduling, Resource allocation, User interface, Storage support Construct research test bed (SimX) –API to import application-knowledge –Implemented on parallel clusters Conduct example MESs –Implement techniques, measure effect of application-aware system policies

Test bed: SimX Parallel System for Interactive Multi- Experiment Studies (SIMECS) Support MESs on parallel clusters Functionality-based components –UI, Sampler, Task Queue, Resource Allocator, Simulation container, SISOL Each component with specific API Adapt API to the needs of the MES

SISOL API Test bed: SimX Front-end Manager Process Worker Process Pool User Interface: Visualisation & Interaction Sampler Resource Allocator FUEL Interface SISOL Server Pool Data Server Dir Server Task Queue Simulation code FUEL Interface Evaluation code Simulation Container

Test bed: SimX 3 interoperable implementations for adapting to existing software: –Stanalone, as a set of library –SimX/SCIRun, as a set of SCIRun modules –SimX/Uintah, as a set of modified Uintah Components View computational study as a single entity

Optimization techniques Reduce number of experiments needed: –Automatic sampling –Study-level user steering –Study-level result reuse Reduce run time of individual experiments: –Reuse results from another experiment: checkpoints, internal states Improve resource utilization rate –Min. parallel. overhead & max. reuse potential –Preemption: claim idle resources

Active Sampling If MES is optimization study (i.e., region of interest is to optimize a function) –Incorporate search algorithm in scheduler Pareto optimizations: Active Sampling –Cover design space from coarse to fine grid –Use aggregate results from coarse level to identify promising regions Reduce number of experiments needed

Active Sampler (cont’d) Initial Grid 1 st level results First Refinement 2 nd level results 2 nd Refinement 3 rd level results

Custom Sampler SISOL API Support for Sampling Front-end Manager Process Worker Process Pool Naïve (Sweep) Sampler Resource Allocator FUEL Interface SISOL Server Pool Data Server Dir Server Task Queue Simulation code FUEL Interface Evaluation code Random Sampler Active (Pareto) Sampler Simulation Container void setStudy(StudySpec) void registerResult(experiment, performance) experiment getNextPointToRun () SimX Sampler API User Interface: Visualisation & Interaction

Evaluation: Active Sampling Helium validation study –Resolve Pareto frontier on 6x6 grid –Reduce no. of experiments from 36 to 24 Defibrillator study –Resolve Pareto frontier on 256x256 grid –Reduce no. of experiments from 65K to 7.3K –Non-perfect scaling due to dependencies –At 128 workers: Active sampling: 349 secs; Grid sampling: 900 secs

Result reuse MES: many similar runs of simulation code Share information between experiments –speed up experiment that reuse information –only need to calculate deltas Many types: depends on information used –varying degrees of generality Reduce individual experiment run time –except study-level reuse

Result reuse types TypeResult reusedApplicability Checkpoint reuse Simulation code output Time-stepping code, Iterative solver Preconditioner reuse PreconditionerIterative linear solver Intermediate result reuse Internal stateSimulation code with shared internal states Simulation result reuse Simulation code output Interactive MESs Performance metric reuse Evaluation code output Interactive MESs Study-level reuse Aggregate result of study Interactive MESs

Intermediate Result Reuse Defibrillator simulation code solves 3 systems, linearly combine solutions Same system needed by different experiments Cache the solutions A d x=b d A a x=b a A b x=b b A c x=b c Store A c -1 b c and A b -1 b b

SISOL API Support for Result Reuse Front-end Manager Process Worker Process Pool Sampler Resource Allocator FUEL Interface SISOL Server Pool Data Server Dir Server Task Queue Simulation Code FUEL Interface Evaluation code A b -1 b b A a -1 b a SISOL API: object StartRead(objSet, coord) void EndRead(object) object StartWrite(objSet, coord) void EndWrite(objSet, object) User Interface: Visualisation & Interaction

Checkpoint Result Reuse Helium code terminates when KE stabilizes Start from another checkpoint – stabilize faster Must have same inlet velocities

Study-level Result Reuse Interactive study: two similar studies Use Pareto frontier from first study as a guide for next study

Evaluation: Result Reuse Checkpoint reuse in Helium Model Study: –No reuse: 3000 timesteps; with reuse: 1641 –18 experiments out of 24 able to reuse –28% improvement overall Defibrillator study –No reuse: 7.3K 2 secs each = 349 secs total on 128 procs –With reuse: 6.5K 1.5 secs = 123 secs total on 128 procs –35% improvement overall

Resource Allocation MES made up of parallel simulation codes How to divide cluster among experiments? –Parallelization overhead fewer processes per experiment –Active sampling + reuse Some experiments more important; more processes for those experiments Adapt allocation policy to MES: –Use application knowledge to decide which experiments are prioritized

Resource Allocation Batching strategy: select subset (batch) and assign high priority, run concurrently –Considerations for batching policies Scaling behavior: maximize batch size Sampling policy: prioritize “useful” samples Reuse potential: prioritize experiments with reuse Preemption strategy: –claim unused processing elements and assign to experiments in progress

Resource Allocation: Batching Batch for Active Sampling Identify independent experiments in sampler Max. parallelism while allowing active sampling First Batch 1 st Pareto-Optimal Second Batch 1 st & 2 nd Pareto Opt. 3 rd Batch 1 st to 3 rd Pareto Opt. 4 rd Batch Pareto Frontier Prantl Number Inlet Velocity

Resource Allocation: Batching Active Sample batching 1 st Batch 2 nd Batch 3 rd Batch 4 th Batch

Resource Allocation: Batching Batch for reuse class Sub-divide each batch into 2 smaller batches: –1 st sub-batch: first in reuse class; no two belong to same reuse class –No two concurrent from- scratch experiments can reuse each other’s checkpoints (max. reuse potential) –Experiments in same batch have comparable run times (reduce holes) Prantl Number Inlet Velocity

Resource Allocation: Batching Batching for reuse classes 1 st Batch 2 nd Batch 3 rd Batch 4 th Batch 5 th Batch 6 th Batch

Resource Allocation: Preemption With preemption 1 st Batch 2 nd Batch 3 rd Batch 4 th Batch 5 th Batch 6 th Batch

SISOL API Support for Resource Allocation Front-end Manager Process Worker Process Pool User Interface: Visualisation & Interaction Sampler Resource Allocator FUEL Interface SISOL Server Pool Data Server Dir Server Task Queue Simulation code FUEL Interface Evaluation code Simulation Container TaskQueue::AddTask(Experiment) TaskQueue:: CreateBatch(set &) TaskQueue::GetIdealGroupSize() TaskQueue:: AssignNextTask(GroupID) Reconfigure(const int* assignment)

Evaluation: Resource Allocation Knowledge used Total timeUtilization Rate Avg. time per run Improvement None (run on 1 worker) 12 hr 35 min56.3%6 hr 17 minN/A None (run 1 experiment) 20 hr 35 min100%34.3 minN/A + Active Sampling 6 hr 10 min71.1%63.4 min51% / 70% + Reuse classes 5 hr 10 min71.3%39.7 min59% / 75% + Preemption4 hr 30 min91.8%34.5 min64% / 78%

Contributions Demonstrate the need to consider the entire end-to-end system Identify system policies that can benefit from application-level knowledge –Scheduling (Sampling): for optimization MESs –User steering: for MESs with subjective goals and MES with high design space dimensionality –Result reuse: for MESs made up of similar executions of simulation code –Resource allocation: for MESs made up of parallel simulation codes

Contributions Demonstrate with prototype system –API to import relevant application-knowledge Quantify the benefits of application-aware techniques –Sampling: orders of magnitude improvement in bridge design and defibrillator study; 33% improvement in Helium model validation study –User steering: enable interactivity in animation design study and defibrillator study –Result reuse: multi-fold improvement in bridge design, defibrillator, and helium model validation studies –Application-aware resource allocation: multi-fold improvement in Helium model validation study