Application-Aware Management of Parallel Simulation Collections Siu-Man Yau, New York University Steven G. Parker

Slides:



Advertisements
Similar presentations
National Institute of Advanced Industrial Science and Technology Ninf-G - Core GridRPC Infrastructure Software OGF19 Yoshio Tanaka (AIST) On behalf.
Advertisements

Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community.
Optimization of Radio resources Krishna Chaitanya Kokatla.
The Big Picture Scientific disciplines have developed a computational branch Models without closed form solutions solved numerically This has lead to.
SLA-Oriented Resource Provisioning for Cloud Computing
Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Resource Containers: A new Facility for Resource Management in Server Systems G. Banga, P. Druschel,
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Application Architecture T H E S O C R A T E S G R O U P, I N C.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Enabling Interactive Multi-Experiment Computational Studies through a Permeable Runtime System-Application Interface Ph.D. Thesis Proposal Siu Yau Jun.
Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.
ProActive Task Manager Component for SEGL Parameter Sweeping Natalia Currle-Linde and Wasseim Alzouabi High Performance Computing Center Stuttgart (HLRS),
Sim-X: Parallel System Software for Interactive Multi-Experiment Computational Studies Siu-Man Yau, New York University Eitan Grinspun.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Chapter 8 Operating System Support
Informationsteknologi Tuesday, October 9, 2007Computer Systems/Operating Systems - Class 141 Today’s class Scheduling.
On the Task Assignment Problem : Two New Efficient Heuristic Algorithms.
Using Application-Domain Knowledge in the Runtime Support of Multi-Experiment Computational Studies Siu Yau Dissertation Defense, Dec 08.
Result Reuse in Design Space Exploration: A Study in System Support for Interactive Parallel Computing Siu-Man Yau, New York University.
New Challenges in Cloud Datacenter Monitoring and Management
Jim Basney Computer Sciences Department University of Wisconsin-Madison Managing Network Resources in.
A l a p a g o s : a generic distributed parallel genetic algorithm development platform Nicolas Kruchten 4 th year Engineering Science (Infrastructure.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
The material in this presentation is the property of Fair Isaac Corporation. This material has been provided for the recipient only, and shall not be used,
 Scheduling  Linux Scheduling  Linux Scheduling Policy  Classification Of Processes In Linux  Linux Scheduling Classes  Process States In Linux.
An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms Authous: Al’ecio P. D. Binotto, Carlos.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Exploiting Data Parallelism in SELinux Using a Multicore Processor Bodhisatta Barman Roy National University of Singapore, Singapore Arun Kalyanasundaram,
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
1 Distributed Process Scheduling: A System Performance Model Vijay Jain CSc 8320, Spring 2007.
Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
Grid Computing at The Hartford Condor Week 2008 Robert Nordlund
The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.
BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.
1 Andreea Chis under the guidance of Frédéric Desprez and Eddy Caron Scheduling for a Climate Forecast Application ANR-05-CIGC-11.
1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003.
Advanced Spectrum Management in Multicell OFDMA Networks enabling Cognitive Radio Usage F. Bernardo, J. Pérez-Romero, O. Sallent, R. Agustí Radio Communications.
Mehmet Can Kurt, The Ohio State University Gagan Agrawal, The Ohio State University DISC: A Domain-Interaction Based Programming Model With Support for.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)
*Partially funded by the Austrian Grid Project (BMBWK GZ 4003/2-VI/4c/2004) Making the Best of Your Data - Offloading Visualization Tasks onto the Grid.
LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics.
Parallelizing Video Transcoding Using Map-Reduce-Based Cloud Computing Speaker : 童耀民 MA1G0222 Feng Lao, Xinggong Zhang and Zongming Guo Institute of Computer.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,
Faucets Queuing System Presented by, Sameer Kumar.
Energy-Aware Resource Adaptation in Tessellation OS 3. Space-time Partitioning and Two-level Scheduling David Chou, Gage Eads Par Lab, CS Division, UC.
Static Process Scheduling
FATCOP: A Mixed Integer Program Solver Michael FerrisQun Chen University of Wisconsin-Madison Jeffrey Linderoth Argonne National Laboratories.
WebFlow High-Level Programming Environment and Visual Authoring Toolkit for HPDC (desktop access to remote resources) Tomasz Haupt Northeast Parallel Architectures.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Millions of Jobs or a few good solutions …. David Abramson Monash University MeSsAGE Lab X.
An Exact Algorithm for Difficult Detailed Routing Problems Kolja Sulimma Wolfgang Kunz J. W.-Goethe Universität Frankfurt.
NGS computation services: APIs and.
Zeta: Scheduling Interactive Services with Partial Execution Yuxiong He, Sameh Elnikety, James Larus, Chenyu Yan Microsoft Research and Microsoft Bing.
Architecture for Resource Allocation Services Supporting Interactive Remote Desktop Sessions in Utility Grids Vanish Talwar, HP Labs Bikash Agarwalla,
CHaRy Software Synthesis for Hard Real-Time Systems
Jacob R. Lorch Microsoft Research
Processes and Threads Processes and their scheduling
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Algorithm Design
Management of Virtual Execution Environments 3 June 2008
Parallel Programming in C with MPI and OpenMP
Mattan Erez The University of Texas at Austin
CSE 1020:Software Development
Parallel Programming in C with MPI and OpenMP
Operating System Overview
Presentation transcript:

Application-Aware Management of Parallel Simulation Collections Siu-Man Yau, New York University Steven G. Parker University of Utah Kostadin Damevski University of Utah Vijay Karamcheti New York University Denis Zorin New York University

Multi-Experiment Studies Computational studies require multiple runs of a simulation software

Multi-Experiment Studies Existing (batch-based) systems treat each execution as a ‘black box’: –Issue one simulation at a time Application-aware system: –Schedule collection of simulations as a whole –Use application-specific knowledge for scheduling and resource allocation decisions Application-awareness brings 4X improvement in response time

Outline Example MES: Helium Model Validation Evaluation platform: SimX System Application-specific considerations –Parallel overhead, Sampling, Result reuse, Malleability Application-Driven Scheduling and Resource Allocation Strategies Conclusion

Helium Model Validation Gas mixing model for fire simulation “Knobs” on model: –Prandtl number –Smagorinsky constant –Grid resolution –Inlet Velocity –etc... To validate: compare Vs real-life experiment

Helium Model Validation Measure velocity profile from real-life experiment Pick two “knobs” –Prandtl number –Inlet Velocity Run simulated experiments Find the combination that match the profile at both heights

Helium Model Validation Pareto Frontier - set of inputs that cannot be improved in all objectives

Evaluation platform: SimX System support for Interactive Multi- Experiment Studies (SIMECS) View computational study as a whole For parallel, distributed clusters –Workers (Simulation code & Evaluation code) –Manager (UI, Sampler, Resource Allocator) –Spatially-Indexed Shared Object Layer (SISOL)

SISOL API Front-end Manager Process Worker Process Pool User Interface: Visualisation & Interaction Sampler Resource Allocator FUEL Interface SISOL Server Pool Data Server Dir Server Task Queue Simulation code FUEL Interface Evaluation code Evaluation platform: SimX

Application-Awareness Decision: How many processes for each task? Application-specific considerations –Minimize parallelization overhead: concurrent tasks, low parallelism –Sampling strategy: task dependency: serial tasks, high parallelism –Reuse opportunities: maximize “reusable” work: serial tasks, high parallelism –Malleability: claim idle resource as beneficial Work against each other

Consideration: Parallel Overhead Parallel overhead from communications, load-imbalance, etc. Minimize per-task parallelism Many concurrent tasks, each using a small number of processes

Consideration: Sampling Active sampling: incorporated search algorithm Introduced data dependency Schedule runs from coarse to fine grid, use coarse level results to ID promising regions 1 st Level 2 nd Level3 rd Level

Consideration: Result Reuse Helium code terminates when KE stabilizes Start from another checkpoint –stabilizes in half the time Must have same inlet velocities (reuse classes)

Application-awareness Naïve approach: Assign one worker per task –Eliminate per-task parallelization overhead –Does not maximize reuse and sampling efficiency –Left over “holes” Naïve approach: Assign one task at a time to all workers –Maximize reuse potential and sampling efficiency –Maximize parallelization overhead Application-aware approach: Batching –Groups of tasks allowed to be concurrently executed

SISOL API Front-end Manager Process Worker Process Pool User Interface: Visualisation & Interaction Sampler Resource Allocator FUEL Interface SISOL Server Pool Data Server Dir Server Task Queue Simulation code FUEL Interface Evaluation code Simulation Container TaskQueue::AddTask(Experiment) TaskQueue:: CreateBatch(set &) TaskQueue::GetIdealGroupSize() Reconfigure(const int* assignment) Solution: Application-awareness

Naïve Approach Response time = 12 hr 35 mins Idle workers

Batch for Sampling Identify independent experiments in sampler Max. parallelism while allowing active sampling First Batch 1 st Pareto-Optimal Second Batch 1 st & 2 nd Pareto Opt. 3 rd Batch 1 st to 3 rd Pareto Opt. 4 rd Batch Pareto Frontier Prantl Number Inlet Velocity

Batch for Sampling Response time = 6 hrs 10 mins 1 st Batch 2 nd Batch 3 rd Batch 4 th Batch

Batch for Result Reuse Sub-divide each batch into 2 smaller batches: –1 st sub-batch: first in reuse class; no two belong to same reuse class –No two concurrent from- scratch experiments can reuse each other’s checkpoints (max. reuse potential) –Experiments in same batch have comparable run times (reduce holes) Prantl Number Inlet Velocity

Batch for Result Reuse Total time: 5 hr 10 mins 1 st Batch 2 nd Batch 3 rd Batch 4 th Batch 5 th Batch 6 th Batch

Preemption Helium code is malleable: –Restart a checkpointed run on different number of workers Preemption system: –Manager stores a database of idle workers in SISOL –Workers uses application knowledge to determine if it should claim idle workers –Manager creates new worker group by adding idle workers to group –Manager restarts the simulation on new group

Preemption Total time: 4 hr 30 mins 1 st Batch 2 nd Batch 3 rd Batch 4 th Batch 5 th Batch 6 th Batch

Evaluation: Resource Allocation Knowledge used Total timeUtilization Rate Avg. time per run Improvement None (run on 1 worker) 12 hr 35 min56.3%6 hr 17 minN/A None (run 1 experiment) 20 hr 35 min100%34.3 minN/A + Active Sampling 6 hr 10 min71.1%63.4 min51% / 70% + Reuse classes 5 hr 10 min71.3%39.7 min59% / 75% + Preemption4 hr 30 min91.8%34.5 min64% / 78%

Related Work Scheduling Policies on traditional batch systems: –Fair Share –Dynamic Re-partitioning –Affinity Scheduling Multi-Processor Scheduling (MPS) Problem –Theoretical results for various heuristics Grid-based parameter sweep infrastructures –Nimrod, Condor, Globus, NetSolve, Virtual Instrument

Conclusion Application-awareness yields up to 4+ times improvement in response time Conclusions: –View from application level important –Domain knowledge important –System API and infrastructure to exploit domain knowledge important Task Queue API for batching SISOL & Resource Allocator API for pre-emption