JSSPP-11, Boston, MA June 19, 2005 1 Pitfalls in Parallel Job Scheduling Evaluation Designing Parallel Operating Systems using Modern Interconnects Pitfalls.

Slides:



Advertisements
Similar presentations
ISE480 Sequencing and Scheduling Izmir University of Economics ISE Fall Semestre.
Advertisements

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Chapter 5 CPU Scheduling. CPU Scheduling Topics: Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling.
Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
The Forgotten Factor: FACTS on Performance Evaluation and its Dependence on Workloads Dror Feitelson Hebrew University.
Scheduling in Batch Systems
OS Spring ’ 04 Scheduling Operating Systems Spring 2004.
Chapter 6: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Chapter 6: CPU Scheduling Basic.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
Performance Evaluation
What we will cover…  CPU Scheduling  Basic Concepts  Scheduling Criteria  Scheduling Algorithms  Evaluations 1-1 Lecture 4.
By Group: Ghassan Abdo Rayyashi Anas to’meh Supervised by Dr. Lo’ai Tawalbeh.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
July 13, “How are Real Grids Used?” The Analysis of Four Grid Traces and Its Implications IEEE Grid 2006 Alexandru Iosup, Catalin Dumitrescu, and.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Lecture 5 Operating Systems.
OPERATING SYSTEMS CPU SCHEDULING.  Introduction to CPU scheduling Introduction to CPU scheduling  Dispatcher Dispatcher  Terms used in CPU scheduling.
Load Balancing in Distributed Computing Systems Using Fuzzy Expert Systems Author Dept. Comput. Eng., Alexandria Inst. of Technol. Content Type Conferences.
Lecture 2 Process Concepts, Performance Measures and Evaluation Techniques.
1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.
CPU S CHEDULING Lecture: Operating System Concepts Lecturer: Pooja Sharma Computer Science Department, Punjabi University, Patiala.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 5: CPU Scheduling.
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
Process A program in execution. But, What does it mean to be “in execution”?
Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University.
Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.
1 11/29/2015 Chapter 6: CPU Scheduling l Basic Concepts l Scheduling Criteria l Scheduling Algorithms l Multiple-Processor Scheduling l Real-Time Scheduling.
Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.
Chapter 5: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Chapter 5: CPU Scheduling Basic.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
1 CS.217 Operating System By Ajarn..Sutapart Sappajak,METC,MSIT Chapter 5 CPU Scheduling Slide 1 Chapter 5 CPU Scheduling.
QoPS: A QoS based Scheme for Parallel Job Scheduling M. IslamP. Balaji P. Sadayappan and D. K. Panda Computer and Information Science The Ohio State University.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Chapter 8 System Management Semester 2. Objectives  Evaluating an operating system  Cooperation among components  The role of memory, processor,
Simulation. Types of simulation Discrete-event simulation – Used for modeling of a system as it evolves over time by a representation in which the state.
Operating Systems Scheduling. Scheduling Short term scheduler (CPU Scheduler) –Whenever the CPU becomes idle, a process must be selected for execution.
Chapter 4 CPU Scheduling. 2 Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.
Lecture 4 CPU scheduling. Basic Concepts Single Process  one process at a time Maximum CPU utilization obtained with multiprogramming CPU idle :waiting.
CPU scheduling.  Single Process  one process at a time  Maximum CPU utilization obtained with multiprogramming  CPU idle :waiting time is wasted 2.
Basic Concepts Maximum CPU utilization obtained with multiprogramming
OPERATING SYSTEMS CS 3502 Fall 2017
CPU SCHEDULING.
Dan C. Marinescu Office: HEC 439 B. Office hours: M, Wd 3 – 4:30 PM.
Dror Feitelson Hebrew University
Chapter 6: CPU Scheduling
Chapter 6: CPU Scheduling
CPU Scheduling G.Anuradha
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 5: CPU Scheduling
Operating System Concepts
3: CPU Scheduling Basic Concepts Scheduling Criteria
Chapter5: CPU Scheduling
TDC 311 Process Scheduling.
Chapter 5: CPU Scheduling
Chapter 6: CPU Scheduling
CPU SCHEDULING.
Chapter 5: CPU Scheduling
Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Presentation transcript:

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Designing Parallel Operating Systems using Modern Interconnects Pitfalls in Parallel Job Scheduling Evaluation Eitan Frachtenberg and Dror Feitelson Computer and Computational Sciences Division Los Alamos National Laboratory Ideas that change the world

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Scope Numerous methodological issues occur with the evaluation of parallel job schedulers: Experiment theory and design Workloads and applications Implementation issues and assumptions Metrics and statistics Paper covers 32 recurring pitfalls, organized into topics and sorted by severity Talk will describe a real case study, and the heroic attempts to avoid most such pitfalls …as well as the less-heroic oversight of several others

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Evaluation Paths Theoretical Analysis (queuing theory): Reproducible, rigorous, and resource-friendly Reproducible, rigorous, and resource-friendly û Hard for time slicing due to unknown parameters, application structure, and feedbacks Simulation: Relatively simple and flexible Relatively simple and flexible û Many assumptions, not all known/reported; hard to reproduce; rarely factors application characteristics Experiments with real sites and workloads: Most representative (at least locally) Most representative (at least locally) û Largely impractical and irreproducible Emulation

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Emulation Environment Experimental platform consisting of three clusters with high-end network Software: several job scheduling algorithms implemented on top of STORM: Batch / space sharing, with optional EASY backfilling Gang Scheduling, Implicit Coscheduling (SB), Flexible Coscheduling Results described in [JSSPP’03] and [TPDS’05]

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Step One: Choosing Workload Static vs. Dynamic Size of workload How many different workloads are needed? Use trace data? Different sites have different workload characteristics Inconvenient sizes may require imprecise scaling “Polluted” data, flurries Use model-generated data? Several models exist, with different strengths By trying to capture everything, may capture nothing

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Static Workloads We start with a synthetic application & static workloads Simple enough to model, debug, and calibrate Bulk-synchronous application Can control: granularity, variability and Communication pattern

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Synthetic Scenarios Balanced Complementing Imbalanced Mixed

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Example: Turnaround Time

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Dynamic Workloads We chose Lublin’s model [JPDC’03] 1000 jobs per workload Multiplying run-times AND arrival times by constant to “shrink” run time (2-4 hours) Shrinking too much is problematic (system constants) Multiplying arrival times by a range of factors to modify load Unrepresentative, since deviates from “real” correlations with run times and job sizes. Better solution is to use different workloads

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Synthetic applications are easy to control, but: Some characteristics are ignored (e.g., I/O, memory) Others may not be representative, in particular communication, which is salient of parallel apps. Granularity, pattern, network performance If not sure, conduct sensitivity analysis Might be assumed malleable, moldable, or with linear speedup, which many MPI applications are not Real applications have no hidden assumptions But may also have limited generality Step Two: Choosing Applications

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Example: Sensitivity Analysis

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Application Choices Synthetic applications on first set Allows control over more parameters Allows testing unrealistic but interesting conditions (e.g., high multiprogramming level) LANL applications on second set (Sweep3D, Sage) Real memory and communication use (MPL=2) Important applications for LANL’s evaluations But probably only for LANL… Runtime estimate: f-model on batch, MPL on others

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Step Three: Choosing Parameters What are reasonable input parameters to use in the evaluation? Maximum multiprogramming level (MPL) Timeslice quantum Input load Backfilling method and effect on multiprogramming Run time estimate factor (not tested) Algorithm constants, tuning, etc.

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Example 1: MPL Verified with different offered loads

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Example 2: Timeslice Dividing to quantiles allows analysis of effect on different job types

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Considerations for Parameters Realistic MPLs Scaling traces to different machine sizes Scaling offered load Artificial user estimates and multiprogramming estimates

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Step Four: Choosing Metrics Not all metrics are easily comparable: Absolute times, slowdown with time slicing, etc. Metrics may need to be limited to a relevant context Use multiple metrics to understand characteristics Measuring utilization for an open model Direct measure of offered load till saturation Same goes for throughput and makespan Better metrics: slowdown, response time, wait time Using mean with asymmetric distributions Inferring scalability from O(1) nodes

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Example: Bounded Slowdown

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Example (continued)

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Response Time

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Bounded Slowdown

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Step Five: Measurement Never measure saturated workloads When arrival rate is higher than service rate, queues grow to infinity; all metrics become meaningless …but finding saturation point can be tricky Discard warm-up and cool-down results May need to measure subgroups separately (long/short, day/night, weekday/weekend,…) Measurement should still have enough data points for statistical meaning, especially workload length

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Example: Saturation Point

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Example: Shortest jobs CDF

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Example: Longest jobs CDF

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Conclusion Parallel Job Scheduling Evaluation is complex …but we can avoid past mistakes Paper can be used as a checklist to work with when designing and executing evaluations Additional information in paper: Pitfalls, examples, and scenarios Suggestions on how to avoid pitfalls Open research questions (for next JSSPP?) Many references to positive examples Be cognizant when Choosing your compromises

JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation References Workload archive: Contains several workload traces and models Dror’s publication page Eitan’s publication page