Scheduling Under Uncertainty: Planning for the Ubiquitous Grid Neal Sample Pedram Keyani Gio Wiederhold Stanford University.

Slides:



Advertisements
Similar presentations
Project management.
Advertisements

Chapter 7 Managing Risk.
QoS-based Management of Multiple Shared Resources in Dynamic Real-Time Systems Klaus Ecker, Frank Drews School of EECS, Ohio University, Athens, OH {ecker,
Chapter 7 Project Management
Design of Experiments Lecture I
Hadi Goudarzi and Massoud Pedram
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Solutions for Scheduling Assays. Why do we use laboratory automation? Improve quality control (QC) Free resources Reduce sa fety risks Automatic data.
 Chapter 6: Activity Planning – Part 1 NET481: Project Management Afnan Albahli.
A Comprehensive Model for Arbitrary Result Extraction Neal Sample, Gio Wiederhold Stanford University Dorothea Beringer Hewlett-Packard.
Meta-Level Control in Multi-Agent Systems Anita Raja and Victor Lesser Department of Computer Science University of Massachusetts Amherst, MA
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
CHAIMS: Compiling High-level Access Interfaces for Multisite Software Neal Sample Stanford University.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 1 Slide 1 Project management.
A Heuristic Bidding Strategy for Multiple Heterogeneous Auctions Patricia Anthony & Nicholas R. Jennings Dept. of Electronics and Computer Science University.
April 1999 CHAIMS1 Prof. Gio Wiederhold, Dr. Dorothea Beringer, Composing Autonomous Internet Services with CHAIMS CHAIMS Objective: Using and composing.
Software project management (intro)
January 1999 CHAIMS1 Objectives C H A I M S CLAM CPAM Scheduling ESTIMATE EXTRACT Provide high-level, composition-only language (or graphical front-end)
February 1999 CHAIMS1 Prof. Gio Wiederhold, Dr. Dorothea Beringer, several Ph.D. and master students Stanford University
Cross-Layer Application-Specific WSN Design over SS-Trees -Prepared by Amy.
1 SOFTWARE PRODUCTION. 2 DEVELOPMENT Product Creation Means: Methods & Heuristics Measure of Success: Quality f(Fitness of Use) MANAGEMENT Efficient &
January 1999 CHAIMS1. January 1999 CHAIMS2 CHAIMS: Compiling High-level Access Interfaces for Multi-site Software CHAIMS Stanford University Objective:
A Statistical Scheduling Technique for a Computational Market Economy Neal Sample Stanford University.
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
Copyright © , Software Engineering Research. All rights reserved. Creating Responsive Scalable Software Systems Dr. Lloyd G. Williams Software.
Lean Six Sigma: Process Improvement Tools and Techniques Donna C. Summers © 2011 Pearson Higher Education, Upper Saddle River, NJ All Rights Reserved.
Quality Assurance in the clinical laboratory
Face Alignment Using Cascaded Boosted Regression Active Shape Models
HIT241 - TIME MANAGEMENT Introduction
Client Logo LEAN ENTERPRISE Implementation Workshop.
Project management DeSiaMore 1.
1 Reading Report 9 Yin Chen 29 Mar 2004 Reference: Multivariate Resource Performance Forecasting in the Network Weather Service, Martin Swany and Rich.
Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.
CS 501: Software Engineering Fall 1999 Lecture 16 Verification and Validation.
Quick Recap Monitoring and Controlling. 2 Control Project Cost.
Appendix A Project Management: Process, Techniques, and Tools.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
L14. Fair networks and topology design D. Moltchanov, TUT, Spring 2008 D. Moltchanov, TUT, Spring 2015.
ANTs PI Meeting, Nov. 29, 2000W. Zhang, Washington University1 Flexible Methods for Multi-agent distributed resource Allocation by Exploiting Phase Transitions.
1 Chapter 5 Project management. 2 Project management : Is Organizing, planning and scheduling software projects.
Engineering, 7th edition. Chapter 5 Slide 1 Project management.
임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.
Probabilistic Reasoning for Robust Plan Execution Steve Schaffer, Brad Clement, Steve Chien Artificial Intelligence.
“WHY ARE PROJECTS ALWAYS LATE?” (“and what can the Project Manager DO about that?) Craig Henderson, MBA, PMP ARVEST Bank Operations.
©Ian Sommerville 2000Software Engineering, 6th edition. Chapter 4 Slide 1 Project management l Organising, planning and scheduling software projects.
9-1 Chapter 9 Project Scheduling Chapter 9 Project Scheduling McGraw-Hill/Irwin Copyright © 2005 by The McGraw-Hill Companies, Inc. All rights reserved.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 5 Slide 1 Project management.
Chapter 3 Project Management Chapter 3 Project Management Organising, planning and scheduling software projects.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
1 Project management. 2 Topics covered Management activities Project planning Project scheduling Risk management.
CSC480 Software Engineering Lecture 5 September 9, 2002.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Competitive Scheduling in Wireless Networks with Correlated Channel State Ozan.
Accommodating Bursts in Distributed Stream Processing Systems Yannis Drougas, ESRI Vana Kalogeraki, AUEB
Learning Simio Chapter 10 Analyzing Input Data
Information System Project Management.  Some problems that org faced with IS dev efforts include schedule delays, cost overrun, less functionality than.
Chap 4. Project Management - Organising, planning and scheduling
Static WCET Analysis vs. Measurement: What is the Right Way to Assess Real-Time Task Timing? Worst Case Execution Time Prediction by Static Program Analysis.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Project Management. Projects and Project Managers Project – a [temporary] sequence of unique, complex, and connected activities having one goal or purpose.
Project management 1/30/2016ICS 413 – Software Engineering1.
Unit – I Presentation. Unit – 1 (Introduction to Software Project management) Definition:-  Software project management is the art and science of planning.
(M) Chapter 12 MANGT 662 (A): Procurement, Logistics and Supply Chain Design Purchasing and Supply Chain Analysis (1/2)
Network Weather Service. Introduction “NWS provides accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing.
Recoverable Service Parts Inventory Problems -Ibrahim Mohammed IE 2079.
1 Project management Organising, planning and scheduling software projects.
Activity Planning.
CHAIMS January 1999 CHAIMS.
Presentation transcript:

Scheduling Under Uncertainty: Planning for the Ubiquitous Grid Neal Sample Pedram Keyani Gio Wiederhold Stanford University

Coordination Why We’re Here Coding Integration/Composition

Coordination Sample Composition Tasks Logistics Reservation and distribution systems, “find the best transportation route from A to B” Genomics Framework for composing various processing tools and repositories Modeling Weather prediction, complex chemical systems, basin modeling Composition of services (vs. components, data)

Coordination Remote, autonomous Services are not free Fee (£) Execution time Open Service Model GRID – principles UDDI, IETF SLP – protocols Globus, CPAM – runtime support Composition of Large Services

Coordination Service Scheduling Goals Closest to Soft Real-time, Job Shop Objectives Minimize transaction time Minimize transaction cost Differences No control over service availability No control over resource allocation No control over workplace loads => Schedules become inaccurate

Coordination New Scheduling Requirements Why not traditional scheduling (e.g., CSP)? Runtime performance changes More than just scheduling: rescheduling in the face of runtime hazards Why not traditional rescheduling? No resource allocation/control “Observe, not control”

Coordination Scheduling Difficulties Adaptation: Schedules must be adaptive Schedules for T 0 are only guesses Estimates for multiple stages may become invalid => Schedules must be revised during runtime Allocation: The scheduler does not handle resource allocation Means: Competing objectives have orthogonal scheduling techniques Changing goals for tasks or users means vastly increased scheduling complexity

Coordination Sample Program //sample program BEGIN out1 = serviceA() out2 = serviceB(out1) out3 = serviceC(out2) out4 = serviceD(out2) END //declarative C A D B

Coordination Budgeting Time Maximum allowable execution time Expense Total resources available to lease services Surety Schedule confidence Goal and assessment technique

Coordination Program Schedule as a Template Instantiated at runtime Service provider selection, etc. D A C B D D D D D A A A A B B B B B C C C C

Coordination Program Schedule as a Template Instantiated at runtime Service provider selection, etc. D A C B D D D D D A A A A B B B B B C C C C

Coordination Steps in Scheduling Estimation Planning Invocation Monitoring Completion Rescheduling

Coordination CHAIMS Scheduler Program Analyzer Input program Planner Requirements Estimator/ Bidder MonitorDispatcher StatusCosts/TimesControl observeinvokehaggle Budget

Coordination t 0 Schedule Selection Guided by runtime “bids” Constrained by budget D A C B D D D D D A A A A B B B B B C C C C 7±2h £50 6±1h £40 5±2h £30 3±1h £30

Coordination t 0 Schedule Constraints Budget Time: upper bound- e.g. 22h Cost: upper bound- e.g. £250 Surety:lower bound- e.g. 90% {22, 250, 90} Steered by user preferences/weights = Selection (single value convolution) S1 est [20, 150, 90] = (22-20)*10 + ( )*1 + (90-90)*5 = 120 S2 est [22, 175, 95] = (22-22)*10 + ( )*1 + (95-90)*5 = 100 S3 est [18, 190, 96] = (22-18)*10 + ( )*1 + (96-90)*5 = 130

Coordination Program Evaluation and Review Technique (PERT) Service times: most likely(m), optimistic(a) and pessimistic(b) and ; N(0, 1)  (1) expected duration (service) (2) standard deviation (3) expected duration (program) (4) test value (5) expectation test (6) ~expectation test

Coordination t 0 Schedule Properties Probability Density Probable Completion Time deadlineBank = £100 surety

Coordination Runtime Hazards With resource allocation or without hazards Scheduling becomes trivial Runtime implies t 0 schedule invalidation Sample hazards Delays and slowdowns Stoppages Inaccurate estimations Communication loss Competitive displacement… OSM

Coordination Definition + Detection execution time minimum surety  hazard 90 surety % PROGRESSIVE HAZARD serviceA start serviceB start (serviceB slow)

Coordination Definition + Detection execution time minimum surety  hazard 90 surety % CATASTROPHIC HAZARD 0% serviceA start serviceB start (serviceB fails)

Coordination Monitoring Observe, not control CPAM runtime support Parameter presetting ESTIMATE(…) primitive for service cost Used a t 0 and t reschedule Service progress EXAMINE(…) primitive Used with PERT to detect surety hazards C A D B

Coordination Schedule Repair Simple cost model: early termination = linear £ recovery Greedy selection of single repair – O(s*r) execution time  t hazard 90 surety % C A D B t repair

Coordination Strategy 1: service replacement Pro: minimize £ lost Pro: boost surety Con: lost investment of £ and time Con: concedes recovery chance execution time  t hazard 90 surety % C A D B t repair B’

Coordination Strategy 2: service duplication Pro: large boost surety Pro: leverages recovery chance Con: large £ cost execution time  t hazard 90 surety % C A D B t repair B’

Coordination Strategy 3: pushdown repair Pro: cheap, no £ lost Pro: no time lost Con: cannot handle all hazard types, e.g. catastrophic hazards Con: requires recovery chance execution time  t hazard 90 surety % C A D B t repair C’ x

Coordination Strategy 4: do nothing/bail-out Pro: no additional £ cost Pro: ideal solution for partitioning hazards Con: generally non-effective Con: depends on self-recovery execution time  t hazard 90 surety % t repair C A D B

Coordination Experimental Results Rescheduling options Limit repair options to one strategy Limits flexibility and effectiveness Use all strategies Setup 1000 random DAG schedules, 2-10 services 1-3 hazards per execution Fixed service availability All schedules are recoverable

Coordination “The Numbers” Value of close finishes? (!= 100% surety)

Coordination Why the Differences? Catastrophic hazard Service provider failure - Cannot be solved by “do nothing” Pseudo-hazard Communication failure, network partition Looks exactly like catastrophic hazard Can’t terminate for £ recovery - Appropriate solution is “do nothing” Slowdown hazard (actual or apparent) Not a complete failure, multiple solutions - “do nothing” may be ideal or futile

Coordination A Fundamental Weakness Observations of progress are only secondary indicators of current work rate projected finish  finish time

Coordination Open Questions Mundane issues Taxonomy of hazard/solution combinations Vary service provider densities Monitor resolution adjustments Networks are not free or zero latency Unstudied effect delayed status information Pseudo-hazards What is a good amount of delay to avoid them? (without getting into deeper trouble…) Accuracy of t 0 service cost estimates ~hazard with delayed detection 1-way hazard

Coordination (Deeper) Open Questions User preferences only used in generating initial (t 0 ) schedule fixed least cost repair (  =  surety / repair cost) Best cost repair (success sensitive to preference?) Second order cost effects £ left over in budget is purchasing power What is the value of that purchasing power? Sampling for cost estimates during runtime Surety = time + progress (+ budget balance) Penalty regimes

Coordination (Deeper) Open Questions Simultaneous rescheduling Use more than one strategy for a hazard NP – reduction to Hamiltonian Path NP here might not be that hard… Approximations are acceptable Small set Strong constraints NP is worst case, not average case…

Coordination (Deeper) Open Questions on time target start/run finish + data transportation costs + Completing the cost model

Coordination (Deeper) Open Questions client ready to start hold fee lateearlyon time target start/run reservation finish client ready for data data transportation costs + Completing the cost model

Coordination Conclusions Initial results given artificial hazards Seemingly effective rescheduling strategies Difficult to characterize the solutions Should translate well out of the sandbox and into an actual runtime Clear directions for continued research Project home