A Statistical Scheduling Technique for a Computational Market Economy Neal Sample Stanford University.

Slides:



Advertisements
Similar presentations
Motorola General Business Use MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are.
Advertisements

Strategic Decisions (Part II)
Hadi Goudarzi and Massoud Pedram
Logistics Network Configuration
Solutions for Scheduling Assays. Why do we use laboratory automation? Improve quality control (QC) Free resources Reduce sa fety risks Automatic data.
A Comprehensive Model for Arbitrary Result Extraction Neal Sample, Gio Wiederhold Stanford University Dorothea Beringer Hewlett-Packard.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
Progress in Linear Programming Based Branch-and-Bound Algorithms
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
Resource Management of Highly Configurable Tasks April 26, 2004 Jeffery P. HansenSourav Ghosh Raj RajkumarJohn P. Lehoczky Carnegie Mellon University.
PZ13B Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ13B - Client server computing Programming Language.
CHAIMS: Compiling High-level Access Interfaces for Multisite Software Neal Sample Stanford University.
A Heuristic Bidding Strategy for Multiple Heterogeneous Auctions Patricia Anthony & Nicholas R. Jennings Dept. of Electronics and Computer Science University.
April 1999 CHAIMS1 Prof. Gio Wiederhold, Dr. Dorothea Beringer, Composing Autonomous Internet Services with CHAIMS CHAIMS Objective: Using and composing.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Scheduling Under Uncertainty: Planning for the Ubiquitous Grid Neal Sample Pedram Keyani Gio Wiederhold Stanford University.
1RADAR – Scheduling Task © 2003 Carnegie Mellon University RADAR – Scheduling Task May 20, 2003 Manuela Veloso, Stephen Smith, Jaime Carbonell, Brett Browning,
Megamodules domain expert writes megaprogram for composition CHAIMS automizes generation of client for distributed system megamodule provider provides.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Chapter 1 and 2 Computer System and Operating System Overview
January 1999 CHAIMS1 Objectives C H A I M S CLAM CPAM Scheduling ESTIMATE EXTRACT Provide high-level, composition-only language (or graphical front-end)
Workload Management Massimo Sgaravatto INFN Padova.
February 1999 CHAIMS1 Prof. Gio Wiederhold, Dr. Dorothea Beringer, several Ph.D. and master students Stanford University
A Local Facility Location Algorithm Supervisor: Assaf Schuster Denis Krivitski Technion – Israel Institute of Technology.
January 1999 CHAIMS1. January 1999 CHAIMS2 CHAIMS: Compiling High-level Access Interfaces for Multi-site Software CHAIMS Stanford University Objective:
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
New Challenges in Cloud Datacenter Monitoring and Management
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
SensIT PI Meeting, January 15-17, Self-Organizing Sensor Networks: Efficient Distributed Mechanisms Alvin S. Lim Computer Science and Software Engineering.
A Budget Constrained Scheduling of Workflow Applications on Utility Grids using Genetic Algorithms Jia Yu and Rajkumar Buyya Grid Computing and Distributed.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
NSF Critical Infrastructures Workshop Nov , 2006 Kannan Ramchandran University of California at Berkeley Current research interests related to workshop.
Integrating Fine-Grained Application Adaptation with Global Adaptation for Saving Energy Vibhore Vardhan, Daniel G. Sachs, Wanghong Yuan, Albert F. Harris,
Distributed Network Scheduling Bradley J. Clement, Steven R. Schaffer Jet Propulsion Laboratory, California Institute of Technology Contact:
Cluster Reliability Project ISIS Vanderbilt University.
Living markets ® living agents ® Adaptive Execution in Business Networks January 21 st, 2002.
Autonomous Replication for High Availability in Unstructured P2P Systems Francisco Matias Cuenca-Acuna, Richard P. Martin, Thu D. Nguyen
1 Network Coding and its Applications in Communication Networks Alex Sprintson Computer Engineering Group Department of Electrical and Computer Engineering.
Network Instruments VoIP Analysis. VoIP Basics  What is VoIP?  Packetized voice traffic sent over an IP network  Competes with other traffic on the.
Probabilistic Reasoning for Robust Plan Execution Steve Schaffer, Brad Clement, Steve Chien Artificial Intelligence.
1 University of Maryland Linger-Longer: Fine-Grain Cycle Stealing in Networks of Workstations Kyung Dong Ryu © Copyright 2000, Kyung Dong Ryu, All Rights.
DISCERN: Cooperative Whitespace Scanning in Practical Environments Tarun Bansal, Bo Chen and Prasun Sinha Ohio State Univeristy.
Issues in (Financial) High Performance Computing John Darlington Director Imperial College Internet Centre Fast Financial Algorithms and Computing 4th.
9-1 Chapter 9 Project Scheduling Chapter 9 Project Scheduling McGraw-Hill/Irwin Copyright © 2005 by The McGraw-Hill Companies, Inc. All rights reserved.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Accommodating Bursts in Distributed Stream Processing Systems Yannis Drougas, ESRI Vana Kalogeraki, AUEB
Dzmitry Kliazovich University of Luxembourg, Luxembourg
Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Unit – I Presentation. Unit – 1 (Introduction to Software Project management) Definition:-  Software project management is the art and science of planning.
NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 27 – Media Server (Part 2) Klara Nahrstedt Spring 2009.
Distributed Systems Lecture 5 Time and synchronization 1.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Workload Management Workpackage
Computing and Compressive Sensing in Wireless Sensor Networks
Grid Computing.
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
CHAIMS January 1999 CHAIMS.
Parallel Programming in C with MPI and OpenMP
Overview: Chapter 2 Localization and Tracking
Presentation transcript:

A Statistical Scheduling Technique for a Computational Market Economy Neal Sample Stanford University

UCSC Research Interests Compositional Computing (GRID) Reliability and Quality of Service Value-based and model-based mediation Languages: “Programming for the non-programmer expert” Database Research Semistructured indexing and storage Massive table/stream compression Approximate algorithms for streaming data

UCSC Why We’re Here Coding Integration/Composition

UCSC GRID: Commodity Computing

UCSC GRID: Commodity Computing

UCSC GRID: Commodity Computing On Demand High Throughput Collaborative Distributed Supercomputing Data Intensive (Large Hadron Collider) (Computer-in-the-loop) (FightAIDSAtHome, Nug30) (Chip design, cryptography) (Data exploration, Education)

UCSC Remote, autonomous Services are not free Fee ($), execution time 2 nd order dependencies “Open Service Model” Principles:GRID, CHAIMS Protocols:UDDI, IETF SLP Runtime:Globus, CPAM Composition of Large Services

UCSC Grid Life is Tough Increased complexity throughout New tools and applications Diverse resources such as computers, storage media, networks, sensors Programming Control flow & data flow separation Service mediation Infrastructure Resource discovery, brokering, monitoring Security/authorization Payment mechanisms

UCSC Our GRID Contributions Programming models and tools System architecture Resource management Instrumentation and performance analysis Network protocols and infrastructure Service mediation

UCSC Other GRID Research Areas The nature of applications Algorithms and problem solving methods Security, payment/escrow, reputation End systems Programming models and tools System architecture Resource management Instrumentation and performance analysis Network protocols and infrastructure Service mediation

UCSC Roadmap Brief introduction to CLAM language Some related scheduling methods Surety-based scheduling Sample program Monitoring Rescheduling Results A few future directions

UCSC Decomposition of CALL-statement Parallelism by asynchrony in sequential program Reduction of complexity of invoke statements Control of new GRID requirements (estimation, trading, brokering, etc.) Abstract out data flow Mediation for data flow control and optimization Extraction model mediation Purely compositional No primitives for arithmetic No primitives for input/output Targets the “non-programmer expert” CLAM Composition Language

UCSC Pre-invocation: SETUP: set up the connection to a service SET-, GETPARAM: in a service ESTIMATE: service cost estimation Invocation and result gathering: INVOKE EXAMINE: test progress of an invoked method EXTRACT: extract results from an invoked method Termination: TERMINATE: terminate a method invocation/connection to a service CLAM Primitives

UCSC Resources + Scheduling Computational Model Multithreading Automatic parallelization Resource Management Process creation OS signal delivery OS scheduling end system

UCSC Resources + Scheduling Computational Model Synchronous communication Distributed shared memory Resource Management Parallel process creation Gang scheduling OS-level signal propagation cluster end system

UCSC Resources + Scheduling Computational Model Client/server Loosely synchronous: pipelines IWIM Resource Management Resource discovery Signal distribution networks cluster intranet end system

UCSC Resources + Scheduling Computational Model Collaborative systems Remote control Data mining Resource Management Brokers Trading Mobile code negotiation cluster intranet end system Internet

UCSC Scheduling Difficulties Adaptation: Repair and Reschedule Schedules for T 0 are only guesses Estimates for multiple stages may become invalid => Schedules must be revised during runtime t0t0 t finish schedule work reschedulehazard work TIME

UCSC Scheduling Difficulties Service Autonomy: No Resource Allocation The scheduler does not handle resource allocation Users observe resources without control Means: Competing objectives have orthogonal scheduling techniques Changing goals for tasks or users means vastly increased scheduling complexity

UCSC Some Related Work R A M Q Rescheduling Autonomy of Services Monitoring Execution QoS, probabilistic execution

UCSC Some Related Work R A M Q Rescheduling Autonomy of Services Monitoring Execution QoS, probabilistic execution PERT Q A M

UCSC Some Related Work R A M Q Rescheduling Autonomy of Services Monitoring Execution QoS, probabilistic execution PERT Q A M CPM M R A

UCSC Some Related Work R A M Q Rescheduling Autonomy of Services Monitoring Execution QoS, probabilistic execution ePERT(AT&T) Condor (Wisconsin) M R Q PERT Q A M CPM M R A

UCSC Some Related Work R A M Q Rescheduling Autonomy of Services Monitoring Execution QoS, probabilistic execution ePERT(AT&T) Condor (Wisconsin) M R Q PERT Q A M CPM M R A Mariposa (UCB) R Q A

UCSC Some Related Work R A M Q Rescheduling Autonomy of Services Monitoring Execution QoS, probabilistic execution ePERT(AT&T) Condor (Wisconsin) M R Q Mariposa (UCB) R Q A PERT Q A M CPM M R A SBS (Stanford) R Q A M

UCSC Sample Program C A D B

UCSC Budgeting Time Maximum allowable execution time Expense Funding available to lease services Surety Goal: schedule probability of success Assessment technique

UCSC Program Schedule as a Template Instantiated at runtime Service provider selection, etc. C A D B C C C C C A A A A B B B B B D D D D

UCSC Program Schedule as a Template Instantiated at runtime Service provider selection, etc. C A D B C C C C C A A A A B B B B B D D D D

UCSC Program Schedule as a Template Instantiated at runtime Service provider selection, etc. C A D B C C C C C A A A A B B B B B D D D D

UCSC Program Schedule as a Template Instantiated at runtime Service provider selection, etc. C A D B C C C C C A A A A B B B B B D D D D

UCSC t 0 Schedule Selection Guided by runtime “bids” Constrained by budget C A D B C C C C C A A A A B B B B B D D D D 7±2h $50 6±1h $40 5±2h $30 3±1h $30

UCSC t 0 Schedule Constraints Budget Time: upper bound- e.g. 22h Cost: upper bound- e.g. $250 Surety:lower bound- e.g. 90% {Time, Cost, Surety} ={22, 250, 90} Steered by user preferences/weights = Selection S1 est [20, 150, 90] = (22-20)*10 + ( )*1 + (90-90)*5 = 120 S2 est [22, 175, 95] = (22-22)*10 + ( )*1 + (95-90)*5 = 100 S3 est [18, 190, 96] = (22-18)*10 + ( )*1 + (96-90)*5 = 130

UCSC budget time budget cost Budget User Pref. Pareto Search Space Expected Program Execution Time Expected Program Cost 0 0 Plans

UCSC Program Evaluation and Review Technique Service times: most likely(m), optimistic(a) and pessimistic(b) and ; N(0, 1)  (1) expected duration (service) (2) standard deviation (3) expected duration (program) (4) test value (5) expectation test (6) ~expectation test

UCSC t 0 Complete Schedule Properties Probability Density Probable Program Completion Time deadlineBank = $100 user specified surety

UCSC Individual Service Properties C A B 7±2h 6±1h 5±2h 010 ~finish time probability density

UCSC probable finish time 0 1 t 0 Combined Service Properties 010 ~finish time probability density Deadline (22h) Surety (90%) Current Surety (99.6%) probability density

UCSC Tracking Surety surety % probability density User-specified surety

UCSC Runtime Hazards With control over resource allocation or without runtime hazards Scheduling becomes much easier Runtime implies t 0 schedule invalidation Sample hazards Delays and slowdowns Stoppages Inaccurate estimations Communication loss Competitive displacement… OSM

UCSC Progressive Hazard execution time minimum surety  hazard 90 surety % Definition + Detection serviceA start serviceB start (serviceB slow)

UCSC Catastrophic Hazard execution time minimum surety  hazard 90 surety % Definition + Detection 0% serviceA start serviceB start (serviceB fails)

UCSC Pseudo-Hazard execution time minimum surety  pseudo-hazard 90 surety % Definition + Detection serviceA start serviceB start (serviceB communication failure) 0%

UCSC Monitoring + Repair Observe, not control Complete set of repairs Sufficient (not minimal) Simple cost model: early termination = linear cost recovery Greedy selection of single repair -O(s*r) C A D B

UCSC Schedule Repair execution time  t hazard 90 surety % C A D B t repair

UCSC Strategy 0: baseline (no repair) pro:no additional $ cost pro:ideal solution for partitioning hazards con:depends on self-recovery execution time  t hazard 90 surety % t repair C A D B

UCSC Strategy 1: service replacement pro:reduces $ lost con:lost investment of $ and time con:concedes recovery chance execution time  t hazard 90 surety % C A D B t repair B’

UCSC Strategy 2: service duplication pro:larger boost surety; leverages recovery chance con:large $ cost execution time  t hazard 90 surety % C A D B t repair B’

UCSC Strategy 3: pushdown repair pro:cheap, no $ lost pro:no time lost con:cannot handle catastrophic hazards con:requires recovery chance execution time  t hazard 90 surety % C A D B t repair C’ x

UCSC Experimental Results Rescheduling options Baseline: no repairs Single strategy repairs Limits flexibility and effectiveness Use all strategies Setup 1000 random DAG schedules, 2-10 services 1-3 hazards per execution Fixed service availability All schedules are repairable

UCSC “The Numbers” What is the value of a close finish? (   late)

UCSC “The Numbers” What is the value of a close finish? (   late)

UCSC Why the Differences? Catastrophic hazard Service provider failure - “do nothing”: no solution to hazard Pseudo-hazard Communication failure, network partition Looks exactly like catastrophic hazard - “do nothing” : the ideal solution Slowdown hazard Not a complete failure, multiple solutions - “do nothing”: ideal or futile or acceptable

UCSC A Challenge Observations of progress are only secondary indicators of current work rate projected finish  finish time projected finish

UCSC Open Questions Simultaneous rescheduling Use more than one strategy for a hazard NP to find the optimal solution NP here might not be that hard… Approximations are acceptable Small set Strong constraints NP is worst case, not average case? (e.g., DFBB search) Global impact of local schedule preferences How do local preferences interact in/reshape the global market?

UCSC Open Questions Monitoring resolution adjustments Networks are not free or zero latency Account cost of monitoring Frequent monitoring = more cost Frequent monitoring = greater accuracy Unstudied effect delayed status information Accuracy of t 0 service cost estimates Model as a hazard with delayed detection “1-way hazard” Penalty adjustments

UCSC Deeper Questions User preferences only used in generating initial (t 0 ) schedule fixed least cost repair (  =  surety / repair cost) Best cost repair (success sensitive to preference?) Second order cost effects $ left over in budget is purchasing power What is the value of that purchasing power? Sampling for cost estimates during runtime surety = time + progress (+ budgetBalance/valuation)

UCSC Conclusions Novel statistical method for service scheduling Effective strategies for varied hazard mix Achieves per-user-defined Quality of Service Should translate well “out of the sandbox” Clear directions for continued research More information

UCSC

UCSC Steps in Scheduling Estimation Planning Invocation Monitoring Completion Rescheduling

UCSC CHAIMS Scheduler Program Analyzer Input program Planner Requirements Estimator/ Bidder MonitorDispatcher StatusCosts/TimesControl observeinvokehaggle User Requirements (e.g., Budget)

UCSC Simplified Cost Model on time target start/run finish + data transportation costs + Completing the cost model

UCSC Full Cost Model client ready to start hold fee lateearlyon time target start/run reservation finish client ready for data data transportation costs + Completing the cost model

UCSC The Eight Fallacies of Distributed Computing -- Peter Deutsch 1. The network is reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn't change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous