Dynamic Mapping of Activation Trees Thesis Proposal January 29, 1998

Slides:

Advertisements

Similar presentations

Advertisements

Distributed Systems CS

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.

Towards Feasibility Region Calculus: An End-to-end Schedulability Analysis of Real- Time Multistage Execution William Hawkins and Tarek Abdelzaher Presented.

Project 4 U-Pick – A Project of Your Own Design Proposal Due: April 14 th (earlier ok) Project Due: April 25 th.

Dynamic Mapping of Activation Trees Thesis Proposal January 29, 1998 Peter A. Dinda Committee David O’Hallaron (chair) Thomas Gross Peter Steenkiste Jaspal.

Responsive Interactive Applications by Dynamic Mapping of Activation Trees February 20, 1998 Peter A. Dinda School of Computer.

OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.

High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.

Performance Evaluation

Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.

1 Dong Lu, Peter A. Dinda Prescience Laboratory Computer Science Department Northwestern University Virtualized.

OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.

A Prediction-based Real-time Scheduling Advisor Peter A. Dinda Prescience Lab Department of Computer Science Northwestern University

Spanning Tree and Multicast. The Story So Far Switched ethernet is good – Besides switching needed to join even multiple classical ethernet networks Routing.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Bug Localization with Machine Learning Techniques Wujie Zheng

Scheduling policies for real- time embedded systems.

1 Scheduling The part of the OS that makes the choice of which process to run next is called the scheduler and the algorithm it uses is called the scheduling.

Cpr E 308 Spring 2005 Process Scheduling Basic Question: Which process goes next? Personal Computers –Few processes, interactive, low response time Batch.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

Static Process Scheduling

Parallel Programming in Chess Simulations Part 2 Tyler Patton.

Lecture 4 Page 1 CS 111 Summer 2013 Scheduling CS 111 Operating Systems Peter Reiher.

Test Loads Andy Wang CIS Computer Systems Performance Analysis.

OPERATING SYSTEMS CS 3502 Fall 2017

Algorithms and Problem Solving

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Exploratory Decomposition Dr. Xiao Qin Auburn.

Jacob R. Lorch Microsoft Research

Dan C. Marinescu Office: HEC 439 B. Office hours: M, Wd 3 – 4:30 PM.

Debugging Intermittent Issues

Distributed Processors

Rule Induction for Classification Using

Chapter 5a: CPU Scheduling

Wayne Wolf Dept. of EE Princeton University

Dynamic Graph Partitioning Algorithm

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Data Partition Dr. Xiao Qin Auburn University.

Debugging Intermittent Issues

Chapter 2 Scheduling.

Software Architecture in Practice

#01 Client/Server Computing

Main Memory Management

Congestion Control, Internet transport protocols: udp

Chapter 6: CPU Scheduling

Chapter 6: CPU Scheduling

Games with Chance Other Search Algorithms

Tools of Software Development

Chapter 5: CPU Scheduling

Announcements Homework 3 due today (grace period through Friday)

Indexing and Hashing Basic Concepts Ordered Indices

Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle

The Medium Access Control Sublayer

Other time considerations

Chapter 6: CPU Scheduling

CPU SCHEDULING.

Networked Real-Time Systems: Routing and Scheduling

CPU scheduling decisions may take place when a process:

COMP60621 Fundamentals of Parallel and Distributed Systems

Lecture 2 Part 3 CPU Scheduling

Creating Computer Programs

Chapter 6: CPU Scheduling

Database System Architectures

Algorithms for Selecting Mirror Sites for Parallel Download

Creating Computer Programs

Chapter 6: CPU Scheduling

COMP60611 Fundamentals of Parallel and Distributed Systems

Operating Systems Concepts

#01 Client/Server Computing

Presentation transcript:

Dynamic Mapping of Activation Trees Thesis Proposal January 29, 1998 Peter A. Dinda Committee David O’Hallaron (chair) Thomas Gross Peter Steenkiste Jaspal Subhlok David Bakken (BBN)

Outline Responsive interactive applications Best effort real-time service Dynamic mapping problem History-based prediction approach Other approaches and related work Current results on simplified problem Proposed thesis work Responsive interactive applications Best effort real-time service to bound execution time of activation trees Local history-based prediction of application and environment to dynamically map nodes of activation trees to different hosts in order to achieve bounds Compelling preliminary results on simplified problem Extend approach to full problem

Interactive Application Model Feedback Message Handler Message mouse_click() Aperiodic User Action Activation tree dynamically unfolds - we don’t know its structure until we are finished executing the message handler Feedback helps determine the user’s next action. Make it clear that the tree is sequentially executed, and dynamically unfolds Activation tree

Acoustic Room Modeling impulse responses Physical Simulation of Wave Eqn Speakers Modify model Frequency response plots It’s a design application keep listener position steady and manipulate room to equalize the frequency response It’s a virtualized reality application listener can wander around space listening Split into two parts - freq response plots, and then the audio output Make feedback loop point on freq resp.

Other Applications Image editing Computer aided design The Adobe Photoshop universe Computer aided design Quake design optimization (Malcevic-97) Computational steering CUMULUS (Geist-96), CAVE (Disz-95), ... Collaboration Collaborative Planning (Zinky-DUTC-95), ... Securities trading (Wolfe-Lau-95) Games DIS (DIS-94) I have spent some time thinking about image editing vast images - should have slide to make this clear Quake DO - can think of ARM as a model problem for it Securities trading of basket of goods, or multiple exchange arbitrage Games are relatively clear

Responsiveness Timely feedback to individual user actions Bound: response time £ tmax Jitter bound and resource usage hint Bound: response time ³ tmin Example: image editor drawing tool Image editing drawing: needs to befast, non-jittery, but also doesn’t have to be faster than the user can perceive (or the monitor can repaint) Image editing - free hand drawing is at perceptual limit, operation on large image is not, but still useful to bound it. ARM - ideally want immediate audible response to movement/change. Realistically it will take longer. As compute rates increase the acceptable wait time may decline. “No surprises” tmin for resource conservation - system will probably provide itself some slack in meeting the tmax deadline, tmin gives it a hint as to how much slack it should tmin is also a hint to the USER - you will wait at least this long, but no longer than this As opposed to generalized utility functions (time constraint functions)

A Best Effort Real-time Service MAP procedure() IN [tmin, tmax] Execute the activation tree rooted at procedure() so that tmin£texec£tmax No guarantees Responsiveness spec: bounds [tmin,tmax] Performance metric: fraction of trees that meet their bounds Programmer responsibility is limited - no specification of tree under procedure(), or, etc, etc. Bounds need only be known at the run-time call to MAP Programmer can, of course, find out if bounds were met for a specific call, and can also interrogate the service about how it is doing Have separate slide to show where this service would reside in a distributed object system Soft rt - deterministic guarantees, statistical guarantees BERT also could be called NON-DETERMINISTIC SOFT REAL-TIME NOT A CONTRADICTION IN TERMS BEST EFFORT IS NOT NO EFFORT The service is aware of and attempts to meet your real-time requirements for task execution time, but can make no guarantees that they will be met.

Machine Model Hosts on a LAN Remote execution facility No centralized or coordinated scheduling, or reservations Other unrelated traffic exists We are only a user Remote execution facility Can execute any procedure on any host RPC, DSM, DCE, CORBA, DCOM, ... Measurable - at least a good real-time clock exists (<1ms) The point here is that we want to provide this service on a common sort of environment - the kind of environment most people have access too. The point is the lack of assumptions. We use the remote execution facility and the measurement capability to Good real-time means better than typical Unix gettimeofday() - most res already haver 1ms or lower overheads and most unix gettimeofdays() only give ms accuracy - better ex is alpha or intel cycle counters Should be clear that using the remote execution facility may mean the programmer has to rewrite. However, this is becoming more common with e.g., CORBA, DCOM, Java, so it’s not completely unreasonable to expect this. Probably not a huge deal I like the idea of piggybacking on existing glue code, like rpc and do stubs

Execution Model [tmin,tmax] Dynamically map nodes of the unfolding activation tree to the hosts At each procedure call, choose which host is best suited to execute the call in order to meet the bounds on the tree Another way to look at this is that we allow the thread of execution to migrate at procedure call points

Dynamic Mapping Problem How do we map the nodes of the trees to the hosts so that the fraction of trees that satisfy their bounds is maximized? We are given a sequence of aperiodically arriving activation trees, where each tree has associated bounds [tmin,tmax] and its structure becomes known only as we sequentially execute it. We can execute each node on any one of a group of uncoordinated but measurable hosts interconnected with a measurable network. The hosts and network can both carry unrelated computation and communication traffic. How do we map the nodes of the trees to the hosts so that the fraction of trees that execute within their desired bounds is maximized?

Aspects of My Approach History-based prediction Decomposition of bounds Adaptation of mapping algorithms during tree traversal

History-based Prediction foo() [tmin,tmax] time, duration, bounds ... time, duration, bounds [t’min,t’max] time, duration, bounds H0 ... H1 bar() time, duration, bounds time, duration, bounds ... foo() is executing on H0 and calls bar(), which can be mapped to H0, H1, or H2 time, duration, bounds H2 H0 has a local history of execution times of bar() on each of the other hosts For each host, H0 predicts whether it can meet the bounds, based on past local history and then chooses one where it is possible Execution times include both communication for remote call and the actual computation Cost model fits in here - how to choose “one where it it is possible” if possible on several hosts - “cheapest”

Decomposition of Bounds foo() [tmin,tmax] partially executed, known [t’min,t’max]? bar() unexecuted, known unexecuted, unknown unexecuted, unknown Choice of [t’min,t’max] for bar() depends on unvisited portion of the tree Collect history of what fraction of time spent in foo() subtree was spent in bar() subtree Choose fraction of bounds to give to bar() based on that history and current time Note that this is a ONE POSSIBLE SOLUTION. THE POINT is that we want to divide the deadline on a subtree among its children according to the past history.

Adaptation of Mapping Algorithms During Tree Traversal Tune strategy to how deep we are in the tree and how far along in the traversal Explore more aggresively early in the traversal, when the effect of a bad decision is easiest to overcome Find interesting new hosts Spend less time making mapping decision deep in the tree More likely to remain on single host

Thesis Statement Dynamic mapping of activation trees using history-based prediction and traversal-based adaptation is an effective way to build a best effort real-time service for responsive interactive applications running in conventional environments.

Other Approaches Distributed soft real-time system System modifications Dynamic load balancing system Different goals Even distribution of load (OS-centric) Minimization of exec times (app-centric) Resource reservation system Shared measurement system Information level and dissemination Distributed soft real-time require system modifications - limits scope mediation versus competition system-centric vs application centric Dynamic load balancing Different goals as above Resource reservations system modifications translation of app requirements to resources may be hard Shared measurement “load monitor” delayed information, granularity, etc. lower level information harder to use

Current Results Load trace collection and analysis Algorithms and evaluation for simplified dynamic mapping problem

Algorithms and Evaluation for Simplified Problem Map only leaf nodes Ignore communication for I=1 to N do MAP leaf_procedure() IN [tmin,tmax] end Looked at cases where leaf_procedure always does some amount of work and cases where it varies according to different distributions.

RangeCounter(W): A Near Optimal Algorithm Each host has a quality level Q and a window of the last W execution times (W is small) Choose host with highest quality level, and age quality levels of all hosts: Q=Q-1 If bounds are met, increase host’s quality level by the inverse of our confidence in it: If bounds are not met, reduce host’s quality level by half: Q=Q/2

Load Trace-based Simulation Exec time computed from load trace using a simple, validated model Mapping algorithms are given bounds, select a host, then are told exec time Simulator computes performance of Algorithm under test Optimal (precognizant) algorithm Random mapping Individual host mappings

Scope of Evaluation 9 mapping algorithms 6 different groups of hosts Chosen from 39 hosts 1 week, 1 Hz load trace from each host 648 different cases Combinations of nominal time and bounds 100,000 calls for each case I will show group of 8 PSC hosts, four mean load of ~1, four with mean load of ~0.2

upper bound set at tnominal - tightest bounds Random - mapping decisions matter Optimal is quite high Choosing the best host and remaining there is suboptimal Point out spread between random and optimal and best host and optimal

And here’s our near-optimal alg. Note it is dead-on-optimal up to 1s Notice that we could possibly improve for tnominal>1s by making multiple decisions (migrate at calls, if available) Reserve slide shows all algs

Effect of relaxing bound tmax for tnominal=100ms, very similar for other tnominals Notice that best ind host would require 50% looser bound to catch up with optimal. Other algs similar.

And, again, rangecounter tracks optimal Other algs take signicant extra slack to catch up with rc, behave eratically, or are very expensive Reserve slide shows all algs

Proposed Work Extend current results to the full dynamic mapping problem Extend simulation environment to include communication and activation trees Trace collection (Activation trees, network) A trace for everything Trace characterization Simulator extension Develop algorithm Evaluate with benchmarks Incorporate into real system Should be clear that the first two are the core items. The idea is to built as realistic a simulation environment as possible so that as much work can be done in it as possible.

Activation Tree Traces Collect activation trees where each node is annotated with compute time, and what data it references Goal is to instrument off-the-shelf MS Windows programs Other options exist This approach would give access to a LARGE group of interactive applications with CLEAR credibility. IMAGE EDITING - PHOTOSHOP GAMES COULD SAY IF THIS IS A USEFUL THJNG FOR EXISTING APPLICATIONS WITHOUT HAVING TO REWRITE THEM TO BE DISTRIBUTED Note that building interactive apps for distributed systems is a relatively new idea, so there are few such programs that can be pulled easily off the shelf. Further, many of the existing programs are really proprietary Make it clear that there are fallback positions Contributions: Instrumentation tools/methodology, Activation tree trace database

Network Traces Realistic communication times Packet traces on Ethernet with tcpdump Simple broadcast networks seem too limiting Remos Existing trace databases Tension between getting lots of low level, interesting data on simple, less interesting networks or potentially less data on more interesting networks such as ATM or switched ethernet Can’t easily just sniff something with a switch Contributions: Methodology, Trace database

Trace Characterization Classify traces into families, from which we can draw benchmarks for evaluation Ideally, parameterized models to fit data Characterizing activation tree traces most challenging Contributions: Trace analysis, Models, Classification scheme, Benchmark suite

Simulator Extension Extend my existing simulator to support arbitrary activation trees and realistic communication Communication time model Contributions: Simulator infrastructure for full dynamic mapping problem

Algorithm Development Use approaches described earlier Extend RangeCounter(W) with a separate algorithm to recursively divide bounds among subtrees Iterative development using simulator and benchmarks With the simulation infrastructure and benchmarks, it will be possible to iterate the algorithm quickly and thus (hopefully) converge on a reasonable solution quickly. From past experience, analytical approaches to algorithm development don’t work well - the behavior of even load is quite complex. One has to find a solution and then attempt to understand it. Contributions: Algorithm(s)

Evaluation Evaluate algorithm in simulation Draw connections between benchmark characteristics and algorithm performance Compare with other approaches Load monitor with simple heuristic Greater degrees of information sharing Incorporate into distributed object system as proof of concept Contributions: Evaluation, Working system

Thesis Timeline