Performance Prediction Engineering Francine Berman U. C. San Diego Rich Wolski U. C. San Diego and University of Tennessee This presentation will probably.

Slides:



Advertisements
Similar presentations
Network Weather Service Sathish Vadhiyar Sources / Credits: NWS web site: NWS papers.
Advertisements

Configuration management
Configuration management
Load Balancing Parallel Applications on Heterogeneous Platforms.
Part IV: Memory Management
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications Piyush Shivam, Shivnath Babu, Jeffrey Chase Duke University.
A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
Self-Adapting Scheduling for Tasks with Dependencies in Stochastic Environments Ioannis Riakiotakis, Florina M. Ciorba, Theodore Andronikos and George.
Finding, Evaluating, and Using Numeric Data [IMS 201, Statistical Literacy] [Electronic Data Center] This presentation will probably involve audience discussion,
Achieving Application Performance on the Information Power Grid Francine Berman U. C. San Diego and NPACI This presentation will probably involve audience.
Project Status Group B-4 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these.
Tools for Engineering Analysis of High Performance Parallel Programs David Culler, Frederick Wong, Alan Mainwaring Computer Science Division U.C.Berkeley.
AppLeS, NWS and the IPG Fran Berman UCSD and NPACI Rich Wolski UCSD, U. Tenn. and NPACI This presentation will probably involve audience discussion, which.
Achieving Application Performance on the Computational Grid Francine Berman This presentation will probably involve audience discussion, which will create.
Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.
The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing, Rich Wolski, Neil Spring, and Jim Hayes, Journal.
Information & Computer Science Dept.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
AppLeS / Network Weather Service IPG Pilot Project FY’98 Francine Berman U. C. San Diego and NPACI Rich Wolski U.C. San Diego, NPACI and U. of Tennessee.
GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Information and Scheduling: What's available and how does it change Jennifer M. Schopf Argonne National Lab.
Slides 13b: Time-Series Models; Measuring Forecast Error
CSE 160/Berman Programming Paradigms and Algorithms W+A 3.1, 3.2, p. 178, 5.1, 5.3.3, Chapter 6, 9.2.8, , Kumar Berman, F., Wolski, R.,
University of Maryland Automatically Adapting Sampling Rates to Minimize Overhead Geoff Stoker.
Load Balancing Dan Priece. What is Load Balancing? Distributed computing with multiple resources Need some way to distribute workload Discreet from the.
Loads Balanced with CQoS Nicole Lemaster, Damian Rouson, Jaideep Ray Sandia National Laboratories Sponsor: DOE CCA Meeting – January 22, 2009.
Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.
Achieving Application Performance on the Grid: Experience with AppLeS Francine Berman U. C., San Diego This presentation will probably involve audience.
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
Computer Science Program Center for Entrepreneurship and Information Technology, Louisiana Tech University This presentation will probably involve audience.
Parallel Tomography Shava Smallen CSE Dept. U.C. San Diego.
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Sam Uselton Center for Applied Scientific Computing Lawrence Livermore National Lab October 25, 2001 Challenges for Remote Visualization: Remote Viz Is.
Development Timelines Ken Kennedy Andrew Chien Keith Cooper Ian Foster John Mellor-Curmmey Dan Reed.
1 Logistical Computing and Internetworking: Middleware for the Use of Storage in Communication Micah Beck Jack Dongarra Terry Moore James Plank University.
1 Extend is a simulation tool to create models quickly, with all the blocks you need and without even having to type an equation. You can use a series.
Common Set of Tools for Assimilation of Data COSTA Data Assimilation Summer School, Sibiu, 6 th August 2009 COSTA An Introduction Nils van Velzen
Mid Term Report Integrated Framework, Visualization and Analysis of Platforms This presentation will probably involve audience discussion, which will create.
Software Testing Reference: Software Engineering, Ian Sommerville, 6 th edition, Chapter 20.
Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University.
Automatic Statistical Evaluation of Resources for Condor Daniel Nurmi, John Brevik, Rich Wolski University of California, Santa Barbara.
Adaptive Computing on the Grid Using AppLeS Francine Berman, Richard Wolski, Henri Casanova, Walfredo Cirne, Holly Dail, Marcio Faerman, Silvia Figueira,
GYTE - Bilgisayar Mühendisliği Bölümü Bilgisayar Mühendisliği Bölümü GYTE - Bilgisayar Mühendisliği Bölümü AN ARCHITECTURE FOR NEXT GENERATION MIDDLEWARE.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Application-level Scheduling Sathish S. Vadhiyar Credits / Sources: AppLeS web pages and papers.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
Parallel Tomography Shava Smallen SC99. Shava Smallen SC99AppLeS/NWS-UCSD/UTK What are the Computational Challenges? l Quick turnaround time u Resource.
Network Weather Service. Introduction “NWS provides accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing.
The Evaluation Matrix: When & Where & Who & How Thomas Shuster, Ph.D. Todd Braeger, M.S. Spectrum Consulting / Safe Food Institute 1770 North Research.
LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium.
An operating system (OS) is a collection of system programs that together control the operation of a computer system.
Resource Characterization Rich Wolski, Dan Nurmi, and John Brevik Computer Science Department University of California, Santa Barbara VGrADS Site Visit.
Achieving Application Performance on the Computational Grid Francine Berman U. C. San Diego and NPACI This presentation will probably involve audience.
Jacob R. Lorch Microsoft Research
Dynamo: A Runtime Codesign Environment
Archiving and Document Transfer Utilities
Introduction to Load Balancing:
Resource Characterization
Continuous Random Variables
Grid Computing.
Final Project Presentation
واشوقاه إلى رمضان مرحباً رمضان
Data Mining Practical Machine Learning Tools and Techniques
Evaluation of Data Fusion Methods Using Kalman Filtering and TBM
Continuous Random Variables
Interactive Visual System
Integrated Cryptographic Network Interface Controller
Presentation transcript:

Performance Prediction Engineering Francine Berman U. C. San Diego Rich Wolski U. C. San Diego and University of Tennessee This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered.

The Computational Grid “Computer” may consist of –computational sites –dist. databases –remote instruments –visualization –distinct networks Computer = ensemble of resources

Grid Programs Grid programs –may couple distributed and dissimilar resources –may incorporate tasks with different implementations –may adapt to dynamic resource load

Performance Models for Grid Programs Grid applications may couple dissimilar resources –models must accommodate heterogeneity Grid applications may incorporate tasks with different implementations –model must accommodate multiple task models Grid applications may adapt to dynamic resource load –models must allow for dynamic parameters

Compositional Models Grid programs can be represented as a composition of tasks “Tasks” consist of relevant performance activities Model parameters may reflect performance variations of grid –may be parameterized by time

Using Grid Performance Models Compositional models particularly useful for grid application scheduling Application schedulers use performance prediction models to –select resources –estimate potential performance of candidate schedules –compare possible schedules

AppLeS = Application-Level Scheduler NWS User Prefs App Perf Model Planner Resource Selector Application Act. Grid infrastructure and resources

Partitionings Block partitioning Compile-time non- uniform strip partitioning AppLeS dynamic strip partitioning

Application Scheduling Jacobi2D Dynamic information key to leveraging deliverable performance from the Grid environment

Performance is Time-Dependent Jacobi2D AppLeS (strip) vs. Block partitioning

Schedulers and Performance Models Predictions may be used at different levels of accuracy –predictions can be “engineered” Knowing something about a prediction can make it more useful –performance range of predictions may provide additional information –meta-information about predictions can improve schedules

Performance Prediction Engineering Performance Prediction Engineering (PPE) System is a methodology for modeling performance in dynamic Grid environments 3 Components: –Structural performance prediction models –Quantitative meta-information –Dynamic Forecasting

Structural Models Top-level Model = performance equation –describes composition of application within a specific time frame (performance grammar) Component models –represent application performance activities (nonterminals) Model parameters –represent system or application values (terminals)

Example: Modeling the Performance of SOR Regular, iterative computation 5 point stencil Divided into a red phase and a black phase 2D grid of data divided into strips Targeted to WS cluster

SOR Structural Model SOR performance equation SOR component models { RComp(p,t), RComm(p,t), BComp(p,t), BComm(p,t)}

SOR Component Models Dynamic Parameters FracAvailCPU(p,t), BWAvail(x,y,t)

Single-User Experiments Question: How well does the SOR model predict performance in a single- user cluster? Platform heterogeneous Sparc cluster 10 Mbit ethernet connection quiescent machines and network Prediction within 3% before memory spill

Dedicated Platform Experiments What happens when other users share the system?

Non-dedicated SOR Experiments

Improving Predictions Many parameters represent values which vary over time Range of behavior of time-dependent parameters represented by distributions Structural models can be extended to accommodate stochastic parameters and render stochastic predictions

Stochastic Predictions Stochastic predictions capture range of possible behavior

Stochastic Structural Models Stochastic predictions Structural Model stochastic and point-valued parameters component models “quality” of performance prediction (lifetime, accuracy, overhead)

Stochastic SOR Performance Model FracAvailCPU, BWAvail given by stochastic parameters Network Weather Service improved to provide better performance information First cut: consider stochastic parameters which can adequately be represented by normal distributions –normal distributions make math tractable

Experiments with Multi-user Systems Platform –Sun workstation cluster –10Mbit ethernet –experiments run in lab environment with additional generated load Experiments run back-to-back for multiple trials

SOR Stochastic Parameters BWAvail FracAvailCPU

Data stays within single mode Data changes modes

“Single-mode” Experiments All values captured by stochastic predictions Maximum absolute error between means and actual values is 10%

“ Multiple Mode” Experiments 80% of actual values captured by stochastic prediction Max discrepancy between stochastic prediction and actual values is 14% Max absolute error between means and actual values is 39%

The Next Step What if performance range of parameters cannot be adequately represented by normal distributions? –Can we identify distributions for model parameters? –Can we combine non-normal distributions efficiently? Is the math tractable? –Can we use empirical data to determine performance ranges if distributions cannot be identified?

Using PPE for Application Scheduling Basic Strategy: Develop structural model for application Use stochastic parameters to provide information about performance range Use profiling to determine desired level of accuracy for component models Use stochastic prediction and meta-information to develop application schedule

Scheduling with Meta-Information Stochastic predictions provide information about range of behavior Stochastic predictions and meta-information provide additional information for schedulers P1 P2 P3 Execution Time

Quality of Information Meta-information = Quality of Information SOR stochastic predictions provide a measure of accuracy Other qualitative measures are possible –lifetime –overhead –complexity Quality of Information attributes can be used to improve scheduling

Preliminary Experiments: Application Scheduling with PPE Simple scheduling scenario: SOR with strip decomposition Scheduling strategies adjust strip size to minimize execution time Multi-user cluster –machines connected by 10 Mbit ethernet –available CPU on at least half of the machines is multi-modal with data changing between modes frequently

Adjusting Strip Size Time balancing used to determine strip size Set all T(p,t) equal and solve for NumElts(p,t’)

Scheduling Strategies Mean –data assignments determined using mean (point-valued) application execution estimates Conservative –data adjusted so that machines with high- variance application execution estimates receive less work –goal is to reduce penalty of being wrong

Preliminary Scheduling Results Conservative scheduling strategy misses big spikes, but is sometimes too conservative.

Research Directions Quality of Information (QoIn) –How can we develop useful mechanisms for obtaining and quantifying performance meta- information? –How do we combine different QoIn measures? –How can QoIn measures enhance scheduling? Contingency Scheduling –Can we develop schedules which adapt dynamically during execution?

More Research Directions Performance-enhanced Tools –Netsolve enhanced with NWS and AppLeS scheduling methodology Performance contracts –How should performance information be exchanged and brokered in grid systems? –How can we develop “grid-aware” programs?

Project Information Thanks to Dr. Darema and DARPA for support and very useful feedback. Performance Prediction Engineering Home Page: apples/PPE/index.html PPE team: Jennifer Schopf, Neil Spring, Alan Su, Fran Berman, Rich Wolski

Up Next: Rich Wolski Dynamic Forecasting for Performance Prediction Engineering with the Network Weather Service