Interactive and semiautomatic performance evaluation W. Funika, B. Baliś M. Bubak, R. Wismueller.

Slides:



Advertisements
Similar presentations
Threads, SMP, and Microkernels
Advertisements

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Enabling Access to Sound Archives through Integration, Enrichment and Retrieval WP1. Project Management.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
An Automata-based Approach to Testing Properties in Event Traces H. Hallal, S. Boroday, A. Ulrich, A. Petrenko Sophia Antipolis, France, May 2003.
CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.
The CrossGrid project Juha Alatalo Timo Koivusalo.
System Design and Analysis
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
Parallel Programming Models and Paradigms
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
© 2006 Pearson Addison-Wesley. All rights reserved2-1 Chapter 2 Principles of Programming & Software Engineering.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
What is Software Architecture?
Introduction to Parallel Processing 3.1 Basic concepts 3.2 Types and levels of parallelism 3.3 Classification of parallel architecture 3.4 Basic parallel.
Introduction to Systems Analysis and Design Trisha Cummings.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Computer Architecture Parallel Processing
Computer System Architectures Computer System Software
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
German National Research Center for Information Technology Research Institute for Computer Architecture and Software Technology German National Research.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
An Introduction to Software Architecture
CONTENTS Arrival Characters Definition Merits Chararterstics Workflows Wfms Workflow engine Workflows levels & categories.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
SOFTWARE DESIGN (SWD) Instructor: Dr. Hany H. Ammar
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Testing Workflow In the Unified Process and Agile/Scrum processes.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Software Development Cycle What is Software? Instructions (computer programs) that when executed provide desired function and performance Data structures.
OMIS Approach to Grid Application Monitoring Bartosz Baliś Marian Bubak Włodzimierz Funika Roland Wismueller.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Information Systems Engineering. Lecture Outline Information Systems Architecture Information System Architecture components Information Engineering Phases.
1 M. Tudruj, J. Borkowski, D. Kopanski Inter-Application Control Through Global States Monitoring On a Grid Polish-Japanese Institute of Information Technology,
Software Engineering Prof. Ing. Ivo Vondrak, CSc. Dept. of Computer Science Technical University of Ostrava
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
CSC480 Software Engineering Lecture 10 September 25, 2002.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
© 2006 Pearson Addison-Wesley. All rights reserved 2-1 Chapter 2 Principles of Programming & Software Engineering.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium.
Slide 1 Chapter 8 Architectural Design. Slide 2 Topics covered l System structuring l Control models l Modular decomposition l Domain-specific architectures.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Conception of parallel algorithms
Parallel Programming By J. H. Wang May 2, 2017.
The Client/Server Database Environment
Introduction to Operating System (OS)
Parallel Algorithm Design
Many-core Software Development Platforms
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
An Introduction to Software Architecture
Database System Architectures
MapReduce: Simplified Data Processing on Large Clusters
Function-oriented Design
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Interactive and semiautomatic performance evaluation W. Funika, B. Baliś M. Bubak, R. Wismueller

Outline Motivation Tools Environment Architecture Tools Extensions for GRID Semiautomatic Analysis Prediction model for Grid execution Summary

Motivation Large number of tools, but mainly off-line and non-Grid oriented ones Highly dynamic character of Grid-bound performance data Tool development needs a monitoring system – accessible via well-defined interface – with a comprehensive range of possibilities – not only to observe but also to control Recent initiatives (DAMS – no perf, PARMON – no MP, OMIS) Re-usability of existing tools Enhancing the functionality to support new programming models Interoperability of tools to support each other When interactive tools are difficult or impossible to apply, (semi)automatic ones are of help

Component Structure of Environment

X# Task Workflow and Interfaces st development 2nd development 3rd development PM 2.1 WP 1,3,4 requirements from Interface to Grid Monitoring Services Performance data Model state-of-the-art GR 3 D2.1 M2.1 PU Design of interfaces Between tool Design of Performance Analysis Tool IR 6 D2.2 CO WP 4 local Grid testbed 2.5 internal integration, testing, refinement 12 1st prototype + report D2.3 M2.2 PU WP 1 feedback mainly from 15 D2.4 internal progress report CO 24 2nd prototype + report PU M2.3 D internal integration, testing, refinement WP 1 feedback mainly from 27 D2.6 internal progress report CO final version 33 M2.4 PU internal integration, testing, refinement 36 final demo + report PU D2.7 WP 4 full Grid testbed 18

Application analysis Basic blocks of all applications dataflow for input and output CPU-intensive cores Parallel tasks / threads Communication Basic structures of the (Cross-) Grid Flow charts, diagrams, basic blocks from the applications Optional information on application’s design patterns: e.g. SPMD, master/worker, pipeline, divide & conquer

Categories of performance evaluation tools Interactive, manual performance analysis Off-line tools track based (combined with visualization) profile based (no time reference) problem: strong influence when fine grained measurements On-line tools possible definition (restriction) of the measurements at run-time suitable with cyclic programs: new measurements based to the previous results. => Automation of the bottleneck search is possible Semi-automatic and automatic tools Batch-oriented use of the computational environment (e.g. Grid) Basis: Search-model: enables possible refining of measurements

Defining new functionality of performance tool Types of measurements Types of presentation Levels of measurement granularity Measurement scopes: Program Procedure Loop Function call Statement Code region identification Object types to be handled within an application

Definition and design Work architecture of the tools, based on their functional description hierarchy and naming policy of objects to be monitored the tool/monitor interface, based on the expressing of measurement requests in terms of monitoring specification standard services the filtering and grouping policy for the tools functions for handling the measurement requests and the modes of their operation granularity of measurement representation and visualization modes the modes of delivering performance data for particular measurements

Modes of delivering performance data

Interoperability of tools ``Capability to run multiple tools concurrently and apply them to the same application'' Motivation: - concurrent use of tools for different tasks - combined use can lead to additional benefits - enhanced modularity Problems: Structural conflicts: due to incompatible monitoring modules Logical conflicts: e.g. a tool modifies the state of an object while another tool still keeps outdated information about it

Semiautomatic Analysis Why (semi-)automatic on-line performance evaluation? – ease of use - guide programmers to performance problems Grid: exact performance characteristics of computing resources and network often unknown to user – tool should assess actual performance w.r.t. achievable performance interactive applications not well suited for tracing – applications run 'all the time' – detailed trace files would be too large – on-line analysis can focus on specific execution phases – detailed information via selective refinement

The APART approach object oriented performance data model – available performance data – different kinds and sources, e.g. profiles, traces,... – make use of existing monitoring tools formal specification of performance properties – possible bottlenecks in an application – specific to programming paradigm – APART specification language (ASL) specification of automatic analysis process

APART specification language specification of performance property has three parts: – CONDITION: when does a property hold? – CONFIDENCE: how sure are we? (depends on data source) (0-1) – SEVERITY: how important is the property? basis for determining the most important performance problems specification can combine different types of performance data – data from different hosts => global properties, e.g. load imbalance templates for simplified specification of related properties

Supporting different performance analysis goals performance analysis tool may be used to – optimize an application (independent of execution platform) – find out how well it runs on a particular Grid configuration can be supported via different definitions of SEVERITY e.g.: communication cost – relative amount of execution time spent for communication – relative amount of available bandwidth used for communication also provides hints why there is a performance problem (resources not well used vs. resources exhausted)

Analytical model for predicting performance on GRID Extract the relationship between the application and execution features, and the actual execution time. Focus on the relevant kernels in the applications included in WP1. Assuming message-passing paradigm (in particular MPI).

Taking features into a model HW features : – Networks speeds – CPU speeds – Memory bandwith Application features: – Matrix and vector sizes – Number of the required coomunications – Size of these communications – Memory access patterns

Building a model Through statistical analysis, a model to predict the influence of several aspects on the execution of the kernels will be extracted. Then, a particular model for each aspect will be obtained. A linear combination of them will be used to predict the whole execution time. Every particular model will be a function of the above features. Aspects to be included in the model: – computations time as a function of the above features – memory access time as a function of the features – communications time as a function of the features – synchronization time as a function of the features

X# WP2.4 Tools w.r.t. DataGrid WP3 RequirementGRMPATOP/OMIS 1Scalability (#u, #r, #e) No, no, yes 2Intrusiveness Low (how much ?)Low (0-10 %) 3Portabilitynoyes 4Extendibility New mon. modulespossibleyes New data typesYes (ev. def.)yes 5Communicationpushquery/response 6MetricsApplication onlycomprehensive 7Archive handling noPossible (TATOO)

Summary New requirements for performance tools in Grid Adaptation of int. performance ev. tool to GRID – New measurements – New dialogue window – New presentations – New objects Need in semiautomatic performance analysis – Performance properties – APART specification language – Search strategy Prediction model construction

Performance Measurements with PATOP Possible Types of Measurement: CPU time Delay in Remote Procedure Calls (system calls executed on front-end) Delay in send and receive calls Amount of data sent and received Time in marked areas (code regions) Numer of executions of a specific point in the source code Scope of Measurement System Related: Whole computing system, Individual nodes, Individual threads, Pairs of nodes (communication partners, for send/receive), Set of nodes specified by a performance condition Program Related: Whole program, Individual functions

PATOP

Performance evaluation tools on top of the OCM

On-line Monitoring Interface Specification The interface should provide the following properties: support for interoperable tools efficiency (minimal intrusion, scalability) support for on-line monitoring (new objects, control) platform-independence (HW, OS, programming library) usability for any kind of run-time tool (observing/manipulating, interactive/automatic, centralized/distributed)

Object based approach to monitoring observed system is a hierarchical set of objects: 1. classes: nodes, processes, threads, messages, and message queues 2. node/process model suitable for DMPs, NOWs, SMPs, and SMP clusters access via abstract identifiers (tokens) services observe and manipulate objects 1. OMIS core services: platform independent 2. others: platform (HW, OS, environment) specific extensions tools define their own view of the observed system

Classification of overheads Synchronisation (e.g. barriers and locks) – coordination of accessing data, maintaining consistency Control of parallelism (e.g. fork/join operations and loop scheduling) – control and manage parallelism of a program (user, compiler) Additional computation - changes to sequential code to increase paralellism or data locality – e.g. eliminating data dependences Loss of parallelism – imperfect parallelisation – un- or partially parallelised code, replicated code Data movement – any data transfer within a process or between processes

Interoperability of PATOP and DETOP PATOP provides a high-level performance measurement and visualisation DETOP provides a source-code level debugging Possible scenarios: – Erroneous behaviour observed via PATOP Suspend application with DETOP, examine source code – Measurement of execution phases Start/stop measurement at breakpoint – Measurement on dynamic objects Start measurement at breakpoint when object is created