Performance and Power Modeling Adolfy Hoisie Performance and Architecture Lab (PAL) Pacific Northwest National Laboratory X-stack Meeting March 19, 2013.

Slides:



Advertisements
Similar presentations
Computer Systems & Architecture Lesson 2 4. Achieving Qualities.
Advertisements

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Architecture Representation
Priority Research Direction Key challenges General Evaluation of current algorithms Evaluation of use of algorithms in Applications Application of “standard”
*time Optimization Heiko, Diego, Thomas, Kevin, Andreas, Jens.
Presented by: Thabet Kacem Spring Outline Contributions Introduction Proposed Approach Related Work Reconception of ADLs XTEAM Tool Chain Discussion.
Variability Oriented Programming – A programming abstraction for adaptive service orientation Prof. Umesh Bellur Dept. of Computer Science & Engg, IIT.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Unified Modeling (Part I) Overview of UML & Modeling
Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.
5 th Biennial Ptolemy Miniconference Berkeley, CA, May 9, 2003 MESCAL Application Modeling and Mapping: Warpath Andrew Mihal and the MESCAL team UC Berkeley.
1 FM Overview of Adaptation. 2 FM RAPIDware: Component-Based Design of Adaptive and Dependable Middleware Project Investigators: Philip McKinley, Kurt.
Lecture Nine Database Planning, Design, and Administration
Chapter 10: Architectural Design
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
What is Software Architecture?
Chapter 10 Architectural Design
Computer System Architectures Computer System Software
UML - Development Process 1 Software Development Process Using UML (2)
ET E.T. International, Inc. X-Stack: Programming Challenges, Runtime Systems, and Tools Brandywine Team May2013.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
An Introduction to Software Architecture
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Computer Science Open Research Questions Adversary models –Define/Formalize adversary models Need to incorporate characteristics of new technologies and.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
Introduction to MDA (Model Driven Architecture) CYT.
Cluster Reliability Project ISIS Vanderbilt University.
Wireless Networks Breakout Session Summary September 21, 2012.
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
High Performance Embedded Computing © 2007 Elsevier Chapter 1, part 2: Embedded Computing High Performance Embedded Computing Wayne Wolf.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Back-end (foundation) Working group X-stack PI Kickoff Meeting Sept 19, 2012.
1 Introduction to Software Engineering Lecture 1.
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 05. Review Software design methods Design Paradigms Typical Design Trade-offs.
Performance evaluation of component-based software systems Seminar of Component Engineering course Rofideh hadighi 7 Jan 2010.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
1 What is OO Design? OO Design is a process of invention, where developers create the abstractions necessary to meet the system’s requirements OO Design.
MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.
MODEL-BASED SOFTWARE ARCHITECTURES.  Models of software are used in an increasing number of projects to handle the complexity of application domains.
16/11/ Semantic Web Services Language Requirements Presenter: Emilia Cimpian
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Programmability Hiroshi Nakashima Thomas Sterling.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Platform Abstraction Group 3. Question How to deal with different types hardware and software platforms? What detail to expose to the programmer? What.
Review of Parnas’ Criteria for Decomposing Systems into Modules Zheng Wang, Yuan Zhang Michigan State University 04/19/2002.
Basic Concepts and Definitions
February 19, February 19, 2016February 19, 2016February 19, 2016 Azusa, CA Sheldon X. Liang Ph. D. Software Engineering in CS at APU Azusa Pacific.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
Control-Theoretic Approaches for Dynamic Information Assurance George Vachtsevanos Georgia Tech Working Meeting U. C. Berkeley February 5, 2003.
Systems Architectures System Integration & Architecture.
1 Acquisition Automation – Challenges and Pitfalls Breakout Session # E11 Name: Jim Hargrove and Allen Edgar Date: Tuesday, July 31, 2012 Time: 2:30 pm-3:45.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
1 Advanced Software Architecture Muhammad Bilal Bashir PhD Scholar (Computer Science) Mohammad Ali Jinnah University.
Wrap up. Structures and views Quality attribute scenarios Achieving quality attributes via tactics Architectural pattern and styles.
Support for Program Analysis as a First-Class Design Constraint in Legion Michael Bauer 02/22/17.
The Development Process of Web Applications
Presented by Munezero Immaculee Joselyne PhD in Software Engineering
CHAPTER 2 CREATING AN ARCHITECTURAL DESIGN.
The Extensible Tool-chain for Evaluation of Architectural Models
Toward a Unified HPC and Big Data Runtime
An Introduction to Software Architecture
Dynamic Program Analysis
From Use Cases to Implementation
Presentation transcript:

Performance and Power Modeling Adolfy Hoisie Performance and Architecture Lab (PAL) Pacific Northwest National Laboratory X-stack Meeting March 19, 2013 Berkeley, CA

The vision Beyond the Standard Model (BSM) Modeling Execution Models (MEMS) SummaryOutline

Challenges Exascale Poses on Modeling Multiple constraints – Achieve performance – Power constraints – Fault tolerance Adaptivity: vast numbers of “knobs” to deal with Complexity of the system software stack – dynamic behavior – models in runtime – actionable models – guiding runtime optimizations and operation Complexity of the architecture and associated technologies – need to leverage marketplace – the exascale system will emerge as a synthesis of technologies – leverage commoditization but adds specific smarts for exascale Modeling is called to capture multiple boundaries of the HW-SW stack Applications need to cope with and help mitigate the increased complexity This triggers the need for Modeling now, wide-spread exploration of future apps and future technologies

The vision: ubiquitous modeling Performance & Power & Reliability – together Bag-of-tools approach – – not one for all but all for one. – modeling, simulation, and emulation. Lifecycle coverage – – software and hardware, – from design space exploration, to analysis of early implementation, to deployment, and to run-time optimizations. Co-design – – modeling need be applied to negotiate tradeoffs at all the boundaries of the Hardware/Software stack Dynamic Modeling – – intelligent and informed decision within runtime software Introspective runtime – – dynamic hardware and software, rapid optimizations. – the runtime system is model driven, and the model is actionable

The Model as a first class citizen Performance/Power/ Reliability Model

Collaborative project between the PNNL (PAL), LLNL, and UC San Diego/SDSC (PMaC) Adolfy Hoisie (PI), PNNL Kevin J. Barker (PNNL) Greg Bronevetsky (LLNL) Laura Carrington (SDSC) Marc Casas (LLNL) Daniel Chavarria (PNNL) Roberto Gioiosa (PNNL) Darren J. Kerbyson (PNNL) Gokcen Kestor (PNNL) Nathan R. Tallent (PNNL) Ananta Tiwari (SDSC) Beyond the Standard Model (BSM)

Modeling of Performance and Power – Establishing the modeling of performance and power in concert as the ultimate goal, beyond the current state-of-the-art in which (except for limited instances) performance only is the modeling target Modeling at different scales – From definition of metrics, to application models, to detailed architectural descriptions, models capture the performance and power characteristics at the various boundaries of the hardware/software stack with the desired accuracy and predictive capability needed to make the decision at hand. Dynamic Modeling of Performance, Power and Data Movement – At the heart of modeling performance and power together. Aims at going beyond the current practice that regardless of the methodology employed is static (off-line) in nature. We envision models operating in the entire spectrum from static to dynamic, the latter models serving as the engine of intelligent runtime systems, among others Techniques for Model Generation – Simplifying static model generation, including through compiler based approaches, and at coming up with methodologies for generating models dynamically based on monitoring of systems and application behavior at runtime. Main areas of emphasis in BSM

Power & Performance Modeling 8 Model of performance impact Model of power impact Goal: Automate model generation for power and performance for large-scale HPC applications. Utilize the models to make application- aware runtime energy optimizations Energy usage = power * time Minimal Energy Usage Carrington et al, PMaC

Dynamic modeling & modeling at different scales Goal: predict execution time of complex workloads Given multiple tasks or application modules that may execute on common resources (e.g. Same node, same network, same file system) Measure each task’s execution independently Predict execution time when multiple tasks run concurrently on common resources Bronevesky et al, LLNL

Represent execution as partial order of operations Cost of operations determines length of critical path and execution time If some resources become congested, new critical paths emerge Execution time determined by dependencies, resource availability Control points in code Operations that utilize resources Critical Path Bronevesky et al, LLNL

Represent execution as partial order of operations Cost of operations determines length of critical path and execution time If some resources become congested, new critical paths emerge Execution time determined by dependencies, resource availability Control points in code Operations that utilize resources New Critical Path Bronevesky et al, LLNL

Active measurement of critical paths, resource impact Measure application Compressibility – Run an interference workload to utilize a specific resource – Observe impact on application execution time Produce resource vs time curve Resources Utilization Resources Utilization Resources Utilization Resources Utilization Application Resources Time

Active measurement of critical paths, resource impact Measure application Impact – Run small workloads that utilize same resources as application – Infer the amount available from workload execution time Resources Application Measurement Workload Bronevesky et al, LLNL

Current Status Developed compressibility measurements – Shared cache storage, bandwidth – Network bandwidth and latency LuleshMCB Input Size

Simplifying Model Generation With Tools Analytical (predictive) models require human input (annotations) Tool generates model based on static & dynamic analysis – modeler refines annotations using diagnostic feedback Explore model as ‘first-class’ citizen – annotations coordinate w/ source code Explore annotation language (vs. library) – analogy: parallelism through language instead of library – annotation semantics may eclipse host-language semantics formal semantics w.r.t. static & dynamic aspects of app e.g.: placement not restricted to executable-statement contexts – static analysis minimizes dynamic impact of an annotation instance may entirely eliminate runtime effects Use source code annotations as primary modeling interface

PAL Compiler PAL Monitor PAL Generator profiles model (program) annotated source static analysis prediction & diagnostics parameters reference & instrumented binaries refine as necessary “PALM”: PAL Model generation tool Annotations: primary input to PAL modeling tools Compile with PAL compiler Execute with PAL monitor –collect accurate & detailed measurements Generate model based on dynamic code structure –model expressions become model functions Models are programs Refine annotations using model diagnostics

Collaborative project between the PNNL (PAL), Indiana University, and LSU Adolfy Hoisie (PI), PNNL Matt Anderson (IU) Kevin J. Barker (PNNL) Daniel Chavarria (PNNL) Hartmut Kaiser (LSU) Sriram Krishnamoorthy (PNNL) Joseph Manzano (PNNL) Thomas Sterling (IU) Abhinav Vishnu (PNNL) Project coordinated with 2 other projects related to characterizing EMs from Sandia (Clay) and LBL/USC (Shalf/Lucas) Modeling Execution Models (MEMS)

Goal: model execution models…quantitatively and predictively What is an execution model? – “… a paradigm of computing establishing the principles of computation that govern the interrelationships of the abstract and physical components and their functions comprising the computational process” [Thomas Sterling] – Describes the orchestration of computation on hardware and software resources. – Connects the application and algorithms with the underlying architecture through its semantics. The Need for New Execution Models – Extreme scale systems exhibit a high level of complexity – Adaptivity is the main keyword – The multi-objective optimization problem of achieving maximum performance within stringent power and reliability constraints at Exascale requires new system software stacks Modeling Execution Models

Examples of execution models – Sequential, SIMD, CSP, Global Memory, ParalleX, etc. However – Design & implementation of applications highly dependent on execution models features. – Hardware features determine the efficiency of execution model support – When a new execution model is introduced … Algorithms must be remapped to the new model Architecture features should be updated to support the new paradigm How to characterize and quantify execution models? – Simple answer: By their attributes – SCaLeM  Hierarchical methodology to characterize, quantify and map execution models impact on hardware and applications. Modeling Execution Models

Modeling Execution Models: SCaLeM / AntiCiPate Execution Models Execution Models reason about … S: Coordination between concurrency units C: Creating, management and destruction of concurrency units M: Availability of address ranges and operations on such ranges L: Differentiation between local and remote regions or units Can characterize execution models A sufficient set of characteristics Execution Model Attributes Not linearly independent Need to be “composed” & “parameterized” Represent universes of all execution model’s features and primitives

Modeling Execution Models: SCaLeM / AntiCiPate Execution Model Compositions –Compositions of execution model attributes Based on the four initial attributes May not be defined for a given execution model Execution Model Parameters –Costs of the compositions in a given architecture –Might be a vector of values per composition entry. Applicable to different level of abstraction –Core  Node  System –Hardware  Runtime  Programming Model Mapping –The process of mapping SCaLeM compositions between two level of abstractions: i.e. “realizing” the execution model costs The methodology of defining the Attributes, Compositions, Parameters and Mappings is called AntiCiPate ATTRIBUTESATTRIBUTES COMPOSITIONSCOMPOSITIONS PARAMETERSPARAMETERS A n t i C i P a t e Modeling Methodology Shared by all Execution Models Relevant combination of Attributes Quantifications of attributes Solely architectural / system software dependent variables, not application dependent

Modeling Execution Models: SCaLeM / AntiCiPate e.g. Access to different Memory Hierarchies & NUMA domains S C LM FsFs FcFc FLFL FMFM F CL F ml F SL F MSL F CS L P w = {p 0, p 1, p 2, …} P n = {p 0, p 1, p 2, …} P c = {p 0, p 1, p 2, …} … Node Level Parameter Space Core Level Parameter Space SCaL’eM Attributes Execution Model Compositions Relevant costs at each abstraction level (i.e. from a full system perspective to a per core one) can be described in terms of AntiCiPate e.g. On-node versus Off- node communications Full System Level Parameter Space Mapping Model Application Workload Characterization Extracted from Execution Model Primitives Extracted from Architecture & System Software Parameter List Performance Prediction

Compositions in SCaLeM / AntiCiPate CompSemantic Meaning F()Not applicable F(S)Synchronization operations in an execution model. F(C)Concurrency Style of the execution model F(L)Accessibility of different memory ranges F(M)Memory consistency characteristics of memory ranges F(C,S)Synchronization operations between concurrency units F(S,M)Classical Data centric synchronization F(S,L)Data centric synchronization that enforces ordering F(C,M)Concurrency units and their consistency interactions F(C,L)Concurrency units access properties F(M,L)Alignment between consistency and locality ranges CompSemantic Meaning F(S,C,M)Data centric synchronization on different consistency ranges affected by the ordering of concurrency units F(S,C,L)Control and termination centric synchronization with respect to locality ranges F(S,M,L)No application found F(C,L,M)No application found F(S,C,L,M)No application found

Performance Model (CSP) 24 GTC Model Modeled vs. Measured performance Maximum Error < 5% Composition of Memory and locality (the performance of local stores and loads) dominate the execution runtime TLB Miss Rate NekBone Model Highly Accurate Model Intra-node contention resulting from congestion in the memory system

Modeling Execution Models: Sensitivity Analysis Fundamental attributes of EMs, and representative modeling parameters Core Count Relative Performance 20% Improvement 40% Improvement 60% Improvement 80% Improvement 100% Improvement Sensitivity Analysis of GTC based on ranges for EM attributes. Model-based quantitative analysis will be used for the co-design of Exascale EMs, architectures and applications. EM Memory and Locality Attributes EM Synchronization, Concurrency, and Locality Attributes

Summary We are making significant inroads towards the vision of ubiquitous modeling, including dynamic modeling, in related projects such as BSM & MEMS The X-stack is a rich ecosystem, with significant opportunities, needs, and requirements for modeling Coordinated, synergistic efforts at project level are key for integration (e.g., modeling in X-stack projects, modeling the execution models featured in X-stack for the workload of the co-design centers) Work funded by DOE/ASCR, Sonia Sachs PM