MIT Lincoln Laboratory 999999-1 XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Damian Dechev 1,2, Amruth.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
Introduction CS 524 – High-Performance Computing.
1: Operating Systems Overview
OPERATING SYSTEM OVERVIEW
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
Establishing the overall structure of a software system
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
What is Concurrent Programming? Maram Bani Younes.
Introduction To System Analysis and design
Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.
Michelle Mills Strout OpenAnalysis: Representation- Independent Program Analysis CCA Meeting January 17, 2008.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Architectural Design portions ©Ian Sommerville 1995 Establishing the overall structure of a software system.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
Model-Driven Analysis Frameworks for Embedded Systems George Edwards USC Center for Systems and Software Engineering
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Architectural Design lecture 10. Topics covered Architectural design decisions System organisation Control styles Reference architectures.
IPDPS 2005, slide 1 Automatic Construction and Evaluation of “Performance Skeletons” ( Predicting Performance in an Unpredictable World ) Sukhdeep Sodhi.
Research Topics CSC Parallel Computing & Compilers CSC 3990.
DISTRIBUTED COMPUTING. Computing? Computing is usually defined as the activity of using and improving computer technology, computer hardware and software.
Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.
MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
1: Operating Systems Overview 1 Jerry Breecher Fall, 2004 CLARK UNIVERSITY CS215 OPERATING SYSTEMS OVERVIEW.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Full and Para Virtualization
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Code Motion for MPI Performance Optimization The most common optimization in MPI applications is to post MPI communication earlier so that the communication.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
Is MPI still part of the solution ? George Bosilca Innovative Computing Laboratory Electrical Engineering and Computer Science Department University of.
Slide 1 Chapter 8 Architectural Design. Slide 2 Topics covered l System structuring l Control models l Modular decomposition l Domain-specific architectures.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
ECE 587 Hardware/Software Co- Design Lecture 23 LLVM and xPilot Professor Jia Wang Department of Electrical and Computer Engineering Illinois Institute.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
Online Performance Analysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance Research.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
TensorFlow– A system for large-scale machine learning
Advanced Computer Systems
OPERATING SYSTEMS CS 3502 Fall 2017
SOFTWARE DESIGN AND ARCHITECTURE
课程名 编译原理 Compiling Techniques
Structural Simulation Toolkit / Gem5 Integration
Model-Driven Analysis Frameworks for Embedded Systems
What is Concurrent Programming?
Presented By: Darlene Banta
Presentation transcript:

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Damian Dechev 1,2, Amruth Dakshinamurthy 1 1 Department of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL Scalable Computing R&D Department, Sandia National Laboratories, Livermore, CA

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Motivation Challenges Key Technology Components Technicalities Methodology Status of the Framework Experimental Setup Conclusions Future Work References

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Motivation –Computing on a massive scale Large Hadron Collider, Weather Forecasting, Research in Biology, Energy Particle Accelerator in LHC generates more than 2 GB of data every ten seconds –Applications and algorithms need to keep up the pace –Predicting the most effective design of a multi-core exascale architecture –Optimizing and fine-tuning the software application to efficiently execute in such a highly concurrent environment.

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Need of the Hour? –Hardware/Software co-design –Methodology depends on a bi-directional optimization of design parameters –Software requirements drive hardware design decisions and Hardware design constraints motivate changes in the software design –Need an environment to enable co-design practices to be applied to the design of future extreme-scale systems –Prototyping ideas for future programming models and software infrastructure for these machines

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Challenges –Difficult to develop a software model based on a complex scientific HPC application –Such applications often include a large number of HPC computational methods and libraries, sophisticated communication and synchronization patterns, and architecture-specific optimizations –Difficult to analyze and predict the runtime statistics for domain- specific applications using heuristic algorithms

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Key Technology Components –Structural Simulation Toolkit (SST) Simulator –ROSE Compiler –Software Skeleton Model of the Large-Scale Parallel Application

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design SST Simulator –Structural Simulation Toolkit (SST) macro discrete event simulator is an open-source simulation framework offered by Sandia National Laboratories for the development of extreme-scale hardware and software –Enables the coarse-grained study of distributed memory applications in highly concurrent systems and is implemented in C++ – Provides a parallel simulation environment based on MPI and, several network and processor models. –Enables to derive detailed information about the effect of various design parameters on execution time memory bandwidth, latency, number of nodes, processors per node, network topology and other design parameters

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design SST Simulator Application ThreadsDiscrete Event Simulator (Lightweight Threads) Process Callbacks Events Callbacks Simulator Interface Kernels Servers Request Trace Skeleton CPU Network

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design SST Simulator –Makes use of extremely lightweight application threads, allowing it to maintain simultaneous task counts ranging into the millions –Lightweight application threads perform MPI operations –Task threads create communication and compute kernels, then interact with the simulator’s back-end by pushing kernels down to the interface layer –The interface layer generates simulation events and handles the scheduling of resulting events to the simulator back-end –The interface layer implements servers to manage the interaction with the network model in the context of the application –Supports two execution modes Skeleton application execution Trace-driven simulation mode –The processor layer receives callbacks when the kernels are completed

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design ROSE Compiler –ROSE compiler is an open source-to-source compiler infrastructure for building a wide variety of customized analysis, optimization and transformation tools –Enables rapid development of source-to-source translators from C, C++, UPC, and Fortran codes to facilitate deep analysis of complex application codes and code transformations

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design ROSE Compiler –The front-end section contains tools for the construction of the AST –The mid-end section contains tools for the processing and transformation of the AST Interfaces to operate on AST nodes Traversal mechanism – Preorder and Postorder Attributes mechanism AST rewrite mechanism Loop analysis and optimization AST Merge mechanism –The back-end section contains tools for generating source code from the AST

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Software Skeleton Model –Skeleton Model is an abstract form of the full scale application similar to the native MPI implementation with the exception of the syntax of the MPI calls –Captures control flow and communication patterns of an application –Constructed by removing parts of computations that do not affect the application's state and replacing them with system calls such as compute(…), which reduce simulation time dramatically –Can be executed on the SST/macro simulator for extreme-scale studies. –Much less expensive than running the full application –Allows the simulation of applications and architectures at levels of parallelism that are not obtainable by the most powerful supercomputers today

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Technicalities –Need to match MPI expression patterns on the AST provided by the ROSE infrastructure to derive a backward slice A backward slice is a collection of all program statements and expressions that affect a given point in the user code –Our slicing algorithm operates on the Program Dependence Graph A PDG represents both data dependence edges and control dependence edges –Matching is done in the ROSE translator by making String comparisons of the input code statements within AST nodes to that of desired SST/macro statements and finally rewriting portions of the AST nodes AST, a tree representation of the source code flow graph where each node denotes a construct occurring in the source code

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Methodology

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Methodology –C++ full scale program utilizing standard MPI calls is fed to ROSE front end. Jacobi Relaxation algorithm implemented in C++ using MPI –The front end parses the input source code and generates EDG’s AST, the Edison Design Group’s compiler intermediate representation –This AST is traversed internally to generate a new AST called Sage III Intermediate representation

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Methodology –The new SAGE III AST is the input to our translator –The translator applies the program transformation and AST Rewrite/Traversal modules provided by ROSE –Using ROSE compiler extension modules: Slicing module and Transformation module –Slicing module performs program slicing by abstracting the application code and modeling only the relevant software components (MPI calls) –Operates on the Program Dependence Graphs to remove computation parts from the program

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Methodology –Transformation module specifies the desired output based on the sliced-out Abstract Syntax Tree (AST) –Operates on the sliced-out AST to perform transformation of native MPI calls to the ones compatible for SST simulator –The backend C++ source generator uses this rewritten AST and unparses it to generate C++ source code –The skeleton program will only provide information about MPI calls and their associated argument lists

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Status of the Framework –SST MPI declarations are inserted into native mpi.h header file in advance to generate SST MPI function identifiers –Avoids inserting SST MPI function symbols into symbol table during transformation phase –Most of the SST MPI calls return a type timestamp. So normal MPI functions calls are transformed into expressions –Argument reordering has been handled since method signatures of standard MPI call vary from those of SST MPI call –Currently, the framework handles MPI routines from Environment Management, Point-to-Point Communication and Collective Communication groups(there are few exceptions) –Slicing has been done using Program Dependence Graph

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Experimental Setup –We are able to derive software skeleton models from C++ implementations of Jacobi Relaxation and Poisson’s equation solutions –The derived skeleton model has to inherit from sstmac::mpiapi to gain access to the SST MPI interface and should implement the run method –SST simulator uses a simple C++ driver that sets up the simulation environment and runs the application –The driver constructs network topology and interconnect model. –The driver constructs an instance of the skeleton application and calls the run method to start the simulation –Analysis of derived Skeleton Models on SST simulator is planned

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Conclusions –Addresses many issues concerning hardware/software co-design by allowing a feedback between hardware design and software development –Use of cycle accurate simulation tools provides the data and insights to estimate the performance impact on an HPC applications when it is subjected to certain architectural constraints –Allows for the evaluation and optimization of C++ applications using the MPI programming model –Allows us to employ new programming models and algorithms in the design of large-scale parallel applications

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Future Work –Our plan is to extract skeleton models for applications provided by Mantevo project –Mantevo applications are small self-contained proxies for real applications that embody essential performance characteristics –We plan to extend our work to include other programming models, in addition to MPI. –We plan to enhance our present framework to extract skeleton models for Fortran applications.

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design References –ROSE Compiler, –Sandia's Computational Software Site, –Message Passing Interface Forum, –SST: The Structural Simulation Toolkit –C. L. Janssen, H. Adalsteinsson, S. Cranford, J. P. Kenny, A. Pinar, D. A. Evensky, J. Mayo. A Simulator for Large-Scale Parallel Computer Architectures. International Journal of Distributed Systems and Technologies –Tae-Hyuk Ahn, Damian Dechev, Heshan Lin, Helgi Adalsteinsson and Curtis Janssen. Evaluating Performance Optimizations of Large-Scale Genomic Sequence Search Applications Using SST/macro. SIMULTECH 2011 –Exascale Computing Software Study: Software Challenges in Extreme Scale Systems

MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Questions?