Leveraging Hierarchy Is this our Undiscovered Country? John T. Daly.

Slides:



Advertisements
Similar presentations
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Priority Research Direction Key challenges General Evaluation of current algorithms Evaluation of use of algorithms in Applications Application of “standard”
DEPARTMENT OF COMPUTER LOUISIANA STATE UNIVERSITY Models without Borders Thomas Sterling Arnaud & Edwards Professor, Department of Computer Science.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Computer Systems/Operating Systems - Class 8
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
Research Directions for On-chip Network Microarchitectures Luca Carloni, Steve Keckler, Robert Mullins, Vijay Narayanan, Steve Reinhardt, Michael Taylor.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
 Introduction Introduction  Definition of Operating System Definition of Operating System  Abstract View of OperatingSystem Abstract View of OperatingSystem.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
STRATEGIES INVOLVED IN REMOTE COMPUTATION
Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers K. Vaidyanathan, S. Narravula, P. Balaji and D. K. Panda Network Based.
ET E.T. International, Inc. X-Stack: Programming Challenges, Runtime Systems, and Tools Brandywine Team May2013.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Priority Research Direction (use one slide for each) Key challenges -Fault understanding (RAS), modeling, prediction -Fault isolation/confinement + local.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
David S. Ebert David S. Ebert Visual Analytics to Enable Discovery and Decision Making: Potential, Challenges, and.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
High-Performance Computing An Applications Perspective REACH-IIT Kanpur 10 th Oct
Presented by High Productivity Language and Systems: Next Generation Petascale Programming Wael R. Elwasif, David E. Bernholdt, and Robert J. Harrison.
GPU in HPC Scott A. Friedman ATS Research Computing Technologies.
Chapter 4 Realtime Widely Distributed Instrumention System.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Directed Reading 2 Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to address these. Submitted.
Cray Innovation Barry Bolding, Ph.D. Director of Product Marketing, Cray September 2008.
Alternative ProcessorsHPC User Forum Panel1 HPC User Forum Alternative Processor Panel Results 2008.
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
 2009 Calpont Corporation 1 Calpont Open Source Columnar Storage Engine for Scalable MySQL Data Warehousing April 22, 2009 MySQL User Conference Santa.
CSC 7600 Lecture 28 : Final Exam Review Spring 2010 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS FINAL EXAM REVIEW Daniel Kogler, Chirag Dekate.
Catawba County Board of Commissioners Retreat June 11, 2007 It is a great time to be an innovator 2007 Technology Strategic Plan *
Group 3: Architectural Design for Enhancing Programmability Dean Tullsen, Josep Torrellas, Luis Ceze, Mark Hill, Onur Mutlu, Sampath Kannan, Sarita Adve,
Data Center & Large-Scale Systems (updated) Luis Ceze, Bill Feiereisen, Krishna Kant, Richard Murphy, Onur Mutlu, Anand Sivasubramanian, Christos Kozyrakis.
System-Directed Resilience for Exascale Platforms LDRD Proposal Ron Oldfield (PI)1423 Ron Brightwell1423 Jim Laros1422 Kevin Pedretti1423 Rolf.
Programmability Hiroshi Nakashima Thomas Sterling.
Extreme Computing’05 Parallel Graph Algorithms: Architectural Demands of Pathological Applications Bruce Hendrickson Jonathan Berry Keith Underwood Sandia.
I MAGIS is a joint project of CNRS - INPG - INRIA - UJF iMAGIS-GRAVIR / IMAG Efficient Parallel Refinement for Hierarchical Radiosity on a DSM computer.
Chapter 2 Introduction to OS Chien-Chung Shen CIS/UD
Parallel IO for Cluster Computing Tran, Van Hoai.
Tackling I/O Issues 1 David Race 16 March 2010.
CERN VISIONS LEP  web LHC  grid-cloud HL-LHC/FCC  ?? Proposal: von-Neumann  NON-Neumann Table 1: Nick Tredennick’s Paradigm Classification Scheme Early.
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Percipient StorAGe for Exascale Data Centric Computing Exascale Storage Architecture based on “Mero” Object Store Giuseppe Congiu Seagate Systems UK.
Chapter 1 Characterization of Distributed Systems
COMPSCI 110 Operating Systems
Parallel Objects: Virtualization & In-Process Components
Programming Models for SimMillennium
CS703 - Advanced Operating Systems
Performance Evaluation of Adaptive MPI
Many-core Software Development Platforms
Toward a Unified HPC and Big Data Runtime
SDM workshop Strawman report History and Progress and Goal.
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Leveraging Hierarchy Is this our Undiscovered Country? John T. Daly

Undiscovered Country: Cost vs. Risk? Data Movement Concurrency Latency Hiding Technology Generation ~ 15 years Time Log(Performance) Parallel (IN) Parallel (OUT) Vector Exascale?

Advanced Computing Systems (ACS) HPC capability doubles every 14 months, but data doubles every 9 months Innovative solutions required to bridge the gap Partner with industry, academia and national labs to develop technology enablers for next generation computing Generate a steady stream of capability; no “end goal” for scaling

ACS: Bridge to research community Mission ProblemsTechnical Challenges Technical SolutionsMission Capability Participatory Research Mirroring Agency Compute Mission Universities National Labs Government Industry CECCEC CECCEC

ACS: technical thrusts + end-to-end Our HPC stakeholders – System integrator optimizes power, performance and reliability for a set number of dollars – System user optimizes usability, dependability and time-to-solution for a set number of deliverables Point solutions in six technical thrusts: power efficiency, chip I/O, interconnects, productivity, file I/O and resilience Innovative end-to-end solutions – AMOEBA: chip level data movement and packaging – MYRIAD(?): system level modeling and simulation

Extreme is not necessarily “balanced” Traditional HPC is an important part of ACS, but not the only part Dynamic design space drives the need for simulation and abstract machine model Goal: Scientific understanding in HPC Chip I/O Interconnect Power EfficiencyResilience Productivity File I/O & Storage Traditional HPC and ACS too Also ACS, but maybe not traditional HPC

Future “convergence” ? Today – Predictive science starts with an initial model and runs a numerical experiment to generate lots of data – Data analytics starts with lots of data and extracts features or information that characterize the data Tomorrow – Predictive science uses in situ data analytics to reduce the data storage and post-processing requirements – Data analytics uses in situ predictive science to ask the question “what ought this data to look like?” ! ? ? ? ? ? ! ? ? ! vs.

Energy is the next shared resource Off node communication is over budget  Off chip communication is over budget   DOE Architectures and Technology for Extreme Scale Computing, San Diego, CA Power Efficiency Resilience Productivity Chip I/O Interconnect File I/O

Data is the challenge of scale Energy, performance and data integrity tapers are a function of the distance between the data and the processor Data locality is key to computing at scale for optimizing right answers per Joule per second – Spatial locality allows me to grab more data in a single memory transaction – Temporal locality allows me to use the same data multiple times before I have to move it

A role for NV in the hierarchy

Node architecture = “shops” of data Byte/Word addressable memory up and down the stack, block synchronous between stacks Control is data aggregator (e.g., gather/scatter) Processor/Contro l Control Processor/Contro l Control

Exploiting Spatial Locality Fractal Memory – Create a virtual mapping of data lines to space filling curves (e.g., Jin and Mellor Crummey, “Using Space-filling Curves for Computation Reordering”) – Use memory control logic to resolve mappings – Dynamic mapping by user via PM interface Move work to data – Adaptive mesh refinement is a refine operation spawned at another memory component – Map memory references back to processor

Exploiting Temporal Locality Global one-sided memory model – Different processors updating same values in PDE solver creates race conditions – You’re going to get the wrong answer anyway, so checkpoint asynchronously and use QMU – Inherently resilient algorithms that avoid global synchronization Reconfigurable hierarchy: “cache” vs. “scratch pad” – “Cache” is seamless and easy to use, but sometimes I’d like to be able to bypass it – “Scratch pad” avoids duplicating memory and can be higher performing, but it is harder to use – Is SSD going to work like “cache” or “scratch pad”?

Motivating example: Exa-sorting Many linear solution methods are already robust against errors and data race conditions (e.g. multigrid methods) What about an application like sorting? – Gradient descent approach is robust under errors* and can be parallelized asynchronously – Suggests possibility for research in asynchronous parallel minimization approach for other classes of problems How about non-linear solvers? – Analogy in minimization of the objective function via solution of the adjoint problem? – What about chaotic systems? * Joseph Sloan, David Kesler, Rakesh Kumar, and Ali Rahimi. “A Numerical Optimization-based Methodology for Application Robustification: Transforming Applications for Error Tolerance”. DSN2010, Chicago, July Non-linear term

From the user/developer perspective Domain specific language to serve as portable wrapper for domain user and SME Support for globally addressable memory space Easy one-sided and two-sided, synchronous and asynchronous access to remote data Intuitive mechanism for lightweight thread creation and remote task invocation Application control over dynamically reconfigurable memory (hardware cache, software cache and software scratch) at each level of the memory hierarchy (chip, node and storage) Tools for monitoring memory and energy utilization, so I know when I’m swapping to DIMM!

Conclusions Exascale arrives at the end of the technology generation bridging concurrency to data: risk or opportunity? Traditional algorithms + architectures too expensive in power, performance and reliability if data leaves cache Rethinking computation may yield large ROI – models of computation – “balanced architecture” – predictive science vs. data analytics Required to facilitate new approaches – programming models and tools – simulation and modeling framework – vendor partnerships and technology investment