Priority Research Direction Key challenges General Evaluation of current algorithms Evaluation of use of algorithms in Applications Application of “standard”

Slides:



Advertisements
Similar presentations
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Advertisements

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Parallel Research at Illinois Parallel Everywhere
Priority Research Direction: Portable de facto standard software frameworks Key challenges Establish forums for multi-institutional discussions. Define.
*time Optimization Heiko, Diego, Thomas, Kevin, Andreas, Jens.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.
Reference: Message Passing Fundamentals.
Chapter 1 Software Development. Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 1-2 Chapter Objectives Discuss the goals of software development.
Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Software Architecture Quality. Outline Importance of assessing software architecture Better predict the quality of the system to be built How to improve.
Power is Leading Design Constraint Direct Impacts of Power Management – IDC: Server 2% of US energy consumption and growing exponentially HPC cluster market.
Leveling the Field for Multicore Open Systems Architectures Markus Levy President, EEMBC President, Multicore Association.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
Priority Research Direction Key challenges Fault oblivious, Error tolerant software Hybrid and hierarchical based algorithms (eg linear algebra split across.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
Chapter 2 The process Process, Methods, and Tools
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
Priority Research Direction (use one slide for each) Key challenges -Fault understanding (RAS), modeling, prediction -Fault isolation/confinement + local.
SOFTWARE ENGINEERING1 Introduction. Software Software (IEEE): collection of programs, procedures, rules, and associated documentation and data SOFTWARE.
1 ISA&D7‏/8‏/ ISA&D7‏/8‏/2013 Systems Development Life Cycle Phases and Activities in the SDLC Variations of the SDLC models.
Integrating Fine-Grained Application Adaptation with Global Adaptation for Saving Energy Vibhore Vardhan, Daniel G. Sachs, Wanghong Yuan, Albert F. Harris,
Using Business Scenarios for Active Loss Prevention Terry Blevins t
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
Role-Based Guide to the RUP Architect. 2 Mission of an Architect A software architect leads and coordinates technical activities and artifacts throughout.
What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Back-end (foundation) Working group X-stack PI Kickoff Meeting Sept 19, 2012.
University of Palestine software engineering department Testing of Software Systems Testing throughout the software life cycle instructor: Tasneem.
INFO 636 Software Engineering Process I Prof. Glenn Booker Week 9 – Quality Management 1INFO636 Week 9.
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
Introduction to Research 2011 Introduction to Research 2011 Ashok Srinivasan Florida State University Images from ORNL, IBM, NVIDIA.
COMPUTER ORGANIZATION AND ASSEMBLY LANGUAGE Lecture 19 & 20 Instruction Formats PDP-8,PDP-10,PDP-11 & VAX Course Instructor: Engr. Aisha Danish.
Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Breakout Group: Debugging David E. Skinner and Wolfgang E. Nagel IESP Workshop 3, October, Tsukuba, Japan.
A Roadmap towards Machine Intelligence
Programmability Hiroshi Nakashima Thomas Sterling.
Software Quality Assurance SOFTWARE DEFECT. Defect Repair Defect Repair is a process of repairing the defective part or replacing it, as needed. For example,
B5: Exascale Hardware. Capability Requirements Several different requirements –Exaflops/Exascale single application –Ensembles of Petaflop apps requiring.
Process Asad Ur Rehman Chief Technology Officer Feditec Enterprise.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Tackling I/O Issues 1 David Race 16 March 2010.
Priority Research Direction (use one slide for each) Key challenges What will you do to address the challenges?Brief overview of the barriers and gaps.
Is MPI still part of the solution ? George Bosilca Innovative Computing Laboratory Electrical Engineering and Computer Science Department University of.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Tohoku University, Japan
Parallel Programming By J. H. Wang May 2, 2017.
Cache Memory Presentation I
Real-time Software Design
Priority Research Direction (use one slide for each)
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Operating System Introduction.
Priority Research Direction (use one slide for each)
Presentation transcript:

Priority Research Direction Key challenges General Evaluation of current algorithms Evaluation of use of algorithms in Applications Application of “standard” approaches Rethink approaches (high risk/high payoff) Scalability Fault tolerance/resilience Conforming to architectural requirements New areas/uses of algorithms See Frameworks and Libraries How will this impact the range of applications that may benefit from exascale systems? Some applications will require breakthroughs in algorithms. Others will be far less efficient or reliable without new algorithms What’s the timescale in which that impact may be felt? Algorithms must be realized efficiently in libraries, frameworks, and application codes Depends critically on availability of tools to develop, test, and tune codes Summary of research direction Potential impact on software component Potential impact on usability, capability, and breadth of community

Key Challenges Scalability – Concurrency Finding enough independent tasks – Latency hiding or Communication/computation Overlap Creating “split” operations Fault tolerance/resilience – Algorithm role in detecting faults – Algorithm role in providing alternative strategies to repairing faults (particularly transient faults) Conforming to architectural requirements – Power – Heterogeneity of processing elements – Memory Limitations, particularly smaller memory per core New areas/uses of algorithms – E.g., UQ.

Key Challenges: Scalability Concurrency – Exascale systems expected to have 10 8 to 10 9 threads – mesh has one point per thread; low computation/communication ratio typically inefficient Latency Hiding – Even current systems have cycle hardware latency to remote access – Algorithms need to permit computation/communication overlap of at least 10 4 cycles – Many current algorithms have synchronization points (such as dot products/allreduce) that limit opportunities for latency hiding (this includes Krylov methods for solving sparse linear systems) Load balancing – Static load balancing rarely provides an exact load balance; experience with current terascale and near petascale systems suggests that this is already/will become a major scalability problem for many algorithms.

Key Challenges: Fault Tolerance Detecting faults – Experience shows applications may not detect faults; see “Assessing Fault Sensitivity in MPI Applications” – Need to evaluate role of algorithms in detecting faults (tradeoff wrt hardware detection) Detecting faults in hardware requires additional power, memory, etc. Repairing faults – Regardless of who detects fault, need to repair – General solution (e.g., checkpoint/restart) is already demanding on high-end platforms (e.g., requiring significant I/O bandwidth) – Need to evaluate role of algorithms in repairing faults, particularly transient (e.g., memory upset) faults

Key Challenges: Architectural Constraints Power – Power is a major constraint; to date, few algorithms have been developed (or evaluated with respect to) minimize power – May be combined with other constraints E.g., data motion consumes energy Heterogeneity – Many proposals for Exascale systems (and power-efficient petascale systems) exploit heterogeneous processors – Current experience with GPGPU systems, while promising for some algorithms, has not shown benefits with others – Heterogeneous systems also require different strategies for use of memory and functional units Memory Ratios (speeds and size) – Exascale systems are likely to have orders of magnitude less memory per core than current systems (though still large amounts of memory) – Power constraints may reduce the amount of fast memory available; adding to need for latency hiding – Algorithms needs to use memory more efficiently (more accuracy per byte; fewer data moves per result)

Key Challenges: New Application Areas Each generation of supercomputer has seemed to reduce the number of applications that can make use HPC Exascale systems are likely to be different than simple extrapolations of petascale systems Some application areas may become suitable again; others (because of the extreme scale and degree of concurrency) may become possible for the first time – This needs to be evaluate wrt possible architectures and performance models

Summary of Research Directions (Discuss role of terascale/petascale systems in developing/testing algorithmic developments) Concern: Much of this depends on discovering new algorithms. – This cannot be scheduled – we can devote effort to searching for such algorithms, but we cannot predict the development of new mathematics – One possibility for a roadmap would be to have decision points in case the hoped-for breakthroughs do not occur – "Genius doesn't work on an assembly line basis. You can't simply say, 'Today I will be brilliant.'" -- Kirk, The Ultimate Computer, Ep 53/4729.4

Summary of Research Directions General Evaluation – What is the suitability of current algorithms to likely Exascale systems (gap analysis) Must distinguish between current implementations and fundamental limits – For each class of algorithms, there are some strategies that may be used to improve for exascale Evolutionary approach – For each problem area, rethink approach Need a model architecture against which ideas can be evaluated

Summary of Research Directions Scalability – Alternative mathematical models Create concurrency by rethinking approximations Re-evaluate existing methods; e.g., use of Monte Carlo; tradeoffs between implicit/explicit methods; replace FFT with other approaches (to avoid all-to-all) – Non-blocking Algorithms Fault Tolerance – Use or creation of redundant information (in algorithm or mathematical model) Architectural Requirements – Incl. higher-order methods (more memory space and locality efficient) – Enhanced models to evaluate algorithms (include memory hierarchy and locality, power, etc.) New Areas – Incl. strong scaling (application areas that haven’t scaled to date)

Potential Impact on Software Components (What capabilities will result?) (What new methods and components will be developed?) See Frameworks and Libraries Latency hiding requires program model support – Split and nonblocking operations introduce programming hazards Need tools to write, debug, and analyze correctness

Potential impact on usability, capability, and breadth of community How will this impact the range of applications that may benefit from exascale systems? – Some applications will require breakthroughs in algorithms. Others will be far less efficient or reliable without new algorithms – To date, there is little quantified evaluation of the needs What’s the timescale in which that impact may be felt? – Algorithms must be realized efficiently in libraries, frameworks, and application codes – Depends critically on availability of tools to develop, test, and tune codes

4.3.2 Application Element: Algorithms Availability of Efficient Exascale Algorithms Gap Analysis Application needs Evaluation Better Scaling in Node Count; Better scaling within node Prototype algorithms for Heterogeneous Systems Fault Resilience Efficient Implementation Exascale Suitability

4.3.2 Application Element: Algorithms Location of box indicates when results are available; the expectation is that further refinement takes place 1.Gap analysis. Needs to be completed early to guide the rest of the effort 2.Application needs evaluation. Needs to be initiated early and completed early to guide allocation of effort and to identify areas where apps need to rethink approach (cross-cutting issue). Needs to develop and use more realistic models of computation (quantify need) 3.Better scaling in node count and within nodes can be performed using petascale systems in this time frame (so it makes sense to deliver a first pass in this time frame) 4.Heterogeneous systems are available now but require both programming model and algorithmic innovation; while some work has already been done, others may require more time. At the time of this bullet, view this as “a significant fraction of algorithms required for applications expected to run at Exascale have effective algorithms for heterogeneous processor systems” 5.Fault resilience is a very hard problem; this assumes that work starts now but will take this long to meet the same definition as for heterogeneous systems – “a significant fraction of algorithms have fault resilience” 6.Efficient implementation includes the realization in exascale programming models and tuning for real systems, which may involve algorithm modifications (since the real architecture will most likely be different from the models used in earlier developments). In addition, the choice of data structures may also change, depending on the abilities of compilers and runtimes to provide efficient execution of the algorithms.

4.3.2 Application Element: Algorithms Technology drivers – Concurrency; Latency; Low Memory; Probability of transient and permanent faults Alternative R&D strategies – Refine existing algorithms to expose more concurrency, adapt to heterogeneous architectures, manage faults – New algorithms (high risk/high payoff) Recommended research agenda – Follow both strategies (Evolutionary + Re-evaluate solution method) – Gap analysis needed ASAP – Must re-evaluate application needs (change model/approximation to allow use of Exascale-appropriate algorithms) Crosscutting considerations – Realize more complex algorithms (execution and programming models, debugging and tuning tools) – Hardware architectural features – Application’s mathematical models

Updates What specific requirements? What are the capabilities that are needed? – For the up and to the right graph Another way of looking at it: – Scalability/perf needed now; will be increasingly serious – Fault resilience likely needed as we approach exascale (thus needed about 2017 to give 3 years to realize, test, and provide enough time for a second round of developments – Connect requirements to applications Show why is something needed Problem: Analysis to date has often focused on anecdotal or simple extrapolation from current systems.