Presentation is loading. Please wait.

Presentation is loading. Please wait.

Leveraging Hierarchy Is this our Undiscovered Country? John T. Daly.

Similar presentations


Presentation on theme: "Leveraging Hierarchy Is this our Undiscovered Country? John T. Daly."— Presentation transcript:

1 Leveraging Hierarchy Is this our Undiscovered Country? John T. Daly

2 Undiscovered Country: Cost vs. Risk? Data Movement Concurrency Latency Hiding Technology Generation ~ 15 years Time Log(Performance) Parallel (IN) Parallel (OUT) Vector Exascale?

3 Advanced Computing Systems (ACS) HPC capability doubles every 14 months, but data doubles every 9 months Innovative solutions required to bridge the gap Partner with industry, academia and national labs to develop technology enablers for next generation computing Generate a steady stream of capability; no “end goal” for scaling

4 ACS: Bridge to research community Mission ProblemsTechnical Challenges Technical SolutionsMission Capability Participatory Research Mirroring Agency Compute Mission Universities National Labs Government Industry CECCEC CECCEC

5 ACS: technical thrusts + end-to-end Our HPC stakeholders – System integrator optimizes power, performance and reliability for a set number of dollars – System user optimizes usability, dependability and time-to-solution for a set number of deliverables Point solutions in six technical thrusts: power efficiency, chip I/O, interconnects, productivity, file I/O and resilience Innovative end-to-end solutions – AMOEBA: chip level data movement and packaging – MYRIAD(?): system level modeling and simulation

6 Extreme is not necessarily “balanced” Traditional HPC is an important part of ACS, but not the only part Dynamic design space drives the need for simulation and abstract machine model Goal: Scientific understanding in HPC Chip I/O Interconnect Power EfficiencyResilience Productivity File I/O & Storage Traditional HPC and ACS too Also ACS, but maybe not traditional HPC

7 Future “convergence” ? Today – Predictive science starts with an initial model and runs a numerical experiment to generate lots of data – Data analytics starts with lots of data and extracts features or information that characterize the data Tomorrow – Predictive science uses in situ data analytics to reduce the data storage and post-processing requirements – Data analytics uses in situ predictive science to ask the question “what ought this data to look like?” ! ? ? ? ? ? ! ? ? ! vs.

8 Energy is the next shared resource Off node communication is over budget  Off chip communication is over budget   DOE Architectures and Technology for Extreme Scale Computing, San Diego, CA Power Efficiency Resilience Productivity Chip I/O Interconnect File I/O

9 Data is the challenge of scale Energy, performance and data integrity tapers are a function of the distance between the data and the processor Data locality is key to computing at scale for optimizing right answers per Joule per second – Spatial locality allows me to grab more data in a single memory transaction – Temporal locality allows me to use the same data multiple times before I have to move it

10 A role for NV in the hierarchy http://www.bit-tech.net/hardware/memory/2007/11/15/the_secrets_of_pc_memory_part_1/3

11 Node architecture = “shops” of data Byte/Word addressable memory up and down the stack, block synchronous between stacks Control is data aggregator (e.g., gather/scatter) Processor/Contro l Control Processor/Contro l Control

12 Exploiting Spatial Locality Fractal Memory – Create a virtual mapping of data lines to space filling curves (e.g., Jin and Mellor Crummey, “Using Space-filling Curves for Computation Reordering”) – Use memory control logic to resolve mappings – Dynamic mapping by user via PM interface Move work to data – Adaptive mesh refinement is a refine operation spawned at another memory component – Map memory references back to processor

13 Exploiting Temporal Locality Global one-sided memory model – Different processors updating same values in PDE solver creates race conditions – You’re going to get the wrong answer anyway, so checkpoint asynchronously and use QMU – Inherently resilient algorithms that avoid global synchronization Reconfigurable hierarchy: “cache” vs. “scratch pad” – “Cache” is seamless and easy to use, but sometimes I’d like to be able to bypass it – “Scratch pad” avoids duplicating memory and can be higher performing, but it is harder to use – Is SSD going to work like “cache” or “scratch pad”?

14 Motivating example: Exa-sorting Many linear solution methods are already robust against errors and data race conditions (e.g. multigrid methods) What about an application like sorting? – Gradient descent approach is robust under errors* and can be parallelized asynchronously – Suggests possibility for research in asynchronous parallel minimization approach for other classes of problems How about non-linear solvers? – Analogy in minimization of the objective function via solution of the adjoint problem? – What about chaotic systems? * Joseph Sloan, David Kesler, Rakesh Kumar, and Ali Rahimi. “A Numerical Optimization-based Methodology for Application Robustification: Transforming Applications for Error Tolerance”. DSN2010, Chicago, July 2010. Non-linear term

15 From the user/developer perspective Domain specific language to serve as portable wrapper for domain user and SME Support for globally addressable memory space Easy one-sided and two-sided, synchronous and asynchronous access to remote data Intuitive mechanism for lightweight thread creation and remote task invocation Application control over dynamically reconfigurable memory (hardware cache, software cache and software scratch) at each level of the memory hierarchy (chip, node and storage) Tools for monitoring memory and energy utilization, so I know when I’m swapping to DIMM!

16 Conclusions Exascale arrives at the end of the technology generation bridging concurrency to data: risk or opportunity? Traditional algorithms + architectures too expensive in power, performance and reliability if data leaves cache Rethinking computation may yield large ROI – models of computation – “balanced architecture” – predictive science vs. data analytics Required to facilitate new approaches – programming models and tools – simulation and modeling framework – vendor partnerships and technology investment


Download ppt "Leveraging Hierarchy Is this our Undiscovered Country? John T. Daly."

Similar presentations


Ads by Google