Presentation is loading. Please wait.

Presentation is loading. Please wait.

PMaC Performance Modeling and Characterization Efficient HPC Data Motion via Scratchpad Memory Kayla Seager, Ananta Tiwari, Michael Laurenzano, Joshua.

Similar presentations


Presentation on theme: "PMaC Performance Modeling and Characterization Efficient HPC Data Motion via Scratchpad Memory Kayla Seager, Ananta Tiwari, Michael Laurenzano, Joshua."— Presentation transcript:

1 PMaC Performance Modeling and Characterization Efficient HPC Data Motion via Scratchpad Memory Kayla Seager, Ananta Tiwari, Michael Laurenzano, Joshua Peraza, Pietro Cicotti, Laura Carrington

2 PMaC Performance Modeling and Characterization Question 1 Do HPC workloads benefit from software managed Scratchpads? YES! If, so how will we manage it?

3 PMaC Performance Modeling and Characterization Outline  Motivation  Scratchpad Background  Simulation Framework and Methodology  Initial Study  Current Direction

4 PMaC Performance Modeling and Characterization Outline  Motivation  Scratchpad Background  Simulation Framework and Methodology  Initial Study  Current Direction

5 PMaC Performance Modeling and Characterization Problem: HPC Powerwall  Can't scale old systems –Powerwall already reached by petaflop systems –Must redesign for power savings  Efficiency must increase by 2x Source: Exascale Report (Kogge, 2008)

6 PMaC Performance Modeling and Characterization How to get Energy Savings 1.Redesign Hardware –Simpler hardware –Transfer complexity to software 2.Minimize expensive data movement –Memory slower –More cores=more contention –HPC codes have large working set sizes

7 PMaC Performance Modeling and Characterization Outline  Motivation  Scratchpad Background  Simulation Framework and Methodology  Initial Study  Current Direction

8 PMaC Performance Modeling and Characterization What is a Scratchpad?  Scratchpad (SPM)? –Local memory (like a cache) –SPM: software allocated memory  Simpler Hardware VS Memory Array Decoder Tagging Array Decoder

9 PMaC Performance Modeling and Characterization Scratchpad Allocation  Dynamic –Move block of code –Iterate over code –Move another block  Static: Move block of code once  Strategies –Knapsack –Graph Coloring  register allocation problem

10 PMaC Performance Modeling and Characterization The Idea: Less Data Movement  Scratchpad saves energy –Allocation burden now on software  Less complexity on hardware  Move only what you use –Uses temporal locality  Cache –Spatial locality can fail: Superfluous data movement (Spatial locality is built into cache design – note the 8- word linesize in most architectures) A BCDE Moved into Cache

11 PMaC Performance Modeling and Characterization Implication of Scratchpads  Current use: Embedded Systems –Smaller working set size –Predictable code  GPU's –Coding overhead  Issue: HPC codes –Large unpredictable codes –How to generalize codes? –How to make it practical and efficient

12 PMaC Performance Modeling and Characterization Outline  Motivation  Scratchpad Background  Simulation Framework and Methodology  Initial Study  Current Direction

13 PMaC Performance Modeling and Characterization Question 2 Are there computation patterns which get the most benefit from SPM?

14 PMaC Performance Modeling and Characterization Why idioms?  Pattern of computation/memory access  Characterize Application Data Movement  Metric to compare different scientific codes (good coverage)  Easy to port HPC Code

15 PMaC Performance Modeling and Characterization The Methodology 1.Idiom characterization study: idioms SPM vs. Cache favorability 2.Find idioms on HPC codes 3.Port SPM favorable idioms in HPC codes to scratchpad

16 PMaC Performance Modeling and Characterization Tool: PEBIL  Binary instrumentation tool –Executable Binary => Identify Basic Blocks => Cache Simulation  Cache Simulator built on top of PEBIL –User Defined Cache Structures –Profiles executables (hit/miss) Cache Block2 Executable Binary Stage 1 Stage 2 Block1 Block2 PEBIL Output Block 1 {#hits} {#misses} Block 2 {#hits} {#misses} ……. A op B A=b+3 ….. Block1

17 PMaC Performance Modeling and Characterization Simulation Environment TitleCache Size (KB) Cache Assoc. Cache Line Size (Bytes) SPM Size (KB) SPM Assoc. SPM Line Size (Bytes) Cache648 --- Scratchpad---64Full8 Hybrid3286432Full8

18 PMaC Performance Modeling and Characterization SPM Stage 1 Stage 2 Cache Block2 Block1 Executable Binary Block1 Block2 Block1 Cache/SPM only

19 PMaC Performance Modeling and Characterization SPMCache Block2 Block1 Executable Binary Stage 1 Stage 2 Block1 Block2 Hybrid Hybrid System

20 PMaC Performance Modeling and Characterization Tool: PIR (find Idioms in HPC)  Used for: automatically identifies idioms in large- scale HPC applications  Input: Idioms.txt –Idioms are defined using a pattern language  Output: –Idioms matched to source line number Loop1 Loop2 Transpose Gather

21 PMaC Performance Modeling and Characterization Outline  Motivation  Scratchpad Background  Simulation Framework and Methodology  Initial Study  Current Direction

22 PMaC Performance Modeling and Characterization Under the hood: HPC Results  Under the hood: HPC Results Fundamental question: Is there a benefit of SPM for HPC codes? –Simulate full apps on cache and SPM –Use simple heuristic to define the mappings –Simulate on hybrid  Pitfalls: –Sometime SPM moves more than cache: LRU

23 PMaC Performance Modeling and Characterization Metrics Data Movement Ratio (SPM Data Movement) (Cache Data Movement) Data Moved=(Cache Misses)*Cache Line Size

24 PMaC Performance Modeling and Characterization HPC Applications  Graph500 –Construct and traverse weighted undirected graph  HYCOM –Ocean model: hybrid isopycnal-sigma-pressure, generalized coordinate  SMG2000 –Parallel semi-coarsening Multi-grid Solver  Sequoia Benchmarks –SPHOT  Monte Carlo photon transport code –UMT  Unstructured-mesh deterministic radiation transport code –AMG2006  Algebraic mult-grid linear system solver for unstructured mesh

25 PMaC Performance Modeling and Characterization HPC Results

26 PMaC Performance Modeling and Characterization Question 1 Do HPC workloads benefit from software managed Scratchpads? YES!

27 PMaC Performance Modeling and Characterization Idiom Gather/Scatter

28 PMaC Performance Modeling and Characterization Using Methodology for HYCOM 1.Gather Idiom: Prefers SPM 2.Find gather in HYCOM: 33 instances 3.Port Idiom Blocks: Hybrid Structure –Port Gather Basic Blocks to SPM –Rest on Cache Result HYCOM (Ocean Modeling Code) Savings: 20% in data motion

29 PMaC Performance Modeling and Characterization Outline  Motivation  Scratchpad Background  Simulation Framework and Methodology  Initial Study  Current Direction

30 PMaC Performance Modeling and Characterization Real SPM for PEBIL?  Extension of PEBIL Simulator –Fully associative cache  Rethink replacement policy  Dynamic Allocation Scheme –Idioms determine loops for allocation –Reuse distance library  Track how often used  Track distance of use BCA Reuse Distance = 2 A

31 PMaC Performance Modeling and Characterization Results Summary  SPM –Simpler Hardware –Efficient Data Movement  Developed Methodology for SPM –Idiom characterization –Idiom identification in HPC codes –Port SPM hotspots –20% Data Movement Savings for HYCOM  Scratchpad shows potential –Good when spatial locality fails –HPC applications –SPM only: Average 22% Data Movement Saved –Hybrid: Average 39% Max 69% Data Movement Saved –4x Improvement for Gather idiom –Current work on creating SPM for PEBIL

32 PMaC Performance Modeling and Characterization Acknowledgements  Acknowledgements PMaC team –Laura Carrington –Ananta Tiwari –Michael Laurenzano –Pietro Cicotii –Mitesh Meswani  Dedicated to: Allan Snavely

33 PMaC Performance Modeling and Characterization EXTRA

34 PMaC Performance Modeling and Characterization Idioms: Strided Access i=i+stride

35 PMaC Performance Modeling and Characterization Looking Forward  Idiom Driven Allocation –PIR-determines loops for allocation  Pre-Allocated array for SPM –Pointers to loops: trigger replacement  Mimic Dynamic Compiler Replacement Policy


Download ppt "PMaC Performance Modeling and Characterization Efficient HPC Data Motion via Scratchpad Memory Kayla Seager, Ananta Tiwari, Michael Laurenzano, Joshua."

Similar presentations


Ads by Google