Download presentation
Presentation is loading. Please wait.
Published byMiles Lester Modified over 9 years ago
1
PMaC Performance Modeling and Characterization Efficient HPC Data Motion via Scratchpad Memory Kayla Seager, Ananta Tiwari, Michael Laurenzano, Joshua Peraza, Pietro Cicotti, Laura Carrington
2
PMaC Performance Modeling and Characterization Question 1 Do HPC workloads benefit from software managed Scratchpads? YES! If, so how will we manage it?
3
PMaC Performance Modeling and Characterization Outline Motivation Scratchpad Background Simulation Framework and Methodology Initial Study Current Direction
4
PMaC Performance Modeling and Characterization Outline Motivation Scratchpad Background Simulation Framework and Methodology Initial Study Current Direction
5
PMaC Performance Modeling and Characterization Problem: HPC Powerwall Can't scale old systems –Powerwall already reached by petaflop systems –Must redesign for power savings Efficiency must increase by 2x Source: Exascale Report (Kogge, 2008)
6
PMaC Performance Modeling and Characterization How to get Energy Savings 1.Redesign Hardware –Simpler hardware –Transfer complexity to software 2.Minimize expensive data movement –Memory slower –More cores=more contention –HPC codes have large working set sizes
7
PMaC Performance Modeling and Characterization Outline Motivation Scratchpad Background Simulation Framework and Methodology Initial Study Current Direction
8
PMaC Performance Modeling and Characterization What is a Scratchpad? Scratchpad (SPM)? –Local memory (like a cache) –SPM: software allocated memory Simpler Hardware VS Memory Array Decoder Tagging Array Decoder
9
PMaC Performance Modeling and Characterization Scratchpad Allocation Dynamic –Move block of code –Iterate over code –Move another block Static: Move block of code once Strategies –Knapsack –Graph Coloring register allocation problem
10
PMaC Performance Modeling and Characterization The Idea: Less Data Movement Scratchpad saves energy –Allocation burden now on software Less complexity on hardware Move only what you use –Uses temporal locality Cache –Spatial locality can fail: Superfluous data movement (Spatial locality is built into cache design – note the 8- word linesize in most architectures) A BCDE Moved into Cache
11
PMaC Performance Modeling and Characterization Implication of Scratchpads Current use: Embedded Systems –Smaller working set size –Predictable code GPU's –Coding overhead Issue: HPC codes –Large unpredictable codes –How to generalize codes? –How to make it practical and efficient
12
PMaC Performance Modeling and Characterization Outline Motivation Scratchpad Background Simulation Framework and Methodology Initial Study Current Direction
13
PMaC Performance Modeling and Characterization Question 2 Are there computation patterns which get the most benefit from SPM?
14
PMaC Performance Modeling and Characterization Why idioms? Pattern of computation/memory access Characterize Application Data Movement Metric to compare different scientific codes (good coverage) Easy to port HPC Code
15
PMaC Performance Modeling and Characterization The Methodology 1.Idiom characterization study: idioms SPM vs. Cache favorability 2.Find idioms on HPC codes 3.Port SPM favorable idioms in HPC codes to scratchpad
16
PMaC Performance Modeling and Characterization Tool: PEBIL Binary instrumentation tool –Executable Binary => Identify Basic Blocks => Cache Simulation Cache Simulator built on top of PEBIL –User Defined Cache Structures –Profiles executables (hit/miss) Cache Block2 Executable Binary Stage 1 Stage 2 Block1 Block2 PEBIL Output Block 1 {#hits} {#misses} Block 2 {#hits} {#misses} ……. A op B A=b+3 ….. Block1
17
PMaC Performance Modeling and Characterization Simulation Environment TitleCache Size (KB) Cache Assoc. Cache Line Size (Bytes) SPM Size (KB) SPM Assoc. SPM Line Size (Bytes) Cache648 --- Scratchpad---64Full8 Hybrid3286432Full8
18
PMaC Performance Modeling and Characterization SPM Stage 1 Stage 2 Cache Block2 Block1 Executable Binary Block1 Block2 Block1 Cache/SPM only
19
PMaC Performance Modeling and Characterization SPMCache Block2 Block1 Executable Binary Stage 1 Stage 2 Block1 Block2 Hybrid Hybrid System
20
PMaC Performance Modeling and Characterization Tool: PIR (find Idioms in HPC) Used for: automatically identifies idioms in large- scale HPC applications Input: Idioms.txt –Idioms are defined using a pattern language Output: –Idioms matched to source line number Loop1 Loop2 Transpose Gather
21
PMaC Performance Modeling and Characterization Outline Motivation Scratchpad Background Simulation Framework and Methodology Initial Study Current Direction
22
PMaC Performance Modeling and Characterization Under the hood: HPC Results Under the hood: HPC Results Fundamental question: Is there a benefit of SPM for HPC codes? –Simulate full apps on cache and SPM –Use simple heuristic to define the mappings –Simulate on hybrid Pitfalls: –Sometime SPM moves more than cache: LRU
23
PMaC Performance Modeling and Characterization Metrics Data Movement Ratio (SPM Data Movement) (Cache Data Movement) Data Moved=(Cache Misses)*Cache Line Size
24
PMaC Performance Modeling and Characterization HPC Applications Graph500 –Construct and traverse weighted undirected graph HYCOM –Ocean model: hybrid isopycnal-sigma-pressure, generalized coordinate SMG2000 –Parallel semi-coarsening Multi-grid Solver Sequoia Benchmarks –SPHOT Monte Carlo photon transport code –UMT Unstructured-mesh deterministic radiation transport code –AMG2006 Algebraic mult-grid linear system solver for unstructured mesh
25
PMaC Performance Modeling and Characterization HPC Results
26
PMaC Performance Modeling and Characterization Question 1 Do HPC workloads benefit from software managed Scratchpads? YES!
27
PMaC Performance Modeling and Characterization Idiom Gather/Scatter
28
PMaC Performance Modeling and Characterization Using Methodology for HYCOM 1.Gather Idiom: Prefers SPM 2.Find gather in HYCOM: 33 instances 3.Port Idiom Blocks: Hybrid Structure –Port Gather Basic Blocks to SPM –Rest on Cache Result HYCOM (Ocean Modeling Code) Savings: 20% in data motion
29
PMaC Performance Modeling and Characterization Outline Motivation Scratchpad Background Simulation Framework and Methodology Initial Study Current Direction
30
PMaC Performance Modeling and Characterization Real SPM for PEBIL? Extension of PEBIL Simulator –Fully associative cache Rethink replacement policy Dynamic Allocation Scheme –Idioms determine loops for allocation –Reuse distance library Track how often used Track distance of use BCA Reuse Distance = 2 A
31
PMaC Performance Modeling and Characterization Results Summary SPM –Simpler Hardware –Efficient Data Movement Developed Methodology for SPM –Idiom characterization –Idiom identification in HPC codes –Port SPM hotspots –20% Data Movement Savings for HYCOM Scratchpad shows potential –Good when spatial locality fails –HPC applications –SPM only: Average 22% Data Movement Saved –Hybrid: Average 39% Max 69% Data Movement Saved –4x Improvement for Gather idiom –Current work on creating SPM for PEBIL
32
PMaC Performance Modeling and Characterization Acknowledgements Acknowledgements PMaC team –Laura Carrington –Ananta Tiwari –Michael Laurenzano –Pietro Cicotii –Mitesh Meswani Dedicated to: Allan Snavely
33
PMaC Performance Modeling and Characterization EXTRA
34
PMaC Performance Modeling and Characterization Idioms: Strided Access i=i+stride
35
PMaC Performance Modeling and Characterization Looking Forward Idiom Driven Allocation –PIR-determines loops for allocation Pre-Allocated array for SPM –Pointers to loops: trigger replacement Mimic Dynamic Compiler Replacement Policy
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.