Download presentation
Presentation is loading. Please wait.
1
Avoiding Initialization Misses to the Heap
Jarrod Lewis, Bryan Black, and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison Intel Labs
2
Motivation Memory bandwidth is expensive
Shouldn’t waste on useless traffic Can be put to better use Multithreading, prefetching, MLP, etc. Search and destroy useless traffic Focus of this talk: heap initialization Detect and optimize initialization of newly allocated memory 23% of misses in 2MB cache are invalid April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
3
Dynamically Allocated Memory
Invalid Unallocated Invalid Heap Space malloc() free() initializing store free() Allocated Valid load or store Invalid memory need not be transferred Provide interface that expresses this directly? April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
4
Avoiding Initialization Misses to the Heap – Mikko Lipasti
Talk Outline Motivation Analysis of Heap Behavior Detecting Initializing Writes Performance Analysis Conclusions April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
5
Avoiding Initialization Misses to the Heap – Mikko Lipasti
Allocation Analysis Two main modes Single dominant allocation (up to 100MB) or Numerous moderate allocations Initialization of allocations 88% initialized with store miss Little temporal reuse of free’d memory Phase behavior Start of program often dominates Even SPEC has counterexamples (gcc, vortex) April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
6
Avoiding Initialization Misses to the Heap – Mikko Lipasti
Cache Miss Behavior Init stores cause up to 60% of misses (avg 23%) These are 35% of all compulsory misses April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
7
Avoiding Initialization Misses to the Heap – Mikko Lipasti
Talk Outline Motivation Analysis of Heap Behavior Detecting Initializing Writes Performance Analysis Conclusions April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
8
Detecting Initializing Writes
Annotate malloc() Record base, size in allocation range cache Key questions What is working set? How are ranges represented? Valid bits? Not scalable for 100M allocation Base + bound How are ranges updated on writes? Split vs. truncate April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
9
Allocation Working Set
4-8 entries sufficient, except parser needs 64 April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
10
Sequential Initialization
Tracking Allocated-Invalid Initialized Pattern Scheme Unknown 1. Sequential 1. Forward Sweep A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F Forward sweep captures 90%+ except Bzip, gzip, perl April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
11
Alternating Initialization
Tracking Allocated-Invalid Initialized Pattern Scheme Unknown 2. Alternating 2. Bidirectional Sweep A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F Bidirectional captures 90%+ of perl Doesn’t help bzip or gzip April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
12
Striding Initialization
Tracking Allocated-Invalid Initialized Pattern Scheme Unknown 3. Striding 3. Interleaving A B C D E F A C E B D F A B C D E F A C E B D F A B C D E F A C E B D F A B C D E F A C E B D F Interleaving captures 90%+ of gzip Still only 60% of bzip Bzip has a large allocation with random initialization April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
13
Avoiding Initialization Misses to the Heap – Mikko Lipasti
Talk Outline Motivation Analysis of Heap Behavior Detecting Initializing Writes Performance Analysis Conclusions April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
14
Avoiding Initialization Misses to the Heap – Mikko Lipasti
PharmSim Overview PharmSim -OOO Core -Gigaplane Block Simple SimOS-PPC -AIX 4.3.1 -Disk driver -E’net driver Ethernet Device simulation, etc. from SimOS-PPC [IBM ARL] PharmSim replaces functional simulators Full OOO core model, values in rename registers Supports priv. mode, MMU, TLB, exceptions, interrupts, barriers, flushes, etc. Lead developer: Trey Cain (thanks Trey!) April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
15
Operating System Effects
Widely accepted for SPECINT: Safe to ignore O/S paths Most popular tool (Simplescalar) Intercepts system calls Emulates on host, updates “flat” memory Returns “magically” with cache contents intact We have found that [CAECW2002]: Omitting system references leads to dramatic error (5.8x L2 miss rate, 100% IPC in worst case) Specifically, AIX page fault handler eliminates many initializing write misses Had we not used PHARMsim? Dramatically overstated performance benefit April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
16
Avoiding Initialization Misses to the Heap – Mikko Lipasti
AIX Page Installation Heap manager calls sbrk Malloc returns block < 4KB Program writes to block First reference causes page fault Heap manager calls sbrk Malloc returns block < 4KB Program writes to block Heap manager calls sbrk Malloc returns block < 4KB Program writes to block First reference causes page fault AIX installs entire page using dcbz Heap manager calls sbrk Heap manager calls sbrk Malloc returns block < 4KB Unallocated Unallocated Allocated Valid Data segment April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
17
Block vs. Page Installation
Practically free as part of page fault Shortcomings of page installation Pollutes cache Not scalable to superpages (AIX v5.1) Does not work for heap reuse Our short simulations don’t show this benefit I.e. high overlap between initializing writes and first reference to extended data segment April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
18
Avoiding Initialization Misses to the Heap – Mikko Lipasti
Integrating ARC April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
19
Avoiding Initialization Misses to the Heap – Mikko Lipasti
Speedup Very aggressive core model Still can’t tolerate all store miss latency Block mode slightly better than page mode Cache pollution, less coverage April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
20
Program Phase Behavior
Only benefits initialization program phase Some programs initialize throughout execution April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
21
Avoiding Initialization Misses to the Heap – Mikko Lipasti
Conclusions Initializing writes Cause 23% of all misses in 2MB L2 Avoid miss with block or page mode install Up to 41% performance improvement Subject to initialization:computation ratio Tracking allocation ranges Working set very small (4-8, 64) Forward/bidirectional/interleaved sweep enables range truncation April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
22
Avoiding Initialization Misses to the Heap – Mikko Lipasti
Acknowledgments Originated as course project: Gordie Bell, Trey Cain, Kevin Lepak PHARMsim infrastructure Lead developer: Trey Cain Financial and equipment support IBM and Intel Corp National Science Foundation University of Wisconsin April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
23
Avoiding Initialization Misses to the Heap – Mikko Lipasti
Questions? April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
24
Avoiding Initialization Misses to the Heap – Mikko Lipasti
Backup Slides April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
25
Invalid Memory Traffic
Real data traffic that transfers invalid data Initializing Store Initial write to a storage location that contains invalid data April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
26
Avoiding Initialization Misses to the Heap – Mikko Lipasti
Allocation Analysis Single dominant allocation vs. Numerous moderate allocations April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
27
Initialization of Heap
88% initialized by store miss Relatively little temporal reuse of freed memory April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
28
Avoiding Initialization Misses to the Heap – Mikko Lipasti
PharmSim Pipeline Decode Execute Commit Mem Fetch Translate Substantially similar to IBM Power4 Some instructions “cracked” (1:2 expansion) Others (e.g. lmw) microcode stream Mem Stage Interface to 2-level cache model Sun Gigaplane XB snoopy MP coherence Caches contain values, must remain coherent No cheating! No “flat” memory model for reference/redirect April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
29
Avoiding Initialization Misses to the Heap – Mikko Lipasti
Machine Model Unrealistically aggressive model to devalue the impact of store misses. 8-wide, 6-stage pipeline 8K entry combining predictor 128 RUU, 64 LSQ entries, 64 write buffers 256KB 4-way associative L1D cache 64KB 2-way associative L1I 2MB 4-way associative L2 unified cache All cache blocks are 64 bytes L2 latency is 10 cycles Memory latency is 70 cycles. April 6, 2019 Avoiding Initialization Misses to the Heap – Mikko Lipasti
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.