Download presentation
Presentation is loading. Please wait.
Published byOliver Merritt Modified over 9 years ago
1
Ioana Burcea * Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University # École Polytechnique Fédérale de Lausanne ASPLOS 13 March 4, 2008
2
Ioana Burcea Predictor Virtualization University of Toronto 2 Why Predictors? History Repeats Itself CPU Branch Prediction Prefetching Value Prediction Pointer Caching Cache Replacement Predictors Application footprints grow Predictors need to scale to remain effective
3
Ioana Burcea Predictor Virtualization University of Toronto 3 Extra Resources: CMPs With Large On-Chip Caches Main Memory D$I$ CPU D$I$ CPU D$I$ CPU D$I$ CPU L2 Cache 10’s – 100’s of MB
4
Ioana Burcea Predictor Virtualization University of Toronto 4 Predictor Virtualization Physical Memory Address Space D$I$ CPU D$I$ CPU D$I$ CPU D$I$ CPU L2 Cache
5
Ioana Burcea Predictor Virtualization University of Toronto 5 Predictor Virtualization (PV) Emulate large predictor tables Reduce predictor table dedicated resources
6
Ioana Burcea Predictor Virtualization University of Toronto 6 Research Contributions PV – metadata stored in conventional cache hierarchy Benefits Emulate larger tables → increased accuracy Less dedicated resources Why now? Large caches / CMPs / Need for larger predictors Will this work? Metadata locality → intrinsically exploited by caches First Step – Virtualized Data Prefetcher Performance: within 1% on average Space: 60KB down to < 1KB Advantages of virtualization
7
Ioana Burcea Predictor Virtualization University of Toronto 7 PV architecture PV in action Virtualized “Spatial Memory Streaming” [ISCA 06]* Conclusions *[ISCA 06] S. Somogyi, T. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. “Spatial Memory Streaming” Talk Road Map
8
Ioana Burcea Predictor Virtualization University of Toronto 8 PV architecture PV in action Virtualized “Spatial Memory Streaming” [ISCA 06]* Conclusions *[ISCA 06] S. Somogyi, T. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. “Spatial Memory Streaming” Talk Road Map
9
Ioana Burcea Predictor Virtualization University of Toronto 9 PV Architecture Virtualize requestprediction D$I$ CPU L2 Cache Main Memory Predictor Table Optimization Engine
10
Ioana Burcea Predictor Virtualization University of Toronto 10 PV Architecture requestprediction D$I$ CPU L2 Cache index PVCache PVProxy Physical Memory Address Space PVTable Optimization Engine PVStart
11
Ioana Burcea Predictor Virtualization University of Toronto 11 PV: Variable Prediction Latency requestprediction D$I$ CPU L2 Cache index PVCache PVProxy Physical Memory Address Space PVTable Optimization Engine PVStart Common Case Infrequent Rare
12
Ioana Burcea Predictor Virtualization University of Toronto 12 Metadata Locality Entry reuse Temporal One entry used for multiple predictions Spatial – can be engineered One miss overcome by several subsequent hits Metadata access pattern predictability Predictor metadata prefetching
13
Ioana Burcea Predictor Virtualization University of Toronto 13 PV architecture PV in action Virtualized “Spatial Memory Streaming” [ISCA 06]* Conclusions *[ISCA 06] S. Somogyi, T. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. “Spatial Memory Streaming” Talk Road Map
14
Ioana Burcea Predictor Virtualization University of Toronto 14 Spatial Memory Streaming [ISCA 06] Memory spatial patterns 1100000001101… 1100001010001… Spatial patterns stored in a pattern history table (PHT) *[ISCA 06] S. Somogyi, T. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. “Spatial Memory Streaming”
15
Ioana Burcea Predictor Virtualization University of Toronto 15 data access stream Virtualizing “Spatial Memory Streaming” (SMS) DetectorPredictor patterns prefetches trigger access Virtualize ~1KB~60 KB
16
Ioana Burcea Predictor Virtualization University of Toronto 16 8 sets Virtualizing SMS Virtual Table 1K sets 11 ways PVCache 11 ways tagpatterntag pattern unused 11 bits32 bits 39 bits Set entries → cache block – 64 bytes
17
Ioana Burcea Predictor Virtualization University of Toronto 17 Current Implementation Non-Intrusive Virtual table stored in reserved physical address space One table per core Caches oblivious to metadata Options Predictor tables stored in virtual memory Single, shared table per application Caches aware of metadata
18
Ioana Burcea Predictor Virtualization University of Toronto 18 Simulation Infrastructure SimFlex Full-system simulator based on Simics Base processor configuration 4-core CMP 8-wide OoO 256-entry ROB L1D/L1I 64KB 4-way set-associative UL2 8MB 16-way set-associative Commercial workloads TPC-C: DB2 and Oracle TPC-H: Query 1, Query 2, Query 16, Query 17 SpecWeb: Apache and Zeus
19
Ioana Burcea Predictor Virtualization University of Toronto 19 better Original Prefetcher – Accuracy vs. Predictor Size L1 Read Misses
20
Ioana Burcea Predictor Virtualization University of Toronto 20 better Original Prefetcher – Accuracy vs. Predictor Size L1 Read Misses
21
Ioana Burcea Predictor Virtualization University of Toronto 21 better Original Prefetcher – Accuracy vs. Predictor Size L1 Read Misses
22
Ioana Burcea Predictor Virtualization University of Toronto 22 Original Prefetcher – Accuracy vs. Predictor Size Small Tables Diminish Prefetching Accuracy better L1 Read Misses
23
Ioana Burcea Predictor Virtualization University of Toronto 23 Virtualized Prefetcher - Performance Speedup Original Prefetcher ~60KB Virtualized Prefetcher < 1KB better Hardware Cost
24
Ioana Burcea Predictor Virtualization University of Toronto 24 Impact on L2 Memory Requests Dark Side: Increased L2 Memory Requests better L2 Memory Requests Increase
25
Ioana Burcea Predictor Virtualization University of Toronto 25 Impact of Virtualization on Off-Chip Bandwidth Minimal Impact on Off-Chip Bandwidth better Off-Chip Bandwidth Increase Indirect impact on performance Direct impact on performance
26
Ioana Burcea Predictor Virtualization University of Toronto 26 Conclusions Predictor Virtualization Metadata stored in conventional cache hierarchy Benefits Emulate larger tables → increased accuracy Less dedicated resources First Step – Virtualized Data Prefetcher Performance: within 1% on average Space: 60KB down to < 1KB Opportunities Metadata sharing and persistence Application directed prediction Predictor adaptation
27
Ioana Burcea * ioana@eecg.toronto.edu Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University # École Polytechnique Fédérale de Lausanne ASPLOS 13 March 4, 2008
28
Ioana Burcea * ioana@eecg.toronto.edu Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University # École Polytechnique Fédérale de Lausanne ASPLOS 13 March 4, 2008
29
Ioana Burcea * ioana@eecg.toronto.edu Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University # École Polytechnique Fédérale de Lausanne ASPLOS 13 March 4, 2008
30
Ioana Burcea Predictor Virtualization University of Toronto 30 PV – Motivating Trends Dedicating resources to predictors hard to justify Larger predictor tables Increased performance Chip multiprocessors Space dedicated to predictors ↔ # processors Memory hierarchies offer the opportunity Increased capacity Diminishing returns Use conventional memory hierarchies to store predictor metadata
31
Ioana Burcea Predictor Virtualization University of Toronto 31 Virtualizing the Predictor Table Pattern History Table Tag PatternTagPattern … … … 1 0 1 1 1 0 1 0 1 0 1 1 0 0 1 1 PC Trigger Access Address Tag index Pattern Prefetch Virtualize PHT stored in physical address space Multiple PHT entries packed in one memory block one memory request brings an entire table set
32
Ioana Burcea Predictor Virtualization University of Toronto 32 Packing Entries in One Cache Block Index: PC + offset within spatial group PC →16 bits 32 blocks in a spatial group → 5 bit offset → 32 bit spatial pattern Pattern table: 1K sets 10 bits to index the table → 11 bit tag Cache block: 64 bytes 11 entries per cache block → Pattern table 1K sets – 11-way set associative 21 bit index tagpatterntag pattern 0114354 85 unused
33
Ioana Burcea Predictor Virtualization University of Toronto 33 Memory Address Calculation + 000000 16 bits5 bits 10 bits PV Start Address Block offset Memory Address PC tag
34
Ioana Burcea Predictor Virtualization University of Toronto 34 Increase in Off-Chip Bandwidth – different L2 sizes Off-Chip Bandwidth Increase
35
Ioana Burcea Predictor Virtualization University of Toronto 35 Increased L2 Latency Speedup
36
Ioana Burcea Predictor Virtualization University of Toronto 36 Conclusions PV – metadata stored in conventional cache hierarchy Benefits Less dedicated resources Emulate larger tables → increased accuracy Example – Virtualized Data Prefetcher Performance: within 1% on average Space: 60KB down to < 1KB Why now? Large caches / CMPs / Need for larger predictors Will this work? Metadata locality → intrinsically exploited by caches Metadata access pattern predictability Opportunities Metadata sharing and persistence Application directed prediction Predictor adaptation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.