Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ioana Burcea * Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University.

Similar presentations


Presentation on theme: "Ioana Burcea * Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University."— Presentation transcript:

1 Ioana Burcea * Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University # École Polytechnique Fédérale de Lausanne ASPLOS 13 March 4, 2008

2 Ioana Burcea Predictor Virtualization University of Toronto 2 Why Predictors? History Repeats Itself CPU Branch Prediction Prefetching Value Prediction Pointer Caching Cache Replacement Predictors  Application footprints grow  Predictors need to scale to remain effective

3 Ioana Burcea Predictor Virtualization University of Toronto 3 Extra Resources: CMPs With Large On-Chip Caches Main Memory D$I$ CPU D$I$ CPU D$I$ CPU D$I$ CPU L2 Cache 10’s – 100’s of MB

4 Ioana Burcea Predictor Virtualization University of Toronto 4 Predictor Virtualization Physical Memory Address Space D$I$ CPU D$I$ CPU D$I$ CPU D$I$ CPU L2 Cache

5 Ioana Burcea Predictor Virtualization University of Toronto 5 Predictor Virtualization (PV)  Emulate large predictor tables  Reduce predictor table dedicated resources

6 Ioana Burcea Predictor Virtualization University of Toronto 6 Research Contributions  PV – metadata stored in conventional cache hierarchy  Benefits  Emulate larger tables → increased accuracy  Less dedicated resources  Why now?  Large caches / CMPs / Need for larger predictors  Will this work?  Metadata locality → intrinsically exploited by caches  First Step – Virtualized Data Prefetcher  Performance: within 1% on average  Space: 60KB down to < 1KB  Advantages of virtualization

7 Ioana Burcea Predictor Virtualization University of Toronto 7  PV architecture  PV in action  Virtualized “Spatial Memory Streaming” [ISCA 06]*  Conclusions *[ISCA 06] S. Somogyi, T. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. “Spatial Memory Streaming” Talk Road Map

8 Ioana Burcea Predictor Virtualization University of Toronto 8  PV architecture  PV in action  Virtualized “Spatial Memory Streaming” [ISCA 06]*  Conclusions *[ISCA 06] S. Somogyi, T. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. “Spatial Memory Streaming” Talk Road Map

9 Ioana Burcea Predictor Virtualization University of Toronto 9 PV Architecture Virtualize requestprediction D$I$ CPU L2 Cache Main Memory Predictor Table Optimization Engine

10 Ioana Burcea Predictor Virtualization University of Toronto 10 PV Architecture requestprediction D$I$ CPU L2 Cache index PVCache PVProxy Physical Memory Address Space PVTable Optimization Engine PVStart

11 Ioana Burcea Predictor Virtualization University of Toronto 11 PV: Variable Prediction Latency requestprediction D$I$ CPU L2 Cache index PVCache PVProxy Physical Memory Address Space PVTable Optimization Engine PVStart Common Case Infrequent Rare

12 Ioana Burcea Predictor Virtualization University of Toronto 12 Metadata Locality  Entry reuse  Temporal  One entry used for multiple predictions  Spatial – can be engineered  One miss overcome by several subsequent hits  Metadata access pattern predictability  Predictor metadata prefetching

13 Ioana Burcea Predictor Virtualization University of Toronto 13  PV architecture  PV in action  Virtualized “Spatial Memory Streaming” [ISCA 06]*  Conclusions *[ISCA 06] S. Somogyi, T. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. “Spatial Memory Streaming” Talk Road Map

14 Ioana Burcea Predictor Virtualization University of Toronto 14 Spatial Memory Streaming [ISCA 06] Memory spatial patterns 1100000001101… 1100001010001… Spatial patterns stored in a pattern history table (PHT) *[ISCA 06] S. Somogyi, T. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. “Spatial Memory Streaming”

15 Ioana Burcea Predictor Virtualization University of Toronto 15 data access stream Virtualizing “Spatial Memory Streaming” (SMS) DetectorPredictor patterns prefetches trigger access Virtualize ~1KB~60 KB

16 Ioana Burcea Predictor Virtualization University of Toronto 16 8 sets Virtualizing SMS Virtual Table 1K sets 11 ways PVCache 11 ways tagpatterntag pattern unused 11 bits32 bits 39 bits Set entries → cache block – 64 bytes

17 Ioana Burcea Predictor Virtualization University of Toronto 17 Current Implementation  Non-Intrusive  Virtual table stored in reserved physical address space  One table per core  Caches oblivious to metadata  Options  Predictor tables stored in virtual memory  Single, shared table per application  Caches aware of metadata

18 Ioana Burcea Predictor Virtualization University of Toronto 18 Simulation Infrastructure  SimFlex  Full-system simulator based on Simics  Base processor configuration  4-core CMP  8-wide OoO  256-entry ROB  L1D/L1I 64KB 4-way set-associative  UL2 8MB 16-way set-associative  Commercial workloads  TPC-C: DB2 and Oracle  TPC-H: Query 1, Query 2, Query 16, Query 17  SpecWeb: Apache and Zeus

19 Ioana Burcea Predictor Virtualization University of Toronto 19 better Original Prefetcher – Accuracy vs. Predictor Size L1 Read Misses

20 Ioana Burcea Predictor Virtualization University of Toronto 20 better Original Prefetcher – Accuracy vs. Predictor Size L1 Read Misses

21 Ioana Burcea Predictor Virtualization University of Toronto 21 better Original Prefetcher – Accuracy vs. Predictor Size L1 Read Misses

22 Ioana Burcea Predictor Virtualization University of Toronto 22 Original Prefetcher – Accuracy vs. Predictor Size Small Tables Diminish Prefetching Accuracy better L1 Read Misses

23 Ioana Burcea Predictor Virtualization University of Toronto 23 Virtualized Prefetcher - Performance Speedup Original Prefetcher ~60KB Virtualized Prefetcher < 1KB better Hardware Cost

24 Ioana Burcea Predictor Virtualization University of Toronto 24 Impact on L2 Memory Requests Dark Side: Increased L2 Memory Requests better L2 Memory Requests Increase

25 Ioana Burcea Predictor Virtualization University of Toronto 25 Impact of Virtualization on Off-Chip Bandwidth Minimal Impact on Off-Chip Bandwidth better Off-Chip Bandwidth Increase Indirect impact on performance Direct impact on performance

26 Ioana Burcea Predictor Virtualization University of Toronto 26 Conclusions  Predictor Virtualization  Metadata stored in conventional cache hierarchy  Benefits  Emulate larger tables → increased accuracy  Less dedicated resources  First Step – Virtualized Data Prefetcher  Performance: within 1% on average  Space: 60KB down to < 1KB  Opportunities  Metadata sharing and persistence  Application directed prediction  Predictor adaptation

27 Ioana Burcea * ioana@eecg.toronto.edu Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University # École Polytechnique Fédérale de Lausanne ASPLOS 13 March 4, 2008

28 Ioana Burcea * ioana@eecg.toronto.edu Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University # École Polytechnique Fédérale de Lausanne ASPLOS 13 March 4, 2008

29 Ioana Burcea * ioana@eecg.toronto.edu Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University # École Polytechnique Fédérale de Lausanne ASPLOS 13 March 4, 2008

30 Ioana Burcea Predictor Virtualization University of Toronto 30 PV – Motivating Trends  Dedicating resources to predictors hard to justify  Larger predictor tables  Increased performance  Chip multiprocessors  Space dedicated to predictors ↔ # processors  Memory hierarchies offer the opportunity  Increased capacity  Diminishing returns Use conventional memory hierarchies to store predictor metadata

31 Ioana Burcea Predictor Virtualization University of Toronto 31 Virtualizing the Predictor Table Pattern History Table Tag PatternTagPattern … … … 1 0 1 1 1 0 1 0 1 0 1 1 0 0 1 1 PC Trigger Access Address Tag index Pattern Prefetch Virtualize  PHT stored in physical address space  Multiple PHT entries packed in one memory block  one memory request brings an entire table set

32 Ioana Burcea Predictor Virtualization University of Toronto 32 Packing Entries in One Cache Block  Index: PC + offset within spatial group  PC →16 bits  32 blocks in a spatial group → 5 bit offset → 32 bit spatial pattern  Pattern table: 1K sets  10 bits to index the table → 11 bit tag  Cache block: 64 bytes  11 entries per cache block → Pattern table 1K sets – 11-way set associative 21 bit index tagpatterntag pattern 0114354 85 unused

33 Ioana Burcea Predictor Virtualization University of Toronto 33 Memory Address Calculation + 000000 16 bits5 bits 10 bits PV Start Address Block offset Memory Address PC tag

34 Ioana Burcea Predictor Virtualization University of Toronto 34 Increase in Off-Chip Bandwidth – different L2 sizes Off-Chip Bandwidth Increase

35 Ioana Burcea Predictor Virtualization University of Toronto 35 Increased L2 Latency Speedup

36 Ioana Burcea Predictor Virtualization University of Toronto 36 Conclusions  PV – metadata stored in conventional cache hierarchy  Benefits  Less dedicated resources  Emulate larger tables → increased accuracy  Example – Virtualized Data Prefetcher  Performance: within 1% on average  Space: 60KB down to < 1KB  Why now?  Large caches / CMPs / Need for larger predictors  Will this work?  Metadata locality → intrinsically exploited by caches  Metadata access pattern predictability  Opportunities  Metadata sharing and persistence  Application directed prediction  Predictor adaptation


Download ppt "Ioana Burcea * Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University."

Similar presentations


Ads by Google