Presentation is loading. Please wait.

Presentation is loading. Please wait.

Shared Last-Level TLBs for Chip Multiprocessors Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011 Presented by: Apostolos Kotsiolis CS.

Similar presentations


Presentation on theme: "Shared Last-Level TLBs for Chip Multiprocessors Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011 Presented by: Apostolos Kotsiolis CS."— Presentation transcript:

1 Shared Last-Level TLBs for Chip Multiprocessors Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011 Presented by: Apostolos Kotsiolis CS 7123 – Research Seminar

2 Translation Lookaside Buffer

3 Contribution SLL TLB design explored for the first time Analyze SLL TLB benefits for parallel programs Analyze multi-programmed fashion workloads consisting of sequential applications

4 Previous and Related work Private Multilevel TLB Hierarchies ◦ Intel i7, AMD K7-K8-K10, SPARC64-III ◦ No Sharing between cores ◦ Waste of resources Inter-Core Cooperative Prefetching ◦ Two types of predictable misses: ◦ Inter-Core Shared (ICS)  Leader-Follower Prefetching ◦ Inter-Core Predictable Stride (ICPS)  Distance-Based Cross-Core Prefetching

5 Shared Last-Level TLBs Exploit inter-core sharing in parallel programs Flexible regarding where entries can be placed Both parallel and sequential workloads are benefited Greater Hit rate CPU Performance boosted

6 Shared Last-Level TLBs

7 Shared Last-Level TLBs with simple Stride Prefetching

8 Methodology ◦ Parallel applications ◦ Different Sequential application on each core Two distinct evaluation sets

9 Methodology Benchmarks

10 SLL TLBs: Parallel Workload Results SLL TLBs versus Private L2 TLBs

11 SLL TLBs: Parallel Workload Results SLL TLBs versus ICC Prefetching

12 SLL TLBs: Parallel Workload Results SLL TLBs versus ICC Prefetching

13 SLL TLBs: Parallel Workload Results SLL TLBs with Simple Stride Prefetching

14 SLL TLBs: Parallel Workload Results SLL TLBs at Higher Core Counts

15 SLL TLBs: Parallel Workload Results Performance Analysis

16 SLL TLBs: Multiprogrammed Workload Results Multiprogrammed Workloads with One Application Pinned per Core

17 SLL TLBs: Multiprogrammed Workload Results Performance Analysis

18 Conclusion-Benefits: On Parallel Workloads: ◦ Elimination of 7-79% of L1 TLBs misses exploiting parallel program inter-core sharing ◦ Outperform conventional per-core private L2 TLBs by average of 27% ◦ Improve CPI up to 0.25 On multiprogrammed sequential workloads: ◦ Improve over private L2 TLBs by average of 21% ◦ Improve CPI up to 0.4

19 Thank You! Questions??


Download ppt "Shared Last-Level TLBs for Chip Multiprocessors Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011 Presented by: Apostolos Kotsiolis CS."

Similar presentations


Ads by Google