Shared Last-Level TLBs for Chip Multiprocessors Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011 Presented by: Apostolos Kotsiolis CS.

Shared Last-Level TLBs for Chip Multiprocessors Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011 Presented by: Apostolos Kotsiolis CS 7123 – Research Seminar

Translation Lookaside Buffer

Contribution SLL TLB design explored for the first time Analyze SLL TLB benefits for parallel programs Analyze multi-programmed fashion workloads consisting of sequential applications

Previous and Related work Private Multilevel TLB Hierarchies ◦ Intel i7, AMD K7-K8-K10, SPARC64-III ◦ No Sharing between cores ◦ Waste of resources Inter-Core Cooperative Prefetching ◦ Two types of predictable misses: ◦ Inter-Core Shared (ICS)  Leader-Follower Prefetching ◦ Inter-Core Predictable Stride (ICPS)  Distance-Based Cross-Core Prefetching

Shared Last-Level TLBs Exploit inter-core sharing in parallel programs Flexible regarding where entries can be placed Both parallel and sequential workloads are benefited Greater Hit rate CPU Performance boosted

Shared Last-Level TLBs

Shared Last-Level TLBs with simple Stride Prefetching

Methodology ◦ Parallel applications ◦ Different Sequential application on each core Two distinct evaluation sets

Methodology Benchmarks

SLL TLBs: Parallel Workload Results SLL TLBs versus Private L2 TLBs

SLL TLBs: Parallel Workload Results SLL TLBs versus ICC Prefetching

SLL TLBs: Parallel Workload Results SLL TLBs with Simple Stride Prefetching

SLL TLBs: Parallel Workload Results SLL TLBs at Higher Core Counts

SLL TLBs: Parallel Workload Results Performance Analysis

SLL TLBs: Multiprogrammed Workload Results Multiprogrammed Workloads with One Application Pinned per Core

SLL TLBs: Multiprogrammed Workload Results Performance Analysis

Conclusion-Benefits: On Parallel Workloads: ◦ Elimination of 7-79% of L1 TLBs misses exploiting parallel program inter-core sharing ◦ Outperform conventional per-core private L2 TLBs by average of 27% ◦ Improve CPI up to 0.25 On multiprogrammed sequential workloads: ◦ Improve over private L2 TLBs by average of 21% ◦ Improve CPI up to 0.4

Thank You! Questions??

Shared Last-Level TLBs for Chip Multiprocessors Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011 Presented by: Apostolos Kotsiolis CS.

Similar presentations

Presentation on theme: "Shared Last-Level TLBs for Chip Multiprocessors Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011 Presented by: Apostolos Kotsiolis CS."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Shared Last-Level TLBs for Chip Multiprocessors Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011 Presented by: Apostolos Kotsiolis CS.

Similar presentations

Presentation on theme: "Shared Last-Level TLBs for Chip Multiprocessors Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011 Presented by: Apostolos Kotsiolis CS."— Presentation transcript:

Similar presentations

About project

Feedback