Download presentation
Presentation is loading. Please wait.
Published byDenis Reeves Modified over 9 years ago
1
Shared Last-Level TLBs for Chip Multiprocessors Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011 Presented by: Apostolos Kotsiolis CS 7123 – Research Seminar
2
Translation Lookaside Buffer
3
Contribution SLL TLB design explored for the first time Analyze SLL TLB benefits for parallel programs Analyze multi-programmed fashion workloads consisting of sequential applications
4
Previous and Related work Private Multilevel TLB Hierarchies ◦ Intel i7, AMD K7-K8-K10, SPARC64-III ◦ No Sharing between cores ◦ Waste of resources Inter-Core Cooperative Prefetching ◦ Two types of predictable misses: ◦ Inter-Core Shared (ICS) Leader-Follower Prefetching ◦ Inter-Core Predictable Stride (ICPS) Distance-Based Cross-Core Prefetching
5
Shared Last-Level TLBs Exploit inter-core sharing in parallel programs Flexible regarding where entries can be placed Both parallel and sequential workloads are benefited Greater Hit rate CPU Performance boosted
6
Shared Last-Level TLBs
7
Shared Last-Level TLBs with simple Stride Prefetching
8
Methodology ◦ Parallel applications ◦ Different Sequential application on each core Two distinct evaluation sets
9
Methodology Benchmarks
10
SLL TLBs: Parallel Workload Results SLL TLBs versus Private L2 TLBs
11
SLL TLBs: Parallel Workload Results SLL TLBs versus ICC Prefetching
12
SLL TLBs: Parallel Workload Results SLL TLBs versus ICC Prefetching
13
SLL TLBs: Parallel Workload Results SLL TLBs with Simple Stride Prefetching
14
SLL TLBs: Parallel Workload Results SLL TLBs at Higher Core Counts
15
SLL TLBs: Parallel Workload Results Performance Analysis
16
SLL TLBs: Multiprogrammed Workload Results Multiprogrammed Workloads with One Application Pinned per Core
17
SLL TLBs: Multiprogrammed Workload Results Performance Analysis
18
Conclusion-Benefits: On Parallel Workloads: ◦ Elimination of 7-79% of L1 TLBs misses exploiting parallel program inter-core sharing ◦ Outperform conventional per-core private L2 TLBs by average of 27% ◦ Improve CPI up to 0.25 On multiprogrammed sequential workloads: ◦ Improve over private L2 TLBs by average of 21% ◦ Improve CPI up to 0.4
19
Thank You! Questions??
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.