Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Auction: Optimizing Banks Usage in Non-Uniform Cache Architectures Javier Lira ψ Carlos Molina ψ,ф Antonio González ψ,λ λ Intel Barcelona Research.

Similar presentations


Presentation on theme: "The Auction: Optimizing Banks Usage in Non-Uniform Cache Architectures Javier Lira ψ Carlos Molina ψ,ф Antonio González ψ,λ λ Intel Barcelona Research."— Presentation transcript:

1 The Auction: Optimizing Banks Usage in Non-Uniform Cache Architectures Javier Lira ψ Carlos Molina ψ,ф Antonio González ψ,λ λ Intel Barcelona Research Center Intel Labs - UPC Barcelona, Spain antonio.gonzalez@intel.com ф Dept. Enginyeria Informàtica Universitat Rovira i Virgili Tarragona, Spain carlos.molina@urv.net ψ Dept. Arquitectura de Computadors Universitat Politècnica de Catalunya Barcelona, Spain javier.lira@ac.upc.edu ICS 2010, Tsukuba (Japan) – June 2, 2010

2 Outline Introduction Methodology The Auction Results Enhanced Auction Approaches Conclusions 2

3 Introduction CMPs incorporate large and shared last-level caches. Access latency in large caches is dominated by wire delays. Traditional caches are no longer feasible as LLC in CMPs. 3 40-45% Intel® NehalemIBM® Power7

4 Non-Uniform Cache Architecture NUCA divides a large cache in smaller and faster banks. Cache access latency consists of the routing and bank access latencies. Banks close to cache controller have smaller latencies than further banks. Processor 4

5 Motivation Banks work independently. Most frequently accessed data concentrate in few banks. In case of replacement… A good choice in a particular bank could be completely unfair if the whole NUCA is considered. Core 6 Core 7 Core 0Core 1 Core 2 Core 3 Core 4Core 5 @

6 The Auction A collaborative replacement technique that finds the most appropriate data to evict, not only from a particular bank but from the whole NUCA cache. 6

7 Outline Introduction Methodology The Auction Results Enhanced Auction Approaches Conclusions 7

8 Methodology Simulation tools: Simics + GEMS CACTI v6.0 Two scenarios: Multi-programmed Mix of SPEC CPU2006 Parallel applications PARSEC Number of cores8 – UltraSPARC IIIi Frequency1.5 GHz Main Memory Size4 Gbytes Memory Bandwidth512 Bytes/cycle Private L1 caches8 x 32 Kbytes, 2-way Shared L2 NUCA cache8 MBytes, 256 Banks NUCA Bank32 KBytes, 8-way L1 cache latency3 cycles NUCA bank latency4 cycles Router delay1 cycle On-chip wire delay1 cycle Main memory latency250 cycles (from core) Auction time-out150 cycles

9 Baseline NUCA cache architecture CMP-DNUCA 8 cores 256 banks 16-way bank-set assoc. (8 local + 8 central) LRU in the bank Zero-copy in the NUCA [2] B. M. Beckmann and D. A. Wood. Managing wire delay in large chip-multiprocessor caches. MICRO ‘04

10 Outline Introduction Methodology The Auction Results Enhanced Auction Approaches Conclusions 10

11 The Auction The Auction is a collaborative replacement technique that finds the most appropriate data to evict, not only from a particular bank but from the whole NUCA cache.” 11 “ It owns the item, but wants to sell it. Bank where the replacement happens. Owner Potential owners of the auctioned item. The other banks from the bankset. Bidder Manages the current auction. New component: Auction slots. Controller Auction participants:

12 The Auction 12 Auction Slots... Step 1: Owner starts the auctionStep 2: Bids for the auctioned itemStep 3: Item is sold!

13 First Auction Approach: Base Fills the gaps provoked by invalidating replicated data. Owner Invites all other banks from the bankset. Bidder Bids if NO new replacement. Controller First bid wins, but prioritising central banks. 13

14 Outline Introduction Methodology The Auction Results Enhanced Auction Approaches Conclusions 14

15 Performance 15 Significant benefits with large working sets Good performance in both scenarios Blindly relocating data could be harmful The Auction outperforms prior proposals

16 Energy consumption 16 Leakage dominates the energy consumption Auction reduces overall energy consumed

17 Outline Introduction Methodology The Auction Results Enhanced Auction Approaches Conclusions 17

18 Enhanced Auction Approaches Almost half of auctions finished without receiving bids. We need… a metric to measure the quality of data. By increasing auction accuracy… Controller has more options to decide the best destination. Auctions with no bids are reduced. Auction-based global replacement policies. 18

19 Bank Usage Imbalance 19 Banks will bid relying on their usage rate. Owner Invites all other banks from the bankset. Bidder Bids if less frequently “used” than owner. Controller The least “used” bidder wins. Capacity replacements per cache-set

20 Prioritising most accessed data 20 Keeps most accessed data in the NUCA cache. Owner Invites all other banks from the bankset. Bidder Bids if LRU’s been less accessed than item. Controller Bidder with the least accessed LRU wins. Access counter per line

21 Auction accuracy 21 Reduction of auctions that finish with no bids Controller decisions are more accurated

22 Auction network 22 At the cost of increasing network traffic

23 Performance 23 Increasing auction accuracy, we take better replacement decisions Network contention is a key constraint

24 Outline Introduction Methodology The Auction Results Enhanced Auction Approaches Conclusions 24

25 Conclusions The decentralized nature of NUCA makes replacement policies not effective. The Auction finds the most appropriate data to evict, not only from a particular bank but from the whole NUCA cache. The Auction adapts to the program behaviour and relocates data only if it is worthy. By using auction-based replacement policies, the baseline NUCA improved its performance by 8% and reduced energy consumption by 4%. 25

26 The Auction: Optimizing Banks Usage in Non-Uniform Cache Architectures Questions?

27 More results (1) 27

28 More results (2) 28

29 More results (3) 29


Download ppt "The Auction: Optimizing Banks Usage in Non-Uniform Cache Architectures Javier Lira ψ Carlos Molina ψ,ф Antonio González ψ,λ λ Intel Barcelona Research."

Similar presentations


Ads by Google