Presentation is loading. Please wait.

Presentation is loading. Please wait.

Paging for Multi-Core Shared Caches Alejandro López-Ortiz, Alejandro Salinger ITCS, January 8 th, 2012.

Similar presentations


Presentation on theme: "Paging for Multi-Core Shared Caches Alejandro López-Ortiz, Alejandro Salinger ITCS, January 8 th, 2012."— Presentation transcript:

1 Paging for Multi-Core Shared Caches Alejandro López-Ortiz, Alejandro Salinger ITCS, January 8 th, 2012

2 2

3 Multi-Core challenges Access to data is a key factor Cache efficiency is determinant – Algorithms – Schedulers – Paging strategies Extensively studied for sequential case Almost no previous theory for multi-core case 3

4 Agenda Sequential paging Multi-Core paging Natural online strategies The offline problem Conclusions and open problems 4

5 Sequential Paging 5 Slow memory Cache of size K …p 6 p 3 p 2 p 4 p 4 p 2 p 10 p 11 p 5 p 4 …Page request Is p i in the cache? -Yes, do nothing (hit) -No, fetch p i from slow memory, evict one page from cache (fault) Goal: minimize number of faults

6 Sequential Paging 6

7 Multi-Core Paging 7 RAM Core 1 Core 2 Core 3 Core 4 L2/L3 Cache

8 t 123456789101112 R1:R1:p2p2 p8p8 p1p1 p4p4 p3p3 p4p4 p 10 p5p5 … R2:R2:p9p9 p1p1 ___ p8p8 p2p2 p1p1 p1p1 p4p4 p7p7 … R3:R3:p3p3 p 18 p 17 p8p8 p2p2 p3p3 p2p2 p9p9 … Multi-Core Paging 8 t 123456789101112 R1:R1:p2p2 p8p8 p1p1 p4p4 p3p3 p4p4 p 10 p5p5 … R2:R2:p9p9 p1p1 p8p8 p2p2 p1p1 p1p1 p4p4 p7p7 … R3:R3:p3p3 p 18 p 17 p8p8 p2p2 p3p3 p2p2 p9p9 …

9 Related Models Multiple applications or threads Multi-Core model [Hassidim, ICS‘10] – Makespan – LRU is not competitive – Scheduling Our model: – No scheduling of requests – Separates scheduling and paging – Minimize faults 9

10 Natural Strategies Share the cache – Eviction policy Partition the cache among cores – Partition function (static, dynamic) – Eviction policy Examples: – Shared-LRU – Optimal Static Partition with LRU 10

11 Partition vs. Shared 11 For any online dynamic partition that changes o(n) times Partitions that don’t change enough are not competitive

12 Shared strategies The same applies to FIFO, CLOCK, FWF 12

13 Proof idea 13 Faults LRU ≥ n/2 Obs: Furthest-In-The-Future is not optimal

14 The Offline Problem 14

15 PARTIAL-INDIVIDUAL-FAULTS (PIF): 15 1234567891011121314151617181920 p1p1 __p2p2 p8p8 p1p1 p4p4 __p 10 p5p5 p1p1 p4p4 p2p2 p9p9 p9p9 p5p5 p2p2 p3p3 p7p7 p2p2 p9p9 p1p1 __p4p4 p8p8 __p1p1 p4p4 p7p7 p2p2 __p3p3 p4p4 __p1p1 p3p3 p4p4 __p8p8 p2p2 p3p3 p2p2 p9p9 p5p5 p1p1 p4p4 p2p2 p9p9 p9p9 p1p1 __p4p4 p2p2 p2p2 __p3p3 p8p8 p1p1 p1p1 p3p3 p9p9 __ p5p5 p1p1 p8p8 __p1p1 p4p4 p2p2 E.g. At t=18,?

16 PARTIAL-INDIVIDUAL-FAULTS (PIF): Optimization version (MAX-PIF): given an instance of PIF, maximize the number of sequences that fault within given bound Unless P=NP, there is no PTAS for MAX-PIF Theorem: PIF is NP-complete Theorem: MAX-PIF is APX-hard

17 PIF vs. Min Faults 17

18 The Offline Problem Offline algorithm can align sequences properly by means of faults Algorithm could “force faults” for this sake Regular execution Forcing a fault on p 1 18 p1p1 p2p2 p3p3 p5p5 p8p8 p9p9 p1p1 p5p5 p4p4 p5p5 p1p1 p4p4 p6p6 p9p9 … p2p2 p3p3 p3p3 p2p2 p8p8 p8p8 p3p3 p 10 p7p7 … p1p1 p2p2 p3p3 p5p5 p8p8 p9p9 p1p1 p2p2 p3p3 p5p5 p8p8 p4p4 p1p1 p5p5 p4p4 ___ p5p5 p1p1 p4p4 p6p6 p9p9 … p2p2 p3p3 p3p3 p2p2 p8p8 p8p8 p3p3 p7p7 … p1p1 p5p5 p4p4 p5p5 p1p1 p4p4 p6p6 p9p9 … p2p2 p3p3 p3p3 p2p2 p8p8 p8p8 p3p3 p7p7 … p1p1 ___ p5p5 p4p4 p5p5 p1p1 p4p4 p6p6 p9p9 … p2p2 p3p3 p3p3 p2p2 p8p8 p8p8 p3p3 p7p7 … p1p1 p2p2 p3p3 p5p5 p8p8 p9p9 p1p1 p2p2 p3p3 p5p5 p8p8 p9p9 p1p1 p4p4 p3p3 p5p5 p8p8 p9p9 p1p1 ___ p5p5 p4p4 ___ p5p5 p1p1 p4p4 p6p6 p9p9 p2p2 p3p3 p3p3 p2p2 p8p8 p8p8 p3p3 p7p7 …

19 The Offline Problem However, this has no advantage over an honest offline algorithm 19 Theorem: Let A be an offline algorithm that forces faults. There exists an offline algorithm A’ such that for all disjoint R A(R) =A’(R) Theorem: Let A be an offline algorithm that forces faults. There exists an offline algorithm A’ such that for all disjoint R A(R) =A’(R)

20 The Offline Problem 20

21 Conclusions Multi-core paging is significantly different from sequential paging Traditional paging strategies are not competitive Serving a set of requests while limiting faults in each sequence is hard Multi-core paging is in P when number of cores is constant 21

22 Open Problems What are good online strategies? What are good measures of performance? – Fairness? What is the complexity of minimizing the number of faults? Can we obtain more efficient offline algorithms (exact or approximate)? 22 Thank you

23 23

24 Partition vs. Shared 24 E.g. K=12, p=3 OPT={5,5,2} p 1 p 2 p 3 p 4 p 5 p 1 p 2 p 3 p 4 p 5 p 1 p 1 p 1 p 1 p 1 p 1 p 1 p 1 p 1 p 1 p 1 p 1 p 1 p 1 p 1 p 1 p 1 p 1 p 1 p 1 q 1 q 1 q 1 q 1 q 1 q 1 q 1 q 1 q 1 q 1 q 1 q 2 q 3 q 4 q 5 q 1 q 2 q 3 q 4 q 5 q 1 q 1 q 1 q 1 q 1 q 1 q 1 q 1 q 1 q 1 s 1 s 1 s 1 s 1 s 1 s 1 s 1 s 1 s 1 s 1 s 1 s 1 s 1 s 1 s 1 s 1 s 1 s 1 s 1 s 1 s 1 s 2 s 3 s 4 s 5 s 1 s 2 s 3 s 4 s 5 For any online dynamic partition D that changes o(n) times

25 Dynamic Programming Alg. Running time exponential in K and p But polynomial in n (recall n>>p) In practice p=2,4,8. p=O(log n) Running time is infeasible Both minimizing the number of faults and PIF are in P for constant p Problem’s hardness stems from the number of sequences, not their length 25

26 Real world numbers 26

27 Hassidim’s model 27

28 Our Model Offline cannot delay sequences Requests must be served as they arrive Separates paging algorithm and scheduler Minimize number of faults 28

29 Static Partitions 29

30 Static Partitions 30 The choice of a good partition is more important than the choice of the eviction policy

31 The Offline Problem 31 Theorem: PIF is NP-complete f__f__hhhhhhhhhhhf__f__f__f__f__f__f__f__f__f__f__f__f__... f__f__f__f__f__f__f__hhhhhhhhhhhhhhhhhhhf__f__f__f__f__... f__f__f__f__f__f__f__f__f__f__f__f__f__f__f__hhhhhhhhhhh… t

32 Dynamic Programming Alg. 32

33 Dynamic Programming Alg. f 33 Positions in sequences Cache Polynomial for constants K and p

34 Partition vs. Shared Competitive strategies must be either shared or change the partition often In fact, these are equivalent (for disjoint sequences) 34 Partitions that don’t change enough are not competitive

35 Partition vs. Shared 35 For any online dynamic partition D that changes o(n) times Partitions that don’t change enough are not competitive

36 Related Models 36 ReferenceModelSchedulingDelayRemarks [Fiat, Karlin ’95]Multi-pointer in Access graph (applications and threads) No Algorithm with optimal Competitive Ratio [Barve et al. ‘00]Multiple applications No [Feuerstein, Strejilevich de Loma 02’] Multi-Threaded Paging YesNo [Hassidim 10’]Multi-coreYes [This]Multi-coreNoYes

37 Related Models 37 ReferenceModelSchedulingDelay [Fiat, Karlin ’95]Multi-pointer in Access graph (applications and threads) No [Barve et al. ‘00]Multiple applicationsNo [Feuerstein, Strejilevich de Loma 02’] Multi-Threaded PagingYesNo [Hassidim 10’]Multi-coreYes [This]Multi-coreNoYes


Download ppt "Paging for Multi-Core Shared Caches Alejandro López-Ortiz, Alejandro Salinger ITCS, January 8 th, 2012."

Similar presentations


Ads by Google