Download presentation
Presentation is loading. Please wait.
Published byJamir Grose Modified over 10 years ago
1
Liron Schiff * (TAU) Joint work with Yehuda Afek, Anat Bremler-Barr (TAU) (IDC) Recursive Design of Hardware Priority Queues Supported by European Research Council (ERC) Starting Grant no. 259085
2
Priority Queue (PQ) Priority Queue Insert GetMin
3
Networking: Scheduling Packets –Many flows (1M) –High rate (100Mpps) More Application: Scientific Simulators, Databases Priority Queue Applications Priority Queue (scheduler) 14 33 9 1324 19 274255 1638 7 2 5
4
Two Existing Approaches Dedicated Hardware Solutions Common Software Solutions Non-ScalableScalable
5
Two Existing Implementation Approaches
6
Merge-Sort concept: Our Approach: The Powering Technique Base Priority Queue (BPQ) Sort Merge
7
The Powering Technique
8
Insert(x) uses Input Input BPQ Exit BPQ 3
9
The Powering Technique Insert(x) uses Input Input BPQ Exit BPQ 0 3
10
The Powering Technique Insert(x) uses Input Input BPQ Exit BPQ 0 35
11
The Powering Technique When Input gets full move to Exit. Input BPQ Exit BPQ 0 3 5
12
The Powering Technique When Input gets full move to Exit. Input BPQ Exit BPQ 0 3 5 4 7 8
13
The Powering Technique When Input gets full move to Exit. Input BPQ Exit BPQ 0 3 5 4 7 8 1 2 6
14
The Powering Technique Get_min() extracts the min of Exit or Input Input BPQ Exit BPQ 0 3 5 4 7 8 1 2 6 9 min
15
The Powering Technique Get_min() extracts the min of Exit or Input Input BPQ Exit BPQ 0 3 5 4 7 8 1 2 6 9 and we update the Exit (if needed). min
16
Difficulties with the Simple idea Applying the construction recursively Exemplifying on TCAM base units Evaluation Outline
17
Two difficulties with the simple idea Input Exit
18
Difficulty 1 Maintaining capacity N, while lists are shrinking Input BPQ Exit BPQ 3 5 4 7 8 1 2 6 9
19
Difficulty 1 Maintaining capacity N, while lists are shrinking Input BPQ Exit BPQ 3 5 4 7 8 1 2 6 9
20
Difficulty 1 Maintaining capacity N, while lists are shrinking Input BPQ Exit BPQ 3 5 4 7 8 1 2 6 9 We continually merge inactive lists during Insert
21
Difficulty 1 Maintaining capacity N, while lists are shrinking Input BPQ Exit BPQ 3 5 4 7 8 1 2 6 9 We continually merge inactive lists during Insert 10
22
Difficulty 1 Maintaining capacity N, while lists are shrinking Input BPQ Exit BPQ 3 5 4 7 8 1 2 6 9 We continually merge inactive lists during Insert 10 11
23
Difficulty 1 Maintaining capacity N, while lists are shrinking Input BPQ Exit BPQ 3 5 4 7 8 1 2 6 We continually merge inactive lists during Insert 9 10 11
24
Difficulty 2 Moving all items from input to RAM in O(1) time Exit BPQ Input BPQ
25
Difficulty 2 Moving all items from input to RAM in O(1) time –Use two Input BPQs and switch between them Exit BPQ Input BPQ Input BPQs
26
Difficulty 2 Moving all items from input to RAM in O(1) time –Use two Input BPQs and switch between them Exit BPQ Input BPQ
27
Difficulty 2 Moving all items from input to RAM in O(1) time –Use two Input BPQs and switch between them Exit BPQ Input BPQ
28
Difficulty 2 Moving all items from input to RAM in O(1) time –Use two Input BPQs and switch between them Exit BPQ Input BPQ
29
Block Size – Time Tradeoff Merge x
30
Block Size – Time Tradeoff Exit BPQ Input BPQ
31
Block Size – Time Tradeoff Exit BPQ Input BPQ
32
Block Size – Time Tradeoff Exit BPQ Input BPQ Exit BPQ InputBPQ InputBPQ
33
Block Size – Time Tradeoff Exit BPQ Input BPQ Exit BPQ InputBPQ InputBPQ InputBPQ InputBPQ
34
Block Size – Time Tradeoff Exit BPQ Input BPQ Exit BPQ InputBPQ InputBPQ InputBPQ InputBPQ Insert
35
Block Size – Time Tradeoff Observation-1: only two queues per recursion level!
36
Block Size – Time Tradeoff A Systolic Array like design: Exit BPQ InputBPQ InputBPQ in
37
Extensions - Tradeoffs Analysis: –Two queues of size N/x require only two sub-queues of size N/x 2. –In each operation we act on all sub-queues –For any k1 we have a priority queue with size BPQs and with time per lookup and update.
38
Resulting Tradeoffs Parallel op. Time (Latency) #BPQ Ops. (per op.) #Queues * SizeRecursion Levels........................
39
Resulting Tradeoffs Parallel op. Time (Latency) #BQP Ops. (per op.) #Queues * SizeRecursion Levels........................
40
TCAM example
41
RAM: Content Addressable Memory (CAM): TCAMs 01101010 00100111 11011011 01010110 in 0 1 2 m 1 00100111 out 01101010 00100111 11011011 01010110 in 0 1 2 m 1 00100111 out
42
Associative Memory chips: Properties: –Ternary values (0,1 and *) –Already used in routers (IP lookup, classification) –High throughput (300M ops per sec for 1Mb TCAM) –Latency and costs increase dramatically with size Ternary CAMs (TCAMs) 0 * 10 ** 1 * 00100111 11 *** 011 01010110 in 0 1 2 m 0 00100111 out
43
Panigrahy & Sharma (2003) presented TCAM based data-structure for disjoint ranges (PIDR): –2 TCAM entries per range –2 TCAM updates per insertion/deletion –2 queries per point lookup Can be used to implement a sorted list: TCAM based Priority Queue a6a6 a5a5 a4a4 a3a3 a2a2 a1a1 ) [a 6,a 6 -1][a 5,a 5 -1][a 4,a 4 -1][a 3,a 3 -1][a 2,a 2 -1][a 1,a 1 -1] (-, ) [a 6,a 6 -1][a 5,a 5 -1][a 4,a 4 -1][x,x-1][a 3,a 3 -1][a 2,a 2 -1][a 1,a 1 -1] (-,
44
Implied by Panigrahy & Sharma (2003) Pros: –O(1) time per queue operation –O(1) TCAM space per Item Cons: –O(N) TCAM space –TCAM space should be managed TCAM based Priority Queue 0 * 10 ** 1 * 00100111 11 *** 011 01010110 in 0 1 2 m 0 00100111 out
45
Implied by Panigrahy & Sharma (2003) Pros: –O(1) time per queue operation –O(1) TCAM space per Item Cons: –O(N) TCAM space –TCAM space should be managed TCAM based Priority Queue 00100111 0 * 10 ** 1 * 11 *** 011 01010110 in 0 1 2 m 0 00100111 out
46
Implied by Panigrahy & Sharma (2003) Three versions: A.O(1) time but O(w) entries per item (where w is the width of a priority value in bits) B.O(log w) time C.Empirical O(1) time but O(w) on w.c. TCAM based Priority Queue BPQ
47
Space (TCAM bits) Time (TCAM ops.) Latency (TCAM ops.) original Implied by Panigrahy & Sharma (2003) Our results: TCAM based Priority Queue
48
Implied by Panigrahy & Sharma (2003) Our results: TCAM based Priority Queue Space (TCAM bits) Time (TCAM ops.) Latency (TCAM ops.) original
49
Implied by Panigrahy & Sharma (2003) Three versions: TCAM based Priority Queue Space (TCAM bits) Time (TCAM ops.) Latency (TCAM ops.) PS[03].V1 PS[03].V2 PS[03].V3
50
Using small TCAM-based PQs –Faster TCAM access –Feasible even when N is large Suits well backbone routers –TCAMs are already used for IP-lookup Powering the TCAM BPQ
51
Results for TCAM based PQ
53
Results for TCAM-based PQ k=2 k=1 A B C
54
Applying to Shift-Registers Considering a HW PQ implementation of R. Chandra and O. Sinnen. Original K=1 K=2
55
Summary The Powering Technique –Combine Small HW queues and RAM –Allows space – time tradeoffs Powering TCAMs –Smaller TCAMs shorter operation time –Matches lower bound for sorting with TCAM –Also works for Shift Registers
57
Ensuring Prefix Length order in TCAMs How can we save n patterns with no prefix order violation? 1.Split the patterns to w sections of size n –Require w*n size TCAM 0 1 w-1 2
58
Ensuring Prefix Length order in TCAMs How can we save n patterns with no prefix order violation? 1.Split the patterns to w sections of size n –Require w*n size TCAM 2.PIDR2 - Attach length indicator to each pattern –Require log(w) queries to find the longest pattern LengthPattern *****1**10011*** ****1***1001**** *****1**10010*** 10010111query =00001111 0000001100000111
59
Ensuring Prefix Length order in TCAMs How can we save n patterns with no prefix order violation? 1.Split the patterns to w sections of size n –Require w*n size TCAM 2.PIDR2 - Attach length indicator to each pattern –Require log(w) queries to find the longest pattern 3.Maintain Chain Ancestor Order (CAO) – an optimized scheme by Shah&Gupta [2001] –Reported average O(1), but worst case O(w)
60
TCAM based Priority Queue We use two PIDR-lists with size TCAMs each can store segments/items using the naïve memory management. Input BPQ Exit BPQ 0 1 w-1 2 0 1 2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.