Ashok Anand, Aaron Gember-Jacobson, Collin Engstrom, Aditya Akella 1 Design Patterns for Tunable and Efficient SSD-based Indexes.

Ashok Anand, Aaron Gember-Jacobson, Collin Engstrom, Aditya Akella 1 Design Patterns for Tunable and Efficient SSD-based Indexes

Large hash-based indexes 2 WAN optimizers [Anand et al. SIGCOMM ’08] De-duplication systems [Quinlan et al. FAST ‘02] Video Proxy [Anand et al. HotNets ’12] ≈20K lookups and inserts per second (1Gbps link) ≥ 32GB hash table

Use of large hash-based indexes 3 WAN optimizers De-duplication systems Video Proxy Where to store the indexes?

4 SSD 8x less25x less

What’s the problem? Need domain/workload-specific optimizations for SSD-based index with↑ performance and ↓overhead Existing designs have… – Poor flexibility – target a specific point in the cost-performance spectrum – Poor generality – only apply to specific workloads or data structures 5 False assumption!

Our contributions Design patterns that ensure: – High performance – Flexibility – Generality Indexes based on these principles: – SliceHash – SliceBloom – SliceLSH 6

Outline Problem statement Limitations of state-of-the-art SSD architecture Parallelism-friendly design patterns – SliceHash (streaming hash table) Evaluation 7

BufferHash [Anand et al. NSDI ’10] – Designed for high throughput State-of-the-art SSD-based index 8 0 1 2 3 0 1 2 3 K,V 0 1 2 3 0 1 2 3 In-memory incarnation incarnation K A,V A K B,V B K C,V C 0 1 2 3 K A,V A K B,V B K C,V C K#( ) 2 Bloom filter 2 4 bytes per K/V pair! 16 page reads in worst case! (average: ≈1)

SILT [Lim et al. SOSP ‘11] – Designed for low memory + high throughput 0 1 2 3 State-of-the-art SSD-based index 9 LogHash 0 1 2 3 K A,V A K B,V B K C,V C Sorted Hash table K,V Index ≈0.7 bytes per K/V pair 33 page reads in worst case! (average: 1) High CPU usage! Target specific workloads and objectives → poor flexibility and generality Do not leverage internal parallelism

Flash mem package 1 Die 1 Die n 10 Flash mem pkg 126 Flash mem pkg 128 Flash mem pkg 4 Plane 1 Plane 2 Plane 1 Plane 2 Data register Block 1 Page 1 Page 2 Block 2 Page 1 Page 2 SSD controller Channel 1 Channel 32 … … Flash mem pkg 125 … SSD Architecture … How does the SSD architecture inform our design patterns?

Flash memory package 1 Four design principles I.Store related entries on the same page II.Write to the SSD at block granularity III.Issue large reads and large writes IV.Spread small reads across channels 11 Flash memory package 1 Flash memory package 4 Block 2 Channel 1 Channel 32 … … Block 1 Page 1 Flash memory package 4 SliceHash

I. Store related entries on the same page Many hash table incarnations, like BufferHash Incarnation 46: K,V5: K,V 20: K,V1: K,V3: K,V 7: K,V 12: K,V3: K,V 54: K,V6: K,V7: K,V 0: K,V 64: K,V 2: K,V30: K,V1: K,V 7: K,V5: K,V Page 12 Sequential slots from a specific incarnation K#( ) 5 5 Multiple page reads per lookup!

I. Store related entries on the same page Many hash table incarnations, like BufferHash Slicing: store same hash slot from all incarnations on the same page 46: K,V5: K,V 20: K,V1: K,V3: K,V 7: K,V 12: K,V3: K,V 54: K,V6: K,V7: K,V 0: K,V 64: K,V 2: K,V30: K,V1: K,V 7: K,V5: K,V 4 6: K,V 5: K,V 2 0: K,V 1: K,V 3: K,V 7: K,V 1 2: K,V 3: K,V 5 4: K,V 6: K,V 7: K,V 0: K,V 6 4: K,V 2: K,V 3 0: K,V 1: K,V 7: K,V 5: K,V Page Incarnation Slice Only 1 page read per lookup! 13 5 Specific slot from all incarnations

Insert into a hash table incarnation in RAM Divide the hash table so all slices fit into one block 0 1 2 3 4 5 6 7 2 0: K,V 1: K,V 3: K,V 1 2: K,V 3: K,V 0: K,V 2: K,V 3 0: K,V 1: K,V II. Write to the SSD at block granularity Incarnation K B,V B K C,V C K A,V A K D,V D K E,V E K F,V F 4 6: K,V 5: K,V 7: K,V 5 4: K,V 6: K,V 7: K,V 6 4: K,V 7: K,V 5: K,V Block K B,V B K C,V C K E,V E K A,V A K D,V D K F,V F 14 SliceTable

III. Issue large reads and large writes 15 Package 1 PageReg Package 2 PageReg Package 3 PageReg Page size Channel parallelism Package parallelism Package 4 PageReg Channel 1 Channel 2

III. Issue large reads and large writes SSD assigns consecutive chunks (4 pages/8KB) to different channels 16 Block size Channel parallelism

Read entire SliceTable into RAM Write entire SliceTable onto SSD 0 1 2 3 III. Issues large reads and large writes 4 6: K,V 5: K,V 7: K,V 5 4: K,V 6: K,V 7: K,V 6 4: K,V 7: K,V 5: K,V (Block) 2: K,V2 0: K,V 1: K,V 3: K,V 1 0: K,V 2: K,V 3 0: K,V 1: K,V 2: K,V2 0: K,V 1: K,V 3: K,V 1 0: K,V 2: K,V 3 0: K,V 1: K,V K A,V A K D,V D K F,V F 1: K A,V A 2: K D,V D 3: K F,V F 0 2: K,V2 0: K,V 1: K,V 3: K,V 1 0: K,V 2: K,V 3 0: K,V 1: K,V 1: K A,V A 2: K D,V D 3: K F,V F 0 17

IV. Spread small reads across channels Recall: SSD writes consecutive chunks (4 pages) of a block to different channels – Use existing techniques to reverse engineer [Chen et al. HPCA ‘11] – SSD uses write-order mapping 18 channel for chunk i = i modulo (# channels)

Estimate channel using slot # and chunk size Attempt to schedule 1 read per channel (slot # * pages per slot) modulo (# channels * pages per chunk) ( * pages per slot) modulo (# channels * pages per chunk) IV. Spread small reads across channels 19 Channel 0 Channel 1 Channel 2 Channel 3 21 1 14 450

0 1 2 3 4 5 6 7 2 0: K,V 1: K,V 3: K,V 1 2: K,V 3: K,V 0: K,V 2: K,V 3 0: K,V 1: K,V SliceHash summary In-memory incarnation K B,V B K C,V C K A,V A K D,V D K E,V E K F,V F 4 6: K,V 5: K,V 7: K,V 5 4: K,V 6: K,V 7: K,V 6 4: K,V 7: K,V 5: K,V Block K B,V B K C,V C K E,V E K A,V A K D,V D K F,V F 20 SliceTable Page Incarnation Slice Specific slot from all incarnations 4 6: K,V 5: K,V 7: K,V 5 4: K,V 6: K,V 7: K,V 6 4: K,V 7: K,V 5: K,V 2 0: K,V 1: K,V 3: K,V 1 2: K,V 3: K,V 0: K,V 2: K,V 3 0: K,V 1: K,V Read/write when updating

Evaluation: throughput vs. overhead 21 128GB Crucial M4 2.26Ghz 4-core ↑6.6x ↓12% 8B key 8B value 50% insert 50% lookup ↑2.8x ↑15% See paper for theoretical analysis

Evaluation: flexibility Trade-off memory for throughput 22 50% insert 50% lookup Use multiple SSDs for even ↓ memory use and ↑ throughput

Evaluation: generality Workload may change 23 Memory (bytes/entry) CPU utilization (%) Decreasing! Constantly low!

Summary Present design practices for low cost and high performance SSD-based indexes Introduce slicing to co-locate related entries and leverage multiple levels of SSD parallelism SliceHash achieves 69K lookups/sec (≈12% better than prior works), with consistently low memory (0.6B/entry) and CPU (12%) overhead 24

Evaluation: theoretical analysis Parameters – 16B key/value pairs – 80% table utilization – 32 incarnations – 4GB of memory – 128GB SSD – 0.31ms to read a block – 0.83ms to write a block – 0.15ms to read a page 25 overhead 0.6 B/entry cost avg: ≈5.7μs worst: 1.14ms cost avg & worst: 0.15ms

Evaluation: theoretical analysis 26 overhead 0.6 B/entry cost avg: ≈5.7μs worst: 1.14ms cost avg & worst: 0.15ms BufferHash 4B/entry avg: ≈0.2us worst: 0.83ms avg: ≈0.15ms worst: 4.8ms

Ashok Anand, Aaron Gember-Jacobson, Collin Engstrom, Aditya Akella 1 Design Patterns for Tunable and Efficient SSD-based Indexes.

Similar presentations

Presentation on theme: "Ashok Anand, Aaron Gember-Jacobson, Collin Engstrom, Aditya Akella 1 Design Patterns for Tunable and Efficient SSD-based Indexes."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ashok Anand, Aaron Gember-Jacobson, Collin Engstrom, Aditya Akella 1 Design Patterns for Tunable and Efficient SSD-based Indexes.

Similar presentations

Presentation on theme: "Ashok Anand, Aaron Gember-Jacobson, Collin Engstrom, Aditya Akella 1 Design Patterns for Tunable and Efficient SSD-based Indexes."— Presentation transcript:

Similar presentations

About project

Feedback