M AINTAINING L ARGE A ND F AST S TREAMING I NDEXES O N F LASH Aditya Akella, UW-Madison First GENI Measurement Workshop Joint work with Ashok Anand, Steven.

Slides:



Advertisements
Similar presentations
Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol Li Fan, Pei Cao and Jussara Almeida University of Wisconsin-Madison Andrei Broder Compaq/DEC.
Advertisements

Finding a needle in Haystack Facebook’s Photo Storage
Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson.
SE-292 High Performance Computing
Storing Data: Disk Organization and I/O
Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
Solid-state storage & DBMS CIDR 2013 Manos Athanassoulis 1.
Solid State Drive. Advantages Reliability in portable environments and no noise No moving parts Faster start up Does not need spin up Extremely low.
Indexing Large Data COMP # 22
Chapter 4 Memory Management Basic memory management Swapping
Cache and Virtual Memory Replacement Algorithms
The HV-tree: a Memory Hierarchy Aware Version Index Rui Zhang University of Melbourne Martin Stradling University of Melbourne.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Rethinking Database Algorithms for Phase Change Memory
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
SILT: A Memory-Efficient, High-Performance Key-Value Store
FAWN: Fast Array of Wimpy Nodes Developed By D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, V. Vasudevan Presented by Peter O. Oliha.
2P13 Week 11. A+ Guide to Managing and Maintaining your PC, 6e2 RAID Controllers Redundant Array of Independent (or Inexpensive) Disks Level 0 -- Striped.
FlashVM: Virtual Memory Management on Flash Mohit Saxena and Michael M. Swift Introduction Flash storage is the largest change to memory and storage systems.
Ashok Anand, Aaron Gember-Jacobson, Collin Engstrom, Aditya Akella 1 Design Patterns for Tunable and Efficient SSD-based Indexes.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble, Ankita Kejriwal, and John Ousterhout Stanford University.
Boost Write Performance for DBMS on Solid State Drive Yu LI.
FAWN: A Fast Array of Wimpy Nodes Presented by: Aditi Bose & Hyma Chilukuri.
Computer ArchitectureFall 2008 © November 12, 2007 Nael Abu-Ghazaleh Lecture 24 Disk IO.
Hystor : Making the Best Use of Solid State Drivers in High Performance Storage Systems Presenter : Dong Chang.
Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…
NET-REPLAY: A NEW NETWORK PRIMITIVE Ashok Anand Aditya Akella University of Wisconsin, Madison.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Flash-based (cloud) storage systems Lecture 25 Aditya Akella.
1 File System Implementation Operating Systems Hebrew University Spring 2010.
Lecture 11: DMBS Internals
IT253: Computer Organization
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee and Bongki Moon Presented by Chris Homan.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Authors: Matteo Varvello, Diego Perino, and Leonardo Linguaglossa Publisher: NOMEN 2013 (The 2nd IEEE International Workshop on Emerging Design Choices.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
PROBLEM STATEMENT A solid-state drive (SSD) is a non-volatile storage device that uses flash memory rather than a magnetic disk to store data. SSDs provide.
1 ECE 526 – Network Processing Systems Design System Implementation Principles I Varghese Chapter 3.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
CS 540 Database Management Systems
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Bigtable: A Distributed Storage System for Structured Data
Taeho Kgil, Trevor Mudge Advanced Computer Architecture Laboratory The University of Michigan Ann Arbor, USA CASES’06.
CPSC 426: Building Decentralized Systems Persistence
FaRM: Fast Remote Memory Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson and Miguel Castro, Microsoft Research NSDI’14 January 5 th, 2016 Cho,
CS161 – Design and Architecture of Computer
CMPE Database Systems Workshop June 16 Class Meeting
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
CS 540 Database Management Systems
Lecture 16: Data Storage Wednesday, November 6, 2006.
Basic Performance Parameters in Computer Architecture:
An Adaptive Data Separation Aware FTL for Improving the Garbage Collection Efficiency of Solid State Drives Wei Xie and Yong Chen Texas Tech University.
Lecture 11: DMBS Internals
HashKV: Enabling Efficient Updates in KV Storage via Hashing
Lecture 9: Data Storage and IO Models
Part V Memory System Design
Virtual Memory 4 classes to go! Today: Virtual Memory.
Page Cache and Page Writeback
Presentation transcript:

M AINTAINING L ARGE A ND F AST S TREAMING I NDEXES O N F LASH Aditya Akella, UW-Madison First GENI Measurement Workshop Joint work with Ashok Anand, Steven Kappes (UW-Madison) and Suman Nath (MSR) 1

Memory & storage technologies 2 Question: What is the role of emerging memory/storage technologies in supporting current and future measurements and applications? This talk: Role of flash memory in supporting applications/measurements that need large streaming indexes; Improving current apps and enabling future apps

Streaming stores and indexes Motivating apps/scenarios Caching, content-based networks (DOT), WAN optimization, de- duplication Large-scale & fine-grained measurements E.g., IP Mon: compute per packet queuing delays Fast correlations across large collections of netflow records Index features Streaming: Data stored in a streaming fashion, maintain online index for fast access Expire old data, update index constantly Large size: Data store ~ several TB, index ~ 100s of GB Need for speed (fast reads and writes) Impacts usefulness of caching applications, timeliness of fine-grained TE 3

Index workload Key aspects Index lookups and writes are random Equal mix of reads/writes New data replaces some old data fast, constant expiry Index data structures Tree-like (B-tree) and log structures not suitable Slow lookup (e.g. log(n) complexity in trees) Poor support for flexible, fast garbage collection Hash tables ideal… … But current options for large streaming hash tables not optimal 4

Current options for >100GB Hashtables DRAM: large DRAMs expensive and can get very hot Disk: inexpensive, but too slow Flash provides a good balance between cost, performance, power efficiency… Bigger and more energy efficient than DRAM Comparable to disk in price >2 orders of magnitude faster than disk, if used carefully But… need appropriate data structures to maximize flash effectiveness and overcome inefficiencies 5

Flash properties Flash chips: Layout – large number of blocks (128KB), each block has multiple pages (2KB) Read/write granularity: page, erase granularity: block Read page: 50us, write page: 400us, block erase: 1ms Cheap: Any read including random, sequential write Expensive: random writes/overwrites, sub-block deletion Requires movement of valid pages from block to erase SSDs: disk like interface for flash Sequential/random read, sequential write: 80us Random write: 8ms 6 Flash good for hashtable lookups Insertions are hard small random overwrites Expiration is hard small random deletes

BufferHash data structure Batch expensive operations – random writes and deletes – on flash Maintain a hierarchy of small hashtables Maintain upper levels in DRAM Efficient insertion Accumulate random updates in memory Flush accumulated updates to lower level in flash (at the granularity of a flash page) Efficient deletion Delete in batch (at flash block granularity) Amortizes deletion cost 7

Handling small random updates … 2^k Buffers Each table uses N-bits key DRAM Flash K bitsN bits Hash key HT IndexHT key 1.Buffer small random updates in DRAM as small hashtables (buffers) 2.When a HT is full, write it to flash, without modifying existing data Each super table is a collection of small hashtables different incarnations over time of the same buffer How to search them? Use (bit-sliced) bloom filters Bit-sliced Bloom filter Super Table 8

Lookup Let key = Check the k 1 th hashtable in memory for the key k 2 If not found, use the bloom filters to decide which hashtable h of the k 1 th supertable may contain the key k 2 Read and check the hashtable (e.g., in hth page of k 1 th block of flash) 9

Expiry of hash entries A supertable is a collection of hashtables Expire the oldest hashtable from a supertable Option 1: use a flash block as a circular queue Supertable = flash block, hashtable = flash page Delete oldest hashtable incarnation (page) and replace it with a new one If a flash block has p pages, supertable has p latest hashtables Problem: a page can not be independently deleted without deleting the block (requires copying other pages) 10

Handle expiry of hash entries Interleave pages from different supertables when writing to flash or SSD Instead of Do this … Advantage: batch deletion of multiple oldest incarnations Other flexible expiration policies can also be supported 11

Insertion Key = Insert into k 1 th hashtable in-memory, using k 2 as the key If the hashtable is full Expire the tail hashtable in k 1 th supertable This expires the oldest incarnation from all supertables Copy k 1 th hashtable from memory to the head of k 1 th supertable 12

Benchmarks 13 Prototyped BufferHash on 2 SSDs and hard drive 99 th percentile read and write latencies under 0.1ms Two orders of magnitude better than disks, at roughly similar cost Built a WAN accelerator that is 3X better than current designs Theoretical results on tuning BufferHash parameters Low bloom filter false positives, low lookup cost, low deletion cost on average Optimal buffer size

Conclusion 14 Many emerging apps and important measurement problems need fast streaming indexes with constant read/write/eviction Flash provides a good hardware platform to maintain such indexes BufferHash helps maximize flash effectiveness and overcome efficiencies Open issues: Role of flash in other measurement problems/architectures? Role of other emerging memory/storage technologies (e.g. PCM)? How to leverage persistence?

I/O operations API Data store/index: StoreData(data) Add data to store; Create/update index with data_name Data store: address Lookup(data_name) Data store: Data ReadData(address) Data store/index: ExpireOldData() Remove old data from store; clean up index Workload data_name is a hash over data Index lookups and writes are random Equal mix of reads/writes Index data structures Tree-like and log structures not suitable Hash tables ideal, but current options for large streaming hash tables not optimal… 15