S M Faisal* Hash in a Flash: Hash Tables for Solid State Devices Tyler Clemons*Shirish Tatikonda ‡ Charu Aggarwal † Srinivasan Parthasarathy* *The Ohio.

Slides:



Advertisements
Similar presentations
Paper by: Yu Li, Jianliang Xu, Byron Choi, and Haibo Hu Department of Computer Science Hong Kong Baptist University Slides and Presentation By: Justin.
Advertisements

Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Hash-Based Indexes The slides for this text are organized into chapters. This lecture covers Chapter 10. Chapter 1: Introduction to Database Systems Chapter.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
SYSTOR2010, Haifa Israel Optimization of LFS with Slack Space Recycling and Lazy Indirect Block Update Yongseok Oh The 3rd Annual Haifa Experimental Systems.
Chapter 11: File System Implementation
1 Overview of Storage and Indexing Chapter 8 (part 1)
FAWN: A Fast Array of Wimpy Nodes Presented by: Aditi Bose & Hyma Chilukuri.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
Computer Organization Cs 147 Prof. Lee Azita Keshmiri.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
1 Overview of Storage and Indexing Chapter 8 1. Basics about file management 2. Introduction to indexing 3. First glimpse at indices and workloads.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Origianal Work Of Hyojun Kim and Seongjun Ahn
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
Speaker: 吳晋賢 (Chin-Hsien Wu) Embedded Computing and Applications Lab Department of Electronic Engineering National Taiwan University of Science and Technology,
CY2003 Computer Systems Lecture 09 Memory Management.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
CS414 Review Session.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:
1 Overview of Storage and Indexing Chapter 8 (part 1)
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
Module 4.0: File Systems File is a contiguous logical address space.
Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee and Bongki Moon Presented by RuBao Li, Zinan Li.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
Introduction to Database, Fall 2004/Melikyan1 Hash-Based Indexes Chapter 10.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
12.1 Silberschatz, Galvin and Gagne ©2003 Operating System Concepts with Java Chapter 12: File System Implementation Chapter 12: File System Implementation.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
CS 540 Database Management Systems
File Systems - Part I CS Introduction to Operating Systems.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
Data on External Storage – File Organization and Indexing – Cluster Indexes - Primary and Secondary Indexes – Index data Structures – Hash Based Indexing.
Chapter 5 Record Storage and Primary File Organizations
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.
 The emerged flash-memory based solid state drives (SSDs) have rapidly replaced the traditional hard disk drives (HDDs) in many applications.  Characteristics.
Select Operation Strategies And Indexing (Chapter 8)
Memory Management.
CS522 Advanced database Systems
CS 540 Database Management Systems
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
Hash-Based Indexes Chapter 11
Database Management Systems (CS 564)
Lecture 11: DMBS Internals
HashKV: Enabling Efficient Updates in KV Storage via Hashing
Lecture 9: Data Storage and IO Models
Disk Storage, Basic File Structures, and Hashing
Hash-Based Indexes Chapter 10
Hash-Based Indexes Chapter 11
Database Design and Programming
Chapter 11 Instructor: Xin Zhang
Dong Hyun Kang, Changwoo Min, Young Ik Eom
The Design and Implementation of a Log-Structured File System
Presentation transcript:

S M Faisal* Hash in a Flash: Hash Tables for Solid State Devices Tyler Clemons*Shirish Tatikonda ‡ Charu Aggarwal † Srinivasan Parthasarathy* *The Ohio State University. Columbus, Ohio ‡ IBM Almaden Research Center. San Jose, California † IBM T.J. Watson Center. Yorktown Heights, New York

Motivation and Introduction  Data is growing at a fast pace  Scientific data, Twitter, Facebook, Wikipedia, WWW  Traditional Data Mining and IR algorithms require random out-of-core data access  Often Data is too large to fit in memory thus frequent random disk access is expected 11/30/2007 2

Motivation and Introduction (2)  Traditional Hard Disk Drives can keep pace with storage requirements but NOT random access workloads  Moving parts are physical limitations  Also contribute to rising energy consumption  Flash Devices have emerged as an alternative  Lack moving parts  Faster Random Access  Lower energy usage  But they have several drawbacks…. 11/30/2007 3

Flash Devices  Limited Lifetime  Supports limited number of rewrites  Also known as erasures or cleans.  Impacts response time  These are incurred at the block level.  Blocks consist of pages. Pages (4kb-8kb) are the smallest I/O unit  Poor Random Write Performance  Incurs many erasures and lowers lifetime  Efficient sequential write performance  Lowers erasures and increases lifetime 11/30/2007 4

On Flash Devices, DM, and IR  Flash Devices provide fast random read access  Common for many IR and DM algorithms and data structures  Hash Tables are common in both DM and IR  Useful for associating keys and values  Counting Hash Tables associate keys with a frequency  This is found in many algorithms that track word frequency  We will examine one such algorithm common in both DM and IR (TF-IDF)  They exhibit random access for writes and reads  Random Writes are an issue for Flash Devices 11/30/2007 5

Hash Tables for Flash Devices must:  Reduce erasures/cleans and Reduce random writes to SSD  Batch updates  Maintain reasonable query times  Data Structure must not incur unreasonable disk overhead  Nor should it require unreasonable memory restraints 11/30/2007 6

Our approach  Our approach makes two key contributions:  Optimize our designs for a counting hash table.  This has not been done by the previous approaches  (A. Anand ’10), (D. Andersen ’09), (B. Debnath, ’10), (D. Zelinalipour-Yatzi ’05)  The Primary Hash Table resides on the Flash Device.  Many designs use the SSD as a cache to the HDD  (D. Andersen ’09) (B. Debnath, ’10)  Anticipate data sets with high random access and throughout requirements 11/30/2007 7

Hash Tables for Flash Devices must:  Reduce erasures/cleans and Reduce random writes to SSD  Batch updates  Create In Memory Structure  Target semi-random updates or block level updates  Maintain reasonable query times  Data Structure must not incur unreasonable disk overhead  Carefully index keys on disk  Nor should it require unreasonable memory restraints  Memory requirement is at most fixed parameter 8

Memory Bounded(MB) Buffering 9 Updates are Hashed into a bucket in the RAM Updates are quickly combined in memory (64,2)(12,7) When full, batch updates to corresponding Disk Buckets If Disk Buckets are full, invoke overflow region

Memory Bounded(MB) Buffering  Two way Hash  On-Disk Closed Hash Table  Hash at page level  Update via block level  Linear Probing for collisions  In memory Open Hash table  Hash at block level  Combine updates  Flush with merge() operation  Overflow segment  Closed Hash table excess 11/30/

Can we improve MB?  Reduces number of write operations to flash device  Batch Updates only when memory buffer is full  Updates are semi-random  (Key,Value) changes are maintained in memory  Query times are reasonable  Memory buffer search is fast  Relatively fast SSD random access and linear probing (See Paper)  Prefetch pages  MB has disadvantages  Sequential Page Level operations are preferred  Fewer block updates  Limited by the amount of available memory  Think large disk datasets.  Updates may be numerous 11/30/

Introduce an On Disk Buffer  Batch updates from memory to disk are page level  Reduce expensive block level writes (time and cleans)  Increase Sequential writes  Increase buffering capability  Reduce expensive non semi-random Block Updates  May decrease cleans  Search space increases during queries  Incurred only if inserting and reading concurrently  However, less erasure time will decrease latency 11/30/

On Disk Buffering  Change Segment (CS)  Sequential Log Structure  sequential writes  stage() operation  Flushes memory to CS  Fast Page Level Operations  merge() operation  Invoked when CS is full  Combines CS with Data Segment  Less frequent than stage()  What is the structure of the CS? 11/30/

Change Segment Structure v1 14 Buckets are assigned specific Change Segment Buckets. Change Segment Buckets are shared by multiple RAM buffer buckets.

Memory Disk Bounded Buffer (MDB)  Associate a CS block to k data blocks  Semi random writes  Only merge() full CS blocks  Frequently updated blocks may incur numerous (k-1) merge() operations  Query times incur an additional block read  Packed with unwanted data 11/30/

Change Segment Structure v2 16 As buckets are flushed, they are written sequentially to the change segment one page at a time

MDB-L  No Partitions in CS  Allows frequently updated blocks to have maximum space  merge() all blocks when CS is full  Potentially expensive  Very infrequent  Queries are supported by pointers  As blocks are staged onto the CS, their pages are recorded for later retrieval  Prefetch 11/30/

Expectations  MB will incur more cleans than MDB or MDBL  Frequent merge() operation will incur block erasure  MDB and MDBL will incur slightly higher query times  Addition of CS  MDB and MDBL will have superior I/O performance  Most operations are page level  Less erasures  lower latency 11/30/

Experimental Setup (Application)  TF-IDF  Term Frequency-Inverse Document Frequency  Word importance is highest for infrequent words  Requires a counting hash table  Useful in many data mining and IR applications (document classification and search) 11/30/

Experimental Setup (DataSets)  100,000 Random Wikipedia articles  136M keywords  9.7M entries  MemeTracker (Aug 2009 dump)  402M total entries  17M unique 11/30/

Experimental Setup (Method)  1M random queries were issued during insertion phase  10 random workloads, queries need not be in the table  Measure Query Performance, I/O time, and Cleans  Used three SSD configurations  One Single Level Cell (SLC) vs two Multi Level Cell (MLC) configurations  MLC is more popular. Cheaper per GB but less lifetime  SLC have lower internal error rate, and faster response rates (See Paper for specific configurations)  DiskSim and Microsoft SSD Plugin  Used for benchmarking and fine-tuning our SSD 21

Results (AVERAGE Query Time) By varying the on memory buffer, as a percentage of the data segment, the average query time only reduces by fractions of a second. This suggest the majority of the query time is incurred by the disk. 11/30/

Results (AVERAGE Query Time) By varying the on disk buffer, as a percentage of the data segment, the average query time decreases substantiall for MDBL This reduction is seen in both datasets. MDB requires block reads in the CS. 11/30/

Results (AVERAGE Query Time) Using the Wiki dataset, we compared SLC with MLC We experience consistent performance 11/30/

Results(AVERAGE I/O) In this experiment, we set the in memory buffer to 5% and the CS to 12.5% of the primary hash table size Simulation time is highest for MB because of the block erasures (next slide). MDBL is faster than MDB because of the increased page level operations 25

Results(Cleans/Erasures) Cleans are extremely low for both MDB and MDBL relative to MB This is caused by the page level sequential operations Queries are effected by cleans because the SSD must allocate resources to cleaning moving 11/30/

Discussion and Conclusion  Flash Devices are gaining popularity  Low Latency, High Random Read Performance, Low Energy  Limited lifetime, poor random write performance  Hash tables are useful data structures in many data mining and IR algorithms  They exhibit random write patterns  Challenging for Flash Devices  We have demonstrated that a proper Hash table for Flash Devices will have  In-memory buffer for batch memory  disk updates  On disk data buffer with page level operations 11/30/

Future work  Our current designs rely on hash functions that use the mod operator  Extendible Hashing  Checkpoint methods for crash recovery  Examine on Real SSD  Disksim is great for finetuning and examining statistics 11/30/

Questions?