Some key-value stores using log-structure Zhichao Liang LevelDB Riak.

Slides:



Advertisements
Similar presentations
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 Presented by Wenhao Xu University of British Columbia.
Advertisements

The google file system Cs 595 Lecture 9.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble, Ankita Kejriwal, and John Ousterhout Stanford University.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Boost Write Performance for DBMS on Solid State Drive Yu LI.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Efficient Storage and Retrieval of Data
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
The Google File System.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
Distributed storage for structured data
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Case Study - GFS.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Flash memory File system organisation issues Nick Gaens.
RAMCloud: Concept and Challenges John Ousterhout Stanford University.
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Forum January, 2015.
Goodbye rows and tables, hello documents and collections.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
Bigtable: A Distributed Storage System for Structured Data 1.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
Presenters: Rezan Amiri Sahar Delroshan
Serverless Network File Systems Overview by Joseph Thompson.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Introduce File Systems – EXT2/3 and BTRFS Yang ShunFa.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Cloudera Kudu Introduction
1 Lecture 20: Big Data, Memristors Today: architectures for big data, memristors.
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
Bigtable: A Distributed Storage System for Structured Data
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
State Machine Replication State Machine Replication through transparent distributed protocols State Machine Replication through a shared log.
W4118 Operating Systems Instructor: Junfeng Yang.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Application-Managed Flash
Bigtable A Distributed Storage System for Structured Data.
CPSC 426: Building Decentralized Systems Persistence
CalvinFS: Consistent WAN Replication and Scalable Metdata Management for Distributed File Systems Thomas Kao.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble and John Ousterhout Stanford University.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Database Management Systems (CS 564)
CSE-291 (Cloud Computing) Fall 2016
SQL 2014 In-Memory OLTP What, Why, and How
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Btrfs Filesystem Chris Mason.
Outline Introduction LSM-tree and LevelDB Architecture WiscKey.
THE GOOGLE FILE SYSTEM.
Presentation transcript:

Some key-value stores using log-structure Zhichao Liang LevelDB Riak

Outline Why log structure? Riak: log-structure hash table Rethinkdb: log-structure b-tree Leveldb: log-structure merge tree Conclusion

Outline Why log structure? Riak: log-structure hash table Rethinkdb: log-structure b-tree Leveldb: log-structure merge tree Conclusion

Log Structure A log-structured file system is a file system design first proposed in 1988 by John K. Ousterhout and Fred Douglis. Design for high write throughput, all updates to data and metadata are written sequentially to a continuous stream, called a log. Conventional file systems tend to lay out files with great care for spatial locality and make in-place changes to their data structures.

Log Structure for SSD Random write degrades the system performance and shrinks the lifetime of ssd. Log structure is ssd-friendly natively! Magnetic DiskSSD free data 1 new data 1 data 2 data 3 data 4 new data 3 block data 3 data 2 data 1 RAM free data 2 erased new data 1 data 2 data 3

Outline Why log structure? Riak: log-structure hash table Rethinkdb: log-structure b-tree Leveldb: log-structure merge tree Conclusion

Riak ? Riak is an open source, highly scalable, fault-tolerant distributed database. Supported core features: - operate in highly distributed environments - no single point of failure - highly fault-tolerant - scales simply and intelligently - highly data available - low cost of operations

Bitcask A Bitcask instance is a directory, and only one operating system process will open that Bitcask for writing at a given time. The active file is only written by appending, which means that sequential writes do not require disk seeking.

Hash Index: keydir A keydir is simply a hash table that maps every key in a Bitcask to a fixed-size structure giving the file, offset and size of the most recently written entry for that key.

Merge The merge process iterates over all non-active file and produces as output a set of data files containing only the “live” or latest versions of each present key.

Outline Why log structure? Riak: log-structure hash table Rethinkdb: log-structure b-tree Leveldb: log-structure merge tree Conclusion

RethinkDB ? RethinkDB is a persistent, industrial-strength key-value store with full support for the Memcached protocol. Powerful technology: - Linear scaling across cores - Fine-grained durability control - Instantaneous recovery on power failure Supported core features: - Atomic increment/decrement - Values up to 10MB in size - Multi-GET support - Up to one million transactions per second on commodity hardware

Installation & usage RethinkDB works on modern 64-bit distributions of Linux. Running the rethinkdb server: Ubuntu x86_64 Ubuntu x86_64 Red Hat Enterprise Linux 5 x86_64 CentOS 5 x86_64 SUSE Linux 10 Ubuntu x86_64 Ubuntu x86_64 Red Hat Enterprise Linux 5 x86_64 CentOS 5 x86_64 SUSE Linux 10 Default installation path: /usr/bin/rethinkdb-1.0./rethinkdb-1.0 -f /u01/rethinkdb_data./rethinkdb-1.0 -f /u01/rethinkdb_data -c 4 -p /rethinkdb-1.0 -f /u01/rethinkdb_data -f /u03/rethinkdb_data -c 4 -p Default installation path: /usr/bin/rethinkdb-1.0./rethinkdb-1.0 -f /u01/rethinkdb_data./rethinkdb-1.0 -f /u01/rethinkdb_data -c 4 -p /rethinkdb-1.0 -f /u01/rethinkdb_data -f /u03/rethinkdb_data -c 4 -p 11500

The methodology Firstly, lack of mechanical parts makes random reads on SSD are significantly efficient! Secondly, random writes trigger more erases, making these operations expensive, and decreasing the drive lifetime! RethinkDB takes an append-only approach to storing data, pioneered by log-structured file system! What are the consequences of appen- only ?

Append-only consequences Data Consistency Hot Backups Instantaneous Recovery Easy Replication Lock-Free Concurrency Live Schema Changes Database Snapshots 2) large amount of data that quickly becomes obsolete in an environment with a heavy insert or update workload 1) eliminating data locality requires a larger number of disk access

Append-only B-tree Page 1 15 Page Page Data File … … Page 1 15 Page Page Page Page Page 1 15 Page 1 15

Outline Why log structure? Riak: log-structure hash table Rethinkdb: log-structure b-tree Leveldb: log-structure merge tree Conclusion

LevelDB ? LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Supported core features: - Data is stored sorted by key - Multiple changes can be made in one atomic batch - Users can create a transient snapshot to get a consistent view of data - Data is automatically compressed using the Snappy compression library

Installation & usage LevelDB works with snappy, which is a compression /decompression library. It is a library, no database server! svn checkout cd leveldb-read-only make && cp libleveldb.a /usr/local/lib && cp -r include/leveldb /usr/local/include svn checkout cd leveldb-read-only make && cp libleveldb.a /usr/local/lib && cp -r include/leveldb /usr/local/include download snappy from cd snappy /configure && make && make install download snappy from cd snappy /configure && make && make install libleveldb.a

Log-structure merge tree LevelDB

Outline Why log structure? Riak: log-structure hash table Rethinkdb: log-structure b-tree Leveldb: log-structure merge tree Conclusion

Log-structure