Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.

Slides:



Advertisements
Similar presentations
Introduction to cloud computing
Advertisements

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Tomcy Thankachan  Introduction  Data model  Building Blocks  Implementation  Refinements  Performance Evaluation  Real applications  Conclusion.
Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon 10/22/2012 Fall.
Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
Big Table Alon pluda.
Bigtable: A Distributed Storage System for Structured Data Presenter: Guangdong Liu Jan 24 th, 2012.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Lecture 7 – Bigtable CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
Distributed Computations MapReduce
 Pouria Pirzadeh  3 rd year student in CS  PhD  Vandana Ayyalasomayajula  1 st year student in CS  Masters.
Authors Fay Chang Jeffrey Dean Sanjay Ghemawat Wilson Hsieh Deborah Wallach Mike Burrows Tushar Chandra Andrew Fikes Robert Gruber Bigtable: A Distributed.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
BigTable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Distributed storage for structured data
Bigtable: A Distributed Storage System for Structured Data
BigTable A System for Distributed Structured Storage Yanen Li Department of Computer Science University of Illinois at Urbana-Champaign
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
BigTable and Google File System
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.
Bigtable: A Distributed Storage System for Structured Data Google’s NoSQL Solution 2013/4/1Title1 Chao Wang Fay Chang, Jeffrey Dean, Sanjay.
BigTable and Accumulo CMSC 461 Michael Wilson. BigTable  This was Google’s original distributed data concept  Key value store  Meant to be scaled up.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.
Bigtable: A Distributed Storage System for Structured Data 1.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI Feb 2012 Presentation.
Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Bigtable A Distributed Storage System for Structured Data.
Google Cloud computing techniques (Lecture 03) 18th Jan 20161Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Chapter 3 System Models.
Bigtable: A Distributed Storage System for Structured Data Written By: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
CSCI5570 Large Scale Data Processing Systems
Bigtable A Distributed Storage System for Structured Data
Lecture 7 Bigtable Instructor: Weidong Shi (Larry), PhD
Column-Based.
Bigtable: A Distributed Storage System for Structured Data
GFS and BigTable (Lecture 20, cs262a)
Data Management in the Cloud
CSE-291 (Cloud Computing) Fall 2016
Cloud Computing Storage Systems
A Distributed Storage System for Structured Data
Presentation transcript:

Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E. Gruber Google, Inc. Tianyang HU

Outline Introduction Data Model Building Blocks Implementation Refinements Performance Evaluation Future Work 2

Motivation Scalability – worldwide applications & users – huge amount of communication & data 3

Bigtable Distributed storage system – petabytes of data, thousands of machines – simple data model with dynamic control – applicability, scalability, performance, availability Used by more than 60 applications of Google 4

Outline Introduction Data Model Building Blocks Implementation Refinements Performance Evaluation Future Work 5

Data Model – Overview Sparse, distributed, persistent multidimensional sorted map. 6

Data Model – Example “Webtable” stores copy of web pages & their related information. – row key: URL (reverse hostname) – column key: attribute name – timestamp: time that the page is fetched 7

Data Model – Rows Row key: string (usually KB, max 64KB) Every R/W of data under a single row key is atomic 8

Data Model – Rows Sorted by row key in lexicographic order Tablet: a certain range of rows – the unit of distribution & load balancing – good locality for data access 9

Data Model – Columns Column families: group of column keys (same type) – the unit of access control Column key: family:qualifier 10

Data Model – Timestamps Timestamp: index multiple versions of the same data – not necessarily the “real time” – data clean up, garbage collection 11

Outline Introduction Data Model Building Blocks Implementation Refinements Performance Evaluation Future Work 12

Building Blocks SSTable file format – persistent, ordered, immutable key-value (string-string) pairs – used internally to store Bigtable data 13

Building Blocks GFS – store log & data files – scalability, reliability, performance, fault tolerance Chubby – a highly-available and persistent distributed lock service 14

Outline Introduction Data Model Building Blocks Implementation Refinements Performance Evaluation Future Work 15

Bigtable Components A library that is linked into every client Many tablet servers – handle R/W to tablets with clients One tablet master – assign tablets to tablet servers – detect addition & expiration of tablet servers – balance tablet-server load 16

Architecture 17

Tablet Location Three-level hierarchy – root tablet (Only one, stores addresses of METADATA tablets) – METADATA tablets (stores addresses of user tablets) – user tablets 18

Tablet Location Client caches (multiple) tablet locations – if the cache is stale, query again 19

Tablet Assignment The tablet master uses Chubby to keeps track of – live tablet servers each live tablet server acquires an exclusive lock on a corresponding file – tablet assignment status compare tablets registered in METADATA tablet with tablets in tablet servers 20

Tablet Assignment Case 1: some tablets are unassigned – master assigns them to tablet servers with sufficient room Case 2: a tablet server stops its service – master detects it and assigns outstanding tablets to other servers. Case 3: too many small tablets – master initiates merge Case 4: a tablet grows too large – the corresponding tablet server initiates split and notifies master 21

Tablet Serving A tablet is stored as a sequence of SSTables in GFS Tablet mutations are logged in commit log – the “commit log” stores redo records – recent tablet versions are stored in memory (memtable) – older tablet versions are stored in GFS 22

Tablet Serving Recover a tablet – 1. Tablet server fetches its metadata from METADATA tablet, which contains a list of SSTables that comprises a tablet and redo points. – 2. The server reads the indices of the SSTables into memory. – 3. The server applies all the mutations after the redo point. 23

Tablet Serving Write operation on a tablet – 1. The tablet server checks the validity of the operation. – 2. The operation is logged in the commit log. – 3. Commit the operation. – 4. The content of tablet is inserted into memtable. 24

Tablet Serving Read operation on a tablet – 1. The tablet server checks the validity of the operation. – 2. Execute the operation on a merged view of memtable & SSTables. 25

Compactions Memtable grows as write operations execute Two types of compactions – minor compaction – merging (major) compaction 26

Compactions Minor compaction (when memtable size reaches a threshold) – 1. Freeze the memtable – 2. Create a new memtable – 3. Convert the memtable to an SSTable and write to GFS 27

Compactions Merging compaction (periodically) – 1. Freeze the memtable – 2. Create a new memtable – 3. Merge a few SSTables & memtable into a new SSTable 28

Compactions Major compaction – special case of merging compaction – merges all SSTables & memtable 29

Compactions Why freeze & create memtable? – Incoming read and write operations can continue during compactions. Advantages of compaction: – release the memory of the tablet server – reduce the amount of data that has to be read from the commit log during recovery if this tablet server dies 30

Outline Introduction Data Model Building Blocks Implementation Refinements Performance Evaluation Future Work 31

Refinements Locality groups – group multiple column families together – different locality groups are not typically accessed together – for each tablet, store each locality group in a separate SSTable – more efficient R/W 32

Refinements Compression – similar data in same column, neighbouring rows, multiple versions – customized compression on SSTable block level (smallest component) – two-pass compression scheme 1. Bentley and McIlroy’s scheme, compress long strings across a large window 2. fast compression algorithm, look for repetitions in small window experimental compression ratio: 10% (Gzip: 25-33%) 33

Refinements Caching – two-level cache on tablet server – Scan cache (high-level): caches key-value pairs Case: read the same data repeatedly – Block cache (low-level): caches SSTable blocks Case: sequential read 34

Refinements Commit-log implementation – one commit log per tablet incurs a large # of disk seeks – use single commit log for all tablets on a tablet server – benefits significantly during normal operation – complicates recovery solution: sort the commit log entries first 35

Outline Introduction Data Model Building Blocks Implementation Refinements Performance Evaluation Future Work 36

Performance Evaluation 37 R/W rate per tablet server aggregate R/W rate

Outline Introduction Data Model Building Blocks Implementation Refinements Performance Evaluation Future Work 38

Future Work Resource sharing for different applications? Hybrid with relational database? – complex query – security 39