Bigtable: A Distributed Storage System for Structured Data Presenter: Guangdong Liu Jan 24 th, 2012.

Slides:



Advertisements
Similar presentations
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 Presented by Wenhao Xu University of British Columbia.
Advertisements

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Tomcy Thankachan  Introduction  Data model  Building Blocks  Implementation  Refinements  Performance Evaluation  Real applications  Conclusion.
Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon 10/22/2012 Fall.
Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational.
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
The google file system Cs 595 Lecture 9.
Big Table Alon pluda.
The Google File System Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani 1CS5204 – Operating Systems.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Lecture 7 – Bigtable CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
 Pouria Pirzadeh  3 rd year student in CS  PhD  Vandana Ayyalasomayajula  1 st year student in CS  Masters.
Authors Fay Chang Jeffrey Dean Sanjay Ghemawat Wilson Hsieh Deborah Wallach Mike Burrows Tushar Chandra Andrew Fikes Robert Gruber Bigtable: A Distributed.
BigTable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Distributed storage for structured data
Bigtable: A Distributed Storage System for Structured Data
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
BigTable and Google File System
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.
Bigtable: A Distributed Storage System for Structured Data Google’s NoSQL Solution 2013/4/1Title1 Chao Wang Fay Chang, Jeffrey Dean, Sanjay.
BigTable and Accumulo CMSC 461 Michael Wilson. BigTable  This was Google’s original distributed data concept  Key value store  Meant to be scaled up.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
Bigtable: A Distributed Storage System for Structured Data 1.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Large Scale Machine Translation Architectures Qin Gao.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture Chunkservers Master Consistency Model File Mutation Garbage.
Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Bigtable A Distributed Storage System for Structured Data.
Google Cloud computing techniques (Lecture 03) 18th Jan 20161Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Chapter 3 System Models.
Bigtable: A Distributed Storage System for Structured Data Written By: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
CSCI5570 Large Scale Data Processing Systems
Bigtable A Distributed Storage System for Structured Data
Lecture 7 Bigtable Instructor: Weidong Shi (Larry), PhD
Bigtable: A Distributed Storage System for Structured Data
How did it start? • At Google • • • • Lots of semi structured data
GFS and BigTable (Lecture 20, cs262a)
Data Management in the Cloud
CSE-291 (Cloud Computing) Fall 2016
Google Filesystem Some slides taken from Alan Sussman.
Cloud Computing Storage Systems
A Distributed Storage System for Structured Data
THE GOOGLE FILE SYSTEM.
Presentation transcript:

Bigtable: A Distributed Storage System for Structured Data Presenter: Guangdong Liu Jan 24 th, 2012

Dec 8 th, 2011 Google’s Motivation: Scale! Consider Google … –Lots of data Copies of the web, satellite data, user data, and USENET, Subversion backing store –Millions of machines –Different projects/applications –Hundreds of millions of users –Many incoming requests

Dec 8 th, 2011 Why not A DBMS? Few DBMS’s support the requisite scale –Required DB with wide scalability, wide applicability, high performance and high availability Couldn’t afford it if there was one –Most DBMSs require very expensive infrastructure DBMSs provide more than Google needs –E.g., full transactions, SQL Google has highly optimized lower-level systems that could be exploited –GFS, Chubby, MapReduce, Job scheduling

Dec 8 th, 2011 What is a Bigtable? “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, a column key, and a timestamp; each value in the map is an uninterpreted array of bytes.”

Dec 8 th, 2011 Relative to a DBMS, BigTable provides … Simplified data retrieval mechanism –A map – -> string –No relational operators Atomic updates only possible at row level Arbitrary number of columns per row Arbitrary data type for each column Designed for Google’s application set Provides extremely large scale (data, throughput) at extremely small cost

Dec 8 th, 2011 Data model: Row Row keys are arbitrary strings Row is the unit of transactional consistency –Every read or write of data under a single row is atomic Data is maintained in lexicographic order by row key Rows with consecutive keys (Row Range) are grouped together as “tablets”. –Unit of distribution and load-balancing –reads of short row ranges are efficient and typically require communication with only a small number of machines

Dec 8 th, 2011 Data model: Column Column keys are grouped into sets called “column families”, which form the unit of access control. Data stored under a column family is usually of the same type A column family must be created before data can be stored in a column key After a family has been created, any column key within the family can be used for queries Column key is named using the following syntax: family :qualifier Access control and disk/memory accounting are performed at column family level

Dec 8 th, 2011 Data model: timestamps Each cell in Bigtable can contain multiple versions of data, each indexed by timestamp Timestamps are 64-bit integers Assigned by: –Bigtable: real-time in microseconds –Client application: when unique timestamps are a necessity Data is stored in decreasing timestamp order, so that most recent data is easily accessed –Application specifies how many versions (n) of data items are maintained in a cell –Bigtable garbage collects obsolete versions

Dec 8 th, 2011 Data Model Example: Zoo

Dec 8 th, 2011 Data Model Example: Zoo row key col. key timestamp

Dec 8 th, 2011 Data Model Example: Zoo row key col. key timestamp - (zebras, length, 2006) --> 7 ft - (zebras, weight, 2007) --> 600 lbs - (zebras, weight, 2006) --> 620 lbs

Dec 8 th, 2011 Data Model Example: Zoo row key col. key timestamp - (zebras, length, 2006) --> 7 ft - (zebras, weight, 2007) --> 600 lbs - (zebras, weight, 2006) --> 620 lbs Each key is sorted in Lexicographic order

Dec 8 th, 2011 Data Model Example: Zoo row key col. key timestamp - (zebras, length, 2006) --> 7 ft - (zebras, weight, 2007) --> 600 lbs - (zebras, weight, 2006) --> 620 lbs Timestamp ordering is defined as “most recent appears first”

Dec 8 th, 2011 Data Model Example: Web Indexing

Dec 8 th, 2011 Data Model

Dec 8 th, 2011 Data Model Row

Dec 8 th, 2011 Data Model Columns

Dec 8 th, 2011 Data Model Cells

Dec 8 th, 2011 Data Model timestamps

Dec 8 th, 2011 Data Model Column family

Dec 8 th, 2011 Data Model Column family family: qualifier

Dec 8 th, 2011 Data Model Column family family: qualifier

Dec 8 th, 2011 Client APIs Bigtable APIs provide functions for: –Creating/deleting tables, column families –Changing cluster, table and column family metadata such as access control rights –Support of single row transactions –Allowing cells to be used as integer counters –Executing client supplied scripts in the address space of servers

Dec 8 th, 2011 Building Blocks Bigtable uses the distributed Google File System (GFS) to store log and data files The Google SSTable file format is used internally to store Bigtable data An SSTable provides a persistent, ordered immutable map from keys to values –Operations are provided to look up the value associated with a specified key, and to iterate over all key/value pairs in a specified key range

Dec 8 th, 2011 Building Blocks Bigtable relies on a highly-available and persistent distributed lock service called Chubby Chubby provides a namespace that consists of directories and small files. Each directory or file can be used as a lock –Consists of 5 active replicas, one replica is the master and serves requests –Service is functional when majority of the replicas are running and in communication with one another – when there is a quorum

Dec 8 th, 2011 Building Blocks Shared pool of machines that also run other distributed applications

Dec 8 th, 2011 SSTable An SSTable provides a persistent, ordered immutable map from keys to values Each SSTable contains a sequence of blocks A block index (stored at the end of SSTable) is used to locate blocks Each SSTable contains a sequence of blocks A block index (stored at the end of SSTable) is used to locate blocks The index is loaded into memory when the SSTable is open One disk access to get the block

Dec 8 th, 2011 SSTable 64KB Block 64KB Block 64KB Block Index (block ranges) …

Dec 8 th, 2011 Tablet Contains some range of rows of the table Built out of multiple SSTables Index 64K block SSTable Index 64K block SSTable Tablet Start : aardvark End:apple

Dec 8 th, 2011 Table Multiple tablets make up the table SSTables can be shared Tablets do not overlap, SSTables can overlap SSTable Tablet aardvark apple Tablet apple boat

Dec 8 th, 2011 Organization A Bigtable cluster stores tables Each table consists of tablets –Initially each table consists of one tablet –As a table grows it is automatically split into multiple tablets Tablets are assigned to tablet servers –Multiple tablets per server. Each tablet is MB –Each tablet lives at only one server

Dec 8 th, 2011 Implementation: Three major components A library that is linked into every client One master server –Assigning tablets to tablet servers –Detecting the addition and deletion of tablet servers –Balancing tablet-server load –Garbage collection of files in GFS Many tablet servers –Tablet servers manage tablets –Tablet server splits tablets that get too big Client communicates directly with tablet server for reads/writes.

Dec 8 th, 2011 Tablet Location Since tablets move around from server to server, given a row, how do clients find the right machine? –Need to find a tablet whose row range covers the target row

Dec 8 th, 2011 Tablet Location A 3-level hierarchy analogous to that of a B+- tree to store tablet location information : –A file stored in chubby contains location of the root tablet –Root tablet contains location of Metadata tablets The root tablet never splits –Each meta-data tablet contains the locations of a set of user tablets Client reads the Chubby file that points to the root tablet –This starts the location process Client library caches tablet locations –Moves up the hierarchy if location N/A

Dec 8 th, 2011 Tablet Assignment BigTable BigTable Master Performs metadata ops and load balancing BigTable Tablet Server Serves data Cluster scheduling systemGFSChubby Holds tablet data, logs Holds metadata, handles master election Handles failover, monitoring BigTable Client BigTable Client Library

Dec 8 th, 2011 Tablet Server When a tablet server starts, it creates and acquires exclusive lock on, a uniquely-named file in a specific Chubby directory –Call this servers directory A tablet server stops serving its tablets if it loses its exclusive lock –This may happen if there is a network connection failure that causes the tablet server to lose its Chubby session

Dec 8 th, 2011 Tablet Server A tablet server will attempt to reacquire an exclusive lock on its file as long as the file still exists If the file no longer exists then the tablet server will never be able to serve again –Kills itself –At some point it can restart; it goes to a pool of unassigned tablet servers

Dec 8 th, 2011 Master Startup Operation Upon start up the master needs to discover the current tablet assignment. –Grabs unique master lock in Chubby Prevents concurrent master instantiations –Scans servers directory in Chubby for live servers –Communicates with every live tablet server Discover all tablets –Scans METADATA table to learn the set of tablets Unassigned tablets are marked for assignment

Dec 8 th, 2011 Master Operation Detect tablet server failures/resumption Master periodically asks each tablet server for the status of its lock

Dec 8 th, 2011 Master Operation Tablet server lost its lock or master cannot contact tablet server: –Master attempts to acquire exclusive lock on the server’s file in the servers directory –If master acquires the lock then the tablets assigned to the tablet server are assigned to others Master deletes the server’s file in the servers directory –Assignment of tablets should be balanced If master loses its Chubby session then it kills itself –An election can take place to find a new master

Dec 8 th, 2011 Tablet Server Failure

Dec 8 th, 2011 Tablet Server Failure

Dec 8 th, 2011 Tablet Server Failure

Dec 8 th, 2011 Tablet Serving Commit log stores the updates that are made to the data Recent updates are stored in memtable Older updates are stored in SStable files

Dec 8 th, 2011 Tablet Serving Recovery process Reads/Writes that arrive at tablet server o Is the request Well-formed? o Authorization: Chubby holds the permission file o If a mutation occurs it is wrote to commit log and finally a group commit is used

Dec 8 th, 2011 Tablet Serving Tablet recovery process o Read metadata containing SSTABLES and redo points o Redo points are pointers into any commit logs o Apply redo points

Dec 8 th, 2011 Compactions Minor compaction – convert the memtable into an SSTable –Reduce memory usage –Reduce log traffic on restart Merging compaction –Reduce number of SSTables –Good place to apply policy “keep only N versions” Major compaction –Merging compaction that results in only one SSTable –No deletion records, only live data

Dec 8 th, 2011 Refinements Locality groups –Clients can group multiple column families together into a locality group. Compression –Compression applied to each SSTable block separately –Uses Bentley and McIlroy's scheme and fast compression algorithm Caching for read performance –Uses Scan Cache and Block Cache Bloom filters –Reduce the number of disk accesses

Dec 8 th, 2011 Refinements Commit-log implementation –Suppose one log per tablet rather have one log per tablet server Exploiting SSTable immutability –No need to synchronize accesses to file system when reading SSTables – Concurrency control over rows efficient –Deletes work like garbage collection on removing obsolete SSTables –Enables quick tablet split: parent SSTables used by children

Dec 8 th, 2011 Source Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In 7 th OSDI (2006) /google-bigtable.ppthttp:// /google-bigtable.ppt

Dec 8 th, 2011 ©2007 The Board of Regents of the University of Nebraska. All rights reserved. Thanks