Bigtable A Distributed Storage System for Structured Data

Slides:

Advertisements

Similar presentations

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China

Advertisements

Tomcy Thankachan  Introduction  Data model  Building Blocks  Implementation  Refinements  Performance Evaluation  Real applications  Conclusion.

Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon 10/22/2012 Fall.

Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational.

Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.

The google file system Cs 595 Lecture 9.

Big Table Alon pluda.

Bigtable: A Distributed Storage System for Structured Data Presenter: Guangdong Liu Jan 24 th, 2012.

Lecture 7 – Bigtable CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.

Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.

Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.

7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.

 Pouria Pirzadeh  3 rd year student in CS  PhD  Vandana Ayyalasomayajula  1 st year student in CS  Masters.

Authors Fay Chang Jeffrey Dean Sanjay Ghemawat Wilson Hsieh Deborah Wallach Mike Burrows Tushar Chandra Andrew Fikes Robert Gruber Bigtable: A Distributed.

BigTable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,

Distributed storage for structured data

Bigtable: A Distributed Storage System for Structured Data

BigTable A System for Distributed Structured Storage Yanen Li Department of Computer Science University of Illinois at Urbana-Champaign

BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.

Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗

BigTable and Google File System

CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.

1 The Google File System Reporter: You-Wei Zhang.

Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.

HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.

Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.

Bigtable: A Distributed Storage System for Structured Data Google’s NoSQL Solution 2013/4/1Title1 Chao Wang Fay Chang, Jeffrey Dean, Sanjay.

Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.

1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.

Bigtable: A Distributed Storage System for Structured Data 1.

Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.

Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability.

Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,

CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.

GFS : Google File System Ömer Faruk İnce Fatih University - Computer Engineering Cloud Computing

HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.

CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.

Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,

Bigtable: A Distributed Storage System for Structured Data

Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.

Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.

The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)

Bigtable A Distributed Storage System for Structured Data.

Big Data Infrastructure Week 10: Mutable State (1/2) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.

From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Chapter 3 System Models.

Bigtable: A Distributed Storage System for Structured Data Written By: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike.

Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung

CSCI5570 Large Scale Data Processing Systems

Lecture 7 Bigtable Instructor: Weidong Shi (Larry), PhD

Bigtable: A Distributed Storage System for Structured Data

How did it start? • At Google • • • • Lots of semi structured data

GFS and BigTable (Lecture 20, cs262a)

Data Management in the Cloud

CSE-291 (Cloud Computing) Fall 2016

Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.

The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.

HashKV: Enabling Efficient Updates in KV Storage via Hashing

Data-Intensive Distributed Computing

CS6604 Digital Libraries IDEAL Webpages Presented by

Introduction to Apache

The Google File System (GFS)

Cloud Computing Storage Systems

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

A Distributed Storage System for Structured Data

THE GOOGLE FILE SYSTEM.

John Kubiatowicz (with slides from Ion Stoica and Ali Ghodsi)

Presentation transcript:

Bigtable A Distributed Storage System for Structured Data Chang, et al. (2006) Google, Inc. Presented by: Siddhartha Sahu September 26, 2016 CS848, University of Waterloo

... What is the problem? 100 TB To build a storage system 8 TB 4 TB Structured Server 1 Server 2 Server 3 Server N Huge data size Distributed

Why is it important? It provides a stable abstraction over which other applications can be built. 8 TB 4 TB 4 TB 8 TB ... Server 1 Server 2 Server 3 Server N

... Why is it hard? Availability Fault Tolerance Performance satisfying different workloads Batch Job 40 TB Realtime 100 GB Reliable abstraction Custom requirements Integration with other systems Why is it hard? Distributed environment makes the problem inherently difficult. 8 TB 4 TB 4 TB 8 TB ... Server 1 Server 2 Server 3 Server N

Solution: Bigtable The word Bigtable is overloaded, and applies to both: the overall service architecture the structure of stored data Multi dimensional dimensional data format Does not support a full relational data model Data is dynamically sharded across multiple servers Provides parallelism and fault tolerance Works together with other Google services Chubby, GFS, Job Scheduler

Schema Map: (row:string, column:string, time:int64) → string tn ... Sorted ACLs Compression GC: versions/time tn ... contents language anchor1 anchor2 ca.waterloo.www ca.waterloo.www/about-us ... t2 contents language anchor1 anchor2 ca.waterloo.www ca.waterloo.www/about-us t1 contents language anchor1 anchor2 ca.waterloo.www <html> en facebook.com ca.waterloo.www/about-us acm.org Table Sparse Atomic operations at single row level only ... sparse, distributed, persistent multi-dimensional sorted map API Column Family Column Family Column Family Locality groups

Sharding API Dynamically partitioned into row ranges called Tablets contents language anchor1 anchor2 ca.waterloo.www <html> en facebook.com ca.waterloo.www/about-us acm.org Sharding contents language anchor1 anchor2 ca.waterloo.www <html> en facebook.com ca.waterloo.www/about-us acm.org Dynamically partitioned into row ranges called Tablets Locality properties depend on row key Atomic operations still happen on single partition Tablet Sorted ... Tablet contents language anchor1 anchor2 com.google.com <html> en Tablet Table API create, delete tables read, write, scan Batching writes client supplied scripts in Sawzall: transforms, filters, aggregate contents language anchor1 anchor2 tz.speedtest.www <html>

Bigtable Architecture GFS Tablet ... SSTable Tablet server ... Master METADATA Client Applications Bigtable Library Chubby A Bigtable cluster stores a number of tables. Each table consists of a set of tablets, and each tablet contains all data associated with a row range. Initially, each table consists of just one tablet. As a table grows, it is auto- matically split into multiple tablets, each approximately 100-200 MB in size by default. Job Scheduler

Chubby Lock service hierarchical namespace of directory and files Uses paxos for consensus clients hold locks as long as session is alive Used to: elect master bootstrap tablets monitor tablet servers store schema and ACLs GFS Tablet SSTable Tablet server ... Master Tablet SSTable METADATA ... Chubby Tablet SSTable Tablet server ... Tablet SSTable Job Scheduler

METADATA tablets ... GFS ... ... Tablet SSTable Tablet server Master Client Applications ... Bigtable Library Chubby Tablet SSTable Tablet server ... Tablet SSTable Job Scheduler

How to locate tablets? Special METADATA table pointing to all other tablets Row key = encode(tablename+last row) how do you use this to look up tablets? Also stores logs for debugging Client optimizations: cache fetches prefetching Never Split

Master Assign tablets to servers How to evenly balance traffic? Detect tablet server failures periodic ping verify lock file in chubby Startup Find live servers from chubby Get live assignments Cross check with METADATA table Start unassigned tablets No communication with clients GFS Tablet SSTable Tablet server ... Master Tablet SSTable Acquire lock METADATA ... Chubby Tablet SSTable Tablet server ... Tablet SSTable Job Scheduler

Tablet Servers Multiple tablets Handles all read and write requests Dynamic sharding GFS Acquire lock Tablet SSTable Tablet server ... Master Tablet SSTable METADATA Client Applications ... Chubby Bigtable Library Tablet SSTable Tablet server ... Optimizations Scan cache Block cache Preload another instance before killing itself? Tablet SSTable Acquire lock Job Scheduler

SSTable ... GFS ... ... Tablet SSTable Tablet server Master Tablet METADATA Client Applications ... Bigtable Library Chubby Tablet SSTable Tablet server ... Tablet SSTable Job Scheduler

SSTable Ordered immutable map 64KB blocks, but configurable Index at the end Why? Binary search to find blocks Why not hashing? Optimizations Single SSTable for each locality group In-memory serving compression: two pass single block small data window Blocks Index 10-to-1 reduction in space

How is the merged view created? SSTable & memtable Optimization: Bloom filters Speeds up non-existent data access Optimization: single commit log per server Sort before recovery Two log file threads Compact all data before dying Read Operation merged view memtable SSTable minor compaction memtable Write Operation RAM GFS commit log SSTable SSTable lexicographically sorted data structures, the merged view can be formed efficiently How is the merged view created? SSTable SSTable

Merging Compaction memtable RAM GFS commit log SSTable SSTable SSTable

Merging Compaction memtable RAM GFS commit log SSTable SSTable SSTable

Merging Compaction memtable RAM GFS commit log SSTable SSTable SSTable

Major Compaction memtable RAM GFS commit log SSTable SSTable SSTable Bounds number of SSTables created Removes all deleted data

Major Compaction memtable RAM GFS commit log SSTable SSTable SSTable

Major Compaction memtable RAM GFS commit log SSTable

Immutability properties SSTables are immutable No contention for reads Deleted data removed during garbage collection Split child tablets can share SSTables memtable is mutable use copy-on-write How to deal with caches? 10-to-1 reduction in space

Performance evaluation 10-to-1 reduction in space

Conclusion Lessons learned: many kinds of failures in Distributed Systems aim for a simple design delay implementing features proper monitoring Discussion question: Having a separate metadata and master servers is an interesting design. Can the same design be used in other systems like GFS? 10-to-1 reduction in space

Picture Acknowledgments Slide 2,3 & 4 https://en.wikipedia.org/wiki/Star_schema Slide 3 & 4 https://www.flickr.com/photos/121483302@N02/14253849274 http://suprtektalk.blogspot.ca/2015/02/google-earth-pro-is-free.html http://calpersloan.com/google-finance-logo https://developers.google.com/analytics/terms/branding-policy http://logos.wikia.com/wiki/Orkut Slide 11 & 19 Images from the paper Chang, Fay; Dean, Jeffrey; Ghemawat, Sanjay; Hsieh, Wilson C; Wallach, Deborah A; Burrows, Michael ‘Mike’; Chandra, Tushar; Fikes, Andrew; Gruber, Robert E (2006), "Bigtable: A Distributed Storage System for Structured Data" 10-to-1 reduction in space