Bigtable: A Distributed Storage System for Structured Data 1
Before we begin … BigTable Sawzall MapReduce Bloom Filters Bigtable: A Distributed Storage System for Structured Data 2
Introduction Data Model API Building Blocks Implementation Refinements Performance Evaluation Real Applications Lessons Related Work Conclusions Bigtable: A Distributed Storage System for Structured Data 3
What is Bigtable? A distributed storage system for managing structured data at Google Used by > 60 Google products Google Analytics Google reader Personalized Search Orkut Bigtable: A Distributed Storage System for Structured Data 4
Goals Wide applicability Scalability High performance High availability Bigtable and Database Bigtable does not support a full relational data model Bigtable: A Distributed Storage System for Structured Data 5
A Bigtable is sparse, distributed, persistent multi-dimensional sorted map Distributed multi-dimensional sparse map (row, column, timestamp) cell contents Webtable Bigtable: A Distributed Storage System for Structured Data 6
Rows row keys are arbitrary strings up to 64KB every read or write of data in a single row is atomic (regardless of the # or columns) row ranges are dynamically partitioned into tablets Bigtable: A Distributed Storage System for Structured Data 7
Column Families column keys are grouped into sets called column families usually of the same type number of columns families should be small number of columns is unbounded access control is at the column family level Bigtable: A Distributed Storage System for Structured Data 8
Timestamps Each cell in a Bigtable can contain multiple versions of the same data Versions are indexed by 64-bit integer timestamps Garage-collection settings per-column-family: only the last n versions of a cell be kept, or only new-enough versions be kept Bigtable: A Distributed Storage System for Structured Data 9
10 Rows Columns Timestamps
Metadata operations Create/delete tables or column families Change metadata Writes (atomic) Bigtable does not support general transactions across row keys does not support writing to Bigtable filtering, summarization, and transformation Bigtable can be used with MapReduce Bigtable: A Distributed Storage System for Structured Data 11
Google File System (GFS) used to store log and data files Scheduler cluster management system used to manage jobs and resources SSTable file format used internally to store Bigtable data Chubby distributed lock service highly-available with five active replicas % unavailability for 14 Bibtable clusters % unavailability for most affectected cluster Bigtable: A Distributed Storage System for Structured Data 12
What is a tablet? A Bigtable cluster stores a number of tables Each table consists of a set of tablets Each tablet managed by a specific tablet server As a table grows, it is automatically split into multiple tablets ( ) MB in size by default Tablet servers handle read/write requests for their tablets Bigtable: A Distributed Storage System for Structured Data 13
BigTable: Servers Master manages assignment of tablets servers Bigtable: A Distributed Storage System for Structured Data 14 Tablet server 1 Bigtable Master Tablet server 2 Tablets
Tablet Location A three-level hierarchy of tablets is used to store tablet locations The root tablet is never split Bigtable: A Distributed Storage System for Structured Data 15
Tablet Assignment A master server is responsible for assigning tablets to tablet servers The master server also: detects addition and expiration of tablet servers balances tablet server loads initiates garbage collection of files in GFS reassigns tablets when a tablet server is lost If the master server dies, a new master server is recreated Bigtable: A Distributed Storage System for Structured Data 16
Tablet Serving The persistent state of a tablet as stored in GFS Bigtable: A Distributed Storage System for Structured Data 17 memtableRead Op Write Op SSTable Files Memory GFS
Compactions Minor Compactions memtable size reaches a threshold memtable is frozen new memtable is created frozen memtable is converted into a new SSTable Merging Compactions Bigtable: A Distributed Storage System for Structured Data 18
A number of refinements were required for Bigtable implementations to achieve high: performance availability reliability Bigtable: A Distributed Storage System for Structured Data 19
Locality groups Clients can group multiple column families together into a locality group A separate SSTable is generated for each locality group Segregating column families which are not typically accessed together enables more efficient reads Bigtable: A Distributed Storage System for Structured Data 20
Refinements Compression Clients can control whether compression is used on a locality group Many clients use a two pass compression algorithm Bentley and McIlroy's scheme Bigtable: A Distributed Storage System for Structured Data 21
Refinements Caching & Bloom Filters Tablets use two levels of caching to improve read performance Scan caching is useful for data which tends to be read repeatedly Block caching is useful for when read data tends to be close to data recently read Bloom filters reduce disk seeks by allowing a client to ask whether a SSTable contains a row/column key pair Bigtable: A Distributed Storage System for Structured Data 22
Refinements Speeding Table Recovery When a tablet is moved to another tablet server : A minor compaction is performed The tablet server stop serving the tablet Another minor compaction (unusually fast) Then the tablet is moved without requiring any log entry recovery Bigtable: A Distributed Storage System for Structured Data 23
Refinements Exploiting Immutability Because SSTables are immutable, various parts of the Bigtable system have been simplified: file system access synchronization permanently removing deleted data is completely handled thru garbage collection splitting tables is efficient because child tablets can share the SSTable of parent tablets Bigtable: A Distributed Storage System for Structured Data 24
Google setup a Bigtable cluster with N tablet servers to measure performance and scalability as N is varied. configured to use 1 GB of memory each with two 400GB IDE hard drives, two dual core 2 GHz chips, and a single gigabit Ethernet link N client machines generated the Bigtable load used for tests Every machine ran a GFS server. Bigtable: A Distributed Storage System for Structured Data 25
Performance Evaluation Single tablet - server performance Bigtable: A Distributed Storage System for Structured Data 26 Experiment # of Tablet Servers Random Reads Random Reads (mem) Random Writes Sequintial Reads Sequintial Writes Scans
Performance Evaluation Scaling : Aggregate throughput increases by over a factor of 100 as the number of tablet servers is increased from 1 to 500. Bigtable: A Distributed Storage System for Structured Data 27
Real Applications As of August 2006 388 non-test Bigtable cluster tablet servers Bigtable: A Distributed Storage System for Structured Data 28 # of Tablet Servers # of Clusters > 50012
Real Applications Bigtable: A Distributed Storage System for Structured Data 29 This table provides some data about a few of the tables currently in use Table size (measured before compression) and # Cells indicate approximate sizes
Real Applications Google Analytics Google Analytics is supported by 2 Bigtables 200 TB raw click table 20 TB summary table Bigtable: A Distributed Storage System for Structured Data 30
Real Applications Google Earth Google Earth is supported by 2 Bigtables 70 TB images table, compression turned off 500 GB index table Bigtable: A Distributed Storage System for Structured Data 31
Real Applications Personalized Search Personalized Search supported by 1 Bigtable one row per user id separate column family for each type of action Bigtable: A Distributed Storage System for Structured Data 32
Lessons learned Large distributed systems are vulnerable to many types of failures memory and network corruption hung machines extended and asymmetric network partitions bugs in other systems (i.e. Chubby) overflow of GFS quotas planned and unplanned hardware maintenance To address experience problems some protocols have been changed some assumptions have been modified Bigtable: A Distributed Storage System for Structured Data 33
Lessons learned It is important to delay adding new features until it is clear how the new features will be used It is important to support system-level monitoring allowed for detection and fixing of many issues also enables tracking clusters to answer common questions Bigtable: A Distributed Storage System for Structured Data 34
Related Work The Boxwood project's goal is to provide infrastructure for building higher-level services such as file systems or databases while the goal of Bigtable is to directly support client applications that wish to store data Bigtable: A Distributed Storage System for Structured Data 35
Related Work C-Store and Bigtable share many characteristics shared-nothing architecture two different data structures however these two systems vary significantly in their APIs performance optimization Bigtable: A Distributed Storage System for Structured Data 36
Conclusions Bigtable is a distributed system for storing structure data at Google in production since April 2005 seven person-years to design and implement more than 60 projects using in August 2006 users like performance and high availability Users can scale their applications capacity by simply adding more machines to their system Bigtable: A Distributed Storage System for Structured Data 37
Conclusions Google has begun deploying Bigtable as a service to product groups Google has gained significant advantages by building their own storage solution has control over implementation and infrastructure can remove bottlenecks and inefficiencies as the arise Bigtable: A Distributed Storage System for Structured Data 38
Strengths Implementation and Usable Optimization Performance Evaluation Used by > 60 Google products Bigtable: A Distributed Storage System for Structured Data 39
Weaknesses Complexity Chubby Master Network Bigtable: A Distributed Storage System for Structured Data 40
Bigtable: A Distributed Storage System for Structured Data 41