Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,

Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Reporter: yu-wen chang

Abstract  Bigtable(Bt) is a distributed storage system for managing structured data that is designed to scale to a very large size.  Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance.

Outline  Introduction  Data Model  APIs  Building Blocks  Implementation  Refinements  Performance  Real Applications  Conclusion

Introduction  Bigtable is designed to reliably scale to petabytes of data and thousands of machines.  Bigtable has achieved several goals:  Wide applicability.  Scalability.  High performance.  High availability.

Data Model  A Bigtable is a sparse, distributed, persistent multidimensional sorted map.  The map is indexed by a row key, column key, and a timestamp.

Data Model-Row  The row keys in a table are arbitrary strings.  Bigtable maintains data in lexicographic order by row key.  Each row range is called a tablet, which is the unit of distribution and load balancing.

Data Model-Column  Column keys are grouped into sets called column families.  A column key is named using the following syntax: family : qualier.  Another useful column family for this table is anchor.

Data Model-Timestamp  Each cell in a Bigtable can contain multiple versions of the same data; these versions are indexed by timestamp.

Data Model http://tw.yahoo.com/ com.yahoo.tw row keyrow key http://tw.bid.y ahoo.com http://buy.yahoo.com.tw TTT Anchor: 拍賣 Anchor: 購物中心 ColumnColumn Family

APIs  The Bigtable API provides functions :  Creating and deleting tables and column families.  Changing cluster, table and column family metadata.

Building Blocks  Bigtable uses the distributed Google File System (GFS) to store log and data files.  The Google SSTable file format is used internally to store Bigtable data.  An SSTable provides a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte strings.

Building Blocks(cont.)  Bigtable relies on a highly-available and persistent distributed lock service called Chubby.  Chubby provides a namespace that consists of directories and small files. Each directory or file can be used as a lock.

Implementation  The Bigtable implementation has three major components:  Library that linked into client.  Master server.  Table server.

Implementation

 Each tablet server manages a set of tablets  The tablet server handles read and write requests to the tablets that it has loaded, and also splits tablets that have grown too large.

Implementation-Tablet Location  We use a three-level hierarchy analogous to that of a B+-tree to store tablet location information.  Each METADATA row stores approximately 1KB of data in memory. With a modest limit of 128 MB METADATA tablets.

Implementation-Tablet Location

 The client library caches tablet locations.  Tablet locations are stored in memory, so no GFS accesses are required.  We also store secondary information in the METADATA table, including a log of all events pertaining to each tablet

Implementation-Tablet Assignment  Each tablet is assigned to one tablet server at a time.  Bigtable uses Chubby to keep track of tablet servers.  When a tablet server starts, it creates, and acquires an exclusive lock on, a uniquely- named file in a specific Chubby directory.

Implementation-Tablet Assignment  When a master is started by the cluster management system, it needs to discover the current tablet assignments before it can change them.

Implementation-Tablet Assignment  The master executes the following steps at startup:  The master grabs a unique master lock in Chubby.  The master scans the servers directory in Chubby to find the live servers.  The master communicates with every live tablet server to discover what tablets are already assigned to each server.  The master scans the METADATA table to learn the set of tablets.

Tablet Serving

Implementation-Compactions  When the memtable size reaches a threshold, the memtable is frozen, a new memtable is created, and the frozen memtable is converted to an SSTable and written to GFS.  A merging compaction that rewrites all SSTables into exactly one SSTable is called a major compaction.

Renements  Locality groups:  Clients can group multiple column families together into a locality group.  Compression:  Uses Bentley and McIlroy's scheme and fast compression algorithm.  Caching for read performance:  Uses Scan Cache and Block Cache.  Bloom filters:  Reduce the number of accesses.

Performance Evaluation

Real Applications  Google Analytics  http://analytics.google.com  Google Earth  http://earth.google.com  Personalized search  www.google.com/psearch

Conclusion

Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,

Similar presentations

Presentation on theme: "Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,

Similar presentations

Presentation on theme: "Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,"— Presentation transcript:

Similar presentations

About project

Feedback