Big Table - Slides by Jatin
Goals wide applicability Scalability high performance and high availability
Bigtable resembles a database Bigtable does not support a full relational data model Data is indexed using row and column names that can be arbitrary strings
What is Bigtable? A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. (row:string, column:string, time:int64) -> string
For example, bigtable store data for maps.google.com/index.html under the key com.google.maps/index.html
Columns A table may have an unbounded number of columns. Column keys are grouped into sets called column families A column key is named using the following syntax: family:qualier.
Storage Bigtable uses the distributed Google File System (GFS) to store log and data files. The Google SSTable file format is used internally to store Bigtable data. An SSTable provides a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte strings. Operations are provided to look up the value associated with a specified key, and to iterate over all key/value pairs in a specified key range
Implementation The implementation has three parts: – Library code at each client – Master server – Tablet Servers Each Tablet Server starts with a single tablet. When the size of this tablet becomes large it gets split into two tablets. The Tablet location information is stored using a B+ tree kind of hierarchy. Bigtable relies on a highly-available and persistent distributed lock service called Chubby.
Tablet location hierarchy
Finding Tablet Location Client caches tablet locations. In case if it does not know, it has to make three network round-trips in case cache is empty and upto six round trips in case cache is stale. Tablet locations are stored in memory, so no GFS accesses are required