Download presentation
Presentation is loading. Please wait.
1
Lecture 7 – Bigtable CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
2
Previous Classes MapReduce toolkit for processing data GFS file system RPC, etc
3
GFS vs Bigtable GFS provides raw data storage We need: More sophisticated storage Flexible enough to be useful Store semi-structured data Reliable, scalable, etc
4
Examples URLs: Contents, crawl metadata, links, anchors, pagerank, … Per-user data: User preference settings, recent queries/search results, … Geographic locations: Physical entities (shops, restaurants, etc.), roads, satellite image data, user annotations, …
5
Commercial DB Why not use commercial database? Not scalable enough Too expensive We need to do low-level optimizations
6
BigTable Features Distributed key-value map Fault-tolerant, persistent Scalable Thousands of servers Terabytes of in-memory data Petabytes of disk-based data Millions of reads / writes per second, efficient scans Self managing Servers can be added / removed dynamically Servers adjust to load imbalance
7
7 Basic Data Model Distributed multi-dimensional sparse map (row, column, timestamp) cell contents “www.cnn.com” “contents:” Rows Columns Timestamps t3t3 t 11 t 17 “ …” Good match for most of our applications
8
8 Rows Name is an arbitrary string Access to data in a row is atomic Row creation is implicit upon storing data Rows ordered lexicographically Rows close together lexicographically usually reside on one or a small number of machines
9
9 Columns “www.cnn.com” “contents:” “ …”“CNN home page” “ anchor:cnnsi.com ” “CNN” “ anchor:stanford.edu ” Columns have two-level name structure: family:optional_qualifier Column family Unit of access control Has associated type information Qualifier gives unbounded columns Additional level of indexing, if desired
10
10 Column Families Must be created before data can be stored Small number of column families Unbounded number of columns
11
11 Timestamps Used to store different versions of data in a cell New writes default to current time, but timestamps for writes can also be set explicitly by clients
12
12 Timestamps Garbage Collection Per-column-family settings to tell Bigtable to GC “Only retain most recent K values in a cell” “Keep values until they are older than K seconds”
13
13 API Create / delete tables and column families Table *T = OpenOrDie(“/bigtable/web/webtable”); RowMutation r1(T, “com.cnn.www”); r1.Set(“anchor:www.c-span.org”, “CNN”); r1.Delete(“anchor:www.abc.com”); Operation op; Apply(&op, &r1);
14
14 Locality Groups Column families can be assigned to a locality group Used to organize underlying storage representation for performance scans over one locality group are O(bytes_in_locality_group), not O(bytes_in_table) Data in a locality group can be explicitly memory-mapped
15
15 SSTable File-format for storing files Key-Value Map Persistent Ordered Immutable Keys and values are strings
16
16 SSTable Operations Look up value for key Iterate over all key/value pairs in specified range Sequence of blocks (64 KB) Block index used to locate blocks How do we find block by block index? Binary search on in-memory index Or, map complete SSTable into memory
17
17 SSTable Relies on lock service called Chubby Ensure there is at most one active master Store bootstrap location of Bigtable data Finalize table server death Store column family information Store access control lists
18
18 Chubby Namespace that consists of directories and small files Each directory or file can be used as lock Chubby client maintains session with Chubby service Expires if unable to renew its session lease within expiration time If expired, client loses any locks and open handles Atomic Reads / Writes
19
19 Tablets Rows A - E Rows F - R Rows S - Z As table grows, split tables into tablets
20
20 Tablets Large tables are broken into tablets at row boundaries Tablet holds contiguous range of rows Aim for ~100MB to 200MB of data per tablet Tablet server responsible for ~100 tablets Fine-grained load balancing: Migrate tablets away from overloaded machine Master makes load-balancing decisions
21
21 Tablet Server Master assigns tablets to table servers Tablet server Handles reads / writes requests to tablets Splits large tablets Client does not move data through master
22
22 Master Startup Grab unique master lock in Chubby Scan servers directory in Chubby to find live servers Communicate with every live tablet to discover which tablets are assigned Scan METADATA table to learn set of tablets Track unassigned tablet
23
23 Tablet Assignment Master has list of unassigned tablets When a tablet is unassigned, and a tablet server has room for it, Master sends tablet load request to tablet server
24
24 Tablet Serving Persistent state of tablet is stored in GFS Updates committed to log that stores redo records Memtable: sorted buffer in memory of recent commits Older updates stored in SSTable
25
25 What if memtable gets too big? Minor compaction: Create new memtable Convert old memtable to SSTable and write to GFS Note: every minor compaction creates a new SSTable
26
26 Compactions Merge Compaction Bound number of SSTables by executing merge compaction Reads contents of a few SSTables and memtable, and writes out new SSTable Major compaction: Merging compaction that rewrites all SSTables into one SSTable Produces SSTable that contains no deletion info Allows Bigtable to reclaim resources from deleted data
27
27 System Structure Lock service Bigtable master Bigtable tablet server GFSCluster scheduling system … holds metadata, handles master-election holds tablet data, logshandles failover, monitoring performs metadata ops + load balancing serves data Bigtable Cell Bigtable client library Open() read/write metadata ops
28
28 Google: The Big Picture Custom solutions for unique problems! GFS: Stores data reliably But just raw files BigTable: gives us key/value map Database like, but doesn’t provide everything we need Chubby: locking mechanism SSTable file format MapReduce: lets us process data from BigTable (and other sources)
29
29 Common Principles One master, multiple helpers MapReduce: master coordinates work amongst map / reduce workers Bigtable: master knows about location of tablet servers GFS: master coordinates data across chunkservers Issues with a single master What about master failure? How do you avoid bottlenecks?
30
30 Next Class Chord: A Scalable P2P Lookup Service for Internet Applications Distributed Hash Table Guest Speaker: John McDowell, Microsoft
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.