Download presentation
Presentation is loading. Please wait.
Published byDoreen Dawson Modified over 8 years ago
1
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006
2
Content Introduction Data Model API and DEMO Building Blocks Implementation Conclusion
3
Introduction Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. goals: – wide applicability – Scalability – high performance – high availability
4
Introduction
5
Bigtable resembles a database, but isn’t Bigtable does not support a full relational data model Bigtable treats data as uninterpreted strings Bigtable schema parameters let clients dynamically control whether to serve data out of memory or from disk.
6
Data Model A Bigtable is a sparse, distributed, persistent multi- dimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. (row:string, column:string, time:int64) -> string
7
Data Model
8
Rows – a tablet is a row range: a the unit of distribution and load balancing Column Families – Column Families mean the sets Column keys are grouped into. – syntax: family:qualier. Column family names must be printable,but qualiers may be arbitrary strings Timestamp – 64-bit integers
9
API and DEMO
11
Building Blocks Google File System (GFS) – store log and data les – Run in a shared pool of machines Google SSTable: File format – Key to Value – Index map Chubby: a highly-available and persistent distributed lock service
12
Implementation Client Master Server Tablet Server
13
Implementation Master Server: – assigning tablets to tablet servers – detecting the addition and expiration of tablet servers – balancing tablet-server load – garbage collection of files in GFS. Tablet Server: – manage a set of tablets – read and write requests to the tablets – Splits tablets
14
Implementation
16
Others Refinement – Locality groups – Compression – Caching for read performance – Bloom filters – Commit-log implementation Performance Evaluation Lesson
18
THANK YOU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.