Lecture 6. NoSQL and Bigtable

Slides:



Advertisements
Similar presentations
Introduction to cloud computing
Advertisements

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Tomcy Thankachan  Introduction  Data model  Building Blocks  Implementation  Refinements  Performance Evaluation  Real applications  Conclusion.
Homework 2 What is the role of the secondary database that we have to create? What is the role of the secondary database that we have to create?  A relational.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Bigtable: A Distributed Storage System for Structured Data Presenter: Guangdong Liu Jan 24 th, 2012.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Lecture 7 – Bigtable CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
 Pouria Pirzadeh  3 rd year student in CS  PhD  Vandana Ayyalasomayajula  1 st year student in CS  Masters.
BigTable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Distributed storage for structured data
Bigtable: A Distributed Storage System for Structured Data
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.
Bigtable: A Distributed Storage System for Structured Data Google’s NoSQL Solution 2013/4/1Title1 Chao Wang Fay Chang, Jeffrey Dean, Sanjay.
BigTable and Accumulo CMSC 461 Michael Wilson. BigTable  This was Google’s original distributed data concept  Key value store  Meant to be scaled up.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
Bigtable: A Distributed Storage System for Structured Data 1.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
1 HBase Intro 王耀聰 陳威宇
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Bigtable: A Distributed Storage System for Structured Data
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
Bigtable A Distributed Storage System for Structured Data.
Google Cloud computing techniques (Lecture 03) 18th Jan 20161Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
Bigtable: A Distributed Storage System for Structured Data Written By: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Lecture 4. MapReduce Instructor: Weidong Shi (Larry), PhD
Plan for Final Lecture What you may expect to be asked in the Exam?
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Presented by: Omar Alqahtani Fall 2016
and Big Data Storage Systems
Lecture 8: BigTable and Dynamo
Cloud Computing and Architecuture
Bigtable A Distributed Storage System for Structured Data
Lecture 7 Bigtable Instructor: Weidong Shi (Larry), PhD
Column-Based.
HBase Mohamed Eltabakh
Hadoop.
Software Systems Development
Bigtable: A Distributed Storage System for Structured Data
An Open Source Project Commonly Used for Processing Big Data Sets
How did it start? • At Google • • • • Lots of semi structured data
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Data Management in the Cloud
Large-scale file systems and Map-Reduce
CSE-291 (Cloud Computing) Fall 2016
NOSQL.
NOSQL databases and Big Data Storage Systems
Google and Cloud Computing
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Ch 4. The Evolution of Analytic Scalability
Hbase – NoSQL Database Presented By: 13MCEC13.
A Distributed Storage System for Structured Data
Presentation transcript:

Lecture 6. NoSQL and Bigtable COSC6376 Cloud Computing Lecture 6. NoSQL and Bigtable Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston

Outline NoSQL HW2 Bigtable

SQL vs. noSQL Functionality SQL NoSQL Data Storage SQL follows the relational model, which comprises of rows and columns. Each row represents all the information about one specific entry/entity, and columns are distinct data points. NoSQL follows a non-relational model i.e. the data is not stored in a tabular form, instead it is stored in small chunks termed as Collections. The collection could be like graphs, key-value pairs or documents. Schemas and Flexibility Schemas are locked and static before the data entry Schemas are dynamic and could be altered at runtime Scalability Scaling for an RDBMS is vertical that in turn means storing data across multiple servers and so is considered to be expensive Scaling for a non-RDBMS is horizontal, one could use cheap servers for cloud management to store data, which in turn could be cost-effective ACID amenability [Atomicity, Consistency, Isolation, Durability] All RDMS are ACID compliant NoSQLis not an ACID compliant technology. CAP Theorem Amenability [ Consistency, Availability, Partial Tolerance] CAP theorem adoption and application is not possible for SQL NoSQL databases could let you choose between the two priorities as per the theorem.

Pros Mostly open source Horizontal scalability Support for Map/Reduce There’s no need for complex joins and data can be easily shared and processed in parallel Support for Map/Reduce It is a simple paradigm that allows for scaling computation on cluster of computing nodes No need to develop fine-grained data model It saves development time Very fast for adding new data and for simple operations/queries No need to make significant changes in code when data structure is modified Ability to store complex data types (for document-based solutions) in a single item of storage

Cons Immaturity Possible database administration issues Still lots of rough edges Possible database administration issues NoSQL often sacrifices features that are present in SQL solutions “by default” for the sake of performance No indexing support Some solutions like MongoDB have to index, but it’s not as powerful as in SQL solutions No ACID Complex consistency models Eventual consistency CAP theorem states that it’s not possible to achieve consistency, availability and partitioning tolerance at the same time NoSQL vendors are trying to make their solutions as fast as possible, and consistency is a most typical trade-off

Types of noSQL

HW2

OpenstreetMap

OSM Size and Growth Current Data – c. 0.5 – 1 TB Current and Historical Data – 5.15TB Growing at 1TB per annum Source: Planet OSM http://planet.openstreetmap.org OSM Historical – every version ever of everything in the database, including now deleted items Hardware growth Source: OSM http://munin.openstreetmap.org/openstreetmap/katla.openstreetmap/postgres_size_openstreetmap_9_1_main.html

noSQL Spatial Implementations that add spatial capabilities to NoSQL databases SpatialHadoop, Hadoop GIS, ESRI tools for Hadoop SpatialSpark, GeoTrellis Geomesa, Geowave MongoDB (extension) Geocouch Geographic data is problematic for databases – it is minimum 2 dimensional – X and Y Imagine a list of names or numbers – it is easy to order them because the key is one-dimensional. So if I have a list between 1-1 billion I know that 1-10 million are on computer A, 10-20 million on B etc. and I can get 1-500,000 easily as they are on the same computer. But I can’t do that for 2-dimensional space, so need to convert it into 2d space for efficiency Space-filling curves try to map 2d onto 1d so that we don’t need to query every node in the cluster if we want to query a geographic area Z-order curve on left Hilbert Curve – on right Geohashing is a form of Z-order curve

Yelp Dataset Containing 1,100k reviews of 42k businesses written by 190k users in five cities, namely Phoenix, Las Vegas, Madison, Waterloo in Canada, and Edinburgh in the UK, over 9 years from 2005 to present.

Yelp Dataset

Map Yelp Reviews

Tools https://github.com/Yelp/dataset-examples https://github.com/stev-0/osm-Hbase

Bigtable Fay Chang, et al @google.com

Global Picture

Why Bigtable? Performance of RDBMS system is good for transaction processing but for very large scale analytic processing, the solutions are commercial, expensive, and specialized. Very large scale analytic processing Big queries – typically range or table scans. Big databases (100s of TB)

Why Bigtable? (2) Map reduce on Bigtable with optionally Cascading on top to support some relational algebras may be a cost effective solution. Sharding is not a solution to scale open source RDBMS platforms Application specific Labor intensive (re)partitionaing

Bigtable BigTable is a distributed storage system for managing data. Designed to scale to a very large size Petabytes of data across thousands of servers Used for many Google projects Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, … Flexible, high-performance solution for all of Google’s products

BigTable Distributed multi-level map Fault-tolerant, persistent Scalable Thousands of servers Terabytes of in-memory data Petabyte of disk-based data Millions of reads/writes per second, efficient scans Self-managing Servers can be added/removed dynamically Servers adjust to load imbalance Often want to examine data changes over time E.g. Contents of a web page over multiple crawls

Building Blocks Building blocks: BigTable uses of building blocks: Google File System (GFS): Raw storage Scheduler: schedules jobs onto machines Lock service: distributed lock manager MapReduce: simplified large-scale data processing BigTable uses of building blocks: GFS: stores persistent data (SSTable file format for storage of data) Scheduler: schedules jobs involved in BigTable serving Lock service: master election Map Reduce: often used to read/write BigTable data

(row, column, timestamp) -> cell contents Basic Data Model A BigTable is a sparse, distributed persistent multi-dimensional sorted map (row, column, timestamp) -> cell contents Good match for most Google applications

WebTable Example Want to keep copy of a large collection of web pages and related information Use URLs as row keys Various aspects of web page as column names Store contents of web pages in the contents: column under the timestamps when they were fetched.

Rows Name is an arbitrary string Rows ordered lexicographically Access to data in a row is atomic Row creation is implicit upon storing data Rows ordered lexicographically Rows close together lexicographically usually on one or a small number of machines

Rows (cont.) Reads of short row ranges are efficient and typically require communication with a small number of machines. Can exploit this property by selecting row keys so they get good locality for data access. Example: math.gatech.edu, math.uga.edu, phys.gatech.edu, phys.uga.edu VS edu.gatech.math, edu.gatech.phys, edu.uga.math, edu.uga.phys

Columns Columns have two-level name structure: Column family family:optional_qualifier Column family Unit of access control Has associated type information Qualifier gives unbounded columns Additional levels of indexing, if desired

Timestamps Used to store different versions of data in a cell New writes default to current time, but timestamps for writes can also be set explicitly by clients Lookup options: “Return most recent K values” “Return all values in timestamp range (or all values)” Column families can be marked w/ attributes: “Only retain most recent K values in a cell” “Keep values until they are older than K seconds”

API Metadata operations Writes (atomic) Reads Create/delete tables, column families, change metadata Writes (atomic) Set(): write cells in a row DeleteCells(): delete cells in a row DeleteRow(): delete all cells in a row Reads Scanner: read arbitrary cells in a bigtable Each row read is atomic Can restrict returned rows to a particular range Can ask for just data from 1 row, all rows, etc. Can ask for all columns, just certain column families, or specific columns

API Examples: Write/Modify atomic row modification

Return sets can be filtered using regular expressions: API Examples: Read Return sets can be filtered using regular expressions: anchor: com.cnn.*

HBase is an open-source, distributed, column-oriented database built on top of HDFS based on BigTable!

HBase is .. A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate on top of the Hadoop distributed file system (HDFS) or Kosmos File System (KFS, aka Cloudstore) for scalability, fault tolerance, and high availability.

Why HBase ? HBase is a Bigtable clone. It is open source It has a good community and promise for the future It is developed on top of and has good integration for the Hadoop platform, if you are using Hadoop already.

HBase Is Not … No join operators. Limited atomicity and transaction support. HBase supports multiple batched mutations of single rows only. Data is unstructured and untyped. No accessed or manipulated via SQL. Programmatic access via Java, REST, or Thrift APIs. Scripting via JRuby.

HBase benefits than RDBMS No real indexes Automatic partitioning Scale linearly and automatically with new nodes Commodity hardware Fault tolerance Batch processing

Testing $ hbase shell > create 'test', 'data' 0 row(s) in 4.3066 seconds > list test 1 row(s) in 0.1485 seconds > put 'test', 'row1', 'data:1', 'value1' 0 row(s) in 0.0454 seconds > put 'test', 'row2', 'data:2', 'value2' 0 row(s) in 0.0035 seconds > put 'test', 'row3', 'data:3', 'value3' 0 row(s) in 0.0090 seconds > scan 'test' ROW COLUMN+CELL row1 column=data:1, timestamp=1240148026198, value=value1 row2 column=data:2, timestamp=1240148040035, value=value2 row3 column=data:3, timestamp=1240148047497, value=value3 3 row(s) in 0.0825 seconds > disable 'test' 09/04/19 06:40:13 INFO client.HBaseAdmin: Disabled test 0 row(s) in 6.0426 seconds > drop 'test' 09/04/19 06:40:17 INFO client.HBaseAdmin: Deleted test 0 row(s) in 0.0210 seconds > list 0 row(s) in 2.0645 seconds

Connecting to HBase Java client Non-Java clients get(byte [] row, byte [] column, long timestamp, int versions); Non-Java clients Thrift server hosting HBase client instance Sample ruby, c++, & java (via thrift) clients REST server hosts HBase client TableInput/OutputFormat for MapReduce HBase as MR source or sink HBase Shell ./bin/hbase shell YOUR_SCRIPT

Bigtable Applications

Application 1: Google Analytics Enables webmasters to analyze traffic pattern at their web sites. Statistics such as: Number of unique visitors per day and the page views per URL per day, Percentage of users that made a purchase given that they earlier viewed a specific page. How? A small JavaScript program that the webmaster embeds in their web pages. Every time the page is visited, the program is executed. Program records the following information about each request: User identifier The page being fetched

Application 1: Google Analytics Two of the Bigtables Raw click table (~ 200 TB) A row for each end-user session. Row name include website’s name and the time at which the session was created. Clustering of sessions that visit the same web site. And a sorted chronological order. Compression factor of 6-7. Summary table (~ 20 TB) Stores predefined summaries for each web site. Generated from the raw click table by periodically scheduled MapReduce jobs. Each MapReduce job extracts recent session data from the raw click table. Row name includes website’s name and the column family is the aggregate summaries. Compression factor is 2-3.

Application 2: Google Earth & Maps Functionality: Pan, view, and annotate satellite imagery at different resolution levels. One Bigtable stores raw imagery (~ 70 TB): Row name is a geographic segments. Names are chosen to ensure adjacent geographic segments are clustered together. Column family maintains sources of data for each segment.

Google File System Large-scale distributed “filesystem” Master: responsible for metadata Chunk servers: responsible for reading and writing large chunks of data Chunks replicated on 3 machines, master responsible for ensuring replicas exist

SSTable Immutable, sorted file of key-value pairs Chunks of data plus an index Index is of block ranges, not values SSTable 64K block 64K block 64K block Index Bloom Filter

Tablet Contains some range of rows of the table Built out of multiple SSTables Tablet Start:aardvark End:apple SSTable SSTable 64K block 64K block 64K block 64K block 64K block 64K block Index Index

Table Multiple tablets make up the table SSTables can be shared Tablets do not overlap, SSTables can overlap Tablet Tablet aardvark apple apple_two_E boat SSTable SSTable SSTable SSTable

Chubby A persistent and distributed lock service. Consists of 5 active replicas, one replica is the master and serves requests. Service is functional when majority of the replicas are running and in communication with one another – when there is a quorum. Implements a nameservice that consists of directories and files.

Bigtable and Chubby Bigtable uses Chubby to: Ensure there is at most one active master at a time, Store the bootstrap location of Bigtable data (Root tablet), Discover tablet servers and finalize tablet server deaths, Store Bigtable schema information (column family information), Store access control list. If Chubby becomes unavailable for an extended period of time, Bigtable becomes unavailable.

Tablet Assignment Each tablet is assigned to one tablet server at a time. Master server keeps track of the set of live tablet servers and current assignments of tablets to servers. Also keeps track of unassigned tablets. When a tablet is unassigned, master assigns the tablet to an tablet server with sufficient room.

Bigtable Master Assigns tablets to tablet servers Detects addition and expiration of tablet servers Balances tablet server load. Tablets are distributed randomly on nodes of the cluster for load balancing. Handles garbage collection Handles schema changes

Bigtable Tablet Servers Each tablet server manages a set of tablets Typically between ten to a thousand tablets Each 100-200 MB by default Handles read and write requests to the tablets Splits tablets that have grown too large Master responsible for load balancing and fault tolerance Use Chubby to monitor health of tablet servers, restart failed servers

A 3-level Hierarchy 1st Level: A file stored in chubby contains location of the root tablet, i.e., a directory of ranges (tablets) and associated meta-data. The root tablet never splits. 2nd Level: Each meta-data tablet contains the location of a set of user tablets. 3rd Level: A set of SSTable identifiers for each tablet.

A 3-level Hierarchy Each meta-data row stores ~ 1KB of data, With 128 MB tablets, the three level store addresses 234 tablets (261 bytes in 128 MB tablets). Approaches a Zetabyte (million Petabytes).

Editing a Table Mutations are logged, then applied to an in-memory version Logfile stored in GFS Tablet Insert Memtable Insert Delete apple_two_E boat Insert Delete Insert SSTable SSTable

Tablet Serving “Log Structured Merge Trees” Image Source: Chang et al., OSDI 2006

Tablet Representation append-only log on GFS SSTable on GFS write buffer in memory (random-access) write read Tablet SSTable: Immutable on-disk ordered map from stringstring String keys: <row, column, timestamp> triples

Client Write & Read Operations Write operation arrives at a tablet server: Server ensures the client has sufficient privileges for the write operation (access control, Chubby), A log record is generated to the commit log file, Once the write commits, its contents are inserted into the memtable. Read operation arrives at a tablet server: Server ensures client has sufficient privileges for the read operation (Chubby), Read is performed on a merged view of (a) the SSTables that constitute the tablet, and (b) the memtable.

Write Operations As writes execute, size of memtable increases. Once memtable reaches a threshold: Memtable is frozen, A new memtable is created, Frozen metable is converted to an SSTable and written to GFS.

Compactions Minor compaction Merging compaction Major compaction Converts the memtable into an SSTable Reduces memory usage and log traffic on restart Merging compaction Reads the contents of a few SSTables and the memtable, and writes out a new SSTable Reduces number of SSTables Major compaction Merging compaction that results in only one SSTable No deletion records, only live data

Refinements: Locality Groups Can group multiple column families into a locality group Separate SSTable is created for each locality group in each tablet. Segregating columns families that are not typically accessed together enables more efficient reads. In WebTable, page metadata can be in one group and contents of the page in another group.

Refinements: Compression Many opportunities for compression Similar values in the same row/column at different timestamps Similar values in different columns Similar values across adjacent rows Two-pass custom compressions scheme First pass: compress long common strings across a large window Second pass: look for repetitions in small window Speed emphasized, but good space reduction (10-to-1)

Refinements: Bloom Filters Read operation has to read from disk when desired SSTable isn’t in memory Reduce number of accesses by specifying a Bloom filter. Allows us ask if an SSTable might contain data for a specified row/column pair. Small amount of memory for Bloom filters drastically reduces the number of disk seeks for read operations Use implies that most lookups for non-existent rows or columns do not need to touch disk

Bloom Filters

Approximate set membership problem Suppose we have a set S = {s1,s2,...,sm}  universe U Represent S in such a way we can quickly answer “Is x an element of S ?” To take as little space as possible ,we allow false positive (i.e. xS , but we answer yes ) If xS , we must answer yes .

Bloom filters 1. Initially set the array to 0 Consist of an arrays A[n] of n bits (space) , and k independent random hash functions h1,…,hk : U --> {0,1,..,n-1} 1. Initially set the array to 0 2.  sS, A[hi(s)] = 1 for 1 i  k (an entry can be set to 1 multiple times, only the first times has an effect ) 3. To check if xS , we check whether all location A[hi(x)] for 1 i  k are set to 1 If not, clearly xS. If all A[hi(x)] are set to 1 ,we assume xS

Initial with all 0 Each element of S is hashed k times 1 x1 x2 Each element of S is hashed k times Each hash location set to 1 Initial with all 0

If only 1s appear, conclude that y is in S This may yield false positive 1 x1 x2

BigTable – Bloom Filters Drastically reduces the number of disk seeks required for read operations !

Benchmarks