Data-Intensive Distributed Computing

Slides:

Advertisements

Similar presentations

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China

Advertisements

Tomcy Thankachan  Introduction  Data model  Building Blocks  Implementation  Refinements  Performance Evaluation  Real applications  Conclusion.

CS525: Special Topics in DBs Large-Scale Data Management HBase Spring 2013 WPI, Mohamed Eltabakh 1.

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.

Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session 8: NoSQL This work is licensed under a Creative Commons Attribution-Noncommercial-Share.

Bigtable: A Distributed Storage System for Structured Data Presenter: Guangdong Liu Jan 24 th, 2012.

HBase Presented by Chintamani Siddeshwar Swathi Selvavinayakam

Lecture 7 – Bigtable CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.

Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.

Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.

-A APACHE HADOOP PROJECT

7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.

NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.

NoSQL Database.

 Pouria Pirzadeh  3 rd year student in CS  PhD  Vandana Ayyalasomayajula  1 st year student in CS  Masters.

Distributed storage for structured data

Bigtable: A Distributed Storage System for Structured Data

BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.

Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.

Gowtham Rajappan. HDFS – Hadoop Distributed File System modeled on Google GFS. Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google.

Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.

Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.

HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.

Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.

Bigtable: A Distributed Storage System for Structured Data Google’s NoSQL Solution 2013/4/1Title1 Chao Wang Fay Chang, Jeffrey Dean, Sanjay.

Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.

1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.

Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

Bigtable: A Distributed Storage System for Structured Data 1.

Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.

Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability.

Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cloud Data Models Lecturer.

Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.

CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.

CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.

NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

NOSQL DATABASE Not Only SQL DATABASE

Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,

Bigtable: A Distributed Storage System for Structured Data

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.

Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.

Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.

Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.

Bigtable A Distributed Storage System for Structured Data.

Big Data Infrastructure Week 10: Mutable State (1/2) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.

1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.

CS 405G: Introduction to Database Systems

and Big Data Storage Systems

Amit Ohayon, seminar in databases, 2017

Bigtable A Distributed Storage System for Structured Data

Lecture 7 Bigtable Instructor: Weidong Shi (Larry), PhD

HBase Mohamed Eltabakh

Software Systems Development

Bigtable: A Distributed Storage System for Structured Data

How did it start? • At Google • • • • Lots of semi structured data

INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER

CS122B: Projects in Databases and Web Applications Winter 2017

Data Management in the Cloud

CSE-291 (Cloud Computing) Fall 2016

Gowtham Rajappan.

NOSQL databases and Big Data Storage Systems

NoSQL Systems Overview (as of November 2011).

Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper

Data-Intensive Distributed Computing

A Distributed Storage System for Structured Data

THE GOOGLE FILE SYSTEM.

Presentation transcript:

Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 7: Mutable State (1/2) March 13, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are available at http://lintool.github.io/bigdata-2018w/ This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details

Structure of the Course Analyzing Text Analyzing Graphs Analyzing Relational Data Data Mining “Core” framework features and algorithm design

The Fundamental Problem We want to keep track of mutable state in a scalable manner Assumptions: State organized in terms of logical records State unlikely to fit on single machine, must be distributed MapReduce won’t do! Want more? Take a real distributed systems course!

The Fundamental Problem We want to keep track of mutable state in a scalable manner Assumptions: State organized in terms of logical records State unlikely to fit on single machine, must be distributed Uh… just use an RDBMS?

What do RDBMSes provide? Relational model with schemas Powerful, flexible query language Transactional semantics: ACID Rich ecosystem, lots of tool support

RDBMSes: Pain Points Source: www.flickr.com/photos/spencerdahl/6075142688/

#1: Must design up front, painful to evolve

Flexible design doesn’t mean no design! This should really be a list… Consistent field names? { "token": 945842, "feature_enabled": "super_special", "userid": 229922, "page": "null", "info": { "email": "my@place.com" } } Is this really an integer? Is this really null? What keys? What values? JSON to the Rescue! Flexible design doesn’t mean no design!

#2: Pay for ACID! Source: Wikipedia (Tortoise)

#3: Cost! Source: www.flickr.com/photos/gnusinn/3080378658/

What do RDBMSes provide? Relational model with schemas Powerful, flexible query language Transactional semantics: ACID Rich ecosystem, lots of tool support What if we want a la carte? Source: www.flickr.com/photos/vidiot/18556565/

Features a la carte? Enter… NoSQL! What if I’m willing to give up consistency for scalability? What if I’m willing to give up the relational model for flexibility? What if I just want a cheaper solution? Enter… NoSQL!

Source: geekandpoke.typepad.com/geekandpoke/2011/01/nosql.html

NoSQL (Not only SQL) Horizontally scale “simple operations” Replicate/distribute data over many servers Simple call interface Weaker concurrency model than ACID Efficient use of distributed indexes and RAM Flexible schemas But, don’t blindly follow the hype… Often, MySQL is what you really need! Source: Cattell (2010). Scalable SQL and NoSQL Data Stores. SIGMOD Record.

“web scale”

(Major) Types of NoSQL databases Key-value stores Column-oriented databases Document stores Graph databases

Three Core Ideas Keeping track of the partitions? Partitioning (sharding) To increase scalability and to decrease latency Consistency? Replication To increase robustness (availability) and to increase throughput Consistency? Caching To reduce latency

Key-Value Stores Source: Wikipedia (Keychain)

Key-Value Stores: Data Model Stores associations between keys and values Keys are usually primitives For example, ints, strings, raw bytes, etc. Values can be primitive or complex: often opaque to store Primitives: ints, strings, etc. Complex: JSON, HTML fragments, etc.

Key-Value Stores: Operations Very simple API: Get – fetch value associated with key Put – set value associated with key Optional operations: Multi-get Multi-put Range queries Secondary index lookups Consistency model: Atomic puts (usually) Cross-key operations: who knows?

Key-Value Stores: Implementation Non-persistent: Just a big in-memory hash table Examples: Redis, memcached Persistent Wrapper around a traditional RDBMS Examples: Voldemort What if data doesn’t fit on a single machine?

Simple Solution: Partition! Partition the key space across multiple machines Let’s say, hash partitioning For n machines, store key k at machine h(k) mod n Okay… But: How do we know which physical machine to contact? How do we add a new machine to the cluster? What happens if a machine fails?

Clever Solution Hash the keys Hash the machines also! Distributed hash tables! (following combines ideas from several sources…)

h = 2n – 1 h = 0

Send request to any node, gets routed to correct one in O(n) hops h = 2n – 1 h = 0 Each machine holds pointers to predecessor and successor Send request to any node, gets routed to correct one in O(n) hops Can we do better? Routing: Which machine holds the key?

Send request to any node, gets routed to correct one in O(log n) hops h = 2n – 1 h = 0 Each machine holds pointers to predecessor and successor + “finger table” (+2, +4, +8, …) Send request to any node, gets routed to correct one in O(log n) hops Routing: Which machine holds the key?

Routing: Which machine holds the key? h = 2n – 1 h = 0 Simpler Solution Service Registry Routing: Which machine holds the key?

How do we rebuild the predecessor, successor, finger tables? Stoica et al. (2001). Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. SIGCOMM. h = 2n – 1 h = 0 Cf. Gossip Protoccols How do we rebuild the predecessor, successor, finger tables? New machine joins: What happens?

Solution: Replication h = 2n – 1 h = 0 N = 3, replicate +1, –1 Covered! Covered! Machine fails: What happens?

Three Core Ideas Keeping track of the partitions? Partitioning (sharding) To increase scalability and to decrease latency Consistency? Replication To increase robustness (availability) and to increase throughput Caching To reduce latency

Another Refinement: Virtual Nodes Don’t directly hash servers Create a large number of virtual nodes, map to physical servers Better load redistribution in event of machine failure When new server joins, evenly shed load from other servers

Bigtable Source: Wikipedia (Table)

Bigtable Applications Gmail Google’s web crawl Google Earth Google Analytics Data source and data sink for MapReduce HBase is the open-source implementation…

Data Model A table in Bigtable is a sparse, distributed, persistent multidimensional sorted map Map indexed by a row key, column key, and a timestamp (row:string, column:string, time:int64)  uninterpreted byte array Supports lookups, inserts, deletes Single row transactions only Image Source: Chang et al., OSDI 2006

Rows and Columns Rows maintained in sorted lexicographic order Applications can exploit this property for efficient row scans Row ranges dynamically partitioned into tablets Columns grouped into column families Column key = family:qualifier Column families provide locality hints Unbounded number of columns At the end of the day, it’s all key-value pairs!

row, column family, column qualifier, timestamp Key-Values row, column family, column qualifier, timestamp value

Okay, so how do we build it? In Memory On Disk Mutability Easy Mutability Hard Small Big

Log Structured Merge Trees MemStore Writes Reads What happens when we run out of memory?

Log Structured Merge Trees MemStore Writes Reads Memory Flush to disk Disk Store Immutable, indexed, persistent, key-value pairs What happens to the read path?

Log Structured Merge Trees MemStore Writes Reads Memory Flush to disk Disk Store Immutable, indexed, persistent, key-value pairs What happens as more writes happen?

Log Structured Merge Trees MemStore Writes Reads Memory Flush to disk Disk Store Store Store Store Immutable, indexed, persistent, key-value pairs What happens to the read path?

Log Structured Merge Trees MemStore Writes Reads Memory Flush to disk Disk Store Store Store Store Immutable, indexed, persistent, key-value pairs What’s the next issue?

Log Structured Merge Trees MemStore Writes Reads Memory Flush to disk Disk Store Store Store Compaction! Immutable, indexed, persistent, key-value pairs

Log Structured Merge Trees MemStore Writes Reads Memory Flush to disk Disk Store Compaction! Immutable, indexed, persistent, key-value pairs

Log Structured Merge Trees MemStore Writes Reads Memory Flush to disk Disk Logging for persistence WAL Store Immutable, indexed, persistent, key-value pairs One final component…

Log Structured Merge Trees The complete picture… Merge MemStore Writes Reads Memory Disk Flush to disk Logging for persistence WAL Store Store Store Immutable, indexed, persistent, key-value pairs Compaction!

Log Structured Merge Trees The complete picture… Okay, now how do we build a distributed version?

Bigtable building blocks HBase Bigtable building blocks HDFS GFS HFile SSTable Region Tablet Regions Server Tablet Server Chubby Zookeeper

HFile SSTable Persistent, ordered immutable map from keys to values Stored in GFS: replication “for free” Supported operations: Look up value associated with key Iterate key/value pairs within a key range

Region Tablet Dynamically partitioned range of rows Tablet SSTable Comprised of multiple SSTables Tablet aardvark - base SSTable SSTable SSTable SSTable

Region Server Tablet Server MemStore Writes Reads Flush to disk Memory Disk Flush to disk Logging for persistence WAL SSTable SSTable SSTable Immutable, indexed, persistent, key-value pairs Compaction!

Table Comprised of multiple tablets Tablet Tablet SSTable SSTable SSTables can be shared between tablets Tablet aardvark - base Tablet basic - database SSTable SSTable SSTable SSTable SSTable SSTable

Tablet to Tablet Server Assignment Region Region Server Tablet to Tablet Server Assignment Each tablet is assigned to one tablet server at a time Exclusively handles read and write requests to that tablet What happens when a tablet grow too big? What happens when a tablet server fails? We need a lock service!

Bigtable building blocks HBase Bigtable building blocks HDFS GFS HFile SSTable Region Tablet Regions Server Tablet Server Chubby Zookeeper

Architecture Client library HMaster Bigtable master Tablet servers Regions Servers

Bigtable Master Roles and responsibilities: Tablet structure changes: Assigns tablets to tablet servers Detects addition and removal of tablet servers Balances tablet server load Handles garbage collection Handles schema changes Tablet structure changes: Table creation/deletion (master initiated) Tablet merging (master initiated) Tablet splitting (tablet server initiated)

Compactions Minor compaction Merging compaction Major compaction Converts the memtable into an SSTable Reduces memory usage and log traffic on restart Merging compaction Reads a few SSTables and the memtable, and writes out a new SSTable Reduces number of SSTables Major compaction Merging compaction that results in only one SSTable No deletion records, only live data

Table Comprised of multiple tables Tablet Tablet How does this happen? SSTables can be shared between tablets Tablet aardvark - base Tablet basic - database How does this happen? SSTable SSTable SSTable SSTable SSTable SSTable

Three Core Ideas Keeping track of the partitions? Partitioning (sharding) To increase scalability and to decrease latency Consistency? Replication To increase robustness (availability) and to increase throughput Caching To reduce latency

HBase Image Source: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

Questions? Source: Wikipedia (Japanese rock garden)