CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos – A. Pavlo Lecture#28: Modern Systems.

Slides:



Advertisements
Similar presentations
Andy Pavlo April 13, 2015April 13, 2015April 13, 2015 NewS QL.
Advertisements

CS525: Special Topics in DBs Large-Scale Data Management HBase Spring 2013 WPI, Mohamed Eltabakh 1.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Presented By Alon Adler – Based on OSDI ’12 (USENIX Association)
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#26: Database Systems.
Chapter 13 (Web): Distributed Databases
NoSQL Databases: MongoDB vs Cassandra
Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
Distributed Databases
Distributed storage for structured data
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Cloud Storage: All your data belongs to us! Theo Benson This slide includes images from the Megastore and the Cassandra papers/conference slides.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#23: Distributed Database Systems (R&G.
Distributed Storage System Survey
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
Cloud Computing Cloud Data Serving Systems Keke Chen.
CSC 536 Lecture 10. Outline Case study Google Spanner Consensus, revisited Raft Consensus Algorithm.
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
NoSQL overview 杨振东. An order, which looks like a single aggregate structure in the UI, is split into many rows from many tables in a relational database.
Cassandra - A Decentralized Structured Storage System
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cloud Data Models Lecturer.
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
Eiger: Stronger Semantics for Low-Latency Geo-Replicated Storage Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡ * Princeton,
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Databases Illuminated
Megastore: Providing Scalable, Highly Available Storage for Interactive Services Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin,
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#25: Column Stores.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
Nov 2006 Google released the paper on BigTable.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#24: Distributed Database Systems (R&G.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed DBMS, Query Processing and Optimization
CS 540 Database Management Systems
Bigtable: A Distributed Storage System for Structured Data
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Google Spanner Steve Ko Computer Sciences and Engineering University at Buffalo.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
An Introduction to Super-Scalability But first…
Introduction to NoSQL Databases Chyngyz Omurov Osman Tursun Ceng,Middle East Technical University.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Bigtable A Distributed Storage System for Structured Data.
Big Data Yuan Xue CS 292 Special topics on.
CS 540 Database Management Systems NoSQL & NewSQL Some slides due to Magda Balazinska 1.
Plan for Final Lecture What you may expect to be asked in the Exam?
CSE-291 (Distributed Systems) Winter 2017 Gregory Kesden
and Big Data Storage Systems
Key-Value Store.
HBase Mohamed Eltabakh
CS122B: Projects in Databases and Web Applications Winter 2017
CSE-291 (Cloud Computing) Fall 2016
NOSQL.
Distributed Transactions and Spanner
NOSQL databases and Big Data Storage Systems
CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden
MVCC and Distributed Txns (Spanner)
Lecture#24: Distributed Database Systems
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
CS 440 Database Management Systems
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Distributed Databases
Presentation transcript:

CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#28: Modern Systems

CMU SCS System Votes Faloutsos/PavloCMU SCS /6152 MongoDB32 Google Spanner/F122 LinkedIn Espresso16 Apache Cassandra16 Facebook Scuba16 Apache Hbase14 VoltDB10 Redis10 Vertica5 Cloudera Impala5 DeepDB2 SAP HANA1 CockroachDB1 SciDB1 InfluxDB1 Accumulo1 Apache Trafodion1

CMU SCS

MongoDB Document Data Model –Think JSON, XML, Python dicts –Not Microsoft Word documents Different terminology: –Document → Tuple –Collection → Table/Relation 4

CMU SCS MongoDB A customer has orders and each order has order items. 5 Customers Orders Order Items R2(orderId, custId, …) R1(custId, name, …) R3(itemId, orderId, …)

CMU SCS MongoDB A customer has orders and each order has order items. 6 Customers Orders Order Items Customer Order Order Item ⋮ { "custId" : 1234, "custName" : “Trump", "orders" : [ { "orderId" : 10001, "orderItems" : [ { "itemId" : "XXXX", "price" : }, { "itemId" : "YYYY", "price" : } ] }, { "orderId" : 10050, "orderItems" : [ { "itemId" : “ZZZZ", "price" : } ] } ] }

CMU SCS MongoDB JSON-only query API Single-document atomicity. –OLD: No server-side joins. Had to “pre-join” collections by embedding related documents inside of each other. –NEW: Server-side joins (only left-outer equi) No cost-based query planner / optimizer. 7

CMU SCS MongoDB Heterogeneous distributed components. –Centralized query router. Master-slave replication. Auto-sharding: –Define ‘partitioning’ attributes for each collection (hash or range). –When a shard gets too big, the DBMS automatically splits the shard and rebalances. 8

CMU SCS MongoDB Originally used mmap storage manager –No buffer pool. –Let the OS decide when to flush pages. –Single lock per database. 9

CMU SCS MongoDB Version 3 (2015) now supports pluggable storage managers. –WiredTiger from BerkeleyDB alumni. –RocksDB from Facebook (“MongoRocks”)

CMU SCS

LinkedIn Espresso Distributed document DBMS deployed in production since Think of it as a custom version of MongoDB with better transactions. Replace legacy Oracle installations –Started with InMail messaging service CMU SCS /61512

CMU SCS LinkedIn Espresso Support distributed transactions across documents Strong consistency to act as a single source- of-truth for user data Integrates with the entire data ecosystem CMU SCS /61513

CMU SCS LinkedIn Espresso 14 Source: On Brewing Fresh Espresso: LinkedIn's Distributed Data Serving Platform, SIGMOD 2013On Brewing Fresh Espresso: LinkedIn's Distributed Data Serving Platform Cluster management system in charge of data partitioning Pub/Sub message bus that supports timeline consistency. Centralized Query Router

CMU SCS

History Amazon publishes a paper in 2007 on the Dynamo system. –Eventually consistency key/value store –Partitions based on consistent hashing People at Facebook start writing Cassandra as a clone of Dynamo in 2008 for their message service. –Ended up not using the system and releasing the source code. CMU SCS /61516

CMU SCS Apache Cassandra Borrows a lot of ideas from other systems: –Consistent Hashing (Amazon Dynamo) –Column-Family Data Model (Google BigTable) –Log-structured Merge Trees Originally one of the leaders of the NoSQL movement but now pushing “CQL” Faloutsos/PavloCMU SCS /61517

CMU SCS Consistent Hashing /2 F E D C B A N=3 h(key2) h(key1) Source: Avinash Lakshman & Prashant Malik (Facebook)

CMU SCS Column-Family Data Model Source: Gary Dusbabek (Rackspace) 19

CMU SCS LSM Storage Model The log is the database. –Have to read log to reconstruct the record for a read. MemTable: In-memory cache SSTables: –Read-only portions of the log. –Use indexes + Bloom filters to speed up reads See the RocksDB talk from this semester:

CMU SCS

Two-Phase Commit Faloutsos/PavloCMU SCS /61522 Node 1Node 2 Application Server Commit Request Node 3 OK Phase1: Prepare Phase2: Commit Participant Coordinator

CMU SCS Paxos 23

CMU SCS Paxos Consensus protocol where a coordinator proposes an outcome (e.g., commit or abort) and then the participants vote on whether that outcome should succeed. Does not block if a majority of participants are available and has provably minimal message delays in the best case. –First correct protocol that was provably resilient in the face asynchronous networks Faloutsos/PavloCMU SCS /61524

CMU SCS Paxos Faloutsos/PavloCMU SCS /61525 Node 1Node 2 Application Server Commit Request Node 3 Proposer Node 4 Acceptor Propose Commit Agree Accept

CMU SCS Paxos Faloutsos/PavloCMU SCS /61526 Node 1Node 2 Application Server Commit Request Node 3 Proposer Node 4 Acceptor Propose Commit Agree Accept X

CMU SCS Paxos Proposer Acceptors Propose(n) Agree(n) Propose(n+1) Commit(n) Reject(n, n+1) Commit(n+1) Agree(n+1) Accept(n+1)

CMU SCS 2PC vs. Paxos 2PC is a degenerate case of Paxos. –Single coordinator. –Only works if everybody is up. Use leases to determine who is allowed to propose new updates to avoid continuous rejection. Faloutsos/PavloCMU SCS /61528

CMU SCS Google Spanner Google’s geo-replicated DBMS (>2011) Schematized, semi-relational data model. Concurrency Control: –2PL + T/O (Pessimistic) –Externally consistent global write-transactions with synchronous replication. –Lock-free read-only transactions. 29

CMU SCS Google Spanner 30 CREATE TABLE users { uid INT NOT NULL, VARCHAR, PRIMARY KEY (uid) }; CREATE TABLE albums { uid INT NOT NULL, aid INT NOT NULL, name VARCHAR, PRIMARY KEY (uid, aid) } INTERLEAVE IN PARENT users ON DELETE CASCADE; CREATE TABLE users { uid INT NOT NULL, VARCHAR, PRIMARY KEY (uid) }; CREATE TABLE albums { uid INT NOT NULL, aid INT NOT NULL, name VARCHAR, PRIMARY KEY (uid, aid) } INTERLEAVE IN PARENT users ON DELETE CASCADE; users(1001) albums(1001, 9990) albums(1001, 9991) users(1002) albums(1002, 6631) albums(1002, 6634)

CMU SCS Google Spanner Ensures ordering through globally unique timestamps generated from atomic clocks and GPS devices. Database is broken up into tablets: –Use Paxos to elect leader in tablet group. –Use 2PC for txns that span tablets. TrueTime API 31

CMU SCS Google Spanner Node 1Node 2 NETWORK Application Server Set A=2, B=9 Application Server A=1 Set A=0, B=7 B=8 T1T2 32 Paxos or 2PC

CMU SCS Google F1 OCC engine built on top of Spanner. –In the read phase, F1 returns the last modified timestamp with each row. No locks. –The timestamp for a row is stored in a hidden lock column. The client library returns these timestamps to the F1 server –If the timestamps differ from the current timestamps at the time of commit the transaction is aborted 33

CMU SCS Andy’s Final Comments Both MySQL and Postgres are getting very good these days. Avoid premature optimizations. Faloutsos/PavloCMU SCS /61534