CSCI5570 Large Scale Data Processing Systems

Slides:



Advertisements
Similar presentations
Andy Pavlo April 13, 2015April 13, 2015April 13, 2015 NewS QL.
Advertisements

Giovanni Chierico | May 2012 | Дубна Consistency in a distributed world.
NoSQL Databases: MongoDB vs Cassandra
© 2011 Citrusleaf. All rights reserved.1 A Real-Time NoSQL DB That Preserves ACID Citrusleaf Srini V. Srinivasan Brian Bulkowski VLDB, 09/01/11.
Overview Distributed vs. decentralized Why distributed databases
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
NoSQL Database.
Distributed Databases
Passage Three Introduction to Microsoft SQL Server 2000.
IBM Haifa Research 1 The Cloud Trade Off IBM Haifa Research Storage Systems.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
Databases with Scalable capabilities Presented by Mike Trischetta.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Trade-offs in Cloud.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
SLIDE 1IS 257 – Fall 2014 NewSQL and VoltDB University of California, Berkeley School of Information IS 257: Database Management.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
CAP Theorem Justin DeBrabant CIS Advanced Systems - Fall 2013.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
Features Of SQL Server 2000: 1. Internet Integration: SQL Server 2000 works with other products to form a stable and secure data store for internet and.
An Introduction to Super-Scalability But first…
CS 540 Database Management Systems NoSQL & NewSQL Some slides due to Magda Balazinska 1.
Intro to NoSQL Databases Tony Hannan November 2011.
CS 405G: Introduction to Database Systems
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
and Big Data Storage Systems
Cloud Computing and Architecuture
DBMS & TPS Barbara Russell MBA 624.
Client/Server Databases and the Oracle 10g Relational Database
CSCI5570 Large Scale Data Processing Systems
CS 440 Database Management Systems
CS122B: Projects in Databases and Web Applications Winter 2017
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
Trade-offs in Cloud Databases
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Dynamo: Amazon’s Highly Available Key-value Store
Operational & Analytical Database
Modern Databases NoSQL and NewSQL
NOSQL.
CSCI5570 Large Scale Data Processing Systems
Introduction to NewSQL
NOSQL databases and Big Data Storage Systems
Software Architecture in Practice
Consistency in Distributed Systems
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
NOSQL and CAP Theorem.
SQL 2014 In-Memory OLTP What, Why, and How
NoSQL Databases An Overview
CLIENT-CENTRIC CONSISTENCY MODELS
Database management concepts
CS 440 Database Management Systems
Introduction to Databases Transparencies
Database management concepts
CSE 482 Lecture 5: NoSQL.
H-store: A high-performance, distributed main memory transaction processing system Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex.
Atomic Commit and Concurrency Control
April 13th – Semi-structured data
CONSISTENCY IN DISTRIBUTED SYSTEMS
Transaction Properties: ACID vs. BASE
Database System Architectures
NoSQL & Document Stores
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
CSE 486/586 Distributed Systems Consistency --- 3
The Database World of Azure
Presentation transcript:

CSCI5570 Large Scale Data Processing Systems Introduction to NoSQL and NewSQL James Cheng CSE, CUHK

NoSQL NoSQL: Not only SQL “Not only SQL”: also supports SQL-like query languages Storage and retrieval of data modeled in means other than the tabular relations used in relational databases Examples: Key-value: Dynamo, MemcacheDB Document: MongoDB, CouchDB

Why not SQL? Unsatisfactory performance of MySQL when data gets larger, two options [1]: partition data across several sites, but hard to manage distributed data in application abandon MySQL, but need to pay big licensing fees for an enterprise SQL DBMS Inflexibility of using MySQL [1]: data do not conform to a rigid relational schema

Why NoSQL? Simplicity of design Horizontal scaling Finer control over availability Faster operations on non-relational data (e.g. key-value, graphs, or documents) New application needs (e.g., big data and real-time web applications)

Tradeoff of NoSQL Sacrifice consistency Lack true ACID transactions Lack of standardized interfaces Use of low-level query languages, less expressive

Availability vs Consistency Ref: [2] Many Internet-scale computing platforms today have strict requirements on security, scalability, availability, performance, and cost-effectiveness, while serving millions of customers around the globe, continuously. Solution: use replication techniques ubiquitously to guarantee consistent performance and high availability. Replication leads to high cost in obtaining consistency (updating all replicas synchronously in all distributed sites is very costly). Tradeoff: high availability or data consistency.

Availability vs Consistency The CAP theorem: Consistency Availability Partition-tolerance A distributed system cannot have CAP at all time When there’s network partition, you cannot have CA at the same time Solution: relax C to get A

Availability vs Consistency Strong consistency: synchronous update on all replicas Weak consistency: some updated value may not be reflected immediately, i.e., an inconsistency window (a period) exists Eventual consistency: a form of WC, if no new updates, eventually the updated value will be seen (no theoretical guarantee on the length of delay)

Eventual Consistency Setting in a distributed store: Process A: writes to and reads from the store Processes B and C: independent of A; writes to and reads from the store Causal consistency: If A has communicated to B that A has updated a data item, a subsequent access by B will return the updated value, and a write (by B) is guaranteed to supersede the earlier write. Normal eventual consistency rules still apply to access by C that has no causal relationship to A.

Eventual Consistency Read-your-writes consistency: After A has updated a data item, A always accesses the updated value and never sees an older value. Special case of causal consistency. Session consistency: A practical version of read-your-writes consistency. A process accesses the store within a session, the system guarantees read-your-writes consistency as long as the session lives.

Eventual Consistency Monotonic read consistency: If a process has seen a particular value for the object, any subsequent accesses will never return any previous value. Monotonic write consistency: The system guarantees to serialize the writes by the same process. Very difficult to program if lacking this consistency.

Reasons for Eventual Consistency Improve read and write performance under highly concurrent conditions Handle network partition cases where a majority model (e.g., quorum protocol) would render part of the system unavailable even though the nodes are up and running Whether or not inconsistencies are acceptable depends on the client applications

What really is NoSQL? NoSQL applications: focus on update- and lookup-intensive OLTP workloads not query-intensive, data-warehousing workloads OLTP performance can be improved by automatic sharding over shared-nothing systems and raising per-server performance Sharding is a type of database partitioning that separates very large databases the into smaller, faster, more easily managed parts called data shards. The word shard means a small part of a whole.

What really is NoSQL? Per-server OLTP performance has little to do with SQL, but with: Overhead in communicating with DBMS using ODBC/JDBC (can be improved using stored-procedure or running DBMS in the same address space as the application) Overhead with logging, locking, latching, buffer management Performance of NoSQL systems comes from no-disk, no-ACID, no-threading, not quite related to SQL though

NewSQL Comparable scalable performance with NoSQL systems (for OLTP workloads), while still offering ACID Support relational data model, use SQL as primary interface Examples: H-Store, VoltDB, Google Spanner, Calvin, Schism, NuoDB, Clustrix, SQLFire, MemSQL

Why NewSQL? Web-based applications (e.g., multiplayer games, social networking sites, online gambling networks) and smartphone applications Demand high OLTP throughput and real-time analytics => motivation for NoSQL But also want SQL expressiveness and real ACID

Why NewSQL? The applications are characterized as having a large number of transactions that are short-lived (i.e., no user stalls), touch a small subset of data using index lookups (i.e., no full table scans or large distributed joins), and are repetitive (i.e. executing the same queries with different inputs).

Why NewSQL? Characteristics of targeted applications allow NewSQL systems to eschew heavyweight recovery and distributed concurrency control to achieve high throughput and short latency as NoSQL systems

Characteristics of NewSQL Ref: [3] SQL as the primary mechanism for application interaction ACID support for transactions A non-locking concurrency control mechanism (so that real-time reads will not conflict with writes and thereby cause them to stall) High per-node performance A scale-out, shared-nothing architecture

References [1] M. Stonebraker. SQL Databases v. NoSQL Databases, Communications of the ACM, 2010 [2] W. Vogels. Eventually Consistent, ACM Queue, 2009 [3] M. Stonebraker. New Opportunities For NewSQL, Communications of the ACM, 2012