CPT-S 580-06 Advanced Databases 11 Yinghui Wu EME 49.

Slides:



Advertisements
Similar presentations
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Advertisements

Map/Reduce in Practice Hadoop, Hbase, MongoDB, Accumulo, and related Map/Reduce- enabled data stores.
NoSQL Databases: MongoDB vs Cassandra
Reporter: Haiping Wang WAMDM Cloud Group
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
CS346: Advanced Databases
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
An introduction to MongoDB Rácz Gábor ELTE IK, febr. 10.
NoSQL W2013 CSCI 2141.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Databases with Scalable capabilities Presented by Mike Trischetta.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
SQL vs NOSQL Discussion
:: Conférence :: NoSQL / Scalabilite Etat de l’art Samuel BERTHE10 Mars 2014Epitech Nantes.
Distributed Data Stores and No SQL Databases S. Sudarshan Perry Hoekstra (Perficient) with slides pinched from various sources such as Perry Hoekstra (Perficient)
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
WTT Workshop de Tendências Tecnológicas 2014
© , OrangeScape Technologies Limited. Confidential 1 Write Once. Cloud Anywhere. Building Highly Scalable Web applications BASE gives way to ACID.
Goodbye rows and tables, hello documents and collections.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Trade-offs in Cloud.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
CPT-S Topics in Computer Science Big Data 1 1 Yinghui Wu EME 49.
CAP Theorem Justin DeBrabant CIS Advanced Systems - Fall 2013.
Lecture 8: Databases and Data Infrastructure CS 6071 Big Data Engineering, Architecture, and Security Fall 2015, Dr. Rozier.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
An Introduction to Super-Scalability But first…
Context Aware RBAC Model For Wearable Devices And NoSQL Databases Amit Bansal Siddharth Pathak Vijendra Rana Vishal Shah Guided By: Dr. Csilla Farkas Associate.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
CPT-S 415 Big Data 1 1 Yinghui Wu EME B45. 2 CPT-S 415 Big Data Beyond RDBMS noSQL databases noSQL: concept and theory –CAP theory –ACID vs EASE –noSQL.
NoSQL: Graph Databases
CSCI5570 Large Scale Data Processing Systems
CS 405G: Introduction to Database Systems
NoSQL Know Your Enemy Shelly Noll Learning Care Group, Novi, MI
NoSQL: Graph Databases
and Big Data Storage Systems
Cloud Computing and Architecuture
NoSQL Know Your Enemy Shelly Noll SRT Solutions, Ann Arbor, MI
CS122B: Projects in Databases and Web Applications Winter 2017
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
NoSQL Know Your Enemy Shelly Noll SRT Solutions, Ann Arbor, MI
Trade-offs in Cloud Databases
Modern Databases NoSQL and NewSQL
NOSQL.
Introduction to NewSQL
NOSQL databases and Big Data Storage Systems
NoSQL CPSC 4670/5670.
A Comparison of SQL and NoSQL Databases
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
NOSQL and CAP Theorem.
NoSQL Databases An Overview
NoSQL W2013 CSCI 2141.
NoSQL Not Only SQL University of Kurdistan Faculty of Engineering
NoSQL Sampath Jayarathna Cal Poly Pomona
Transaction Properties: ACID vs. BASE
Introduction to Data Science
NoSQL Sampath Jayarathna Cal Poly Pomona
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Presentation transcript:

CPT-S Advanced Databases 11 Yinghui Wu EME 49

NoSQL: concept NoSQL is a non-relational database management system, different from traditional RDBMS in significant ways Carlo Strozzi used the term NoSQL in 1998 to name his lightweight, open-source relational database that did not expose the standard SQL interface In 2009, Eric Evans reused the term to refer databases which are non-relational, distributed, and does not conform to ACID The NoSQL term should be used as in the Not- Only-SQL and not as No to SQL or Never SQL

Motives Behind NoSQL Big data. Scalability. Data format. Manageability.

Scalability Scale up, Vertical scalability. –Increasing server capacity. –Adding more CPU, RAM. –Managing is hard. –Possible down times Scale out, Horizontal scalability. –Adding servers to existing system with little effort, aka Elastically scalable. Bugs, hardware errors, things fail all the time. It should become cheaper. Cost efficiency. –Shared nothing. –Use of commodity/cheap hardware. –Heterogeneous systems. –Controlled Concurrency (avoid locks). –Service Oriented Architecture. Local states. Decentralized to reduce bottlenecks. Avoid Single point of failures. –Asynchrony. –Symmetry, you don’t have to know what is happening. All nodes should be symmetric.

NoSQL Distinguishing Characteristics Large data volumes –Google’s “big data” Scalable replication and distribution –Potentially thousands of machines –Potentially distributed around the world Queries need to return answers quickly Mostly query, few updates Asynchronous Inserts & Updates Schema-less ACID transaction properties are not needed – BASE CAP Theorem Open source development 5

noSQL Data Models Key/Value Pairs row/tabular Columns Documents Graphs and correspondingly…

Categories of NoSQL storages Key-Value –memcached –Redis –Dynamo Column Family –Tabular BigTable, Hbase –Cassandra Document-oriented –MongoDB Graph (beyond noSQL) –Neo4j –TITAN

Key-Value Stores “Dynamo: Amazon’s Highly Available Key-Value Store” (2007) Data model: –Global key-value mapping –Highly fault tolerant (typically) Examples: –Riak, Redis, Voldemort

KV-stores and Relational Tables You can add indices with new KV-tables: Thus KV-tables are used for column-based storage, as opposed to row- based storage typical in older DBMS. … OR: the value field can contain complex data StateID Alabama1 Alaska2 Arizona3 Arkansas4 California5 Colorado6 …… IDPopulation 14,822, ,449 36,553,255 42,949, ,041,430 65,187,582 …… Senator_1ID Sessions1 Begich2 Boozman3 Flake4 Boxer5 Bennet6 …… Index Index_2

Column Family (BigTable) Google’s “Bigtable: A Distributed Storage System for Structured Data” (2006) Data model: –A big table, with column families –Map-reduce for querying/processing Examples: –HBase, HyperTable, Cassandra, accumulo

Row Store and Column Store In row store data are stored in the disk tuple by tuple. Where in column store data are stored in the disk column by column 11

Document Databases Data model –Collections of documents –A document is a key-value collection –Index-centric, lots of map-reduce Examples –CouchDB, MongoDB

MongoDB: Hierarchical Objects A MongoDB instance may have zero or more ‘databases’ A database may have zero or more ‘collections’. A collection may have zero or more ‘documents’. A document may have one or more ‘fields’. MongoDB ‘Indexes’ function much like their RDBMS counterparts. 0 or more Databases 0 or more Collections 0 or more Documents 0 or more Fields

RDB Concepts to NO SQL RDBMSMongoDB Database Table, ViewCollection RowDocument (BSON) ColumnField Index JoinEmbedded Document Foreign KeyReference PartitionShard

BSON Example { "_id" : "37010" "city" : "ADAMS", "pop" : 2660, "state" : "TN", “councilman” : { name: “John Smith” address: “13 Scenic Way” } { {“_id” : “1” “first name”: “Hassan” “last name” : “Mir” “department”: 20 } {“_id” : “1” “first name”: “Bill” “last name” : “Gates” }

Graph Databases Data model: –Nodes with properties –Named relationships with properties –Hypergraph, sometimes Examples: –Neo4j, Sones GraphDB, OrientDB, InfiniteGraph, AllegroGraph

XML databases one of the oldest “noSQL” database 17

Complexity 90% of use cases still billions of Nodes &relationships

19

CAP theory 20

CAP Theorem Also known as Brewer’s Theorem by Prof. Eric Brewer, published in 2000 at UC Berkeley. Eric Brewer 2001

Theory of NOSQL: CAP GIVEN: Many nodes Nodes contain replicas of partitions of the data Consistency All replicas contain the same version of data Client always has the same view of the data (no matter what node) Availability System remains operational on failing nodes All clients can always read and write Partition tolerance multiple entry points System remains operational on system split (communication malfunction) System works well across physical network partitions 6 AP CAP Theorem: satisfying all three at the same time is impossible C

CAP theorem for NoSQL What the CAP theorem really says: If you cannot limit the number of faults and requests can be directed to any server and you insist on serving every request you receive then you cannot possibly be consistent How it is interpreted: You must always give something up: consistency, availability or tolerance to failure and reconfiguration 23 “Of three properties of a shared data system: data consistency, system availability and tolerance to network partitions, only two can be achieved at any given moment.” Proven by Nancy Lynch et al. MIT labs.

Proof: a trivial two-node system 24 A A B B Data App

A Simple Proof A A B B Data Old Data Available and partitioned Not consistent, we get back old data. App

A Simple Proof A A B B New Data Wait for new data Consistent and partitioned Not available, waiting… App

A Simple Proof A A B B Data Consistent and Available No partition. App

Where would SQL lie on this triangle? 28 APAP C SQL RDBMS

Consistent, Available (CA) Systems have trouble with partitions and typically deal with it with replication Available, Partition- Tolerant (AP) Systems achieve "eventual consistency" through replication and verification Consistent, Partition-Tolerant (CP) Systems have trouble with availability while keeping data consistent across partitioned nodes

ACID vs BASE 30

Database Attributes Databases require 4 properties: Atomicity: When an update happens, it is “all or nothing” Consistency: The state of various tables much be consistent (relations, constraints) at all times. Isolation: Concurrent execution of transactions produces the same result as if they occurred sequentially. Durability: Once committed, the results of a transaction persist against various problems like power failure etc. Big picture: “Principles of Transaction Processing” by P. Bernstein and E. Newcomer: rs/01~Front_Matter.pdf

BASE Transactions Acronym contrived to be the opposite of ACID –Basically Available, –Soft state, –Eventually Consistent Characteristics –Weak consistency – stale data OK –Availability first –Best effort –Approximate answers OK –Aggressive (optimistic) –Simpler and faster

RDB ACID to NoSQL BASE Pritchett, D.: BASE: An Acid Alternative (queue.acm.org/detail.cfm?id= ) Atomicity Consistency Isolation Durability Basically Available Soft-state (State of system may change over time) Eventually consistent (Asynchronous propagation) RDBMS (mySQL) Vertica BigTable HBase MongoDB Cassandra Dynamo CouchDB Data constraints Smaller, horizontal scalable, Schema-driven, Normalized, Relational, Pre-social network Data constraints Smaller, horizontal scalable, Schema-driven, Normalized, Relational, Pre-social network Unstructured data Big data Non-relational, Schema-less, Distributed, open-linked data Unstructured data Big data Non-relational, Schema-less, Distributed, open-linked data

A Clash of cultures ACID: Strong consistency. Less availability. Pessimistic concurrency. Complex. BASE: Availability is the most important thing. Willing to sacrifice for this (CAP). Weaker consistency (Eventual). Best effort. Simple and fast. Optimistic.