Copyright © 2011-2013 Curt Hill NoSQL Databases No SQL or Not Only SQL.

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
Relational Database Alternatives NoSQL. Choosing A Data Model Relational database underpin legacy applications and meet business needs However, companies.
NoSQL Databases: MongoDB vs Cassandra
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
A Study in NoSQL & Distributed Database Systems John Hawkins.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Databases with Scalable capabilities Presented by Mike Trischetta.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
CODD’s 12 RULES OF RELATIONAL DATABASE
Modern Databases NoSQL and NewSQL Willem Visser RW334.
NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book.
Lecture 8: Databases and Data Infrastructure CS 6071 Big Data Engineering, Architecture, and Security Fall 2015, Dr. Rozier.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
CIS 250 Advanced Computer Applications Database Management Systems.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
NoSQL databases A brief introduction NoSQL databases1.
CMPE 226 Database Systems May 3 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Context Aware RBAC Model For Wearable Devices And NoSQL Databases Amit Bansal Siddharth Pathak Vijendra Rana Vishal Shah Guided By: Dr. Csilla Farkas Associate.
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
1 © 2013 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
Look Mom! – NoSQL Charles Nurse | DotNetNuke Corp.
BIG DATA/ Hadoop Interview Questions.
Introduction to Database Programming with Python Gary Stewart
CSCI5570 Large Scale Data Processing Systems
CS 405G: Introduction to Database Systems
NoSQL Know Your Enemy Shelly Noll Learning Care Group, Novi, MI
NoSQL: Graph Databases
and Big Data Storage Systems
Cloud Computing and Architecuture
An Open Source Project Commonly Used for Processing Big Data Sets
NoSQL Know Your Enemy Shelly Noll SRT Solutions, Ann Arbor, MI
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
NoSQL Know Your Enemy Shelly Noll SRT Solutions, Ann Arbor, MI
NoSQL Database and Application
Modern Databases NoSQL and NewSQL
NOSQL.
Christian Stark and Odbayar Badamjav
NOSQL databases and Big Data Storage Systems
Database Models Files and models Copyright © Curt Hill.
NoSQL Systems Overview (as of November 2011).
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
NOSQL and CAP Theorem.
NoSQL Databases An Overview
NoSQL Databases Antonino Virgillito.
Chapter 1: Introduction
NoSQL Not Only SQL University of Kurdistan Faculty of Engineering
Transaction Properties: ACID vs. BASE
Introduction to NoSQL Database Systems
CMPE 280 Web UI Design and Development March 14 Class Meeting
Database System Architectures
NoSQL & Document Stores
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Presentation transcript:

Copyright © Curt Hill NoSQL Databases No SQL or Not Only SQL

Historically … Typical relational databases live in a pleasant niche –Their data is relatively small and usually on one machine –The meaning of the data is well- understood –Schemas are tightly defined –Transactional consistency (ACID) is maintained –Results of transactions are accurate Copyright © Curt Hill

Things can be different Extremely large amounts of data Data is spread over many machines, possibly geographically distant Change to this data is continuous Data quality may be poor, obtained from many sources Schemas are fuzzy and uncertain –Or completely lacking Copyright © Curt Hill

Relaxing principles Classic database principles have been left behind Locking is usually absent Schema are often inconsistent or lacking Data come from many sources –How does this get integrated into rigid schema? Accuracy of the data is missing –By the time we update it has already changed Copyright © Curt Hill

People and Business A normal relational database gives us accuracy –The limitation is the accuracy of the data People are used to making decisions without all the facts Businesses often make decisions without all the facts or complete analysis –Otherwise the window of opportunity has passed Copyright © Curt Hill

CAP Theorem A distributed database or web service cannot guarantee all of the following: Consistency –That operations occur all at once Availability –Every operation must terminate in the intended operation Partition tolerance –Operations will complete even if individual components fail Copyright © Curt Hill

ACID absent ACID, in particular, is in danger The goal of a transaction is to make it look like it occurs by itself without considering other transactions When multiple computers are communicating and have their own data this is in danger Locking and unlocking is a problem –Things are changing too fast to let one transaction lock data –Without it serializing is in danger Copyright © Curt Hill

Now and Then Suppose a transaction is made One computer messages all the others By the time that message arrives it reflects a past state By the time it is processed that state may have changed Virtually everything on the Internet represents a past state and not currently Copyright © Curt Hill

Now and Then Again A single computer may think of its data as current It must accept all messages from other computers as in the past Absolute consistency cannot be obtained Eventual consistency is now the norm Copyright © Curt Hill

BASE not ACID BASE is an alternative to ACID Basically Available, Soft state, Eventually consistent –Clearly contrived to complement ACID This is acknowledging that when the data becomes too widely distributed something has to give Copyright © Curt Hill

Not the only relaxation of requirements NoSQL databases usually abandon the whole relational format They may also include the relational database as a subset of the entire database The most common form is the data store –AKA key-value store Copyright © Curt Hill

NoSQL Databases Must provide APIs to various programming language Must scale well to very large sizes Indexing is the key to rapid access These NoSQL databases are targeted at different niches Generally not interchangable –Unlike most RDBMS Copyright © Curt Hill

Kinds of NoSQL Key Value Columnar Document Graph Copyright © Curt Hill

Key Value Simplest model There is a key (which must be unique) linked to a group of values It gets more interesting if the values may include key value pairs as well Often not much of a schema Think of a database with one table –Unlimited string as key –Unlimited string as second field Two examples: Riak and Reddis Copyright © Curt Hill

Key-Value stores A relational table is a restricted form of key-value –The key is the primary key –The data is all the fields associated with that key –However, it may not be even in First Normal Form There is only one table –Key is unrestricted size string –Data is whatever needs to be there –The values may be completely different Copyright © Curt Hill

KV Picture Copyright © Curt Hill

Key Value Again In a relational database we always know what the value extracted from a cell is It has the same meaning as everything else in the column This is no longer the case in key value stores Copyright © Curt Hill

Columnar Also known as a column store A lot of similarity to relational, but the dominant item is the column not the row We lack rectangularity that relational has Columns are stored together Halfway between Relational and Key Value HBase, Cassandra, HyperTable, CalPont, MonetDB are examples Copyright © Curt Hill

Columnar Copyright © Curt Hill

Columnar again Often used in Data warehouses Since the columns are stored together (rather than the rows) and since the columns have only one data type, there is an opportunity to compress a column that is absent in relational DBs Copyright © Curt Hill

Document The basic object is now a document instead of a simple field like a number –Document is often XML or JSON Each document has an ID and other identifying values A document is an arbitrary and complicated item –As if every field were a BLOB Examples: MongoDB, CouchDB, Oracle NoSQL, Amazon’s SimpleDB Copyright © Curt Hill

Graph A mathematical graph consists of nodes (the data) and links between these –This is the network model revisited Used for highly interconnected data Processing rides the links Neo4J and Zope are examples Copyright © Curt Hill

Commentary These classifications are incomplete Many examples exist that are combinations of several We next look at some example databases –Most of these are open source Copyright © Curt Hill

Riak Key value store designed to be distributed over many nodes Designed to be fault-tolerant –Peer to peer architecture – no master –All the data is scattered over many servers and disk –Any one or more failures does not compromise the data Everything is done through web queries Used by a quarter of Fortune 50 Includes Best Buy, Github, Comcast Copyright © Curt Hill

Redis Key value store, optimized for speed Creator is Salvatore Sanfilippo who calls it a data structure server –Data could be more than a string or number linked to a key May also consider data a sorted or unsorted set strings –This enables set operations on keys Keeps data in memory and occasionally updates disk –No ACID guarantees in that Used by Craigslist, flickr Copyright © Curt Hill

MongoDB Designed to be very scalable document model database –Used by CERN for Large Hadron data Data is formatted as JavaScript objects –JavaScript Object Notation (JSON) Attributes are indexed Queries now become JavaScript functions APIs in the major languages Who is Mongo? Copyright © Curt Hill

JSON A lightweight data interchange format Defined by JavaScript but used outside of the JavaScript Most languages have a subroutine to parse and assimilate JSON A short JSON presentation Copyright © Curt Hill

MongoDB and ACID Atomicity - yes Consistency – no schema, so no consistency or inconsistency Isolation – good, but not perfect Durable – yes Copyright © Curt Hill

Terms RDBMSMongoDB TableCollection RowJSON Document Index JoinEmbedding and linking PartitionShard Copyright © Curt Hill

CouchDB Document based with JSON content Each document has a set of keys that link to it Written in Erlang, but with JavaScript API –Other languages interface to that Very fault tolerant Used by LinkedIn, Orbitz Copyright © Curt Hill

HBase A columnar database Very scalable – designed for big data Each field is versioned, making it 3D rather than 2D –Columns are stored together –Rows are the related data –Depth are older versions Used by Facebook, Twitter, Yahoo, eBay Copyright © Curt Hill

Cassandra Project started by Facebook to track status updates Became an Apache project Intended to create a network of equal nodes Eventual consistency not perfect consistency Mostly written in Java but provides APIs in Python, Ruby, PhP among others Used by IBM, HP, Netflix among others Copyright © Curt Hill

Neo4J Graph database –Network of nodes and links Data is information on a person or thing Links are the connections between one datum and another Numerous graph algorithms have been implemented –Consider Facebook connections Used by Adobe, Lufthanza, Mozilla Copyright © Curt Hill

CAP Several of these are distributed Since they cannot do all three they generally are good at two of the three See the following picture Copyright © Curt Hill

CAP Copyright © Curt Hill Consistency AvailabilityPartition tolerance Riak MongoDB HBase CouchDB

Niches For a product to be successful it must find one or more niches where it may do well A niche is a particular set of circumstances and requirements Next we want to consider some of these products and what they do well and what they do poorly Copyright © Curt Hill

Relational Layout and form of the data is well known in advance and relatively stable –We do not need to know in advance what will be done with the data, but we do need to know the form –Most business processes have this kind of requirements Not as effective for deeply hierarchical and widely varying data Copyright © Curt Hill

Key Value Copyright © Curt Hill Easy to make fast or horizontally scalable or both Useful where data does not conform to a well known schema or the data is not very well related Searches are easy but more complicated queries are not –No indices –No linkages, ie. foreign keys

Columnar Horizontal scalability is based on storing columns in different nodes –Thus good for big data Allows for versioning Like relational, schema needs to be done in advance –Based on what queries are needed –Does poorly with ad hoc data and queries Copyright © Curt Hill

Document Works well with data that is highly variable and not known in advance Content is often JSON, so these are object oriented databases No normalization is possible, so redundancies are mostly unavoidable Most interesting queries are not possible Copyright © Curt Hill

Graph Particularly useful for modeling networking For social networking applications –Nodes are people and edges their relationships –Hard to model this in other models Not easy to partition, so not easy to scale No common query language Copyright © Curt Hill

déjà vu? In the early 1970s database world was in some disarray There were several models None had achieved dominance Commercial offerings were present, but theoretical foundation was lacking There was no uniformity to these products Interchanging products was very difficult Copyright © Curt Hill

The End or Start of an Era Codd changed that by the development of a theoretical foundation for relational databases SQL became the common language For several decades now Relational Databases have been the undisputed king RDMS is a 32 billion dollar industry The products are to some degree interchangeable Copyright © Curt Hill

Again The situation around NoSQL databases has a lot of the same feel as in the 1970s They are not interchangeable and not even directed towards the same ends Is this the end of RDBMS era? Unlikely we will soon get rid of RDBMS, but it is not likely to be as exclusive as it has been Copyright © Curt Hill

Finally Some of the motivations of the NoSQL movement are: –Big Data –Requirements to be distributed –Volatility of data, largely caused by web Check out the following link –DB-Engines.com rates popularity of data basesDB-Engines.com rates popularity of data bases Copyright © Curt Hill