1 Ahmed K. Ezzat, Tradeoffs Between SQL and NoSQL Data Mining and Big Data.

Slides:



Advertisements
Similar presentations
Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
Advertisements

Chapter 13 (Web): Distributed Databases
NoSQL Databases: MongoDB vs Cassandra
Introduction to Structured Query Language (SQL)
Overview Distributed vs. decentralized Why distributed databases
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
NoSQL Database.
A Comparison of SQL and NoSQL Databases
Working with SQL and PL/SQL/ Session 1 / 1 of 27 SQL Server Architecture.
NoSQL W2013 CSCI 2141.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Databases with Scalable capabilities Presented by Mike Trischetta.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
CSC2012 Database Technology & CSC2513 Database Systems.
Database Technical Session By: Prof. Adarsh Patel.
Database System Concepts and Architecture Lecture # 2 21 June 2012 National University of Computer and Emerging Sciences.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
© , OrangeScape Technologies Limited. Confidential 1 Write Once. Cloud Anywhere. Building Highly Scalable Web applications BASE gives way to ACID.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Modern Databases NoSQL and NewSQL Willem Visser RW334.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
IT System Administration Lesson 3 Dr Jeffrey A Robinson.
Session 1 Module 1: Introduction to Data Integrity
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NOSQL DATABASE Not Only SQL DATABASE
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Chapter 18 Object Database Management Systems. Outline Motivation for object database management Object-oriented principles Architectures for object database.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Context Aware RBAC Model For Wearable Devices And NoSQL Databases Amit Bansal Siddharth Pathak Vijendra Rana Vishal Shah Guided By: Dr. Csilla Farkas Associate.
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
SQL Basics Review Reviewing what we’ve learned so far…….
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Neo4j: GRAPH DATABASE 27 March, 2017
Cloud Computing and Architecuture
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
Modern Databases NoSQL and NewSQL
NOSQL.
NOSQL databases and Big Data Storage Systems
A Comparison of SQL and NoSQL Databases
NoSQL CPSC 4670/5670.
A Comparison of SQL and NoSQL Databases
MongoDB Introduction, Installation & Execution
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
NoSQL Databases An Overview
Teaching slides Chapter 8.
A Comparison of SQL and NoSQL Databases
NoSQL W2013 CSCI 2141.
relational thoughts on NoSql
Transaction Properties: ACID vs. BASE
Database Management Systems
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Presentation transcript:

1 Ahmed K. Ezzat, Tradeoffs Between SQL and NoSQL Data Mining and Big Data

2 Outline SQL Databases  SQL Standard  SQL Characteristics  SQL Database Examples NoSQL Databases  NoSQL Defintion  General Characteristics  NoSQL Database Types  NoSQL Database Examples Summary

333 SQL Databases

4 SQL DBMS: SQL Standard The following is a short, incomplete history of the SQL Standards – ISO/IEC – Initial ISO/IEC Standard 1989 – Referential Integrity 1992 – SQL2  1995 SQL/CLI (ODBC)  1996 SQL/PSM – Procedural Language extensions 1999 – User Defined Types 2003 – SQL/XML 2008 – Expansions and corrections 2012 – System Versioned and Application Time Period Tables

5 SQL DBMS: SQL Characteristics Data stored in rows or columns into tables (relations) Relationships represented by data Data adheres strictly to schema stored in the catalog. Complete separation between data and schema (provides flexibility) Data Definition Language (DDL – create table)) Data Manipulation Language (DML - IUD) Complete separation between logical and physical layer/view of the DBMS. User sees only the logical view. Optimizer that allows the user to ask for the “what” and it generate code to decide the “how.” Transactions – ACID properties with strong consistency

6 SQL DBMS: SQL Characteristics Data stored in rows or columns into tables (relations):  Row-oriented database is more appropriate for insert  Column-oriented database is more appropriate for Data Warehouse (SELECT statements)  Row vs. Column is pure performance issue based on the workload; the user is not aware and deals with tables.

7 SQL DBMS: SQL Characteristics Relationships represented by data:  Data in a database instance is partitioned among tables but useful operations require access to relevant data from the different tables; hence we need to be able to do JOIN.  Relationship between two tables is represented by primary and foreign keys attributes.

8 SQL DBMS: SQL Characteristics Data adheres strictly to schema stored in the catalog:  Data has to be 100% compliant with the schema (data types) as well as meeting integrity constraints  This guarantee is done before the data gets into the database (ETL).  Since ETL typically is semi-automatic process; hence access and process data in a SQL engine from other data sources is a challenge.  The data in the database is assumed to be accurate and as a result you can perform analysis and come up with reliable recommendations

9 SQL DBMS: SQL Characteristics Complete separation between data and schema (provides flexibility):  By having the schema (table definitions) away from the data in the tables, we can easily at the query level change the attribute name being presented to the user.  While user create table with set of attributes in some order, the database engine can change the order of attributes for performance reasons (all fixed width attributes first before variable width attributes.

10 SQL DBMS: SQL Characteristics Data Definition Language (DDL):  Schema defined at the start  Create Table (Column1 Datatype1, Column2 Datatype 2, …)  Constraints to define and enforce relationships  Primary Key  Foreign Key  Etc.  Triggers to respond to Insert, Update, & Delete  Stored Modules  Alter …  Drop …  Security and Access Control

11 SQL DBMS: SQL Characteristics Data Manipulation Language (DML):  Schema defined at tData manipulated with Select, Insert, Update, & Delete statements  Select T1.Column1, T2.Column2 … From Table1, Table2 … Where T1.Column1 = T2.Column1 …  Data Aggregation  Compound statements (sub-queries)  Functions and Procedures  Explicit transaction control

12 SQL DBMS: SQL Characteristics Complete separation between logical and physical layer/view of the DBMS:  Physical layer can change without any impact on the application; table grow to use two disk drives instead of one – transparent to the application  Order of a table attributes is transparent to the application; controlled by the physical layer for performance reasons.  Create an index to support query performance.  Fixed size attributes are always first before any variable width attribute.  In-Memory Database.

13 SQL DBMS: SQL Characteristics Optimizer:  Allows the user to ask for the “what” and the optimizer generates code to decide the “how.”  Programming in SQL is very productive as the optimizer does most of the work that otherwise you have to do in 3 rd generation programming languages.

14 SQL DBMS: SQL Characteristics Transactions (ACID Properties):  Atomic – All of the work in a transaction completes (commit) or none of it completes.  Consistent – A transaction transforms the database from one consistent state to another consistent state. Consistency is defined in terms of constraints.  Isolated – The results of any changes made during a transaction are not visible until the transaction has committed.  Durable – The results of a committed transaction survive failures.

15 SQL DBMS: SQL Database Examples Commercial:  IBM DB2  Oracle RDMS  Microsoft SQL Server  Teradata  HP/Vertica  Sybase SQL Anywhere Open Source (with commercial options):  MySQL  Ingres Significant portions of the world’s economy use SQL databases!

16 NoSQL Databases

17 NoSQL Database: NoSQL Definition From  Next Generation Databases mostly addressing some of the points:  Being non-relational,  Distributed,  Open-source and  Horizontal scalable.  The original intention has been modern web-scale databases.  The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge data amount, and more.

18 NoSQL Database: NoSQL Definition NoSQL Products & Projects:  lists 122 NoSQL Databases:  Cassandra  Hive  Pig  Couchbase  Hadoop & Hbase  MongoDB  StupidDB  Etc.

19 NoSQL Database: General Characteristics NoSQL General/Distinguishing Characteristics:  Large data volumes  Google’s “big data”  Scalable replication and distribution  Potentially thousands of machines  Potentially distributed around the world  Queries need to return answers quickly  Mostly query, few updates  Asynchronous Inserts & Updates  Schema-less  ACID transaction properties are not needed – BASE (Basically Available Soft state, Eventually consistent)  CAP (Consistency, Availability, Partition tolerance) Theorem  Open source development

20 BASE Transactions:  Acronym contrived to be the opposite of ACID  Basically Available,  Soft state,  Eventually Consistent  Characteristics  Weak consistency – stale data OK  Availability first  Best effort  Approximate answers OK  Aggressive (optimistic)  Simpler and faster NoSQL Database: General Characteristics

21 Brewer’s CAP Theorem: A distributed system can support only two of the following characteristics:  Consistency  Availability  Partition tolerance P.S. The slides from Brewer’s July 2000 talk do not define these characteristics. NoSQL Database: General Characteristics

22 Consistency:  all nodes see the same data at the same time – Wikipedia  client perceives that a set of operations has occurred all at once – Pritchett.  More like Atomic in ACID transaction properties. Availability:  Node failures d not prevent survivors from continuing to operate.  Every operation must terminate in an intended response. Partition Tolerance:  The system continues to operate despite arbitrary message loss.  Operations will complete, even if individual components are unavailable. NoSQL Database: General Characteristics

23 NoSQL Database: NoSQL Database Types Discussing NoSQL databases is complicated because there are a variety of types: Column Store – Each storage block contains data from only one column Document Store – stores documents made up of tagged elements Key-Value Store – Hash table of keys

24 Additional Non-SQL Databases:  XML Databases  Graph Databases  Codasyl Databases  Object Oriented Databases  Etc… Will not address the above today.. NoSQL Database: NoSQL Database Types

25 NoSQL Database: NoSQL Examples Column Store:  Each storage block contains data from only one column  Example: Hadoop/Hbase   Yahoo, Facebook  Example: Ingres VectorWise  Column Store integrated with an SQL database 

26 NoSQL Database: NoSQL Examples Column Store Comments:  More efficient than row (or document) store if:  Multiple row/record/documents are inserted at the same time so updates of column blocks can be aggregated  Retrievals access only some of the columns in a row/record/document

27 NoSQL Database: NoSQL Examples Document Store:  Example: CouchDB   BBC  Example: MongoDB   Foursquare, Shutterfly  JSON – JavaScript Object Notation

28 NoSQL Database: NoSQL Examples CouchDB JSON Example: "_id": "guid goes here", "_rev": "314159", "type": "abstract", "author": "Keith W. Hare" "title": "SQL Standard and NoSQL Databases", "body": "NoSQL databases (either no-SQL or Not Only SQL) are currently a hot topic in some parts of computing.", "creation_timestamp": "2011/05/10 13:30: " }

29 NoSQL Database: NoSQL Examples CouchDB JSON Tags:  "_id"  GUID – Global Unique Identifier  Passed in or generated by CouchDB  "_rev"  Revision number  Versioning mechanism  "type", "author", "title", etc.  Arbitrary tags  Schema-less  Could be validated after the fact by user-written routine

30 Key-Value Store:  Hash tables of Keys  Values stored with Keys  Fast access to small data values  Example – Project-Voldemort   Linkedin  Example – MemCacheDB   Backend storage is Berkeley-DB NoSQL Database: NoSQL Examples

31 Map Reduce:  Technique for indexing and searching large data volumes  Two Phases, Map and Reduce  Map Extract sets of Key-Value pairs from underlying data Potentially in Parallel on multiple machines  Reduce Merge and sort sets of Key-Value pairs Results may be useful for other searches  Map Reduce techniques differ across products  Implemented by application developers, not by underlying software NoSQL Database: NoSQL Examples

32 Map Reduce Patent:  Google granted US Patent 7,650,331, January 2010 System and method for efficient large-scale data processing A large-scale data processing system and method includes one or more application-independent map modules configured to read input data and to apply at least one application-specific map operation to the input data to produce intermediate data values, wherein the map operation is automatically parallelized across multiple processors in the parallel processing environment. A plurality of intermediate data structures are used to store the intermediate data values. One or more application-independent reduce modules are configured to retrieve the intermediate data values and to apply at least one application-specific reduce operation to the intermediate data values to provide output data. NoSQL Database: NoSQL Examples

33 Storing & Modifying Data:  Syntax varies  HTML  Java Script  Etc.  Asynchronous – Inserts and updates do not wait for confirmation  Versioned  Optimistic Concurrency NoSQL Database: NoSQL Examples

34 Retrieving Data:  Syntax Varies  No set-based query language  Procedural program languages such as Java, C, etc.  Application specifies retrieval path  No query optimizer  Quick answer is important  May not be a single “right” answer NoSQL Database: NoSQL Examples

35 Open Source:  Small upfront software costs  Suitable for large scale distribution on commodity hardware NoSQL Database: NoSQL Examples

36 NoSQL Summary:  NoSQL databases reject:  Overhead of ACID transactions  “Complexity” of SQL  Burden of up-front schema design  Declarative query expression  Yesterday’s technology  Programmer responsible for  Step-by-step procedural language  Navigating access path NoSQL Database: NoSQL Examples

37 Summary

38 Summary SQL Databases  Predefined Schema  Standard definition and interface language  Tight consistency  Well defined semantics NoSQL Database  No predefined Schema  Per-product definition and interface language  Getting an answer quickly is more important than getting a correct answer

39 END