RDBMSs The NoSQL movement

Slides:



Advertisements
Similar presentations
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
Advertisements

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
In 10 minutes Mohannad El Dafrawy Sara Rodriguez Lino Valdivia Jr.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
A Study in NoSQL & Distributed Database Systems John Hawkins.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Introduction to Hadoop and HDFS
Modern Databases NoSQL and NewSQL Willem Visser RW334.
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
Cloud Computing Clase 8 - NoSQL Miguel Johnny Matias
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
Lecture 8: Databases and Data Infrastructure CS 6071 Big Data Engineering, Architecture, and Security Fall 2015, Dr. Rozier.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
NoSQL databases A brief introduction NoSQL databases1.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
BIG DATA/ Hadoop Interview Questions.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
NoSQL: Graph Databases
2nd year Computer Science & Engineer
Neo4j: GRAPH DATABASE 27 March, 2017
2 Phase Commit Protocol In transaction processing, databases, and computer networking, the two-phase commit protocol (2PC) is a type of atomic commitment.
CS 405G: Introduction to Database Systems
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
NoSQL: Graph Databases
and Big Data Storage Systems
Cloud Computing and Architecuture
CSE 775 – Distributed Objects Bekir Turkkan & Habib Kaya
NoSQL Know Your Enemy Shelly Noll SRT Solutions, Ann Arbor, MI
CS122B: Projects in Databases and Web Applications Winter 2017
A free and open-source distributed NoSQL database
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Open Source distributed document DB for an enterprise
Dremel.
Modern Databases NoSQL and NewSQL
NOSQL.
The Client/Server Database Environment
Christian Stark and Odbayar Badamjav
NOSQL databases and Big Data Storage Systems
NoSQL Systems Overview (as of November 2011).
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
NOSQL and CAP Theorem.
NoSQL Databases An Overview
NoSQL Databases Antonino Virgillito.
Overview of big data tools
Database Software.
Database Systems Summary and Overview
April 13th – Semi-structured data
relational thoughts on NoSql
Introduction to NoSQL Database Systems
Big DATA.
UFCEUS-20-2 Web Programming
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

RDBMSs The NoSQL movement Database Systems RDBMSs The NoSQL movement 2011-02-23 (NB): Everything from here on is new.

(R)DBMSs ”today” (2012) DMBS Approx. market share Oracle 40-50% IBM DB2 20-30% Microsoft SQL Server ~15% Sybase 3% (…) MySQL 1% PostgreSQL 0.5% (Microsoft Access)

Oracle DB First commercially sold database engine V2 release 1979. HUGE market share historically (~80%) Still clear market leader. Standard incompliant Might makes right?

I believe this explains a lot… IBM DB2 First (in-house) SQL database 1974, DB2 released commercially 1982. Near-monopoly on ”mainframes” Towards Oracle-compliant(!) IBM and Oracle cooperate on e.g. tools. “At the time IBM didn't believe in the potential of Codd's ideas, leaving the implementation to a group of programmers not under Codd's supervision, who violated several fundamentals of Codd's relational model; the result was Structured English QUEry Language or SEQUEL.” - Wikipedia article on DB2 I believe this explains a lot…

Microsoft SQL Server First release 1989. Market leading on Windows application platforms. Code base originally by Sybase, aquired by Microsoft 1991.

MySQL Open Source – but owned by Oracle since 2010 Historically: fast but feature-poor Subqueries(!) added in ”recent” release FK constraints only with Oracle’s (InnoDB) backend Not optimized for joins Still missing features (but getting closer) No CHECK constraints (including assertions) No sequencing (WITH) No INSTEAD OF triggers on views. Big on the web: used by Wikipedia, Facebook, … Early support in PHP helped boost

PostgreSQL Open Source – community development Historically: full-featured but (relatively) slow Much faster today – and optimized for complex tasks Efficient support for joins Almost standard-compliant Full constraint support – except assertions! Full ACID transactions Sequencing (WITH) Prominent users: Yahoo, MySpace, Skype, …

SQLite Small-ish C library Embedded into application means no communication overhead Stores all data as a single file on host platform. Not as fast as dedicated systems. Not full-featured. Prominent users: Firefox, Chrome, Opera, Skype

Feature Comparison - Queries OUTER JOIN INTERSECT/EXCEPT WITH Oracle Yes Yes (MINUS) IBM DB2 MS SQL Server MySQL No PostgreSQL SQLite LEFT

Feature Comparison - DDL FK REF CHECK ASSERTION TRIGGER PROC/FUN Oracle Yes No IBM DB2 MS SQL Server MySQL InnoDB PostgreSQL SQLite Assertions (schema-level constraints) are incompatible with most existing research on database optimization. No existing implementation supports them (except by using triggers to mimic).

Feature Comparison - Misc OS Support Programming support Oracle Win, Mac, Unix, z/OS PL/SQL IBM DB2 Win, Mac, Unix, z/OS, iOS SQL/PSM MS SQL Server Windows T-SQL MySQL Win, Mac, Unix, z/OS, Symbian PostgreSQL Win, Mac, Unix, Android PL/pgSQL, SQL/PSM, lots more! SQLite Any N/A PostgreSQL allows procedures to be written in a wide variety of languages, including Java, Python, Tcl, Perl, PHP… SQLite is a C library, so can be used with any language that has C FFI support.

Beyond SQL? SQL was first developed around 1970 Contemporaries: Forth, PL/I, Pascal, C, SmallTalk, ML, … With one prominent exception – C – these have all been succeded by newer, cooler languages. Has the time finally come for SQL as well?

SQL injection attacks http://xkcd.com/327/ The possibility for SQL injection attacks has lead development away from literal SQL, towards higher-level interfaces, tools and libraries.

The world is a-changin’ ”The Claremont Report”, 2008 Big Data is coming big-time Social networks, e-science, Web 2.0, streaming media, … Data IS the business Efficient data management is no longer just a cost reduction – it’s a selling point! Rapidly expanding user base … means rapidly expanding user needs Architectural shifts in computing Storage hardware is still improving exponentially Data networks, cloud computing, …

Examples of database sizes Digg: 3 TB – just to store the up/down votes Facebook: 50 TB – for the private messaging feature eBay: 2 PB data overall (2 000 000 GB)

RDBMS weakness RDBMSs typically handle ”massive” amounts of data in complex domains, with frequent small read/writes. The archetypical RDBMS serves a bank. Data-intensive applications don’t fit this pattern: MASSIVE+++ amounts of data (e.g. eBay) Super-fast indexing of documents (e.g. Google) Serving pages on high-traffic websites (e.g. Facebook) Streaming media (e.g. Spotify)

Column-oriented databases Typical RDBMSs are row-oriented All data associated with a particular key is stored close together. Allows horizontal partitioning (sharding) Different rows stored on different distributed nodes. Column-oriented (R)DBMSs All data in a given column is stored close together. Allows vertical partitioning (normalization) Different columns stored on different nodes. Fast computation of aggregations, e.g. data mining.

The ”NoSQL” movement NoSQL (”Not only SQL”) SELECT fun, profit FROM Real_World WHERE relational = FALSE; NoSQL (”Not only SQL”) Originally the name of an RDBMS with a different interface language. Nowadays a term that encompasses a wide variety of non-relational DBMSs (”NoRel”?).

Non-relational databases MapReduce framework Google originally; Hadoop (Apache), … Key-Value stores BigTable (Google), Cassandra (Apache), … Document stores CouchDB, MongoDB, SimpleDB, … Graph databases Neo4j, FlockDB, … Semi-structured databases (Native) XML databases, …

MapReduce No data model – all data stored in files Operations supplied by user: Reader :: file → [input record] Map :: input record → <key, value> Reduce :: <key, [value]> → [output record] Writer :: [output record] → file Everything else done behind the scenes: Consistency, atomicity, distribution and parallelism, ”glue” Optimized for broad data analytics Running simple queries over all data at once

MapReduce implementations The ”secret” behind Google’s success Still going strong. Hadoop (Apache) Open Source implementation of the MapReduce framework Used by Ebay, Amazon, Last.fm, LinkedIn, Twitter, Yahoo, Facebook internal logs (~15PB), …

Key-Value Stores Key-Value stores is a fancy name for persistant maps (associative arrays): Extremely simple interface – extremely complex implementations. void Put(string key, byte[] data); byte[] Get(string key); void Remove(string key);

Key-Value Stores Built for extreme efficiency, scalability and fault tolerance Records distributed to different nodes based on key Replication of data to several nodes Sacrifice consistency and querying power Single-record transactions Not good for ”joins” ”Eventual consistency” between nodes AKA: ”why does Facebook tell me I have a notification when there isn’t anything new when I click the icon??!?”

Key-Value store implementations BigTable (Google) Sparse, distributed, multi-dimensional sorted map Proprietary – used in Google’s internals: Google Reader, Google Maps, YouTube, Blogger, … Cassandra (Apache) Originally Facebook’s PM database – now Open Source (Apache top-level project) Used by Netflix, Digg, Reddit, Spotify, …

Semi-structured data (SSD) More flexible than the relational model. The type of each ”entity” is its own business. Labels indicate meanings of substructures. Semi-structured: it is structured, but not everything is structured the same way! Support for XML and XQuery in e.g. Oracle, DB2, SQL Server. Special case: Document databases

Document stores Roughly: Key-Value stores where the values are ”documents” XML, JSON, mixed semistructured data sets Typically incorporate a query language for the document type. See previous lecture for discussion on XML querying.

Document store implementations MongoDB Name short for ”Humongous” Open source – owned by 10gen JSON(-like) semi-structured storage JavaScript query language Supports MapReduce for aggregations Apache CouchDB

Graph Databases Data modeled in a graph structure Nodes = ”entities” Properties = ”tags”, attribute values Edges connect Nodes to nodes (relationships) Nodes to properties (attributes) Fast access to associative data sets All entities that share a common property Computing association paths

Graph database implementations Neo4j Developed in Malmö Specialized query language: Cypher FlockDB Initially developed by Twitter to store user relationships Apache licence

NoSQL – a hype? NoSQL is not ”the right choice” just because it’s new! Relational DBMSs still rule at what they were first designed for: efficient access to large amounts of data in complex domains. That’s still the vast majority!

NoSQL summary NoSQL = ”Not only SQL” Different data models optimized for different tasks MapReduce, Key-Value stores, Document stores, Graph databases, … Typically: + efficiency, scalability, flexibility, fault tolerance - (no) query language, (less) consistency

Course summary Assignment retrospective Exam information Next Lecture Course summary Assignment retrospective Exam information