NoSQL Know Your Enemy Shelly Noll SRT Solutions, Ann Arbor, MI

Slides:



Advertisements
Similar presentations
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Advertisements

Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas NoSQL Data Management.
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
NoSQL Databases: MongoDB vs Cassandra
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
NoSQL W2013 CSCI 2141.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Databases with Scalable capabilities Presented by Mike Trischetta.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
1 Introduction to Big Data and NoSQL SQL Azure Saturday April, 21, 2012 Don Demsak Advisory Solutions Architect EMC Consulting
SQL vs NOSQL Discussion
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
© , OrangeScape Technologies Limited. Confidential 1 Write Once. Cloud Anywhere. Building Highly Scalable Web applications BASE gives way to ACID.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
Lecture 8: Databases and Data Infrastructure CS 6071 Big Data Engineering, Architecture, and Security Fall 2015, Dr. Rozier.
Copyright © Curt Hill NoSQL Databases No SQL or Not Only SQL.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
NoSQL databases A brief introduction NoSQL databases1.
CMPE 226 Database Systems May 3 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
Look Mom! – NoSQL Charles Nurse | DotNetNuke Corp.
NoSQL: Graph Databases
Neo4j: GRAPH DATABASE 27 March, 2017
CS 405G: Introduction to Database Systems
NoSQL Know Your Enemy Shelly Noll Learning Care Group, Novi, MI
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
NoSQL: Graph Databases
and Big Data Storage Systems
Cloud Computing and Architecuture
NoSQL Know Your Enemy Shelly Noll SRT Solutions, Ann Arbor, MI
CS122B: Projects in Databases and Web Applications Winter 2017
A free and open-source distributed NoSQL database
Based on: NoSQL Databases Based on:
Data and Applications Security Developments and Directions
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
Modern Databases NoSQL and NewSQL
NOSQL.
CMPE 280 Web UI Design and Development October 17 Class Meeting
Christian Stark and Odbayar Badamjav
NOSQL databases and Big Data Storage Systems
A Comparison of SQL and NoSQL Databases
NoSQL Systems Overview (as of November 2011).
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
NOSQL and CAP Theorem.
NoSQL Databases An Overview
NoSQL Databases Antonino Virgillito.
Overview of big data tools
NoSQL W2013 CSCI 2141.
NoSQL Not Only SQL University of Kurdistan Faculty of Engineering
April 13th – Semi-structured data
NoSQL Sampath Jayarathna Cal Poly Pomona
Transaction Properties: ACID vs. BASE
Introduction to Data Science
Introduction to NoSQL Database Systems
CMPE 280 Web UI Design and Development March 14 Class Meeting
NoSQL Sampath Jayarathna Cal Poly Pomona
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Working with GEOLocation Data
Presentation transcript:

NoSQL Know Your Enemy Shelly Noll SRT Solutions, Ann Arbor, MI shelly.noll@srtsolutions.com @shellynoll

Disclaimer There is lots of disagreement about this topic Everything I say could be wrong depending on who you ask Even if it’s right today, it will probably be wrong soon

What is nosql? It is a database management system with the following features: Queries do not use SQL Doesn’t guarantee ACID properties Fault-tolerant, distributed architecture Coined by Carlo Strozzi in 1998 to describe a database he created that did not expose a SQL interface Term was co-opted in 2009 when Eric Evans from Rackspace and Johan Oskarsson from Last.fm organized an event to discuss the growing trend of open-source, distributed databases

Consistency Availability Partition Tolerance CAP Theorem All nodes see the same data at the same time Availability Every request receives a success/failure response Partition Tolerance Operates despite failure of part of the system A distributed system can satisfy any two of these guarantees at the same time, but not all three A couple of basic theories we need to talk about to understand the difference between relational and noSQL databases

ACID vs BASE Atomicity Consistency Isolation Durability Basically Available Soft State Eventual Consistency Instead of ACID properties found in relational database, nosql has something different. What is the opposite of a an acid? Nosql databases exhibit BASE properties All or nothing (atomicity) Data must be adhere to schema and rules (consistency) No transaction interferes with another (isolation) Permanency (durability) an application works basically all the time (basically available) does not have to be consistent all the time (soft-state) but will be in some known-state state eventually (eventual consistency,

ACID vs BASE ACID BASE Strong consistency Isolation Focus on “commit” Nested transactions Conservative (pessimistic) Difficult to change schema Weak consistency Best effort Approximate answer OK Aggressive (optimistic) Simpler Faster Easier to change Consistency – adheres to the rules Isolation – transactions do not interfere Dr. Eric A. Brewer (2000) http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf

Why Did This Happen??? Data-related reasons Avoidance of unneeded complexity Avoidance of object-relational mapping Avoidance of making schema changes Performance-related reasons Higher throughput Horizontal scalability and running on commodity hardware Complexity and cost of setting up database clusters Complexity – consider Twitter – You have users, status updates, relationships between users, direct messages and not much else Object-relational mapping – object-oriented programmers have to create a layer in their applications that take the data from the database and transforms it into objects the application can use – also creates the overhead in syncing the state of the objects in memory with the entities in the database – expensive, time-consuming, nosql APIs look more like the objects programmers use NoSQL compromises reliability for better performance

Database Types Key-Value Graph Document Store Column Store

Database type disagreement Stephen Yen Ken North Rick Cattel Jonathan Ellis Wikipedia Amazon SimpleDB Entity-Attribute-Value Data Store Document Store Apache Hadoop Tabular Cassandra Wide Columnar Store Extensible Record Store Columnfamily Eventually-Consistent Key-Value Store Google Bigtable Key-Value Store HBase HyperTable Redis Data-Structures Server Collection Key-Value Cache

Key-Value Data is stored in a schema-less way with a key and a value Limited querying capability Values can usually be of any data type, or could be a serialized object Variations Eventually consistent Hierarchical Ordered Key-value cache (in RAM or on disk) Memcached Redis Riak Basho Voldemort

Popular Key-Value stores Vendor Language Used By Memcached Danga C LiveJournal, YouTube, Reddit, Zynga, Facebook, Twitter Redis Vmware ANSI C Github, Craigslist, Blizzard, Digg, Twitter, Flickr, Stackoverflow Riak Basho Erlang, C, C++, JavaScript Comcast, Mozilla, AOL, Ask.com Voldemort LinkedIn Java

Graph Based on graph theory Data is stored as nodes (entities), properties, and edges (relationship) Allows for calculations between nodes Shortest distance between nodes Analysis of relationships AllegroGraph FlockDB GraphDB InfiniteGraph Neo4j OrientDB

Popular graph databases Vendor Language Used By AllegroGraph Franz, Inc. Lisp Pfizer, Ford, Kodak, NASA, DoD FlockDB Twitter GraphDB Sones .NET InfiniteGraph Objectivity CIA, DoD Neo4j Neo Technology Java Adobe, Cisco OrientDB Apache A bunch of small companies no one’s heard of

Document Store Stores document-oriented or semi-structured data Documents may be encoded as XML, YAML, JSON, BSON, PDF, MS Word, MS Excel, etc. Documents are not required to adhere to a standard schema Offers a query language to retrieve documents based on content Amazon SimpleDB Apache CouchDB Lotus Notes MongoDB

Popular Document stores Vendor Language Used By CouchDB Apache Erlang Various Facebook applications MongoDB 10gen C++ MTV Networks, Craigslist, Foursquare SimpleDB Amazon

Column store Stores data in a tabular format Different names for the exact same thing Wide Columnar Store ColumnFamily Tabular Entity-Attribute-Value Data Store Extensible Record Store Multivalue BigTable Apache Hadoop Cassandra Google Bigtable Hbase HyperTable

Popular column stores Vendor Language Used By Bigtable Google Google File System Cassandra Apache Java Netflix, Twitter, Constant Contact, Reddit, Digg Hadoop Yahoo! HBase Facebook's messaging platform HyperTable Zvents C++ Baidu

An algorithm for dividing work across a distributed system Map reduce An algorithm for dividing work across a distributed system Breaks a big task into smaller tasks that can be done in parallel Map Query Maps the input into a final format Reduce Query Operates over a set of results

Comparisons Performance Scalability Flexibility Complexity Key-Value Stores High None Column Stores Moderate Low Document Stores Variable (High) Graph Databases Variable Relational Databases Ben Scofield (2010) http://nosql.mypopescu.com/post/396337069/presentation-nosql-codemash-an-interesting-nosql

Mongodb example

Where wouldn’t you use nosql? Data is critical to the function of the business/application Data has strong and/or slowly changing schema Need true transactional capabilities Need data mining capabilities Set-based updates Banking apps Healthcare apps Enterprise apps

Where would you use nosql? Heavy read/write Single-user Simple, non- structured data Lack of interconnected data Doesn’t matter if it takes a while to get the data consistent Data is not critical Social networking apps Mobile apps

Future of nosql UnSQL A query language for NoSQL databases Does not have data definition language Acquisition of NoSQL databases by larger companies Similar to what happened in the BI space where IBM, Microsoft, and HP acquired smaller players

Shelly Noll SRT Solutions, Ann Arbor, MI shelly.noll@srtsolutions.com Twitter - @shellynoll