NoSQL & Document Stores

Slides:



Advertisements
Similar presentations
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Advertisements

Jennifer Widom NoSQL Systems Overview (as of November 2011 )
NoSQL Databases: MongoDB vs Cassandra
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
Databases with Scalable capabilities Presented by Mike Trischetta.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
Is Apache CouchDB for you?
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
WTT Workshop de Tendências Tecnológicas 2014
Modern Databases NoSQL and NewSQL Willem Visser RW334.
Moohanad Hassan Maedeh Pishvaei. Introduction Open Source Apache foundation project Relational DB: SQL Server CouchDB : JSON document-oriented DB (NoSQL)
NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University
NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
BACS 287 Big Data & NoSQL 2016 by Jones & Bartlett Learning LLC.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
Introduction to Mongo DB(NO SQL data Base)
Neo4j: GRAPH DATABASE 27 March, 2017
Web Database Programming Using PHP
CS 405G: Introduction to Database Systems
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
and Big Data Storage Systems
Cloud Computing and Architecuture
Hadoop.
Chapter 1: Introduction
CS122B: Projects in Databases and Web Applications Winter 2017
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Web Database Programming Using PHP
Modern Databases NoSQL and NewSQL
NOSQL.
CMPE 280 Web UI Design and Development October 17 Class Meeting
Dineesha Suraweera.
Christian Stark and Odbayar Badamjav
Introduction to NewSQL
CHAPTER 3 Architectures for Distributed Systems
Twitter & NoSQL Integration with MVC4 Web API
NOSQL databases and Big Data Storage Systems
New Mexico State University
Databases.
NoSQL CPSC 4670/5670.
NoSQL Systems Overview (as of November 2011).
MongoDB Introduction, Installation & Execution
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
CS6604 Digital Libraries IDEAL Webpages Presented by
MANAGING DATA RESOURCES
NOSQL and CAP Theorem.
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Web DB Programming: PHP
Database Systems Summary and Overview
CSE 482 Lecture 5: NoSQL.
April 13th – Semi-structured data
relational thoughts on NoSql
CS5220 Advanced Topics in Web Programming Introduction to MongoDB
Physical Data Modeling – Implementation
Lecture 15: Databases II Wednesday Feburary 28, 2018
CMPE 280 Web UI Design and Development March 14 Class Meeting
Web-Services and RESTful APIs
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Working with GEOLocation Data
Presentation transcript:

NoSQL & Document Stores BCHB697

Outline NoSQL Document Stores Partition, Replication, Availability XML, JSON Partition, Replication, Availability Map / Reduce BCHB697 - Edwards

NoSQL Not Only SQL Blanket term for non-traditional databases Minor and/or radical departures from tables, relational data modeling Column Stores, Document Stores, Triple Stores Typical rationale (at some cost) Scale/performance, Data model flexibility, Deployment BCHB697 - Edwards

NoSQL vs Relational Databases Relational Databases: ACID Atomicity, Consistency, Isolation, Durability Ensures data is always self-consistent SQL query language w/ joins (relational) NoSQL: BASE Basically Available, Soft State, Eventually Consistent Give up on guarantees to achieve performance scalability Simple queries / manual "joins" BCHB697 - Edwards

NoSQL: Key Features Spread database over many nodes Partition, replication, query execution Simple queries, indexes Flexible data-model: Data may be inconsistent, duplicated Data attributes may be dynamically added Data attributes may be inconsistent Data attributes may be complex structures BCHB697 - Edwards

NoSQL: Column Store Column Stores: Organize table by column rather than rows Better compression of values Every column becomes an index Easy to add new columns to a table BCHB697 - Edwards

NoSQL: Document Store Database is a collection of "documents" Structured format: JSON, XML Collection ↔ Table Document ↔ Row of Table Every document can have its own structure Different attributes, complex values Usually documents in a collection have (somewhat) consistent keys Query for documents using their keys BCHB697 - Edwards

XML Document <artist> <artistname>Iron Maiden</<artistname> <albums> <album> <albumname>The Book of Souls</albumname> <datereleased>2015</datereleased> <genre>Hard Rock</genre> </album> <albumname>Killers</albumname> <datereleased>1981</datereleased> <albumname>Powerslave</albumname> <datereleased>1984</datereleased> <albumname>Somewhere in Time</albumname> <datereleased>1986</datereleased> </albums> </artist> BCHB697 - Edwards

JSON Document { "artistName" : "Iron Maiden", "albums" : [ "albumname" : "The Book of Souls", "datereleased" : 2015, "genre" : "Hard Rock" }, { "albumname" : "Killers", "datereleased" : 1981, "albumname" : "Powerslave", "datereleased" : 1984, "albumname" : "Somewhere in Time", "datereleased" : 1986, } ] BCHB697 - Edwards

JSON Syntax Dictionaries Lists Strings, Numbers, Boolean, Null { <key1>: <value1>, … ,<key2>: <value2> } Lists [ <value1>, …, <valuen> ] Strings, Numbers, Boolean, Null "string", 1, 5.6, true, null White-space is ignored Newlines, spaces, tabs Maps directly to modern programming lang. BCHB697 - Edwards

CouchDB Document Store for JSON documents Apache Foundation project Can act as web-application back-end server Interactive browsing using Fauxton EdwardsLab: CouchDB, Fauxton UniProt database See also: MongoDB, CouchBase, … curl -X POST -H 'Content-Type: application/json' -u admin:admin 'http://localhost:5984/uniprot/_bulk_docs' -d @uniprot.json BCHB697 - Edwards

Why Document Stores Documents can be partitioned across many commodity compute nodes Query requests sent to each compute node, executed in parallel on data partition Writes can be executed against any convenient node and in parallel Data can be replicated for performance and robustness reasons Flexible attributes can be determined later BCHB697 - Edwards

Why Not Document Stores No complex relational queries Complex values can't be readily indexed. Inconsistent keys can make the application logic convoluted Flexible data-model can lead to ad-hoc and on-the-fly modeling decisions Logical data-model is still needed, even if only "on paper," for application success BCHB697 - Edwards

Partition, Replication, Availability Documents spread across (many) commodity servers (sharding): Cheaper, more fault tolerant than massive server Replicate document for availability Inserts, retrievals can operate in parallel All documents must self contained Query by id can be sent to single server Query by key value is executed by all servers in parallel and results merged BCHB697 - Edwards

Map Reduce / Hadoop Simple computational model for large scale parallel data processing Esp. good for partitioned document store queries Map: Each server executes on its portion of the document collection independently Reduce: Results from each server are merged in batches as they become available BCHB697 - Edwards

Document Stores Flexible data-model: rapid prototyping Simple queries, clear retrieval priorities Push application logic to the middle-tier or client – avoid complex queries in RDMS Scalability, sharding, partitioning esp. for writes, updates BCHB697 - Edwards

Exercise Explore Fauxton interface, URL interface to CouchDB: CouchDB, Fauxton Explore JSON web-services here: https://www.ebi.ac.uk/proteins/api/doc Python script to interact with CouchDB: urllib, json modules Extract documents from uniprot database BCHB697 - Edwards

Exercise import urllib, json base = 'https://admin:admin@edwardslab.bmcb.georgetown.edu' + \ 'couchdb/uniprot/' data = json.loads(urllib.urlopen(base + '_all_docs').read()) for r in data['rows']: id = r['id']a entry = json.loads(urllib.urlopen(base + id).read()) print entry BCHB697 - Edwards