NOSQL By: Joseph Cooper MIS 409 MIS 409

Slides:



Advertisements
Similar presentations
Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
Advertisements

CASSANDRA-A Decentralized Structured Storage System Presented By Sadhana Kuthuru.
2 Proprietary & Confidential What is Sharding Benefits of Sharding Alternatives of Sharding When to start Sharding Agenda.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Map/Reduce in Practice Hadoop, Hbase, MongoDB, Accumulo, and related Map/Reduce- enabled data stores.
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
NoSQL Databases: MongoDB vs Cassandra
Reporter: Haiping Wang WAMDM Cloud Group
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay.
NoSQL By Perry Hoekstra Technical Consultant Perficient, Inc.
Introduction to NOSQL Databases
Databases with Scalable capabilities Presented by Mike Trischetta.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
What makes Facebook do what it does? By Gavin Mais.
NoSQL by Michael Britton, Mark McGregor, and Sam Howard
Distributed Data Stores and No SQL Databases S. Sudarshan Perry Hoekstra (Perficient) with slides pinched from various sources such as Perry Hoekstra (Perficient)
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
WTT Workshop de Tendências Tecnológicas 2014
© , OrangeScape Technologies Limited. Confidential 1 Write Once. Cloud Anywhere. Building Highly Scalable Web applications BASE gives way to ACID.
Goodbye rows and tables, hello documents and collections.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
High Throughput Computing on P2P Networks Carlos Pérez Miguel
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
Cloud Computing Clase 8 - NoSQL Miguel Johnny Matias
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL.
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
NoSQL databases A brief introduction NoSQL databases1.
Introduction to NoSQL Databases Chyngyz Omurov Osman Tursun Ceng,Middle East Technical University.
Database Processing Chapter "No, Drew, You Don’t Know Anything About Creating Queries.” Copyright © 2015 Pearson Education, Inc. Operational database.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
BIG DATA/ Hadoop Interview Questions.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Intro to NoSQL Databases Tony Hannan November 2011.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
CS 405G: Introduction to Database Systems
and Big Data Storage Systems
Cloud Computing and Architecuture
A free and open-source distributed NoSQL database
Modern Databases NoSQL and NewSQL
NOSQL.
Introduction to NewSQL
NOSQL databases and Big Data Storage Systems
NoSQL Systems Overview (as of November 2011).
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
NOSQL and CAP Theorem.
NoSQL Databases Antonino Virgillito.
NoSQL By Perry Hoekstra Technical Consultant Perficient, Inc.
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Presentation transcript:

NOSQL By: Joseph Cooper MIS 409 MIS 409

TABLE OF CONTENTS  Why NoSQL  History of NoSQL  SQL vs NoSQL  How did we get here?  Main characteristics of a NoSQL model  Dynamo and BigTable  CAP Theorem  Availability VS Consistency  Kinds of NoSQL  What am I giving up?  Cassandra  Code Examples  Statistics  Things to thing about  Don’t forget about DBA  Where would I use it  Summary  Questions  Resources

HISTORY OF NO SQL  Relational databases  RDBMS style databases are becoming problematic  NoSQL was coined by Carlo Strozzi in the year 1998

HISTORY OF NO SQL (CONTINUED)  Facebooks open sources the Cassandra Project (inbox search) in 2008  In 2009, Last FM (online streaming music website) wanted to organize an event on open-source distributed databases.  NoSQL Conferences

SQL VS NO SQL  Large datasets and an acceptance towards the alternatives have created a market for NoSQL  NoSQL is not a backlash/rebellion against RDBMS  SQL is a rich query language that cannot be rivaled by the current list of NoSQL offerings.

WHO’S USING IT?

WHY NOSQL?  For data storage, an RDBMS cannot be the only option.  Just as there are different programming languages, there need be different shortage options.  A NoSQL solution is being more acceptable to a clients because of the flexibility and performance increases it can add to companies.

WHY NO SQL (CONTINUED)  Three trends disrupting the database status quo  Big Data  Big Users (Facebook for example)  Cloud Computing  NoSQL is increasingly being used by companies as a viable alternative to relational databases.  NoSQL allows for performance and flexibility unseen by traditional relational databases.

HOW DID WE GET HERE?  With a blast of social media sites (Instagram, LinkedIN, Facebook, Twitter and Google Plus) using massive amount of data. (Terrabyte/petabtyes)  Rise of cloud-based solutions such as Amazon S3 (simple storage solution)  Open-source community

MAIN CHARACTERISTICS OF NOSQL DBMS  NoSQL stands for “not only SQL”.  NoSQL is considered to be a class of non-relational data storage systems..  All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP theorem)

DYNAMO AND BIGTABLE  Three major papers were the seeds of the NoSQL movement  BigTable (Google)  Dynamo (Amazon)  Gossip protocol (discovery and error detection)  Distributed key-value data store  Eventual consistency  CAP Theorem

CAP THEOREM  Consistency  Availability  Partitions  You must pick two out of these three for your system.  When you scale out your partition you must choose between consistency and availability. Normally, companies choose availability.

AVAILABILITY VS CONSISTENCY  Traditionally server/process are consider available by having five 9’s ( %).  However, with a large node system. At any point in time there’s a strong chance that a node is either down or there is a network disruption among the nodes.  In a consistency model there are rules for visibility and apparent order.  Strict consistency states that availability and partition-tolerance can not be achieved at the same time.

WHAT KINDS OF NOSQL  NoSQL solutions fall into two major areas:  Key/Value or ‘the big hash table’.  Amazon S3 (Dynamo)  Voldemort  Scalaris  Schema-less which comes in multiple flavors, column-based, document-based or graph-based.  Cassandra (column-based)  CouchDB (document-based)  Neo4J (graph-based)  HBase (column-based)

KEY/VALUE Pros:  very fast  very scalable  simple model  able to distribute horizontally Cons: - many data structures (objects) can't be easily modeled as key value pairs

SCHEMA-LESS Pros: - Schema-less data model is richer than key/value pairs - eventual consistency - many are distributed - still provide excellent performance and scalability Cons: - typically no ACID transactions or joins

COMMON ADVANTAGES  Cheap, easy to implement (open source)  Data are replicated to multiple nodes (therefore identical and fault-tolerant) and can be partitioned  Down nodes easily replaced  No single point of failure  Easy to distribute  Don't require a schema  Can scale up and down  Relax the data consistency requirement (CAP)

WHAT AM I GIVING UP?  joins  group by  order by  ACID transactions  SQL as a sometimes frustrating but still powerful query language  easy integration with other applications that support SQL

CASSANDRA  Originally developed at Facebook  Follows the BigTable data model: column-oriented  Uses the Dynamo Eventual Consistency model  Written in Java  Open-sourced and exists within the Apache family  Uses Apache Thrift as it’s API

THRIFT  Created at Facebook along with Cassandra  Is a cross-language, service-generation framework  Binary Protocol (like Google Protocol Buffers)  Compiles to: C++, Java, PHP, Ruby, Erlang, Perl,...

SEARCHING  Relational  SELECT `column` FROM `database`,`table` WHERE `id` = key;  SELECT product_name FROM rockets WHERE id = 123;  Cassandra (standard)  keyspace.getSlice(key, “column_family”, "column")  keyspace.getSlice(123, new ColumnParent(“rockets”), getSlicePredicate());

TYPICAL NOSQL API  Basic API access:  get(key) -- Extract the value given a key  put(key, value) -- Create or update the value given its key  delete(key) -- Remove the key and its associated value  execute(key, operation, parameters) -- Invoke an operation to the value (given its key) which is a special data structure (e.g. List, Set, Map.... etc).

DATA MODEL  Within Cassandra, you will refer to data this way:  Column: smallest data element, a tuple with a name and a value :Rockets, '1' might return: {'name' => ‘Rocket-Powered Roller Skates', ‘toon' => ‘Ready Set Zoom', ‘inventoryQty' => ‘5‘, ‘productUrl’ => ‘rockets\1.gif’}

DATA MODEL CONTINUED  ColumnFamily: There’s a single structure used to group both the Columns and SuperColumns. Called a ColumnFamily (think table), it has two types, Standard & Super.  Column families must be defined at startup  Key: the permanent name of the record  Keyspace: the outer-most level of organization. This is usually the name of the application. For example, ‘Acme' (think database name).

CASSANDRA AND CONSISTENCY  Cassandra has programmable read/writable consistency  One: Return from the first node that responds  Quorom: Query from all nodes and respond with the one that has latest timestamp once a majority of nodes responded  All: Query from all nodes and respond with the one that has latest timestamp once all nodes responded. An unresponsive node will fail the node

CASSANDRA AND CONSISTENCY  Zero  Any  One  Quorom  All

CONSISTENT HASHING  Partition using consistent hashing  Keys hash to a point on a fixed circular space  Ring is partitioned into a set of ordered slots and servers and keys hashed over these slots  Nodes take positions on the circle.  A, B, and D exists.  B responsible for AB range.  D responsible for BD range.  A responsible for DA range.  C joins.  B, D split ranges.  C gets BC from D.

CODE EXAMPLES: CASSANDRA GET OPERATION try { cassandraClient = cassandraClientPool.borrowClient(); // keyspace is Acme Keyspace keyspace = cassandraClient.getKeyspace(getKeyspace()); // inventoryType is Rockets List result = keyspace.getSlice(Long.toString(inventoryId), new ColumnParent(inventoryType), getSlicePredicate()); inventoryItem.setInventoryItemId(inventoryId); inventoryItem.setInventoryType(inventoryType); loadInventory(inventoryItem, result); } catch (Exception exception) { logger.error("An Exception occurred retrieving an inventory item", exception); } finally { try { cassandraClientPool.releaseClient(cassandraClient); } catch (Exception exception) { logger.warn("An Exception occurred returning a Cassandra client to the pool", exception); }

CODE EXAMPLES: CASSANDRA UPDATE OPERATION try { cassandraClient = cassandraClientPool.borrowClient(); Map > data = new HashMap >(); List columns = new ArrayList (); // Create the inventoryId column. ColumnOrSuperColumn column = new ColumnOrSuperColumn(); columns.add(column.setColumn(new Column("inventoryItemId".getBytes("utf-8"), Long.toString(inventoryItem.getInventoryItemId()).getBytes("utf-8"), timestamp))); column = new ColumnOrSuperColumn(); columns.add(column.setColumn(new Column("inventoryType".getBytes("utf-8"), inventoryItem.getInventoryType().getBytes("utf-8"), timestamp))); …. data.put(inventoryItem.getInventoryType(), columns); cassandraClient.getCassandra().batch_insert(getKeyspace(), Long.toString(inventoryItem.getInventoryItemId()), data, ConsistencyLevel.ANY); } catch (Exception exception) { … }

SOME STATISTICS  Facebook Search  MySQL > 50 GB Data  Writes Average : ~300 ms  Reads Average : ~350 ms  Rewritten with Cassandra > 50 GB Data  Writes Average : 0.12 ms  Reads Average : 15 ms

SOME THINGS TO THINK ABOUT  You would have to build your own Object-relational mapping to work with NoSQL.  However, some plugins may already exist.  Same would go for Java/C#, no Hibernate-like framework.  A simple Java Data Object framework does exist.  Does offer support for basic languages like Ruby.

SOME MORE THINGS TO THINK ABOUT  Troubleshooting performance problems  Concurrency on non-key accesses  Are the replicas working?  No TOAD for Cassandra  though some NoSQL offerings have GUI tools  have SQLPlus-like capabilities using Ruby IRB interpreter.

DON’T FORGET ABOUT THE DBA  It does not matter if the data is deployed on a NoSQL platform instead of an RDBMS.  Still need to address:  Backups & recovery  Capacity planning  Performance monitoring  Data integration  Tuning & optimization  What happens when things don’t work as expected and nodes are out of sync or you have a data corruption occurring at 2am?

WHERE WOULD I USE IT?  For most of us, we will work in corporate IT.  Where would I use a NoSQL database?  Do you have somewhere a large set of uncontrolled, unstructured, data that you are trying to fit into a RDBMS?  Log Analysis  Social Networking Feeds (many firms hooked in through Facebook or Twitter)  Data that is not easily analyzed in a RDBMS such as time-based data  Large data feeds that need to be massaged before entry into an RDBMS

SUMMARY  Leading users of NoSQL datastores are social networking sites such as Twitter, Facebook, LinkedIn, and Reddit.  To implement a single feature in Cassandra, Facebook has a dataset that is in the terabytes and billion columns.  Therefore not every problem is a NoSQL fix and not every solution is a SQL statement.

QUESTIONS

RESOURCES  Cassandra   NoSQL News websites    High Scalability 