NoSQL Databases: MongoDB vs Cassandra

Slides:



Advertisements
Similar presentations
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advertisements

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
A Survey of Distributed Database Management Systems Brady Kyle CSC
Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Presenter.
Transaction Management and Concurrency Control
Database Management: Getting Data Together Chapter 14.
Overview Distributed vs. decentralized Why distributed databases
DBMS Functions Data, Storage, Retrieval, and Update
Presentation by Krishna
CSC 2720 Building Web Applications Database and SQL.
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
David Gibbs and Govardhan Tanniru Georgia State University Department of Computer Science P.O. Box 3965 Atlanta, GA
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
Databases with Scalable capabilities Presented by Mike Trischetta.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
Systems analysis and design, 6th edition Dennis, wixom, and roth
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
CHAPTER 8: MANAGING DATA RESOURCES. File Organization Terms Field: group of characters that represent something Record: group of related fields File:
1 Adapted from Pearson Prentice Hall Adapted form James A. Senn’s Information Technology, 3 rd Edition Chapter 7 Enterprise Databases and Data Warehouses.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,
High Throughput Computing on P2P Networks Carlos Pérez Miguel
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
NoSQL overview 杨振东. An order, which looks like a single aggregate structure in the UI, is split into many rows from many tables in a relational database.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Chapter 4 Data and Databases. Learning Objectives Upon successful completion of this chapter, you will be able to: Describe the differences between data,
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
SYS364 Database Design Continued. Database Design Definitions Initial ERD’s Normalization of data Final ERD’s Database Management Database Models File.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
Big Data and NoSQL What and Why?. Motivation: Size WWW has spawned a new era of applications that need to store and query very large data sets –Facebook.
NOSQL DATABASE Not Only SQL DATABASE
Object storage and object interoperability
NoSQL Cheng Lei Department of Electrical and Computer Engineering University of Victoria Mar 05, 2015.
NoSQL databases A brief introduction NoSQL databases1.
An Introduction to Super-Scalability But first…
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Distributed databases A brief introduction with emphasis on NoSQL databases Distributed databases1.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Amirhossein Saberi May CASSANDRA NAME A daughter of the Trojan king Priam, who was given the gift of prophecy by Apollo. When she cheated him, however,
James A. Senn’s Information Technology, 3rd Edition
CPSC-310 Database Systems
Plan for Final Lecture What you may expect to be asked in the Exam?
CSCI5570 Large Scale Data Processing Systems
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
and Big Data Storage Systems
Cloud Computing and Architecuture
DBMS & TPS Barbara Russell MBA 624.
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
NoSQL Database and Application
NOSQL.
The NoSQL Column Store used by Facebook
NOSQL databases and Big Data Storage Systems
A Comparison of SQL and NoSQL Databases
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
NoSQL Databases An Overview
NoSQL W2013 CSCI 2141.
Transaction Properties: ACID vs. BASE
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Working with GEOLocation Data
Presentation transcript:

NoSQL Databases: MongoDB vs Cassandra

Introduction What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino, 2013-07) Data can be accessed using DBMS (DataBase Management System) What is DBMS? “ DBMS can be defined as a collection of mechanisms that enables storage, edit and extraction of data” (Abramova & Bernardino, 2013-07)

SQL SQL: Structured Query Language Became standard for: Data interaction Data manipulation Data Stored as set of tables Accessing data from different tables at the same time is possible.

NoSQL Carlo Strozzi presented NoSQL in 1980, back then, it refers to an open source database that didn’t use SQL interface. Carlo Strozzi preferred to call it “noseequel” or “NoRel” Principle Difference Popular after San Francisco conference held 2009 Why do we need NoSQL? In SQL ,efficiency in information extraction is affected by the growth of data stored & used

CAP theorem Based from CAP theorem, the following guarantees can be defined: Consistency Availability Partition tolerance CAP theorem derives Relational and NoSQL principles

ACID “ACID is a principle based on CAP theorem and used as set of rules for relational database transactions.“ (Abramova & Bernardino, 2013-07) ACID guarantees: Atomic Consistent Isolated Durable What if the amount of data is large? ACID may be hard to accomplish!

BASE Principle & NoSQL BASE principle: BASE still follows CAP theorem. Basically Available Soft state Eventually consistent BASE still follows CAP theorem. Two of the three guarantees should be selected if the system is distributed.

Types of NoSQL Databases More than 150 different NoSQL databases Based on same principles Has some different characteristics. Categories: Key-value Store Document Store Column-family Graph database

Key-value store Data is stored as a group of key and value All keys are unique Data Access is done by relating those keys to values Hash contains all keys in order to provide information when needed

Document Store Databases are defined as set of Key-value stores that gets transformed into documents. Each document is identified by unique key Data access can be done using: key specific value

Column Family Similar to relational database model Structure: Super-Column Column family Structure of database is defined by super- columns and column families. Data access is accomplished by specifying column family, key and column in order to get value, using following structure: <columnFamily>.<key>.<column> = <value>

Graph database Those databases are used when data can be represented as graph, for example, social networks.

MONGODB “MongoDB is an open source NoSQL database developed in C++” (Abramova & Bernardino, 2013-07). MongoDB is a document store database Documents are gathered into groups according to their structure CAP theorem Consistency Partition tolerance

MONGODB (Cont.) Description Characteristics Data is sent to disc every 60 seconds. Everything is flushed to disc once new files are created Each document is identified by “id” field An index for the “id” field is created Characteristics Durability Concurrency

MongoDB Characteristics Durability Durability of data is accomplished by the creation of replicas. Master-Slave technique Master: read & write Slave: read Slave with recent data becomes Master if the Master goes down Replicas are asynchronous Concurrency Locks

CASSANDRA Similar to the usual relational model CAP theorem “Cassandra is a NoSQL database developed by Apache Software Foundation; written in Java” (Abramova & Bernardino, 2013-07) Similar to the usual relational model Difference is that stored data can be: semi structured unstructured. CAP theorem Partition tolerance High Availability Designed to save large amount of data and deal with huge volumes in an efficient way.

CASSANDRA (Cont.) Peer-to-peer architecture (NO MASTER) High availability High scalability Replicates data over multiple nodes in a cluster. Replication Factor: Total number of replicas. RF(1): 1 copy of each row on 1 node RF(2): 2 copies of same records on 2 nodes Fail nodes are replaced with no downtime, and they are detected using “gossip” protocols

CASSANDRA (Cont.) Replication Strategy: Cassandra Characteristics: Simple: single data center Network Topology: multiple data centers Cassandra Characteristics: Durability: Two replication types: Synchronous Asynchronous All writes & redundancies are known using a commit log. Indexing: “Each node maintains the indexes of the table it manages” Data is manipulated using CQL

YCSB “The YCSB – Yahoo! Cloud Serving Benchmark is one of the most used benchmarks to test NoSQL databases” (Abramova & Bernardino, 2013-07). YCSB has a client that consists of two parts: Workload generator Set of workloads. Workloads are combinations of: read Write update operations are done on randomly chosen records.

YCSB (Cont.) The predefined workloads are: Workload A: update heavy Workload B: Read mostly workload Workload C: Read only Workload D: Read latest workload? Workload E: Short ranges? Workload F: Read, modify and write Workload G: update mostly workload Workload H: update only

Workload A: 50%reads & 50% updates Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 19

Workload b: 95% Reads & 5%updates Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20

Workload C: 100% reads Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20

Workload D: read latest workload

Workload E: Short Ranges

Workload f: Read-Modify-Write Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20

Workload G: 5% reads 95% updates Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20

Workload H: 100% updates Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 21