A free and open-source distributed NoSQL database

Slides:



Advertisements
Similar presentations
CASSANDRA-A Decentralized Structured Storage System Presented By Sadhana Kuthuru.
Advertisements

Large Scale Computing Systems
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
NoSQL Databases: MongoDB vs Cassandra
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
NoSQL Database.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Cloud Storage: All your data belongs to us! Theo Benson This slide includes images from the Megastore and the Cassandra papers/conference slides.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
A Study in NoSQL & Distributed Database Systems John Hawkins.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay.
Databases with Scalable capabilities Presented by Mike Trischetta.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
Zhang Gang Big data High scalability One time write, multi times read …….(to be add )
Distributed Data Stores and No SQL Databases S. Sudarshan Perry Hoekstra (Perficient) with slides pinched from various sources such as Perry Hoekstra (Perficient)
Modern Databases NoSQL and NewSQL Willem Visser RW334.
High Throughput Computing on P2P Networks Carlos Pérez Miguel
LOGO Discussion Zhang Gang 2012/11/8. Discussion Progress on HBase 1 Cassandra or HBase 2.
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
The Lightning Way XIV Encontro da comunidade SQLPort LX
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NOSQL DATABASE Not Only SQL DATABASE
HDB++: High Availability with
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
Introduction to NoSQL Databases Chyngyz Omurov Osman Tursun Ceng,Middle East Technical University.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
BIG DATA/ Hadoop Interview Questions.
Amirhossein Saberi May CASSANDRA NAME A daughter of the Trojan king Priam, who was given the gift of prophecy by Apollo. When she cheated him, however,
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
NoSQL: Graph Databases
Neo4j: GRAPH DATABASE 27 March, 2017
Plan for Final Lecture What you may expect to be asked in the Exam?
CS 405G: Introduction to Database Systems
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
NoSQL: Graph Databases
and Big Data Storage Systems
Cloud Computing and Architecuture
Hadoop Aakash Kag What Why How 1.
Introduction to Distributed Platforms
How did it start? • At Google • • • • Lots of semi structured data
CS122B: Projects in Databases and Web Applications Winter 2017
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
NoSQL Database and Application
Azure Cosmos DB Venitta J Microsoft Connect /6/2018 4:36 PM
Modern Databases NoSQL and NewSQL
NOSQL.
The NoSQL Column Store used by Facebook
Dineesha Suraweera.
Christian Stark and Odbayar Badamjav
Introduction to NewSQL
NOSQL databases and Big Data Storage Systems
A Comparison of SQL and NoSQL Databases
NoSQL Systems Overview (as of November 2011).
Storage Systems for Managing Voluminous Data
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
NoSQL Databases An Overview
NoSQL Databases Antonino Virgillito.
Overview of big data tools
NoSQL W2013 CSCI 2141.
CSE 482 Lecture 5: NoSQL.
Transaction Properties: ACID vs. BASE
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Presentation transcript:

A free and open-source distributed NoSQL database Computer Clusters and Grids Prof. Andrey Shevel A free and open-source distributed NoSQL database Carl Kugblenu ITMO University – St. Petersbury

What is NoSQL? Stands for Not Only SQL Class of non-relational data storage systems Usually do not require a fixed table schema nor do they use the concept of joins -> NoSQL was a term coined by Eric Evans. He states that ‘… but the whole point of seeking alternatives is that you need to solve a problem that relational databases are a bad fit for. … -> This is why people are continually interpreting nosql to be anti-RDBMS, it's the only rational conclusion when the only thing some of these projects share in common is that they are not relational databases.’ -> Emil Elfrem stated it is not a ‘NO’ SQL but more of a ‘Not Only” SQL.

Distributed Database

Eventual Consistency When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID -> http://en.wikipedia.org/wiki/Eventual_consistency -> http://www.allthingsdistributed.com/2008/12/eventually_consistent.html The types of large systems based on CAP aren't ACID they are BASE (http://queue.acm.org/detail.cfm?id=1394128): * Basically Available - system seems to work all the time * Soft State - it doesn't have to be consistent all the time * Eventually Consistent - becomes consistent at some later time Everyone who builds big applications builds them on CAP and BASE: Google, Yahoo, Facebook, Amazon, eBay, etc

Cassandra Originally developed at Facebook Follows the BigTable data model: column-oriented Uses the Dynamo Eventual Consistency model Written in Java Open-sourced and exists within the Apache family

Why Use Cassandra

Features Fault Tolerant Architecture: Cassandra has no single point of failure. Every node in the cluster is identical and can run on commodity hardware. Performant: Cassandra supports very high throughput. In particular the Cassandra write path allows for high performance, robust writes. Resilience: Data is automatically replicated to multiple nodes Scalable: Cassandra is horizontally scalable. Flexible data storage: Cassandra accommodates all possible data formats including: structured, semi-structured, and unstructured.

Architecture presentation title

Cassandra: Requests Flow presentation title

Internode Communication Cassandra uses Gossip communication protocol in which nodes periodically exchange state information about themselves and other nodes they know about Gossip messages are versioned During a gossip exchange, older information is overwritten with the most current state for a particular node

Data Model

Advantages Write Speeds Multi-DC Replication Tunable Consistency

Drawbacks No Ad-Hoc Queries No Aggregations Unpredictable Performance JVM Based

Use Cases Immutable data where updates and deletions are the exception rather than the rule. High-throughput firehoses of immutable events such as personalization, fraud detection, time series and IOT/sensors.

Demo Deployment

References http://cassandra.apache.org/ http://www.datastax.com/dev/blog/multi-datacenter-replication https://www.edureka.co/blog/apache-cassandra-advantages/ https://en.wikipedia.org/wiki/Apache_Cassandra https://www.digitalocean.com/community/tutorials/how-to-install-cassandra-and-run-a-single-node-cluster-on-ubuntu-14-04 https://opencredo.com/fulfilling-promise-apache-cassandra/ https://www.quora.com/What-are-the-pros-and-cons-of-using-the-Cassandra-database