Shujaat Hussain. A single column A single row.

Slides:



Advertisements
Similar presentations
CASSANDRA-A Decentralized Structured Storage System Presented By Sadhana Kuthuru.
Advertisements

2 Proprietary & Confidential What is Sharding Benefits of Sharding Alternatives of Sharding When to start Sharding Agenda.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Distributed databases
Transaction.
Chapter 13 (Web): Distributed Databases
PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang.
By: Chris Hayes. Facebook Today, Facebook is the most commonly used social networking site for people to connect with one another online. People of all.
Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
ABCSG - Distributed Database 1 Data Management Distributed Database Data Replication.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Overview Distributed vs. decentralized Why distributed databases
Chapter 3 : Distributed Data Processing
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Group 11 Sameera Shah & Fatemah Husain [10/31/13].
Distributed Databases
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
Databases with Scalable capabilities Presented by Mike Trischetta.
Titan Graph Database Meet Bhatt(13MCEC02).
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
1 Overview of Databases. 2 Content Databases Example: Access Structure Query language (SQL)
Simple Database.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
1 Adapted from Pearson Prentice Hall Adapted form James A. Senn’s Information Technology, 3 rd Edition Chapter 7 Enterprise Databases and Data Warehouses.
HNDComputing – DeMontfort University  DeMontfort University 2011 Database Fundamentals wk2 Database Design ConceptsDatabase Design Concepts Database Design.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
KIT – University of the State of Baden-Württemberg and National Large-scale Research Center of the Helmholtz Association Institute of Applied Informatics.
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.
NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
PHP and MySQL CS How Web Site Architectures Work  User’s browser sends HTTP request.  The request may be a form where the action is to call PHP.
Beth Schaefer, assistant director Client Services University Information Technology Services IT's 4 U: Putting social networking tools to work.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
MySQL to NoSQL Data Modeling Challenges in Supporting Scalability ΧΑΡΟΚΟΠΕΙΟ ΠΑΝΕΠΙΣΤΗΜΙΟ - ΤΜΗΜΑ ΠΛΗΡΟΦΟΡΙΚΗΣ ΚΑΙ ΤΗΛΕΜΑΤΙΚΗΣ ΠΜΣ "Πληροφορική και Τηλεματική“
Chapter 8 Data and Knowledge Management. 2 Learning Objectives When you finish this chapter, you will  Know the difference between traditional file organization.
Large-scale Linked Data Management Marko Grobelnik, Andreas Harth (Günter Ladwig), Dumitru Roman Big Linked Data Tutorial Semantic Days 2012.
MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built.
1 Distributed Databases BUAD/American University Distributed Databases.
DISTRIBUTED DATABASES JORGE POMBAR. Overview Most businesses need to support databases at multiple sites. Most businesses need to support databases at.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Lecture 8: Databases and Data Infrastructure CS 6071 Big Data Engineering, Architecture, and Security Fall 2015, Dr. Rozier.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
NOSQL DATABASE Not Only SQL DATABASE
DATABASE REPLICATION DISTRIBUTED DATABASE. O VERVIEW Replication : process of copying and maintaining database object, in multiple database that make.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Introduction to NoSQL Databases Chyngyz Omurov Osman Tursun Ceng,Middle East Technical University.
Managing Data Resources File Organization and databases for business information systems.
Just insert..it's NOSQL Assem Ragab. Agenda ● NOSQL in others' sight ● Life evolves! ● New business generation ● RDBMS limitations Vs new Business needs.
Scaling PostgreSQL with GridSQL. Who Am I? Jim Mlodgenski – Co-organizer of NYCPUG – Founder of Cirrus Technologies – Former Chief Architect of EnterpriseDB.
Amirhossein Saberi May CASSANDRA NAME A daughter of the Trojan king Priam, who was given the gift of prophecy by Apollo. When she cheated him, however,
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
Open Source distributed document DB for an enterprise
NOSQL.
Introduction to NewSQL
Data Lifecycle Review and Outlook
NoSQL Databases An Overview
Chapter 17: Database System Architectures
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
MANAGING DATA RESOURCES
CloudAnt: Database as a Service (DBaaS)
Presentation transcript:

Shujaat Hussain

A single column

A single row

 Consistency –the system is in a consistent state after an operation  Availability –the system is “always on”, no downtime  Partition tolerance–the system continues to function even when split into disconnected subsets (by a network disruption)

 MySQL  300ms write  350ms read  Cassandra  0.12ms write  15ms read

 You need a key or keys:  Single: key=‘a’  Range: key=‘a’ through ’f’  And columns to retrieve:  Slice: cols={bar through kite}  By name: key=‘b’ cols={bar, cat, llama}  Nothing like SQL “ WHERE col=‘faz’ ”

 Digg is a social news site that allows people to discover and share content from anywhere on the Internet by submitting stories and links, and voting and commenting on submitted stories and links.

 Problems  Terabytes of data; high transaction rate (reads dominated)  Multiple clusters  Management nightmare (high effort, error prone)  Unsatisfied availability requirements (geographic isolation)  Solution  Cassandra as primary data store  Datacenter and rack-aware replication

 Twitter is a social networking and microblogging service that enables its users to send and read tweets, text-based posts of up to 140 characters.  Terabytes of data, ~1,000,000 ops/s

 Inbox Search  100 TB  160 nodes  1/2 billion writes per day (2yr old number?)

 Advantages  Massive scalability  High availability  Lower cost (than competitive solutions at that scale)  (usually) predictable elasticity  Schema flexibility, sparse & semi-structured data

 Disadvantages  Limited query capabilities (so far)  Eventual consistency is not intuitive to program for  Makes client applications more complicated  No standardizatrion  Portability might be an issue  Insufficient access control