Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:

Slides:



Advertisements
Similar presentations
CASSANDRA-A Decentralized Structured Storage System Presented By Sadhana Kuthuru.
Advertisements

Large Scale Computing Systems
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications.
AMAZON’S KEY-VALUE STORE: DYNAMO DeCandia,Hastorun,Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels: Dynamo: Amazon's highly available.
A Survey of Distributed Database Management Systems Brady Kyle CSC
Cassandra Structured Storage System over a P2P Network Avinash Lakshman, Prashant Malik.
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
Cloud Storage Theo Benson. Outline Distributed storage – Commodity server, limited resources, – Geodistribution, scalable, reliable Cassandra [FB] – High.
Cloud Storage Yizheng Chen. Outline Cassandra Hadoop/HDFS in Cloud Megastore.
By: Chris Hayes. Facebook Today, Facebook is the most commonly used social networking site for people to connect with one another online. People of all.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Rethinking Dynamo: Amazon’s Highly Available Key-value Store --An Offense Shih-Chi Chen Hongyu Gao.
A Decentralized Structure Storage Model - Avinash Lakshman & Prashanth Malik - Presented by Srinidhi Katla CASSANDRA.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Cloud Storage: All your data belongs to us! Theo Benson This slide includes images from the Megastore and the Cassandra papers/conference slides.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
ZhangGang, Fabio, Deng Ziyan /31 NoSQL Introduction to Cassandra Data Model Design Implementation.
© 2011 Cisco All rights reserved.Cisco Confidential 1 APP server Client library Memory (Managed Cache) Memory (Managed Cache) Queue to disk Disk NIC Replication.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Cloud Computing Cloud Data Serving Systems Keke Chen.
High Throughput Computing on P2P Networks Carlos Pérez Miguel
KIT – University of the State of Baden-Württemberg and National Large-scale Research Center of the Helmholtz Association Institute of Applied Informatics.
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
Cassandra - A Decentralized Structured Storage System
Cassandra – A Decentralized Structured Storage System Lecturer : Prof. Kyungbaek Kim Presenter : I Gde Dharma Nugraha.
Database Essentials. Key Terms Big Data Describes a dataset that cannot be stored or processed using traditional database software. Examples: Google search.
MySQL to NoSQL Data Modeling Challenges in Supporting Scalability ΧΑΡΟΚΟΠΕΙΟ ΠΑΝΕΠΙΣΤΗΜΙΟ - ΤΜΗΜΑ ΠΛΗΡΟΦΟΡΙΚΗΣ ΚΑΙ ΤΗΛΕΜΑΤΙΚΗΣ ΠΜΣ "Πληροφορική και Τηλεματική“
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
Large-scale Linked Data Management Marko Grobelnik, Andreas Harth (Günter Ladwig), Dumitru Roman Big Linked Data Tutorial Semantic Days 2012.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
By Vaibhav Nachankar Arvind Dwarakanath.  HBase is an open-source, distributed, column- oriented and sorted-map data storage.  It is a Hadoop Database;
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NOSQL DATABASE Not Only SQL DATABASE
NoSQL Cheng Lei Department of Electrical and Computer Engineering University of Victoria Mar 05, 2015.
CS 245Notes 131 CS 245: Database System Principles Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Introduction to NoSQL Databases Chyngyz Omurov Osman Tursun Ceng,Middle East Technical University.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
Look Mom! – NoSQL Charles Nurse | DotNetNuke Corp.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Cloud Computing Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2012.
Amirhossein Saberi May CASSANDRA NAME A daughter of the Trojan king Priam, who was given the gift of prophecy by Apollo. When she cheated him, however,
Why NO-SQL ?  Three interrelated megatrends  Big Data  Big Users  Cloud Computing are driving the adoption of NoSQL technology.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Cassandra - A Decentralized Structured Storage System
CS122B: Projects in Databases and Web Applications Winter 2017
A free and open-source distributed NoSQL database
Open Source distributed document DB for an enterprise
NOSQL.
The NoSQL Column Store used by Facebook
Cloud Computing Ed Lazowska August 2011 Bill & Melinda Gates Chair in
NOSQL databases and Big Data Storage Systems
Storage Systems for Managing Voluminous Data
Peer to Peer Information Retrieval
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation: Joab Jackson, “New Cassandra Can Pack Two Billion Columns Into a Row”, PCWorld News, January 2011.

What was the Problem ?  Facebook Messages Inbox Search  Feature that enables users to search through their Facebook Inbox  Millions of messages are sent everyday on Facebook  Messages stored in different data centers  How to handle indexing all of this information for Inbox search ? 2

What is Cassandra ?  Distributed storage system  Designed for managing kind of NoSQL database NoSQL: Key-Value, schema-less database  Scale to a very large size across many servers spread across different datacenters small and large components fail continuously  No single point of failure Data replicated at several nodes 3

Cassandra Goals  High scalability  The ability to scale incrementally  High performance  The ability to respond quickly  High availability  The ability to retain data available for users 4

Cassandra Data Model  Cassandra does not support a full relational data model  Key-Value data model  Every row is identified by a unique key  Every row can have unlimited number of Columns classified in different columns family can pack Two Billion columns into a row  Columns are sorted in a row by name order time order (required for inbox search) 5

Distribution and Replication  Data is distributed across the nodes using Consistent Hashing function  High availability is achieved using replication  If one storage node fails, data that has been replicated in other nodes is available.  Data replicate at N node across data centers actively.  Replication policies: Rack Unaware Rack Aware Datacenter Aware 6

Users of Cassandra System  First deployment:  2008 by Facebook, inspired by Google and Amazon  Designed for message inbox search system  Stores TB’s of indexes across a cluster of 600+ cores and 120+ TB of disk space  Each node can handle over 5,000 requests per second  Well-known users: 7

References  Prashant Malik, “Inbox Search”  Joab Jackson, “Apache Cassandra Ready for the Enterprise”,  Joab Jackson “, New Cassandra Can Pack Two Billion Columns Into a Row  Avinash Lakshman and Prashant Malik. “Cassandra: a decentralized structured storage system” SIGOPS Oper. Syst. Rev. 44, 2 (April 2010) 8

Thank You 9