Cassandra Structured Storage System over a P2P Network Avinash Lakshman, Prashant Malik.

Slides:



Advertisements
Similar presentations
Cassandra and Sigmod contest Cloud computing group Haiping Wang
Advertisements

Dynamo: Amazon’s Highly Available Key-value Store
CASSANDRA-A Decentralized Structured Storage System Presented By Sadhana Kuthuru.
Building a Database on S3 Matthias Brantner, Daniela Florescu, David Graf, Donald Kossmann, Tim Kraska Xiang Zhang
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications.
AMAZON’S KEY-VALUE STORE: DYNAMO DeCandia,Hastorun,Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels: Dynamo: Amazon's highly available.
Amazon’s Dynamo Simple Cloud Storage. Foundations 1970 – E.F. Codd “A Relational Model of Data for Large Shared Data Banks”E.F. Codd –Idea of tabular.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Cloud Storage Yizheng Chen. Outline Cassandra Hadoop/HDFS in Cloud Megastore.
Giovanni Chierico | May 2012 | Дубна Consistency in a distributed world.
NoSQL Databases: MongoDB vs Cassandra
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:
Scalable Adaptive Data Dissemination Under Heterogeneous Environment Yan Chen, John Kubiatowicz and Ben Zhao UC Berkeley.
Distributed Systems Fall 2011 Gossip and highly available services.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Client-Centric.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
A Decentralized Structure Storage Model - Avinash Lakshman & Prashanth Malik - Presented by Srinidhi Katla CASSANDRA.
Wide-area cooperative storage with CFS
Distributed Databases
Distributed storage for structured data
Amazon’s Dynamo System The material is taken from “Dynamo: Amazon’s Highly Available Key-value Store,” by G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
Cloud Storage – A look at Amazon’s Dyanmo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as.
Dynamo: Amazon’s Highly Available Key-value Store Presented By: Devarsh Patel 1CS5204 – Operating Systems.
Cloud Storage: All your data belongs to us! Theo Benson This slide includes images from the Megastore and the Cassandra papers/conference slides.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Dynamo: Amazon’s Highly Available Key-value Store COSC7388 – Advanced Distributed Computing Presented By: Eshwar Rohit
1 The Google File System Reporter: You-Wei Zhang.
CS Storage Systems Lecture 14 Consistency and Availability Tradeoffs.
Consistency And Replication
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University Subject : Cassandra - A Decentralized Structured Storage System Professor.
Cloud Computing Cloud Data Serving Systems Keke Chen.
High Throughput Computing on P2P Networks Carlos Pérez Miguel
Dynamo: Amazon's Highly Available Key-value Store Dr. Yingwu Zhu.
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
Cassandra - A Decentralized Structured Storage System
Cassandra – A Decentralized Structured Storage System Lecturer : Prof. Kyungbaek Kim Presenter : I Gde Dharma Nugraha.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
Peer to Peer Networks Distributed Hash Tables Chord, Kelips, Dynamo Galen Marchetti, Cornell University.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Partitioning and Replication.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Keepin it real (time) Cassandra in a real time bidding infrastructure.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
An Introduction to Super-Scalability But first…
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cassandra Architecture.
Big Data Yuan Xue CS 292 Special topics on.
Mobility Victoria Krafft CS /25/05. General Idea People and their machines move around Machines want to share data Networks and machines fail Network.
Distributed Systems CS 425 / ECE 428 Fall 2012
Cassandra - A Decentralized Structured Storage System
P2P: Storage.
Dynamo: Amazon’s Highly Available Key-value Store
The NoSQL Column Store used by Facebook
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
EECS 498 Introduction to Distributed Systems Fall 2017
EECS 498 Introduction to Distributed Systems Fall 2017
IS 698/800-01: Advanced Distributed Systems Membership Management
Presentation transcript:

Cassandra Structured Storage System over a P2P Network Avinash Lakshman, Prashant Malik

Why Cassandra? Lots of data –Copies of messages, reverse indices of messages, per user data. Many incoming requests resulting in a lot of random reads and random writes. No existing production ready solutions in the market meet these requirements.

Design Goals High availability Eventual consistency –trade-off strong consistency in favor of high availability Incremental scalability Optimistic Replication “Knobs” to tune tradeoffs between consistency, durability and latency Low total cost of ownership Minimal administration

Data Model KEY ColumnFamily1 Name : MailList Type : Simple Sort : Name Name : tid1 Value : TimeStamp : t1 Name : tid2 Value : TimeStamp : t2 Name : tid3 Value : TimeStamp : t3 Name : tid4 Value : TimeStamp : t4 ColumnFamily2 Name : WordList Type : Super Sort : Time Name : aloha C1 V1 T1 C2 V2 T2 C3 V3 T3 C4 V4 T4 Name : dude C2 V2 T2 C6 V6 T6 Column Families are declared upfront Columns are added and modified dynamically SuperColumns are added and modified dynamically Columns are added and modified dynamically

Write Operations A client issues a write request to a random node in the Cassandra cluster. The “Partitioner” determines the nodes responsible for the data. Locally, write operations are logged and then applied to an in-memory version. Commit log is stored on a dedicated disk local to the machine.

Write Properties No locks in the critical path Sequential disk access Behaves like a write back Cache Append support without read ahead Atomicity guarantee for a key per replica “Always Writable” –accept writes during failure scenarios

Read Query Closest replica Cassandra Cluster Replica A Result Replica BReplica C Digest Query Digest Response Result Client Read repair if digests differ

Cluster Membership and Failure Detection Gossip protocol is used for cluster membership. Super lightweight with mathematically provable properties. State disseminated in O(logN) rounds where N is the number of nodes in the cluster. Every T seconds each member increments its heartbeat counter and selects one other member to send its list to. A member merges the list with its own list.

Accrual Failure Detector Valuable for system management, replication, load balancing etc. Defined as a failure detector that outputs a value, PHI, associated with each process. Also known as Adaptive Failure detectors - designed to adapt to changing network conditions. The value output, PHI, represents a suspicion level. Applications set an appropriate threshold, trigger suspicions and perform appropriate actions. In Cassandra the average time taken to detect a failure is seconds with the PHI threshold set at 5.

Properties of the Failure Detector If a process p is faulty, the suspicion level Φ(t)  ∞as t  ∞. If a process p is faulty, there is a time after which Φ(t) is monotonic increasing. A process p is correct  Φ(t) has an ub over an infinite execution. If process p is correct, then for any time T, Φ(t) = 0 for t >= T.

Performance Benchmark Loading of data - limited by network bandwidth. Read performance for Inbox Search in production: Search InteractionsTerm Search Min7.69 ms7.78 ms Median15.69 ms18.27 ms Average26.13 ms44.41 ms

Lessons Learnt Add fancy features only when absolutely required. Many types of failures are possible. Big systems need proper systems-level monitoring. Value simple designs

Questions?