Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University Subject : Cassandra - A Decentralized Structured Storage System Professor.

Slides:

Advertisements

Similar presentations

CASSANDRA-A Decentralized Structured Storage System Presented By Sadhana Kuthuru.

Advertisements

Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.

Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung

The google file system Cs 595 Lecture 9.

Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications.

AMAZON’S KEY-VALUE STORE: DYNAMO DeCandia,Hastorun,Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels: Dynamo: Amazon's highly available.

Cassandra Structured Storage System over a P2P Network Avinash Lakshman, Prashant Malik.

Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:

Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.

CS 582 / CMPE 481 Distributed Systems

1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.

Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.

A Decentralized Structure Storage Model - Avinash Lakshman & Prashanth Malik - Presented by Srinidhi Katla CASSANDRA.

7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.

Wide-area cooperative storage with CFS

Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.

Distributed storage for structured data

Case Study - GFS.

Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Google Distributed System and Hadoop Lakshmi Thyagarajan.

Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc

Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.

Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.

1 The Google File System Reporter: You-Wei Zhang.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.

By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.

Introduction to Hadoop and HDFS

Cloud Computing Cloud Data Serving Systems Keke Chen.

High Throughput Computing on P2P Networks Carlos Pérez Miguel

Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.

Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.

Cassandra - A Decentralized Structured Storage System

Cassandra – A Decentralized Structured Storage System Lecturer : Prof. Kyungbaek Kim Presenter : I Gde Dharma Nugraha.

1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.

Bigtable: A Distributed Storage System for Structured Data 1.

Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,

1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.

Presenters: Rezan Amiri Sahar Delroshan

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Partitioning and Replication.

Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.

GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.

AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?

 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.

Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows,

Bigtable: A Distributed Storage System for Structured Data

GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.

Fault Tolerance (2). Topics r Reliable Group Communication.

Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.

Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cassandra Architecture.

Bigtable A Distributed Storage System for Structured Data.

DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)

Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung

CREATED BY: JEAN LOIZIN CLASS: CS 345 DATE: 12/05/2016

Cassandra - A Decentralized Structured Storage System

Dynamo: Amazon’s Highly Available Key-value Store

CSE-291 (Cloud Computing) Fall 2016

The NoSQL Column Store used by Facebook

CHAPTER 3 Architectures for Distributed Systems

Internet Networking recitation #12

Replication Middleware for Cloud Based Storage Service

The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.

آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95

THE GOOGLE FILE SYSTEM.

Presentation transcript:

Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University Subject : Cassandra - A Decentralized Structured Storage System Professor : Dr. sh.Esmaili The Student’s Identifiers : Mr. Houshyar Mohammadi Talvar(Slides 4 to 17) Miss.Hakimi(Slides 19 to 27) Mr. Hossien Sadrizadeh(Slides 29 to 65) The Date : June 6 th 2012, (On Thursday, 25 th Khordad 1391 ) 1 /66

Contenet Of The Presentation: Abstract Introduction Related Work Data Model API System Arcgitecture Partitioning Replication Membership Bootstrapping Scaling the Cluster Local Persistance Implementation Details Practical Experiences Facebook Inbox Search Conclusion Acknowledgements Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 2 / 66

Mr. Houshyar Mohammadi Talvar Slides From 4 To 17 Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 3 / 66

Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure. Abstract Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 4 / 66

Introduction Facebook runs the largest social networking platform that serves hundreds of millions users at peak times using tens of thousands of servers located in many data centers around the world. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 5 / 66

Related Work Systems like Ficus and Coda replicate files for high availability at the expense of consistency. Update conflicts are typically managed using specialized conflict resolution procedures. Bayou Coda Ficus Dynamo Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 6 / 66

Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 7 / 66

Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 8 / 66

Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 9 / 66

Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 10 / 66

Data Model A table in Cassandra is a distributed multi dimensional map indexed by a key. The value is an object which is highly structured. Cassandra exposes two kinds of columns families Simple column families Super column families Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 11 / 66

KEY ColumnFamily1 Name : MailList Type : Simple Sort : Name Name : tid1 Value : TimeStamp : t1 Name : tid2 Value : TimeStamp : t2 Name : tid3 Value : TimeStamp : t3 Name : tid4 Value : TimeStamp : t4 ColumnFamily2 Name : WordList Type : Super Sort : Time Name : aloha C1 V1 T1 C2 V2 T2 C3 V3 T3 C4 V4 T4 Name : dude C2 V2 T2 C6 V6 T6 Column Families are declared upfront Columns are added and modified dynamically SuperColumns are added and modified dynamically Columns are added and modified dynamically Data Model(Continue) Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 12 / 66

Any column within a column family is accessed using the convention column family : column any column within a column family that is of type super is accessed using the convention column family :super column : column. Data Model(Continue) Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 13 / 66

The Cassandra API consists of the following three simple methods. insert(table; key; rowMutation) get(table; key; columnName) delete(table; key; columnName) API Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 14 / 66

System Architecture The architecture of a storage system that needs to operate in a production setting is complex. In addition to the actual data persistence component, the system needs to have the following characteristics Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 15 / 66

System Architecture(Continue) scalable and robust solutions for load balancing membership and failure detection failure recovery replica synchronization overload handling state transfer concurrency and job scheduling request marshalling request routing system monitoring and alarming configuration management Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 16 / 66

Read Query Closest replica Cassandra Cluster Replica A Result Replica BReplica C Digest Query Result Client Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 17 / 66

Miss. Hakimi Slides From 19 To 27 Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 18 / 66

Partitioning One of the key design features for Cassandra is the ability to scale incrementally. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 19 / 66

01 1/2 F E D C B A N=3 h(key2) h(key1) Partitioning and Replication Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 20 /66

Membership Cluster membership in Cassandra is based on Scuttlebutt, a very efficint anti-entropy Gossip based mechanism. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 21 / 66

Membership Gossip protocol is used for cluster membership. Super lightweight with mathematically provable properties. State disseminated in O(logN) rounds where N is the number of nodes in the cluster. Every T seconds each member increments its heartbeat counter and selects one other member to send its list to. A member merges the list with its own list. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 22 / 66

Failure Detection Failure detection is a mechanism by which a node can locally determine if any other node in the system is up or down. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 23 / 66

Accrual Failure Detector Valuable for system management, replication, load balancing etc. Defined as a failure detector that outputs a value, PHI, associated with each process. Also known as Adaptive Failure detectors - designed to adapt to changing network conditions. The value output, PHI, represents a suspicion level. Applications set an appropriate threshold, trigger suspicions and perform appropriate actions. In Cassandra the average time taken to detect a failure is seconds with the PHI threshold set at 5 Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 24 / 66

Bootstrapping When a node starts for the first time, it chooses a random token for its position in the ring. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 25 / 66

Scaling The Cluster When a new node is added into the system, it gets assigned a token such that it can alleviate a heavily loaded node. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 26 / 66

Local Persistence The Cassandra system relies on the local file system for data persistence. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 27 / 66

Hossien Sadrizadeh Slides From 29 To 65 Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 28 / 66

Implementaion Details The following abstractions are need for Cassandra Process on a Single Machine. Partitioning module. Cluster membership and Failure detection module. Storage engine module. Each of these module has been implemented from the ground using Java. Each of these modules rely on an event driven where the message processing pipeline and the task pipeline are split into multiple stage along the line of the SEDA architecture.(Staged Event-Driven Architecture). Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 29 / 66

SEDA 1 1 SEDA: An Architecture for Well-Conditioned,Scalable Internet Services (Matt Welsh, David Culler, and Eric Brewer) SEDA combines of threads and event-based programming models to manage : Concurrency. I/O. Schedulaing. Resource management needs of Internet services. In SEDA, applications consist of: A network of event-driven stages. Each stage connected by explicit queues. SEDA is intended to support massive concurrency demands and simplify the construction of well- conditioned services. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 30 / 66

SEDA (Continue) Thread Server Design : Each incoming request is dispatched to a separate threads, which processes the request and returns a result to the client. Edges represent control flow between components. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 31 / 66

Routing All system control messages rely on UDP 1 based messageing while the application related messages for replication and request routing relies on TCP 2. The request routing modules are implemented using a certain state machine. 1.UDP : User Datagram Protocol.(a connectionless protocol) 2.TCP : Transfer Control Protocol.(a connection-Oriented protocol) Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 32 / 66

What Happened When a Read/Write Request From a Node In The Cluster ? Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 33 / 66

Partitioning 1 In cassandra, the total data managed by the cluster is represented as a circular space or ring. The ring is divided up into ranges equal to the number of nodes, which each node being responsible for one or more ranges of the overall data. Before a node can join the ring, it must be assigned a token. The token determines the node’s position on the ring and the range of data it is responsible for. 1 Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 34 / 66

Partitioning – Single Data Center( Continue ) A cluster with 4 nodes, the row keys managed by the cluster were numbers in the range of 0 to 100. Each node is assigned a token that represents a point in this range. In this simple example, the token values are 0, 25, 50, and 75. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 35 / 66

Partitioning – Replica Placement( Continue ) In multi-data center deployments, replica placement is calculated per data center. Additional replicas in the same data center are placed by walking the ring clockwise until it reaches the first node in another rack. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 36 / 66

Partitioning – Multi Data Center( Continue ) Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 37 / 66

Partitioning – Multi Data Center( Continue ) The goal is to ensure that the nodes for each data center have token assignments that evenly divide the overall range. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 38 / 66

About Client Request All nodes in cassandra are peers. A client read/write request can go to any node in the cluster. When a client connect to a node and issues a read/write request, that node serves as a proxy the coordinator for that particular operation. The job of the coordinator is to act between the client application and the nodes(replicas)that own the data being requested. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 39 / 66

About Client Request(Continue) The coordinator sends the write request to all replicas that own the row being written. if all replica nodes are up and available. They will get the write regardless of the consistency level specified by the client. The write consistency level determines how many replica nodes must respond with a success acknowledgement in order for the write to be considered successful. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 40 / 66

An Example To Write For example, in a single data center 10 node cluster with a replication factor of 3, an incoming write will go to all 3 nodes that own the requested row. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 41 / 66

Client 9 12 R1 R2 R3 Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 42 / 66

Replication Factor & Replication In Cassandra Replication factor The total number of replicas across the cluster is often referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row, and a replication factor of 2 means two copies of each row. Replication Is the process of storing copies of data on multiple nodes. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 43 / 66

Replication Factor In Cassandra Replication is the process of storing copies of data on multiple nodes. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 44 / 66

Commit Log The cassandra system base on the local file system for data persistance. We have a dedicated disk on each machine for the commit log. The write into the in-memory data structure is performed only after a successful write into the commit log. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 45 / 66

Commit Log & In-Memory Structure The cassandra,first writes data to a commit log(for durability), and then an in-memory table structure called memtable. A write is successful when : 1.First, It is written to the commit log. 2.Second, write in the Memory. Writes are batched in memory and periodically written to disk to a persistent table structure called an SSTable. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 46 / 66

Structure Of Commit Log Every commit log has a header which is basically: A bit vector with fixed size. The size of the bit vector is more than the number of column families. These bit vectors are per commit log and also hold in memory. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 47 / 66

Write Operation Into The Commit Log The write operation into the commit log can either be in normal mode or in fast sync mode. In the fast sync mode the writes to the commit log are buffered.(if the machine is crashed some of data maybe loss). Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 48 / 66

Implementaion The Commit Log(Continue) Traditional databases are not designed to handle high write throughput. Cassandra do writes to disk into sequential writes thus maximize disk write throughput. Since the files dumped to the disk are never changed then no locks need to be taken while reading them.for instance the server of cassandra is practically lockless for read/write operation. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 49 / 66

When Should We Delete The Commit Log? In any logging system, we need a mechanism to purge commit log entries. Question : Is there any different between delete a commit log and delete the entries of commit log ? Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 50 / 66

Implementaion The Index The cassandra system indexes all database on primary key. The data file on disk is broken down into a sequence of blocks. Each block: Contains at most 128 keys. Is demarcated by a block index. The block index capture the relative offset of a key within the block and the size of its data. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 51 / 66

Layout Of a Sample Block Structure of a block and their index demarcated in memory Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 52 / 66

Implementaion The Index(Continue) When an in-memory data structure(block) is dumped to disk a block index is generated and their offsets written out to disk as indics. This index is also hold in memory for fast access. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 53 / 66

What Happened When a Typical Read Is Take Place? Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 54 / 66

What Should We 1 Do When The Number Of Files Are Increased On The Disk ? Over time the number of data files will increase on disk. We perform a compaction process, very much like the Bigtable system. Merges multiple files into one ;essentially merge sort on a cluster of sorted data files. Periodically a compaction process is run to compact all related data files into one big file. 1 The Research team Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 55 / 66

Practical Experiences In the design process of cassandra, we learnt a lot of usefull experience and it is very benefitical for us. We experimented with various implementations of Failure Detectors.if the size of cluster is grown then the time of detected faliure is increased. Most application only require atomic operation per key per replica, but there are some application to do on secondray indexes.(because most developers work on RDBMS). Cassandra is a completely decentralized system(distributed system). Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 56 / 66

Ganglia /ģæŋ.lia/ Old monitoring is not benefit anymore,because the Cassandra system is well integerated with Ganglia (a distributed monitoring tool) 1. Ganglia is a scalable distributed system monitor tool for high-performance computing system such as clusters. The strategy uses a distributed tree structure that enables organizations to monitor an arbitrarily large number of clusters while placing bounds on the required processing load. 1 Matthew L.Massie,Brent N.Chun, and David E, Culler.The Ganglia distributed monitoring system Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 57 / 66

Ganglia( Continue ) Ganglia is comprised of two components: Gmon,local-area monitoring system. Gmeta wide-area system. Ganglia local and wide area monitor interaction. Gmon runs on each cluster node; gmeta can fail over between nodes. Gmon uses UDP multicast. Gmon communicates with its Gmeta counterpart using XML streams sent over TCP connections. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 58 / 66

Facebook Inbox Search what is the matter? Millions of messages are sent everyday on Facebook. Messages stored in different data centers. How to handle indexing all of this information for Inbox search ? Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 59 / 66

Facebook Inbox Search For inbox search we have to make a list of all messages per user that have been exchanged between the sender and recipients. There are two kinds of search features: Term search. Search interaction. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 60 / 66

Term Search / Search Interaction Term search : Key = user ID. Super column = the words that make up the message become. Search interaction : Key = user ID. Super column = the recipients id’s. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 61 / 66

An Actual Example The current system store about 50TB of data on a 150 node cluster. The previous data are spread out between east and west coast data center. Some measure we product them are in the following table. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 62 / 66

What Works Are There To Do On The Future ? The works that we can do them are: Adding compression. Secondary index support. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 63 / 66

Cassandra Goals(Conclusion) High scalability. High performance. Throughput. Response time. High availability. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 64 / 66

Headline Summary All implimentation use of java. Use the UDP anf TCP protocol for routing. Ring mechanism used for clustering. All the nodes in the ring are peers. Use of replication. Use of commit log to persistence files. As use of sequential write we have a high throughput. All files broken into some blocks. It doesn’t use of lock to write/read. Use of compression to compat the files. Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 65 / 66

Now,Please Ask Your Questions ! Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University 66 / 66