Cassandra – A Decentralized Structured Storage System Lecturer : Prof. Kyungbaek Kim Presenter : I Gde Dharma Nugraha
Outlined Introduction History Data Model System Architecture Cassandra Configuration CQL = Cassandra Query Language Cassandra Driver Practical Example
Introduction Apache Cassandra ™ is a massively scalable open source NoSQL database. Cassandra is perfect for managing large amounts of data across multiple data centers and cloud. Cassandra delivers continuous availability, linear scalability, and operational simplicity across many commodity servers with no Single Point of Failure (SPOF), along with a powerful data model designed for maximum flexibility and fast response times.
Introduction (Cont’d) Cassandra has a “masterless” architecture. Cassandra provides customizable replication, storing redundant copies of data across nodes that participate in a Cassandra ring.
History Cassandra was created to power the Facebook Inbox Search. Facebook open-sourced Cassandra in 2008 and became an Apache Incubator project. In 2010, Cassandra graduated to a top-level project, regular update and releases followed.
General Design Features Emphasis on performance over analysis Still has support for analysis tools such as Hadoop. Organization Rows are organized into tables. First component of a table’s primary key is the partition key. Rows are clustered by the remaining columns of the key. Columns may be indexed separately from the primary key. Tables may be created, dropped, altered at runtime without blocking queries. Language CQL (Cassandra Query Language) introduced, similar to SQL (flattened learning curve).
Data Model Table is a multi dimensional map indexed by key (row key). Columns are grouped into Column Families. 2 Types of Column Families Simple Super (nested Column Families) Each Column has Name Value Timestamp A row is a collection of columns labeled with name.
Data Model keyspace settings column family settings column namevaluetimestamp
Data Model Cassandra Row The value of a row is itself a sequence of key- value pairs. Such nested key-value pairs are column. Key = column name. A row must contain at least 1 column.
Data Model Example of Column
Data Model Key Space A Key Space is a group of column families together. It is only a logical grouping of column families and provides an isolated scope for names.
System Architecture The ring represents a cyclic range of token values (i.e., the token space). Each node is assigned a position on the ring based on its token. Each node communicates with each other node using Gossip protocol. First data written into commit log for data durability. Later data pushed from commit log to memtable, once memtable is full then the data written into sstable (disk) ABCD
System Architecture Important keyword Node The place for store the data. It is the basic infrastructure component of Cassandra. Data Center A collection of related nodes. A data center can be a physical data center or virtual data center. Cluster A cluster contains one or more data centers. It can span physical locations. Commit log All data is written first to the commit log for durability. After all its data has been flushed to SSTables, it can be archived, deleted or recycled. Table A collection ordered column fetched by row. A row consists of columns and have a primary key. The first part of the key is a column name. SSTable A sorted string table (SSTable) is an immutable data file to which Cassandra writes memtables periodically. SSTables are append only and stored on disk sequentially and maintained for each Cassandra table.
System Architecture Involve: Partitioning How Data is partitioned across nodes. Replication How Data is duplicated across nodes. Cluster Membership How nodes are added, deleted to the cluster
System Architecture Partitioning Nodes are logically structured in Ring Topology. Hashed value of key associated with data partition is used to assign it to a node in the ring. Hashing rounds off after certain value to support ring structure. Cassandra has 3 type of partition Murmur3Partitioner RandomPartitioner ByteOrdererPartitioner
System Architecture Replication Each data item is replicated at N (replication factor) nodes. Different Replication Policies Rack Unaware – replicate data at N-1 successive nodes after its coordinator. Rack Aware – uses ‘Zookeeper’ to choose a leader which tells nodes the range they are replicas for. Datacenter Aware – similar to Rack Aware but leader is chosen at Datacenter level instead of Rack Level.
System Architecture Gossip Protocol Network Communication protocols inspired for real life rumour spreading. Periodic, Pairwise, inter-node communication. Low frequency communication ensures low cost. Random selection of peers. Example – Node A wish to search for pattern in data Round 1 – Node A searches locally and then gossips with node B. Round 2 – Node A,B gossips with C and D. Round 3 – Nodes A,B,C and D gossips with 4 other nodes …… Round by round doubling makes protocol very robust.
System Architecture Cluster Membership Uses Scuttleback (a Gossip protocol) to manage nodes. Uses gossip for node membership and to transmit system control state. Node Fail state is given by variable ‘phi’ which tells how likely a node might fail (suspicion level) instead of simple binary value (up/down). This type of system is known as Accrual Failure Detector.
System Architecture Accrual Failure Detector If a node is faulty, the suspicion level monotonically increases with time. Φ(t) k as t k Where k is a threshold variable (depends on system load) which tells a node is dead. If node is correct, phi will be constant set by application. Generally Φ(t) = 0
System Architecture Local Persistence Relies on local file system for data persistency. Write operations happens in 2 steps Write to commit log in local disk of the node Update in-memory data structure. Read operation Looks up in-memory ds first before looking up files on disk. Uses Bloom Filter (summarization of keys in file store in memory) to avoid looking up files that do not contain the key.
System Architecture Write Path
System Architecture Read Path
System Architecture Example write and read process. Data Model
System Architecture Write Process
System Architecture Replication Process
System Architecture Read Process
Cassandra Configuration Key components for configuring Cassandra Gossip A peer-to-peer communication protocol to discover and share location and state information about the other nodes in a cluster. Gossip information is also persisted locally by each node to use immediately when a node restarts. Partitioner A partitioner determines how to distribute the data across the nodes in the cluster and which node to place the first copy of data on. Replication factor The total number of replicas across the cluster.
Cassandra Configuration Key component for configuring Cassandra Replica placement strategy Cassandra stores copies (replicas) of data on multiple nodes to ensure reliability and fault tolerance. Snitch Defines groups of machines into data centers and racks (the topology) that the replication strategy uses to place replicas. The cassandra.yaml configuration file The main configuration file for setting the initialization properties for a cluster, caching parameters for tables, properties for tuning and resource utilization, timeout settings, client connections, backups and security.
CQL = Cassandra Query Language Default and primary interface into the Cassandra DBMS. Provide SQL-like command. CQL and SQL share the same abstract idea of a table constructed of tables and rows. The main difference from SQL is that CQL does not support joins or subqueries. Run cqlsh in terminal window. The command is inside bin directory.
CQL = Cassandra Query Language Creating and updating a keyspace Cassandra keyspace is a namespace that defines how data is replicated on nodes. To create a keyspace: cqlsh> CREATE KEYSPACE demodb WITH REPLICATION = {‘class’ : ‘SimpleStrategy’, ‘replication_factor’ : 1}; To update a keyspace: cqlsh>ALTER KEYSPACE demodb WITH REPLICATION = {‘class’ : ‘NetworkTopologyStrategy’, ‘replication_factor’ : 2}; To use namespace: Cqlsh>USE demodb;
CQL – Cassandra Query Language Creating Tables: CREATE TABLE users( varchar, bio varchar, birthday timestamp, active boolean, PRIMARY KEY ( ));
CQL – Cassandra Query Language Inserting Data: **timestamp fields are specified in milliseconds since epoch. INSERT INTO users ( , bio, birthday, active) VALUES ‘RoomMate’, ‘ , true);
CQL – Cassandra Query Language Querying Tables: SELECT expression reads one or more records from Cassandra column family and returns a result-set of rows. SELECT * FROM users; SELECT FROM users WHERE active = true;
Cassandra Driver To connect with programming language, Cassandra provide driver package. The programming language that supported by Cassandra Drivers are : C# Java Node.js Python URL Cassandra driver download:
Reference Lakshman, Avinash, and Prashant Malik. "Cassandra: a decentralized structured storage system." ACM SIGOPS Operating Systems Review 44.2 (2010): Hewitt, Eben. Cassandra: the definitive guide. O'Reilly Media, /cassandra/gettingStartedCassandraIntro.html 1/cassandra/gettingStartedCassandraIntro.html ql_intro_c.html ql_intro_c.html ava-driver/2.1/java-driver/whatsNew2.html ava-driver/2.1/java-driver/whatsNew2.html apache-cassandra-and-java/
Installation guide and practical example.