Introduction to Cassandra

Slides:



Advertisements
Similar presentations
CASSANDRA-A Decentralized Structured Storage System Presented By Sadhana Kuthuru.
Advertisements

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012 Cassandra concepts, patterns and anti- patterns Dave ApacheCon.
NoSQL Databases: MongoDB vs Cassandra
© 2011 Citrusleaf. All rights reserved.1 A Real-Time NoSQL DB That Preserves ACID Citrusleaf Srini V. Srinivasan Brian Bulkowski VLDB, 09/01/11.
Physical Database Monitoring and Tuning the Operational System.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
A Decentralized Structure Storage Model - Avinash Lakshman & Prashanth Malik - Presented by Srinidhi Katla CASSANDRA.
Distributed storage for structured data
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
1 The Google File System Reporter: You-Wei Zhang.
Introduction & Data Modeling
© 2011 Cisco All rights reserved.Cisco Confidential 1 APP server Client library Memory (Managed Cache) Memory (Managed Cache) Queue to disk Disk NIC Replication.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Introduction to Hadoop and HDFS
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
IMDGs An essential part of your architecture. About me
Cassandra - A Decentralized Structured Storage System
Cassandra – A Decentralized Structured Storage System Lecturer : Prof. Kyungbaek Kim Presenter : I Gde Dharma Nugraha.
Data in the Cloud – I Parallel Databases The Google File System Parallel File Systems.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Keepin it real (time) Cassandra in a real time bidding infrastructure.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Intuitions for Scaling Data-Centric Architectures
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
HDB++: High Availability with
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cassandra Architecture.
Bigtable A Distributed Storage System for Structured Data.
Cassandra as Memcache Edward Capriolo Media6Degrees.com.
Plan for Final Lecture What you may expect to be asked in the Exam?
Cassandra The Fortune Teller
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
CSE-291 (Distributed Systems) Winter 2017 Gregory Kesden
CS 540 Database Management Systems
Cassandra - A Decentralized Structured Storage System
Cassandra Storage Engine
Open Source distributed document DB for an enterprise
CSE-291 Cloud Computing, Fall 2016 Kesden
Microsoft Build /26/2018 2:16 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
CSE-291 (Cloud Computing) Fall 2016
Noah Treuhaft UC Berkeley ROC Group ROC Retreat, January 2002
The NoSQL Column Store used by Facebook
Data Warehouse in the Cloud – Marketing or Reality?
Introduction to NewSQL
Physical Database Design for Relational Databases Step 3 – Step 8
Google Filesystem Some slides taken from Alan Sussman.
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden
Apache Cassandra for the SQLServer DBA
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Working with Very Large Tables Like a Pro in SQL Server 2014
HashKV: Enabling Efficient Updates in KV Storage via Hashing
Data-Intensive Distributed Computing
Predictive Performance
Outline Virtualization Cloud Computing Microsoft Azure Platform
EECS 498 Introduction to Distributed Systems Fall 2017
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Mark Zbikowski and Gary Kimura
CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
H-store: A high-performance, distributed main memory transaction processing system Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex.
Outline Introduction LSM-tree and LevelDB Architecture WiscKey.
Presentation transcript:

Introduction to Cassandra Russ Katz Solutions Engineer(Central) @ DataStax

How do we handle massive transactional data and never go down? Schema Memtables Compaction SStables Commit Log Cluster Architecture Partitioning Replication Gossip Anti-Entropy Hints Confidential

C* What is Cassandra? Distributed Database Individual DBs (nodes) Working in a cluster Nothing is shared Confidential

Why Cassandra? It’s Fast (Low Latency) Confidential

Why Cassandra? It’s Always On Confidential

Why Cassandra? It’s Hugely Scalable (High Throughput) Confidential

Why Cassandra? It is natively multi-data center with distributed data Confidential

Operational Simplicity   San Francisco New York Stockholm OR ©2014 DataStax Confidential. Do not distribute without consent.

Why Cassandra? It has a flexible data model Tables, wide rows, partitioned and distributed Data Blobs (documents, files, images) Collections (Sets, Lists, Maps) UDTs Access it with CQL ← familiar syntax to SQL Confidential

How Does The Database Work? The Basics

It’s a Cluster of Nodes C* Confidential

Each Cassandra node is… read write Fully functional database Very fast (low latency) In-memory and/or persistent storage One machine or Virtual Machine data tables memory disk or SSD data tables Confidential

Why is Cassandra so fast? Low Latency Nodes Distributed Workload read write Writes Durable write to log file (fast) No db file read or lock needed Reads Get data from memory first Optimize storage layer IO No locking or two phase commit data tables memory disk or SSD data tables Confidential

✔ C* A Cassandra Cluster Nodes in a peer-to-peer cluster read or write Nodes in a peer-to-peer cluster No single point of failure Built in data replication Data is always available 100% Uptime Across data centers Failure avoidance C* ✔ Confidential

Multi-data center deployment   C* San Francisco New York Multi-data center deployment Amazon UK Confidential

If a data center goes offline…   Amazon C* C* Your data is always available from other data centers. San Francisco UK New York Confidential

…and recovers automatically   Amazon C* C* San Francisco C* UK New York Confidential

Cassandra Cluster Architecture Each Node ~ box or VM (technically it’s a JVM) Each node has same Cassandra database functionality System/hardware failures happen Snitch – topology (DC and rack) Gossip – state of each node 80 70 10 60 20 50 30 40

Transaction Load Balancing Application driver Driver manages connection pool Driver has load balancing policies that can be applied Each transaction has a different connection point in the cluster Asynch operations are faster 80 70 10 60 20 Data Center 1 50 30 40

Data Partitioning Application driver insert key=‘x’; Driver selects the Coordinator node per operation via load balancing Partitioned token ranges Random distribution Murmur3 No hot spots Each entire row lives on a node hash(key) => token(43) 80 70 10 60 20 Data Center 1 50 30 40

Replication Replication Factor = # of copies Application driver insert key=‘x’; Replication Factor = # of copies All replication operations in parallel “Eventual” = micro- or milliseconds No master or primary node Each node acknowledges the op hash(key) => token(43) 80 70 10 60 20 Data Center 1 50 30 40 replication factor = 3

Multi-Data Center Replication Application driver insert key=‘x’; hash(key) => token(43) 80 81 70 10 71 11 60 20 61 21 Data Center 1 Data Center 2 50 30 51 31 40 41 replication factor = 3 replication factor = 3

Consistency Replication Factor = ? Application Replication Factor = ? Consistency Level = # of nodes to acknowledge an operation read and write “consistency” per operation CL(write) + CL(read) > RF ➔ consistency If needed, tune consistency for performance driver insert key=‘x’; hash(key) => token(43) 80 70 10 60 20 Data Center 1 50 30 40 replication factor = 3

Netflix Replication Experiment

Slow Node Anti-Entropy: Hints Application IFF Node 60 is: Slow enough to timeout ops Down for a short time Then… Node 80 holds the op missed by 60 held ops are called Hints When node 60 becomes responsive Node 80 sends the hints as ops …until node 60 is synchronized Read Repair driver insert key=‘x’; hash(key) => token(43) 80 70 10 60 60 20 Data Center 1 50 30 40 replication factor = 3

What if I need more nodes? Application Data set size is growing Need more TPS Client application demand Hardware limits being reached Looking for lower latency Moving some tables to in-memory driver 80 70 10 60 Data Center 1 20 50 30 40 replication factor = 3

Cluster Expansion: Add Nodes Application Introduce new nodes to cluster Nodes do not yet own any token ranges (data) Cluster operates as normal driver 80 70 10 60 20 Data Center 1 50 40 30 replication factor = 3

Cluster Expansion: Rebalance Application Rebalance = redistribution of token ranges around the cluster Data is streamed in small chunks to synchronize the cluster see “Vnodes” minimizes streaming distributes rebalance workload New nodes begin taking ops Cleanup = reclaims space on nodes dedicated to tokens no longer owned driver 80 72 8 64 16 Data Center 1 56 24 24 48 32 40 40 replication factor = 3

Read and Write path

✔ Write Path write memtable A memtable B memory disk or SSD sstable A INSERT INTO email_users (domain, user, username) VALUES ('@datastax.com',’rreffner', ’rreffner@datastax.com'); memtable A memtable B memory disk or SSD commit log sstable A sstable B v2 sstable B sstable B Compaction Merge sstables Evict tombstones Rebuild indexes Is a background process append fsynch SSTables are immutable Confidential

Memtable Flush During Write Load ID NAME DOB AB1 John Smith 10/11/1972 AB2 Bob Jones 3/1/1964 ZZ3 Mike West 4/22/1968 ID NAME DOB AB1 John Smith 10/11/1972 AB2 Bob Jones 3/1/1964 ID NAME DOB flush memtable memory disk or SSD ID NAME DOB BB1 John Waters 11/11/1974 CB2 Chris Jones 3/16/1964 NN3 Jim West 4/22/1958 ID NAME DOB BB1 John Waters 11/11/1974 CB2 Chris Jones 3/16/1964 NN3 Jim West 4/22/1958 write 1 write 2 write 1 write 2 write 3 ID NAME DOB AB1 John Smith 10/11/1972 AB2 Bob Jones 3/1/1964 ZZ3 Mike West 4/22/1968 commit log SSTables are immutable SSTables ©2014 DataStax Confidential. Do not distribute without consent.

Compaction During Write Load ID NAME DOB AB1 John Smith 10/11/1972 AB2 Bob Jones 3/1/1964 ZZ3 Mike West 4/22/1968 CB2 Chris Jones 3/16/1964 HD9 Jane Doe 8/9/1967 ID NAME DOB BB1 John Waters 11/11/1974 CB2 Chris Jones 3/16/1964 NN3 Jim West 4/22/1958 Merge Sort ID NAME DOB BB1 John Waters 11/11/1974 CB2 Chris Jones 3/16/1964 NN3 Jim West 4/22/1958 ID NAME DOB AB1 John Smith 10/11/1972 AB2 Bob Jones 3/1/1964 ZZ3 Mike West 4/22/1968 memory disk or SSD ID NAME DOB AB1 John Smith 10/11/1972 AB2 Bob Jones 3/1/1964 ZZ3 Mike West 4/22/1968 CB2 Chris Jones 3/16/1964 HD9 Jane Doe 8/9/1967 ID NAME DOB BB1 John Waters 11/11/1974 CB2 Chris Jones 3/16/1964 NN3 Jim West 4/22/1958 ID NAME DOB BB1 John Waters 11/11/1974 CB2 Chris Jones 3/16/1964 NN3 Jim West 4/22/1958 ID NAME DOB AB1 John Smith 10/11/1972 AB2 Bob Jones 3/1/1964 ZZ3 Mike West 4/22/1968 SSTables are immutable SSTables ©2014 DataStax Confidential. Do not distribute without consent.

Compaction And Tombstones ID NAME DOB BB1 John Waters CB2 4/20/1964 NN3 Jim West 4/22/1958 Note: This is a logical representation, not the SSTable physical file format. ID NAME DOB AB1 John Smith 10/11/1972 AB2 Bob Jones ZZ3 Mike West 4/22/1968 CB2 Chris Jones BB1 John Waters 11/11/1974 ID NAME DOB BB1 11/11/1974 CB2 Chris Jones 3/16/1964 NN3 Jim West 4/22/1958 ID NAME DOB AB1 John Smith 10/11/1972 AB2 Bob Jones ZZ3 Mike West 4/22/1968 SSTables are immutable ©2014 DataStax Confidential. Do not distribute without consent.

✔ ✔ ✔ Read Path read row cache memtable A * can be in-memory table SELECT * FROM email_users WHERE domain= '@datastax.com’ AND user = ’rreffner@datastax.com’; memtable A * can be in-memory table bloom filter key cache memtable B OS cache memory disk or SSD commit log sstable A sstable B Confidential

Data Modeling Best Practices Start with your data access patterns What does your app do? What are your query patterns? Optimize for Fast writes and reads at the correct consistency Consider the results set, order, group, filtering Use TTL to manage data aging 1 Query = 1 Table Each SELECT should target a single row (1 seek) Range queries should traverse a single row (efficient)

Denormalization Is Expected Remember: You’re not optimizing for storage efficiency You are optimizing for performance Do this: Forget what you’ve learned about 3rd normal form Repeat after me… “slow is down”, “storage is cheap” Denormalize and do parallel writes! Don’t to this: Client side joins Reads before writes

© 2016 DataStax, All Rights Reserved.

Thank You! Questions? Russ.Katz@Datastax.com