Apache Cassandra for the SQLServer DBA

Slides:



Advertisements
Similar presentations
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Advertisements

Primitive Data Types There are a number of common objects we encounter and are treated specially by almost any programming language These are called basic.
Day 3 - Basics of MySQL What is MySQL What is MySQL How to make basic tables How to make basic tables Simple MySQL commands. Simple MySQL commands.
Dictionaries and Hash Tables1  
Hash Tables1 Part E Hash Tables  
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2010 All Rights Reserved. 1.
1 Nassau Community CollegeProf. Vincent Costa Acknowledgements: Introduction to Database Management, All Rights ReservedIntroduction to Database Management.
Distributed storage for structured data
Database Design (for IQ-M). Introduction This section has been re-vamped for the course I have removed all the design bits that are not absolutely.
DATABASES AND SQL. Introduction Relation: Relation means table(data is arranged in rows and columns) Domain : A domain is a pool of values appearing in.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Introduction & Data Modeling
HAP 709 – Healthcare Databases SQL Data Manipulation Language (DML) Updated Fall, 2009.
Microsoft Access – Tutorial 2 Designing Databases In this tutorial, we will create a new database create a new table import tables from an existing database.
Cassandra - A Decentralized Structured Storage System
Cassandra – A Decentralized Structured Storage System Lecturer : Prof. Kyungbaek Kim Presenter : I Gde Dharma Nugraha.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
SQL Fundamentals  SQL: Structured Query Language is a simple and powerful language used to create, access, and manipulate data and structure in the database.
Indexes / Session 2/ 1 of 36 Session 2 Module 3: Types of Indexes Module 4: Maintaining Indexes.
Visual Programing SQL Overview Section 1.
SQL Server 2005: Extending the Type System with XML.
HDB++: High Availability with
Introduction to SQL. Relational databases  Practical level: A data storage that can effectively been queried with SQL query language  A database consists.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
SQL Introduction SQL stands for “Structured Query Language” and can be pronounced as “SQL” or “sequel – (Structured English.
N5 Databases Notes Information Systems Design & Development: Structures and links.
Understanding Microsoft Excel
3 A Guide to MySQL.
Plan for Final Lecture What you may expect to be asked in the Exam?
and Big Data Storage Systems
Understanding Microsoft Excel
Column-Based.
Cassandra - A Decentralized Structured Storage System
HBase Mohamed Eltabakh
Cassandra Storage Engine
Managing Tables, Data Integrity, Constraints by Adrienne Watt
INLS 623– Database Systems II– File Structures, Indexing, and Hashing
File Format Benchmark - Avro, JSON, ORC, & Parquet
Lecturer : Dr. Pavle Mogin
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
SQL Implementation & Administration
Data Definition and Data Types
CSE-291 (Cloud Computing) Fall 2016
The NoSQL Column Store used by Facebook
NOSQL databases and Big Data Storage Systems
Designing Tables for a Database System
Searching.
SQL Server May Let You Do It, But it Doesn’t Mean You Should
Understanding Microsoft Excel
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Instant Add Columns in MySQL
STRUCTURED QUERY LANGUAGE
1 Demand of your DB is changing Presented By: Ashwani Kumar
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
Dictionaries and Hash Tables
Hashing.
Coding Concepts (Data Structures)
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Dictionaries and Hash Tables
Four Rules For Columnstore Query Performance
Contents Preface I Introduction Lesson Objectives I-2
Displaying and Editing Data
Chapter 2: Creating And Modifying Database Tables
Understanding Microsoft Excel
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
L L Line CSE 420 Computer Games Lecture #4 Working with Data.
Database Instructor: Bei Kang.
Dictionaries and Hash Tables
Presentation transcript:

Apache Cassandra for the SQLServer DBA

It’s all about me...

Case studies

History Originally developed at Facebook for its Inbox Search feature Released to Google Code as open source in 2006 Became an Apache Incubator project in 2008 Top-level project in 2010 Named after a character from geek Greek mythology (something about a curse on an Oracle)

CAP theorem Cassandra is “AP” Not to say that C isn’t possible Consistency is ‘tune-able’ Consistency Availability Partition tolerance 5

Peer-to-peer Cluster Nodes Token Ranges / vNodes 6

Token Assignment Token ranges

Availability / Tolerance 1 - 20 21 - 40 41 - 60 61 - 80 81 - 100 24 F 78 K 24 F RF = 3 78 K 24 F 78 K

Consistency Levels ALL – All replicas must respond ANY – Closest available replica ONE – Closest available repl… wait….what ??!?? QUORUM – for RF3, 2 out of 3 LOCAL_QUORUM – for RF3, 2 out of…. wait… not again ??!!?? …plus some others that are less easy to describe…. ANY is for write consistency, ONE is for read.

Consistency ALL A ! A ! 32 = A ? A !

Consistency ONE A ! A ! 32 = A? A !

Consistency QUORUM B ! A ! 32=B ? B !

Write path Commit Log Data MemTable SSTable

Write path 1 A 2 B 4 c 1 A 2 B 3 A 4 c 1 A 2 B 3 A 5 F 4 c 5 F 5 F 4 c

Compaction Worth mentioning here, so I have Quite complex to cover in less than an hour Optimises SStables based on Level, Size-tier (default), or Time- Window strategies Has a major impact on your storage requirements It happens…

Demo – Creating a keyspace

Design for speed Table per query Partition per query

CQL “SELECT * FROM ?? WHERE x = “ INSERT UPDATE DELETE …it’s quite simple, really…

Simple datatypes 17 data types, 12 of which are most common CQL Type Constants Description bigint integers 64-bit signed long counter Distributed counter value (64-bit long) float integers, floats 32-bit IEEE-754 floating point Java type int 32-bit signed integer text strings UTF-8 encoded string timestamp integers, strings Date plus time, encoded as 8 bytes since epoch (01/01/1970) timeuuid uuids Type 1 UUID only uuid A UUID in standard UUID format varchar list n/a A collection of one or more ordered elements map A JSON-style array of literals: { literal : literal, literal : literal ... } set A collection of one or more elements

Consider this table… CREATE TABLE pubs (id INT, title TEXT, author TEXT, genre TEXT, published INT, PRIMARY KEY (id) );

Cassandra data layout Cells Key Value Partitions Partition Key

Primary Key determines partition key PRIMARY KEY (id)

Composite Partition Keys CREATE TABLE pubs_by_title_year (title TEXT, author TEXT, genre TEXT, published INT, PRIMARY KEY ((title, published)) ); Note double brackets on PRIMARY KEY – we’ll come on to that later. 25

Clustering columns CREATE TABLE pubs_by_year_title (published INT, title TEXT, author TEXT, genre TEXT, id INT, PRIMARY KEY ((published), title) ); Clustering columns need to be part of the Primary Key to be searchable, but not part of the Partition Key. They divide the rows between the partitions. 26

Effect on layout 2014 I did DBCC you coming : author I did DBCC you coming : Genre O. DeeBeaSea Horror   Slow train to SOS_Scheduler : author Slow train to SOS_Scheduler : Genre R.T. Fishall Humour 2015 Mamma Mia ! That index… !! : author Mamma Mia ! That index… !! : Genre Richard Munn Horror   Plan the plan's planny plan : author Plan the plan's planny plan : Genre A. Noob Management

Demo – creating tables and adding data

Things to remember INSERT vs. UPDATE Tombstones Garbage collection removes tombstoned records – default interval 10 days. Cleaned up on compaction Hinted hand-off Anti-entropy operations (read-repair)

Things you can’t do JOIN * Use Foreign Keys Implement 3rd Normal form Actually, throw away any thought of normalisation Add additional indexes * Create Stored Procedures – use BATCH instead Maths (except min, max, avg, sum, and count) Use non-key columns as search predicates

Demo - Let’s do some shenanigans….

exit;