Cassandra as Memcache Edward Capriolo Media6Degrees.com.

Cassandra as Memcache Edward Capriolo Media6Degrees.com

What we learned in Operating Systems ➲ CPU (and registers) - Super FAST! ➲ Main Memory - Fast ➲ Hard Disks - Slow

What has changed since my first computer ➲ 100 MHZ ➲ 8 MB RAM ➲ 1 GB Disk ➲ 14.4kbps Modem ➲ 686 Windowz 3.11 ➲ Packard Bell ➲ Multiple Cores ➲ @ 4GHZ ➲ 2GB RAM ➲ 2TB Disk ➲ 1/10Gb Ethernet ➲ 64 bit FC 14 ➲ Sadly no more Packard bell

The Present Situation ➲ Computers are not and never will be fast or big enough ➲ Until they take over and then they will be too fast and too big

Traditional two tier Web Application ➲ User facing tier ● Usually Apache|Tomcat|... ● Speaks some CGI alternative php|jsp|cfm|... ● Logging ● Display ➲ Back end ● Usually an RDBMS ● Stores and indexes data ● Supports a data abstraction and manipulation language

Simple Schema ➲ create table user ( ➲ id int auto_increment, name varchar UNIQUE, ➲ pass varchar ➲ ) ➲ create table book ( id int auto_increment, name varchar 25 unique, author varchar 25 ➲ ) ➲ Create table users_books ( uid int, bid int, unique (uid,bid), index (bid) ➲ )

Some Queries you might see (user login) ➲ Select id,pass from users where user.name=? ➲ Totally random queries based on user login ➲ Not often read - may not be helpful to cache

Some queries you might see (Books a user has read) ➲ Select user.name, book.name ➲ FROM user JOIN users_books ➲ ON user.id=users_books.uid ➲ JOIN book ON book.id=bid ➲ WHERE user.id=? ➲ More complex query ➲ Two join conditions ➲ Result might be on users start page ➲ Result might be often used by algorithms

Some queries you might see (count all the read books) ➲ Select user_books.bid, book.name, count(*) from user_books inner join books on user_books.bid=book.id group by user_books.bid, book.name ➲ No where clause! ➲ Possible table scan ➲ Possible intermediate results to temp file ➲ Result displayed on main index page

How fast are these queries? ➲ Trick question! ➲ How much data? ● The Log-O for 'small' data sets is negligible ➲ How fast are the disks? ● Streaming much faster then seeking* ➲ How many QPS? ● More requests means more contention ➲ How much RAM? ● Unallocated RAM works as page cache...

Wait..Page Cache... what? ➲ Virtual File System or VFS cache ➲ RAM not in use by a process ➲ Used to Cache Disk ➲ Blocks read often get cached in RAM ➲ large disk to RAM ratio reduces hit chance

Scaling RDBMS challenges ➲ Scaling up ● More RAM, DISK ● Upper limit ➲ Adding Slaves ● Add read capacity ● Does not add write capacity ● Monitoring/fixing replication ➲ Shard-ed ● Possibly giving up DB features ● Re-shard with growth

Enter Memcache ➲ Key value store with no persistence* ➲ Works with memory slabs ➲ Set a key, value, and a Time To Live ➲ Typically client controlled sharing ➲ Normal Use Case ● Check cache ● If found in cache return ● Else query and save in cache ➲ Save resource by not re-querying mostly static, non transactional, and non time sensitive data

Memcache...Good Things ➲ More control of cache then VFS cache ➲ Saves web server memory vs HttpSession ➲ Fast to store and access data ➲ Simple to use ➲ Clients for many languages

Memcache (possibly not so good things) ➲ Memcache empty on shutdown ➲ 8GB hash table better then 8GB more in your database machine? ➲ Another tier to manage ➲ Is it scalable?...

A highly un-suggested deployment

Enter Cassandra... ➲ Data sharding and replication ➲ Writing ● Structured log format ● Linear Writes to sorted memtable ● Memtables flush (time,size,ops) ➲ Reading ● VFS Cache ● Bloom filters ● Row Cache ● Key Cache ➲ 0.7.X brings TTL fields!

So then... Cassandra is faster then memcache? ➲ No! ● Memcache is an in memory datastore ● Cassandra has to persist data ➲ But may be faster, more efficient, and easier to manage then separate memcache + database tier

Configuration 1: Defacto Standard ➲ 5 Nodes ➲ Replication Factor = 3 ➲ Key Cache ➲ Results in: ● Good Performance ● Strong consistency ● Highly fault tolerant

Configuration 2: Do not care about stale reads ➲ 5 nodes ➲ Replication Factor = 3 ➲ Row cache ➲ Read Repair Chance = 0 % ➲ Results in: ● 1/3 rd the read traffic ● Minor possibility of not found/out of sync data (not much different then memcache)

Configuration 3: Snitches get stitches ➲ 5 nodes ➲ Replication Factor = 3 ➲ Row Cache ➲ Read Repair Chance = 0% ➲ Dynamic Snitches + Pinning ➲ Results in: ● Reads should hit the same node not random replica ● Caches on each node have less duplication

Configuration 4: Little Data, Big Request load! ➲ 20 nodes ➲ Replication Factor 20! (only this keyspace) ➲ Row Cache ➲ Read Repair Chance = 0% ➲ Results in: ● 20 nodes capable of serving this reads! ● Writes do not scale (like master-slave replication)

To recap... Cassandra ➲ 0.7.X brings Time To Live ➲ 0.7.X brings Read Repair Chance ➲ Can serve purely from memory ➲ Can serve from disk ➲ Replication Factor, Caching, Sharding many ways to tune ➲ General Awesomeness

Cassandra as Memcache Edward Capriolo Media6Degrees.com.

Similar presentations

Presentation on theme: "Cassandra as Memcache Edward Capriolo Media6Degrees.com."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cassandra as Memcache Edward Capriolo Media6Degrees.com.

Similar presentations

Presentation on theme: "Cassandra as Memcache Edward Capriolo Media6Degrees.com."— Presentation transcript:

Similar presentations

About project

Feedback