Lecture 11: Other NoSql Instructor: Weidong Shi (Larry), PhD

Slides:

Advertisements

Similar presentations

MEMCACHE FOR BIGINNERS

Advertisements

Databases Architectures & Hypertable

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.

Map/Reduce in Practice Hadoop, Hbase, MongoDB, Accumulo, and related Map/Reduce- enabled data stores.

Cassandra Structured Storage System over a P2P Network Avinash Lakshman, Prashant Malik.

Jennifer Widom NoSQL Systems Overview (as of November 2011 )

NoSQL Databases: MongoDB vs Cassandra

Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.

CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.

Distributed storage for structured data

1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.

AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.

NoSQL for the SQL Server Pro

SQL vs NOSQL Discussion

: what’s all the buzz about?

HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.

NOSQL By: Joseph Cooper MIS 409 MIS 409

Modern Databases NoSQL and NewSQL Willem Visser RW334.

Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.

Cloud Computing Clase 8 - NoSQL Miguel Johnny Matias

NoSQL overview 杨振东. An order, which looks like a single aggregate structure in the UI, is split into many rows from many tables in a relational database.

Cassandra - A Decentralized Structured Storage System

Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.

Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.

CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.

NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.

NOSQL DATABASE Not Only SQL DATABASE

Bigtable: A Distributed Storage System for Structured Data

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.

Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.

Introduction to NoSQL Databases Chyngyz Omurov Osman Tursun Ceng,Middle East Technical University.

Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.

Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:

Bigtable A Distributed Storage System for Structured Data.

Cassandra as Memcache Edward Capriolo Media6Degrees.com.

Plan for Final Lecture What you may expect to be asked in the Exam?

CS 405G: Introduction to Database Systems

NoSQL Know Your Enemy Shelly Noll Learning Care Group, Novi, MI

and Big Data Storage Systems

Cloud Computing and Architecuture

Distributed Systems CS 425 / ECE 428 Fall 2012

Cassandra - A Decentralized Structured Storage System

CSE 775 – Distributed Objects Bekir Turkkan & Habib Kaya

Introduction to Cassandra

NoSQL Know Your Enemy Shelly Noll SRT Solutions, Ann Arbor, MI

INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER

CS122B: Projects in Databases and Web Applications Winter 2017

Data and Applications Security Developments and Directions

Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.

NoSQL Know Your Enemy Shelly Noll SRT Solutions, Ann Arbor, MI

CSE-291 (Cloud Computing) Fall 2016

Modern Databases NoSQL and NewSQL

The NoSQL Column Store used by Facebook

Christian Stark and Odbayar Badamjav

Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.

NOSQL databases and Big Data Storage Systems

Hadoop and NoSQL at Thomson Reuters

NoSQL Systems Overview (as of November 2011).

Massively Parallel Cloud Data Storage Systems

آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95

Introduction to Apache

NoSQL Not Only SQL University of Kurdistan Faculty of Engineering

Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper

Transaction Properties: ACID vs. BASE

Introduction to NoSQL Database Systems

Outline Introduction LSM-tree and LevelDB Architecture WiscKey.

NoSQL databases An introduction and comparison between Mongodb and Mysql document store.

Presentation transcript:

Lecture 11: Other NoSql Instructor: Weidong Shi (Larry), PhD COSC6376 Cloud Computing Lecture 11: Other NoSql Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston

Outline Cassandra Memcache

Cassandra

What Cassandra is… Cassandra is a massively scalable, decentralized, structured data store (aka database). A prophetess in Troy during the Trojan War. Her predictions were always true, but never believed.

Cassandra: Big Table/Dynamo Hybrid Originally from Facebook Written by an original Dynamo developer Now an Apache project Facebook now uses Hbase NetFlix is based on Cassandra Written in Java Follows the BigTable data model: column-oriented Uses the Dynamo Eventual Consistency model Uses Apache Thrift as it’s API An interface definition language Define and create services for numerous languages

Thrift Created at Facebook along with Cassandra Is a cross-language, service-generation framework Binary Protocol (like Google Protocol Buffers) Compiles to: C++, Java, PHP, Ruby, Erlang, Perl, ...

Thrift SOAP XML, XML, and more XML CORBA Over designed and Heavyweight Sending requests, getting results Waiting for requests (known location, known port) Communication protocol, Data format SOAP XML, XML, and more XML CORBA Over designed and Heavyweight COM Embraced mainly in Windows Client Software Pillar Slick! But no versioning/abstraction. Protocol Buffers etc Closed source Google deliciousness

Principle Of Operation Define Data types and Service interfaces Create a thrift file eg demo.thrift Thrift Code Generator Tool (written in C++) Build Thrift platform files Demo.php Demo.cpp Demo.py Demo.java Server implements Services and Client calls them Create Server/Client App Run the Server

Projects Using Thrift Cassandra ThriftDB Scribe Hadoop / HBase Facebook

Approaches of influence BigTable Sparse map data model GFS, Chubby, et al Dynamo O(1) distributed hash table (DHT) BASE (aka eventual consistency) Client tunable consistency/availability cassandra ~= bigtable + dynamo

Design Goals High availability Eventual consistency trade-off strong consistency in favor of high availability Incremental scalability Optimistic Replication “Knobs” to tune tradeoffs between consistency, durability and latency Low total cost of ownership Minimal administration

web 2.0 Proven The Facebook stores 150TB of data on 150 nodes used at Twitter, Rackspace, Mahalo, Reddit, Cloudkick, Cisco, Digg, SimpleGeo, Ooyala, OpenX, others

Cassandra Data Model

Typical NoSQL API Basic API access: get(key) -- Extract the value given a key put(key, value) -- Create or update the value given its key delete(key) -- Remove the key and its associated value execute(key, operation, parameters) -- Invoke an operation to the value (given its key) which is a special data structure (e.g. List, Set, Map .... etc).

keyspace column family column Data Model name value clock settings (eg, partitioner) column family settings (eg, type [Std]) column name value clock

Data Model Keyspace ColumnFamily Column Uppermost namespace Typically one per application ColumnFamily Associates records of a similar kind Record-level Atomicity Indexed Column Basic unit of storage

Keyspace ~= database typically one per application some settings are configurable only per keyspace

Column Column: smallest data element, a tuple with a name and a value Each column has 3 parts name determines sort order used in queries Value timestamp long (clock) Here’s a column represented in JSON-ish notation: { // this is a column name: "emailAddress", value: "arin@example.com", timestamp: 123456789 }

Column Family Group records of similar kind Not same kind, because CFs are sparse tables Example: UserProfile = { // this is a ColumnFamily phatduckk: { // this is the key to this Row inside the CF // now we have an infinite # of columns in this row username: "phatduckk", email: "phatduckk@example.com", phone: "(900) 976-6666" }, // end row ieure: { // this is the key to another row in the CF // now we have another infinite # of columns in this row username: "ieure", email: "ieure@example.com", phone: "(888) 555-1212" age: "66", gender: "undecided" }, }

nickname=The Situation Column Family key123 user=eben nickname=The Situation key456 user=alison icon= n= 42 Think of it as hashmap or associative array each row is uniquely identifiable by key

Super Column super columns group columns under a common name A SuperColumn is a tuple with a name & a value which is a map containing an unbounded number of Columns (a map of columns)

Super Column { // this is a SuperColumn name: "homeAddress", // with an infinite list of Columns value: { // note the keys is the name of the Column street: {name: "street", value: "1234 x street", timestamp: 123456789}, city: {name: "city", value: "san francisco", timestamp: 123456789}, zip: {name: "zip", value: "94107", timestamp: 123456789}, } }

Super Column Family A column family can be of type standard or super Standard column family: all the Rows contains a map of normal columns Super column family: each Row contains a map of super columns

super column family column key AddressBook = { // this is a ColumnFamily of type Super phatduckk: { // this is the key to this row inside the Super CF // the key here is the name of the owner of the address book // now we have an infinite # of super columns in this row // the keys inside the row are the names for the SuperColumns // each of these SuperColumns is an address book entry friend1: {street: "8th street", zip: "90210", city: "Beverley Hills", state: "CA"}, // this is the address book entry for John in phatduckk's address book John: {street: "Howard street", zip: "94404", city: "FC", state: "CA"}, Kim: {street: "X street", zip: "87876", city: "Balls", state: "VA"}, Tod: {street: "Jerry street", zip: "54556", city: "Cartoon", state: "CO"}, Bob: {street: "Q Blvd", zip: "24252", city: "Nowhere", state: "MN"}, ... // we can have an infinite # of ScuperColumns (aka address book entries) }, // end row ieure: { // this is the key to another row in the Super CF // all the address book entries for ieure joey: {street: "A ave", zip: "55485", city: "Hell", state: "NV"}, William: {street: "Armpit Dr", zip: "93301", city: "Bakersfield", state: "CA"}, }, } column key

Datamodel explained by example (Twitter)

Example - Twitter

Example - Twitter Supercolumn family

Write Operations A client issues a write request to a random node in the Cassandra cluster. The “Partitioner” determines the nodes responsible for the data. Locally, write operations are logged and then applied to an in-memory version. Commit log is stored on a dedicated disk local to the machine.

Write Operations No locks in the critical path No reads No seeks Memtable No locks in the critical path No reads No seeks Append support Fast Sequential disk access Atomic within a column family ≈ 0.2 ms Commit log Threshold Write SSTable SSTable

D E L E T E D Compaction MERGE SORT K2 < Serialized data > -- K4 < Serialized data > K5 < Serialized data > K10 < Serialized data > -- K1 < Serialized data > K2 < Serialized data > K3 < Serialized data > -- Sorted Sorted Sorted MERGE SORT Index File K1 < Serialized data > K2 < Serialized data > K3 < Serialized data > K4 < Serialized data > K5 < Serialized data > K10 < Serialized data > K30 < Serialized data > Loaded in memory K1 Offset K5 Offset K30 Offset Bloom Filter Sorted Data File

Reads Memtable Bloomfilter field to determine whether a provided key is in the SSTable Index field for quick read Any node Read repair ≈ 15 ms Read Bf Idx Bf Idx SSTable SSTable

Read repair if digests differ Read Operations Client Query Result Cassandra Cluster Read repair if digests differ Closest replica Result Replica A Digest Query Digest Response Digest Response Replica B Replica C

Cassandra and Consistency Talked previous about eventual consistency Cassandra has programmable read/writable consistency One: Return from the first node that responds Quorom: Query from all nodes and respond with the one that has latest timestamp once a majority of nodes responded All: Query from all nodes and respond with the one that has latest timestamp once all nodes responded. An unresponsive node will fail the read

Cassandra and Consistency Zero: Ensure nothing. Asynchronous write done in background Any: Ensure that the write is written to at least 1 node One: Ensure that the write is written to at least 1 node’s commit log and memory table before receipt to client Quorom: Ensure that the write goes to node/2 + 1 All: Ensure that writes go to all nodes. An unresponsive node would fail the write

Architecture

Tombstones “soft delete.” Instead of actually executing a delete SQL statement, the application will issue an update statement that changes a value in a column called something like “deleted”. In Cassandra, it is called a tombstone. When you execute a delete operation, the data is not immediately deleted. Instead, it’s treated as an update operation that places a tombstone on the value. A tombstone is a deletion marker that is required to suppress older data in SSTables until compaction can run.

Hinted Handoff An optimization technique for data write on replicas When a write is made and a replica node for the key is down Cassandra writes a hint to a live replica node That replica node will remind the downed node of changes once it is back on line Hinted Handoff reduce write latency when a replica is temporarily down Hinted Handoff provides high write availability at the cost of consistency A hinted write does NOT count towards Consistency Level requirements for ONE, QUORUM, or ALL

MySQL Comparison MySQL > 50 GB Data Writes Average : ~300 ms Reads Average : ~350 ms Cassandra > 50 GB Data Writes Average : 0.12 ms Reads Average : 15 ms

Lessons Learnt Add fancy features only when absolutely required. Many types of failures are possible. Big systems need proper systems-level monitoring. Value simple designs

Memcache

Memcache Memcache is not a database. Memcache is a distributed cache system. Memcache is not meant for providing any backup support. Its all about simple read and write. Memcache is very fast.

Memcache users LiveJournal Wikipedia Flickr Twitter Youtube Dig Wordpress Craigslist Facebook (around 200 dedicated memcache servers)‏

Memcache Memcache is an in-memory key-value store for small chunks of arbitrary data (strings, objects) > in-memory (volatile) key-value store $memcache->set('unique_key', $value, $flag, $expiration_time); $flag = 0 / MEMCACHE_COMPRESSED to store the item compressed. $expiration_time = 0 (never expire) / 30 (30 seconds) etc. $memcache->get('unique_key'); NOTE: Missing key makes fetch time doubles. > distributed memory caching system (you can use more than one server to cache your data)‏ $memcache->addServer('host1', 11211); $memcache->addServer('host2', 11211); $memcache->addServer('host3', 11211);

What can you store in Memcache? Results of database calls, API calls (xml as string), page rendering (html as string) etc. NOTE: Objects are serialized before being stored to memcache

Caching

Caching Use memcache What can be cached? What are the benefits? In-memory key-value store Distributed What can be cached? Common queries, results of database calls Page rendering (html as string) Sessions What are the benefits? Decreases load on DB Faster response than DB

Caching

Additional NoSQL DBs

What else? MongoDB Voldemort Riak / Basho CouchDB Hibari Virtuoso Many many others! http://nosql.mypopescu.com/ http://en.wikipedia.org/wiki/NoSQL_(concept)

MongoDB Document–oriented All writes and reads are through the master Documents stored as JSON objects All writes and reads are through the master Written in C++ Native Python bindings Simple configuration

MemcacheDB MemcacheD with persistence Uses Memcache API Uses Berkelely DB Master/Slave Read from any slave Write only to the master

HyperTable open source Inspired Google's BigTable runs on top of a distributed file system such as the Apache Hadoop DFS, GlusterFS, or the Kosmos File System (KFS) written almost entirely in C++ Developed in-house at Zvents Inc

Voldemort Dynamo clone by LinkedIn Eventually consistent Multiple versions may be returned on a Get Uses Berkeley DB for persistence Thrift interface Written in java

Datastores Replication Consistency CAP Data Model Range Queries Cassandra Yes Eventual AP Column oriented Hbase Strong CP Hypertable MemcacheDB Key/Value MongoDB Document MySQL Relational Voldemort No Eventual Consistency: Apps can see inconsistent data if they are not careful about choice of R and W Might not see its own writes or successive reads might see a row’s state jump back and forth in time

List of NoSQL databases [122+] Wide Column Store / Column Families HBase, Cassandra, Hypertable, Cloudata, Cloudera, Amazon SimpleDB Document Stores CouchDB, MongoDB, Terrastore, ThruDB, OrientDB, RavenDB, Citrusleaf, SisoDB Key Value / Tuple Store Azure Table Storage, MEMBASE, Riak, Redis, Chordless, GenieDB, Scalaris, Tokyo Cabinet / Tyrant, Keyspace Berkeley DB, MemcacheDB, Faircom C-Tree, Mnesia, LightCloud, Hibari, HamsterDB, STSdb, Pincaster, RaptorDB Eventually Consistent Key Value Stores Amazon Dynamo, Voldemort, Dynomite, KAI Graph Databases Neo4J, Infinite Graph, Sones, InfoGrid, HyperGraphDB, Trinity, AllegroGraph, Bigdata, DEX, OpenLink, Virtuoso, VertexDB, FlockDB Object Databases db4o, Versant, Objectivity, Gemstone, Progress, Starcounter, Perst, Caching, ZODB, NEO, PicoLisp, Sterling More and more databases