Cassandra and Sigmod contest Cloud computing group Haiping Wang 2009-12-19.

Slides:



Advertisements
Similar presentations
Chen Zhang Hans De Sterck University of Waterloo
Advertisements

Disk Storage, Basic File Structures, and Hashing
Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Hashing.
Symbol Table.
Dynamo: Amazon’s Highly Available Key-value Store
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Dynamo: Amazon’s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat.
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
AMAZON’S KEY-VALUE STORE: DYNAMO DeCandia,Hastorun,Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels: Dynamo: Amazon's highly available.
Cassandra Structured Storage System over a P2P Network Avinash Lakshman, Prashant Malik.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Spark: Cluster Computing with Working Sets
NoSQL Databases: MongoDB vs Cassandra
BTrees & Bitmap Indexes
CS 582 / CMPE 481 Distributed Systems
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Distributed Systems Fall 2011 Gossip and highly available services.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Dynamo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as well as related cloud storage implementations.
Distributed storage for structured data
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Cloud Storage – A look at Amazon’s Dyanmo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as.
Dynamo: Amazon’s Highly Available Key-value Store Presented By: Devarsh Patel 1CS5204 – Operating Systems.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Dynamo: Amazon’s Highly Available Key-value Store COSC7388 – Advanced Distributed Computing Presented By: Eshwar Rohit
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Cassandra-A Decentrilized Structured Storage System Azad Kurdistan University Subject : Cassandra - A Decentralized Structured Storage System Professor.
Cloud Computing Cloud Data Serving Systems Keke Chen.
Dynamo: Amazon's Highly Available Key-value Store Dr. Yingwu Zhu.
Dynamo: Amazon’s Highly Available Key-value Store DeCandia, Hastorun, Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels PRESENTED.
Dynamo: Amazon’s Highly Available Key-value Store
Cassandra - A Decentralized Structured Storage System
DATA STRUCTURE & ALGORITHMS (BCS 1223) CHAPTER 8 : SEARCHING.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
Peer to Peer Networks Distributed Hash Tables Chord, Kelips, Dynamo Galen Marchetti, Cornell University.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.
Bigtable: A Distributed Storage System for Structured Data
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Chapter 5 Record Storage and Primary File Organizations
© 2006 Pearson Addison-Wesley. All rights reserved15 A-1 Chapter 15 External Methods.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cassandra Architecture.
Big Data Yuan Xue CS 292 Special topics on.
Kitsuregawa Laboratory Confidential. © 2007 Kitsuregawa Laboratory, IIS, University of Tokyo. [ hoshino] paper summary: dynamo 1 Dynamo: Amazon.
Bigtable A Distributed Storage System for Structured Data.
Plan for Final Lecture What you may expect to be asked in the Exam?
Cassandra - A Decentralized Structured Storage System
Module 11: File Structure
Introduction to Cassandra
CSE-291 Cloud Computing, Fall 2016 Kesden
MongoDB Distributed Write and Read
Dynamo: Amazon’s Highly Available Key-value Store
CSE-291 (Cloud Computing) Fall 2016
Noah Treuhaft UC Berkeley ROC Group ROC Retreat, January 2002
The NoSQL Column Store used by Facebook
Presentation transcript:

Cassandra and Sigmod contest Cloud computing group Haiping Wang

Outline Cassandra Cassandra overview Data model Architecture Read and write Sigmod contest 2009 Sigmod contest 2010

Cassandra overview Highly scalable, distributed Eventually consistent Structured key-value store Dynamo + bigtable P2P Random reads and random writes Java

Data Model KEY ColumnFamily1 Name : MailList Type : Simple Sort : Name Name : tid1 Value : TimeStamp : t1 Name : tid2 Value : TimeStamp : t2 Name : tid3 Value : TimeStamp : t3 Name : tid4 Value : TimeStamp : t4 ColumnFamily2 Name : WordList Type : Super Sort : Time Name : aloha ColumnFamily3 Name : System Type : Super Sort : Name Name : hint1 Name : hint2 Name : hint3 Name : hint4 C1 V1 T1 C2 V2 T2 C3 V3 T3 C4 V4 T4 Name : dude C2 V2 T2 C6 V6 T6 Column Families are declared upfront Columns are added and modified dynamically SuperColumns are added and modified dynamically Columns are added and modified dynamically

Cassandra Architecture

Cassandra API Data structures Exceptions Service API ConsistencyLevel(4) Retrieval methods(5) Range query: returns matching keys(1) Modification methods(3) Others

Cassandra commands

Partitioning and replication(1) Consistent hashing DHT Balance Monotonicity Spread Load Virtual nodes Coordinator Preference list

01 1/2 F E D C B A N=3 h(key2) h(key1) 9 Partitioning and replication(2)

Data Versioning Always writeable Mulitple versions – put() return before all replicas – get() many versions Vector clocks Reconciliation during reads by clients

Vector clock List of (node, counter) pairs E.g. [x,2][y,3] vs. [x,3][y,4][z,1] [x,1][y,3] vs. [z,1][y,3] Use timestamp E.g. D([x,1]:t1,[y,1]:t2) Remove the oldest version when reach a thresthold

Vector clock Return all the objects at the leaves D3,4([Sx,2],[Sy,1],[Sz,1]) Single new version

Excution operations Two strategies – A generic load balancer based on load balance Easy,not have to link any code specific – Directory to the node Achieve lower latency

Put() operation client coordinator PN-1 P2 P1 w-1 responses Object with vector clock

Cluster Membership Gossip protocol State disseminated in O(logN) rounds Increase its heartbeat counter and send its list to another every T seconds Merge operations

Failure Data center(s) failure – Multiple data centers Temporary failure Permanent failure – Merkle tree

Temporary failure

Merkle tree

Boolom filter a space-efficient probabilistic data structure used to test whether an element is a member of a set false positive

Compactions K1 K2 K3 -- Sorted K2 K10 K30 -- Sorted K4 K5 K10 -- Sorted MERGE SORT K1 K2 K3 K4 K5 K10 K30 Sorted K1 Offset K5 Offset K30 Offset Bloom Filter Loaded in memory Index File Data File D E L E T E D

Write Key (CF1, CF2, CF3) Commit Log Binary serialized Key ( CF1, CF2, CF3 ) Memtable ( CF1) Memtable ( CF2) Data size Number of Objects Lifetime Dedicated Disk --- BLOCK Index Offset, Offset K 128 Offset K 256 Offset K 384 Offset Bloom Filter (Index in memory) Data file on disk

Read Query Closest replica Cassandra Cluster Replica A Result Replica BReplica C Digest Query Digest Response Result Client Read repair if digests differ

Outline Cassandra Cassandra overview Data model Architecture Read and write Sigmod contest 2009 Sigmod contest 2010

Sigmod contest 2009 Task overview API Data structure Architecture Test

Task overview Index system for main memory data Running on multi-core machine Many threads with multiple indices Serialize execution of user-specified transactions Basic function exact match queries,range queries, updates inserts, deletes

API

Record

HashTable

HashShared

TxnState

IdxState Keep track of an index Created openIndex() Destroyed closeIndex() Inherited by IdxStateType Contains pointers pointing to – a hashtable – a FixedAllocator – a Allocator – a array with the type of action

Architecture

IndexManager

DeadLockDetector

Transactor a HashOnlyGet object with type TxnState

Allocator Allocate the memory for the payloads Use pools and linked list Pool sized --the max length of payload is 100 The payloads with the same payload are in the same list

Unit Tests three threads, run over three indices the primary thread – create the primary index – inserts, deletes and accesses data in the primary index the second thread – simultaneously runs some basic tests over a separate index the third thread – ensure the transactional guarantees – Continuously queries the primary index

Outline Cassandra Cassandra overview Data model Architecture Read and write Sigmod contest 2009 Sigmod contest 2010

Task overview Implement a simple distributed query executor with the help of the in-memory index Given centralized query plans, translate them into distributed query plans Given a parsed SQL query, return the right results Data stored on disk, the indexes are all in memory Measure the total time costs

SQL query form SELECT alias_name.field_name,... FROM table_name AS alias_name,… WHERE condition1 AND... AND conditionN Condition alias_name.field_name = fixed value alias_name.field_name > fixed value alias_name.field_name1 =alias_name.field_name2

Initialization phase

Connection phase

Query phase

Closing phase

Tests An initial computation On synthetic and real-world datasets Tested on a single machine Tested on an ad-hoc cluster of peers Passed a collection of unit tests, provided with an Amazon Web Services account of a 100 USD value

Benchmarks(stag1) Assume a partition always cover the entire table, the data is not replicated. Unit-tests Benchmarks – On a single node, selects with an equal condition on the primary key – On a single node, selects with an equal condition on an indexed field – On a single node, 2 to 5 joins on tables of different size – On a single node, 1 join and a "greater than" condition on an indexed field – On three nodes, one join on two tables of different size, the two tables being on two different nodes

Benchmarks(stag2) Tables are now stored on multiple nodes Part of a table, or the whole table may be replicated on multiple nodes Queries will be sent in parallel up to 50 simultaneous connections Benchmarks – Selects with an equal condition on the primary key, the values being uniformly distributed – Selects with an equal condition on the primary key, the values being non- uniformly distributed – Multiple joins on tables separated on different nodes

Important Dates

Thank you!!!