Download presentation
Presentation is loading. Please wait.
Published byJackeline Edmundson Modified over 10 years ago
1
Cassandra and Sigmod contest Cloud computing group Haiping Wang 2009-12-19
2
Outline Cassandra Cassandra overview Data model Architecture Read and write Sigmod contest 2009 Sigmod contest 2010
3
Cassandra overview Highly scalable, distributed Eventually consistent Structured key-value store Dynamo + bigtable P2P Random reads and random writes Java
4
Data Model KEY ColumnFamily1 Name : MailList Type : Simple Sort : Name Name : tid1 Value : TimeStamp : t1 Name : tid2 Value : TimeStamp : t2 Name : tid3 Value : TimeStamp : t3 Name : tid4 Value : TimeStamp : t4 ColumnFamily2 Name : WordList Type : Super Sort : Time Name : aloha ColumnFamily3 Name : System Type : Super Sort : Name Name : hint1 Name : hint2 Name : hint3 Name : hint4 C1 V1 T1 C2 V2 T2 C3 V3 T3 C4 V4 T4 Name : dude C2 V2 T2 C6 V6 T6 Column Families are declared upfront Columns are added and modified dynamically SuperColumns are added and modified dynamically Columns are added and modified dynamically
5
Cassandra Architecture
6
Cassandra API Data structures Exceptions Service API ConsistencyLevel(4) Retrieval methods(5) Range query: returns matching keys(1) Modification methods(3) Others
7
Cassandra commands
8
Partitioning and replication(1) Consistent hashing DHT Balance Monotonicity Spread Load Virtual nodes Coordinator Preference list
9
01 1/2 F E D C B A N=3 h(key2) h(key1) 9 Partitioning and replication(2)
10
Data Versioning Always writeable Mulitple versions – put() return before all replicas – get() many versions Vector clocks Reconciliation during reads by clients
11
Vector clock List of (node, counter) pairs E.g. [x,2][y,3] vs. [x,3][y,4][z,1] [x,1][y,3] vs. [z,1][y,3] Use timestamp E.g. D([x,1]:t1,[y,1]:t2) Remove the oldest version when reach a thresthold
12
Vector clock Return all the objects at the leaves D3,4([Sx,2],[Sy,1],[Sz,1]) Single new version
13
Excution operations Two strategies – A generic load balancer based on load balance Easy,not have to link any code specific – Directory to the node Achieve lower latency
14
Put() operation client coordinator PN-1 P2 P1 w-1 responses Object with vector clock
15
Cluster Membership Gossip protocol State disseminated in O(logN) rounds Increase its heartbeat counter and send its list to another every T seconds Merge operations
19
Failure Data center(s) failure – Multiple data centers Temporary failure Permanent failure – Merkle tree
20
Temporary failure
21
Merkle tree
22
Boolom filter a space-efficient probabilistic data structure used to test whether an element is a member of a set false positive
23
Compactions K1 K2 K3 -- Sorted K2 K10 K30 -- Sorted K4 K5 K10 -- Sorted MERGE SORT K1 K2 K3 K4 K5 K10 K30 Sorted K1 Offset K5 Offset K30 Offset Bloom Filter Loaded in memory Index File Data File D E L E T E D
24
Write Key (CF1, CF2, CF3) Commit Log Binary serialized Key ( CF1, CF2, CF3 ) Memtable ( CF1) Memtable ( CF2) Data size Number of Objects Lifetime Dedicated Disk --- BLOCK Index Offset, Offset K 128 Offset K 256 Offset K 384 Offset Bloom Filter (Index in memory) Data file on disk
25
Read Query Closest replica Cassandra Cluster Replica A Result Replica BReplica C Digest Query Digest Response Result Client Read repair if digests differ
26
Outline Cassandra Cassandra overview Data model Architecture Read and write Sigmod contest 2009 Sigmod contest 2010
27
Sigmod contest 2009 Task overview API Data structure Architecture Test
28
Task overview Index system for main memory data Running on multi-core machine Many threads with multiple indices Serialize execution of user-specified transactions Basic function exact match queries,range queries, updates inserts, deletes
29
API
30
Record
31
HashTable
32
HashShared
33
TxnState
34
IdxState Keep track of an index Created openIndex() Destroyed closeIndex() Inherited by IdxStateType Contains pointers pointing to – a hashtable – a FixedAllocator – a Allocator – a array with the type of action
35
Architecture
36
IndexManager
37
DeadLockDetector
38
Transactor a HashOnlyGet object with type TxnState
39
Allocator Allocate the memory for the payloads Use pools and linked list Pool sized --the max length of payload is 100 The payloads with the same payload are in the same list
40
Unit Tests three threads, run over three indices the primary thread – create the primary index – inserts, deletes and accesses data in the primary index the second thread – simultaneously runs some basic tests over a separate index the third thread – ensure the transactional guarantees – Continuously queries the primary index
41
Outline Cassandra Cassandra overview Data model Architecture Read and write Sigmod contest 2009 Sigmod contest 2010
42
Task overview Implement a simple distributed query executor with the help of the in-memory index Given centralized query plans, translate them into distributed query plans Given a parsed SQL query, return the right results Data stored on disk, the indexes are all in memory Measure the total time costs
43
SQL query form SELECT alias_name.field_name,... FROM table_name AS alias_name,… WHERE condition1 AND... AND conditionN Condition alias_name.field_name = fixed value alias_name.field_name > fixed value alias_name.field_name1 =alias_name.field_name2
44
Initialization phase
45
Connection phase
46
Query phase
47
Closing phase
48
Tests An initial computation On synthetic and real-world datasets Tested on a single machine Tested on an ad-hoc cluster of peers Passed a collection of unit tests, provided with an Amazon Web Services account of a 100 USD value
49
Benchmarks(stag1) Assume a partition always cover the entire table, the data is not replicated. Unit-tests Benchmarks – On a single node, selects with an equal condition on the primary key – On a single node, selects with an equal condition on an indexed field – On a single node, 2 to 5 joins on tables of different size – On a single node, 1 join and a "greater than" condition on an indexed field – On three nodes, one join on two tables of different size, the two tables being on two different nodes
50
Benchmarks(stag2) Tables are now stored on multiple nodes Part of a table, or the whole table may be replicated on multiple nodes Queries will be sent in parallel up to 50 simultaneous connections Benchmarks – Selects with an equal condition on the primary key, the values being uniformly distributed – Selects with an equal condition on the primary key, the values being non- uniformly distributed – Multiple joins on tables separated on different nodes
51
Important Dates
52
Thank you!!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.