Experience Cassandra Wenjing wu
outline About Cassandra Data Model Deployment Client Programming An example: implementing a name space Stress tests
What is Cassandra(1) Decentralized/fault tolerant/scalable /durable distributed hash storage Originally developed by facebook, now maintained by apache. A list of big users: cloudkick, digg, Facebook, twitter, Rackspace, Cisco etc. A combination of Big Table and Dynamo Like a big hash table(both 2 and 3 dimensional )
What is Cassandra(2) Eventual consistence CAP theory: AP, however, configurable tradeoffs between A and C. Easy to deploy Rich client APIs for your own application, easy to install/use
Data model(1) Non SQL Support single index for query – Select username from user where city=‘beijing’(Yes) – -select username from user where city=‘beijing’ and age=‘28’ (No!) No joins, no complicated query Useful for suitable cases
Data model(2) Keyspace, one for each application, equivalent to a database Column: an attribute of the structured data, has a name, value and timestamp, equivalent to column of a table. (column=username, value=tom, timestamp= ) Column family: a serial columns as above ones. Define a column family User: – (column=username, value=tom, timestamp= ) – (column= , timestamp= ) – (column=city, value=beijing, timestamp= )
Data Model(3) A row : identified by a key, instantiated one or more of the columns in column family: – RowKey: userkey1 – (column=username, value=tom, timestamp= ) – (column= , timestamp= ) Application creates the key(unique, usually use uuid to avoid collision) for each row, each row can have different number of columns within the column family Analogous to 2 dimensional hash table User{row_key1}{username}=tom
Data Model(4) Supper column family – Each column of the super column family is a column family 3 dimensional hash table – Person{row_key1}{user}{user_name}=tom – – Person{row_key2}{manager}{user_name}=Alice
Deployment(1) Pretty easy! – Wget ra/0.7.5/apache-cassandra bin.tar.gz ra/0.7.5/apache-cassandra bin.tar.gz – tar zxvf apache-cassandra bin.tar.gzapache-cassandra bin.tar.gz – cd apache-cassandra – udo mkdir -p /var/log/cassandra – sudo chown -R `whoami` /var/log/cassandra – sudo mkdir -p /var/lib/cassandra – sudo chown -R `whoami` /var/lib/cassandra
Deployment(2) Start service – bin/cassandra –f Try to connect with client: – bin/cassandra-cli --host localhost –port 9160 How to start: – create keyspace Keyspace1 – create column family Users with comparator=UTF8Type and default_validation_class=UTF8Type; – set Users[jsmith][first] = 'John'; – set Users[jsmith][last] = 'Smith'; What you see? – get Users[jsmith]; – => (column=last, value=Smith, timestamp= ) – => (column=first, value=John, timestamp= )
Run over a cluster Configuration file – conf/cassandra.yaml – listen_address: fst01.ihep.ac.cn(for gossip) – rpc_address: fst01.ihep.ac.cn(for client) – seeds: - fst02.ihep.ac.cn - fst03.ihep.ac.cn - fst04.ihep.ac.cn Test the cluster – bin/nodetool –host fst01.ihep.ac.cn ring
Client Programming Rich client options (c/java/php/perl/python) Driver for python client(pycassa) Easy to install – Install by easy_install Have easy_install installed $easy_install pycassa – Manual install $ Easy_install thrift05 $ git clone git://github.com/pycassa/pycassa.git $ cd pycassa/ $ sudo python setup.py install
API examples >>> import pycassa >>> pool = pycassa.connect('Keyspace1', ['localhost:9160']) col_fam = pycassa.ColumnFamily(pool, ’User') col_fam.insert(’user_key1', {’username': ’tom'}) col_fam.get(’user_key1') col_family.remove(‘user_key1’)
An example: implement a namespace Use pycassa to implement a name space. Similar to ext3 file system, inodes to represent metadata 2 column family used (Directory, FFile) to describe the metadata CF Directory, columns include : – Metadata: create/modify/access time, owner,group – Contents inside the directory: sub directories names, file names
Directory(1) dir_key1 Ownerfilestore Groupfilestore testdir1dir_keyxxxxx1 testdir2dir_keyxxxxx2 testfile1file_keyyyyyy1
Directory(2) A row : – RowKey: dirkey_372c5d e0-bc71-001a64631cb0 – => (column=dir3, value=3e180f00-459b-11e a64631cb0, timestamp= ) – => (column=f2, value=c69f2ac2-45a6-11e0-9c79-001a64631cb0, timestamp= ) – => (column=f3, value=ddd77c2e-45a5-11e0-934f-001a64631cb0, timestamp= ) – => (column=group, value=root, timestamp= ) – => (column=owner, value=root, timestamp= ) – => (column=p3, value=edf0ed73-45a6-11e0-bf90-001a64631cb0, timestamp= )
FFile(1) CF FFile is used to store the metadata and contents of a specific file FFile columns include: – Metadata: create/modify/access time, owner,group,size, checksum – Contents of the file
FFile(2) file_key_yyyy1 ownerfilestore groupfilestore size1023 contentBla bla….
Ffile(3) A row – RowKey: filekey_edf0ed73-45a6-11e0-bf90-001a64631cb0 – => (column=content, value= – localhost.localdomain localhost – lcg002.ihep.ac.cn lcg002 – lwn011.ihep.ac.cn lwn011 –....,timestamp= ) – => (column=group, value=root, timestamp= ) – => (column=owner, value=root, timestamp= ) – => (column=size, value=11281, timestamp= )
Name space operation fs_ls (list a dir/file) fs_mkdir(make a dir) fs_rename (rename a file/dir) fs_mv(move a file/dir to another file/dir) fs_rm (remove a file/dir) fs_cpw(write a file to the storage) fs_cpr(read a file from the storage)
How does it work dir_key1 ownerfilestore groupfilestore testdir1dir_keyxx x1 testdir2dir_keyxx x2 dir_keyxxx1 ownerfilestore testdir12dir_keyxxx 4 testfile11file_keyyyy 1 testfile12File_keyyyy 2 file_keyyyy1 ownerfilestore groupfilestore size1023 contentThis is a test file…. /testdir1/testfile11
How to implement? mk_dir: fs_mkdir /testdir1/testdir2/testdir3 (/testdir1/testdir2 already exisits) – 1. generate a key for this entry: new_key=dirkey_`uuid` – 2. walk from the root directory(/, key is dirkey_1) to get the key for the parent directory(testdir2), assuming the key is dirkey_XXX – 3.insert a column in the parent directory entry (testdir2, with key dirkey_XXX). the column name is the name of the inserting directory(testdir3), and its value is the new_key – 4. create a new entry for the new directory, with all the metadata columns (owner, group)
Stress test Testbed: A small cluster – 4 nodes cluster – Replica number is 3 – One client test methodology: – Operation sequence: mkdir/touch a file/list dir & file – Depth of directory(4) /dir1/dir2/dir3/dir4 – -test result: finished operation(mkdir,create file,list dir, list file) in seconds, second for each operation sequence – Another test failed (more than 10million operation) due to memory crash.