Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experience Cassandra Wenjing wu 2011-5-17. outline About Cassandra Data Model Deployment Client Programming An example: implementing a name space Stress.

Similar presentations


Presentation on theme: "Experience Cassandra Wenjing wu 2011-5-17. outline About Cassandra Data Model Deployment Client Programming An example: implementing a name space Stress."— Presentation transcript:

1 Experience Cassandra Wenjing wu 2011-5-17

2 outline About Cassandra Data Model Deployment Client Programming An example: implementing a name space Stress tests

3 What is Cassandra(1) Decentralized/fault tolerant/scalable /durable distributed hash storage Originally developed by facebook, now maintained by apache. A list of big users: cloudkick, digg, Facebook, twitter, Rackspace, Cisco etc. A combination of Big Table and Dynamo Like a big hash table(both 2 and 3 dimensional )

4 What is Cassandra(2) Eventual consistence CAP theory: AP, however, configurable tradeoffs between A and C. Easy to deploy Rich client APIs for your own application, easy to install/use

5 Data model(1) Non SQL Support single index for query – Select username from user where city=‘beijing’(Yes) – -select username from user where city=‘beijing’ and age=‘28’ (No!) No joins, no complicated query Useful for suitable cases

6 Data model(2) Keyspace, one for each application, equivalent to a database Column: an attribute of the structured data, has a name, value and timestamp, equivalent to column of a table. (column=username, value=tom, timestamp=1299137043078874) Column family: a serial columns as above ones. Define a column family User: – (column=username, value=tom, timestamp=1299137043078874) – (column=email, value=tom@google.com, timestamp=1299137043078133) – (column=city, value=beijing, timestamp=1299137043078141)

7 Data Model(3) A row : identified by a key, instantiated one or more of the columns in column family: – RowKey: userkey1 – (column=username, value=tom, timestamp=1299137043078874) – (column=email, value=tom@google.com, timestamp=1299137043078133) Application creates the key(unique, usually use uuid to avoid collision) for each row, each row can have different number of columns within the column family Analogous to 2 dimensional hash table User{row_key1}{username}=tom User{row_key1}{email}=tom@google.com

8 Data Model(4) Supper column family – Each column of the super column family is a column family 3 dimensional hash table – Person{row_key1}{user}{user_name}=tom – Person{row_key1}{user}{email}=tom@google.com Person{row_key1}{user}{email}=tom@google.com – Person{row_key2}{manager}{user_name}=Alice

9 Deployment(1) Pretty easy! – Wget http://www.apache.org/dyn/closer.cgi?path=/cassand ra/0.7.5/apache-cassandra-0.7.5-bin.tar.gz http://www.apache.org/dyn/closer.cgi?path=/cassand ra/0.7.5/apache-cassandra-0.7.5-bin.tar.gz – tar zxvf apache-cassandra-0.7.5-bin.tar.gzapache-cassandra-0.7.5-bin.tar.gz – cd apache-cassandra-0.7.5 – udo mkdir -p /var/log/cassandra – sudo chown -R `whoami` /var/log/cassandra – sudo mkdir -p /var/lib/cassandra – sudo chown -R `whoami` /var/lib/cassandra

10 Deployment(2) Start service – bin/cassandra –f Try to connect with client: – bin/cassandra-cli --host localhost –port 9160 How to start: – create keyspace Keyspace1 – create column family Users with comparator=UTF8Type and default_validation_class=UTF8Type; – set Users[jsmith][first] = 'John'; – set Users[jsmith][last] = 'Smith'; What you see? – [default@KS1] get Users[jsmith]; – => (column=last, value=Smith, timestamp=1287604215498000) – => (column=first, value=John, timestamp=1287604214111000)

11 Run over a cluster Configuration file – conf/cassandra.yaml – listen_address: fst01.ihep.ac.cn(for gossip) – rpc_address: fst01.ihep.ac.cn(for client) – seeds: - fst02.ihep.ac.cn - fst03.ihep.ac.cn - fst04.ihep.ac.cn Test the cluster – bin/nodetool –host fst01.ihep.ac.cn ring

12 Client Programming Rich client options (c/java/php/perl/python) Driver for python client(pycassa) Easy to install – Install by easy_install Have easy_install installed $easy_install pycassa – Manual install $ Easy_install thrift05 $ git clone git://github.com/pycassa/pycassa.git $ cd pycassa/ $ sudo python setup.py install

13 API examples >>> import pycassa >>> pool = pycassa.connect('Keyspace1', ['localhost:9160']) col_fam = pycassa.ColumnFamily(pool, ’User') col_fam.insert(’user_key1', {’username': ’tom'}) col_fam.get(’user_key1') col_family.remove(‘user_key1’)

14 An example: implement a namespace Use pycassa to implement a name space. Similar to ext3 file system, inodes to represent metadata 2 column family used (Directory, FFile) to describe the metadata CF Directory, columns include : – Metadata: create/modify/access time, owner,group – Contents inside the directory: sub directories names, file names

15 Directory(1) dir_key1 Ownerfilestore Groupfilestore testdir1dir_keyxxxxx1 testdir2dir_keyxxxxx2 testfile1file_keyyyyyy1

16 Directory(2) A row : – RowKey: dirkey_372c5d87-4567-11e0-bc71-001a64631cb0 – => (column=dir3, value=3e180f00-459b-11e0-8846-001a64631cb0, timestamp=1299159388519845) – => (column=f2, value=c69f2ac2-45a6-11e0-9c79-001a64631cb0, timestamp=1299329058698329) – => (column=f3, value=ddd77c2e-45a5-11e0-934f-001a64631cb0, timestamp=1299328989534849) – => (column=group, value=root, timestamp=1299137043078874) – => (column=owner, value=root, timestamp=1299137043078874) – => (column=p3, value=edf0ed73-45a6-11e0-bf90-001a64631cb0, timestamp=1299164408007020)

17 FFile(1) CF FFile is used to store the metadata and contents of a specific file FFile columns include: – Metadata: create/modify/access time, owner,group,size, checksum – Contents of the file

18 FFile(2) file_key_yyyy1 ownerfilestore groupfilestore size1023 contentBla bla….

19 Ffile(3) A row – RowKey: filekey_edf0ed73-45a6-11e0-bf90-001a64631cb0 – => (column=content, value= – 127.0.0.1 localhost.localdomain localhost – 202.122.33.12 lcg002.ihep.ac.cn lcg002 – 192.168.56.11 lwn011.ihep.ac.cn lwn011 –....,timestamp=1299164408007882) – => (column=group, value=root, timestamp=1299164408007882) – => (column=owner, value=root, timestamp=1299164408007882) – => (column=size, value=11281, timestamp=1299164408007882)

20 Name space operation fs_ls (list a dir/file) fs_mkdir(make a dir) fs_rename (rename a file/dir) fs_mv(move a file/dir to another file/dir) fs_rm (remove a file/dir) fs_cpw(write a file to the storage) fs_cpr(read a file from the storage)

21 How does it work dir_key1 ownerfilestore groupfilestore testdir1dir_keyxx x1 testdir2dir_keyxx x2 dir_keyxxx1 ownerfilestore testdir12dir_keyxxx 4 testfile11file_keyyyy 1 testfile12File_keyyyy 2 file_keyyyy1 ownerfilestore groupfilestore size1023 contentThis is a test file…. /testdir1/testfile11

22 How to implement? mk_dir: fs_mkdir /testdir1/testdir2/testdir3 (/testdir1/testdir2 already exisits) – 1. generate a key for this entry: new_key=dirkey_`uuid` – 2. walk from the root directory(/, key is dirkey_1) to get the key for the parent directory(testdir2), assuming the key is dirkey_XXX – 3.insert a column in the parent directory entry (testdir2, with key dirkey_XXX). the column name is the name of the inserting directory(testdir3), and its value is the new_key – 4. create a new entry for the new directory, with all the metadata columns (owner, group)

23 Stress test Testbed: A small cluster – 4 nodes cluster – Replica number is 3 – One client test methodology: – Operation sequence: mkdir/touch a file/list dir & file – Depth of directory(4) /dir1/dir2/dir3/dir4 – -test result: finished 255102 operation(mkdir,create file,list dir, list file) in 111397.302446 seconds, 0..436second for each operation sequence – Another test failed (more than 10million operation) due to memory crash.


Download ppt "Experience Cassandra Wenjing wu 2011-5-17. outline About Cassandra Data Model Deployment Client Programming An example: implementing a name space Stress."

Similar presentations


Ads by Google