Download presentation
Presentation is loading. Please wait.
Published byJuniper Bartholomew Holmes Modified over 9 years ago
1
Performance Evaluation on Hadoop Hbase By Abhinav Gopisetty Manish Kantamneni
2
Introduction HBase is an open source, non-relational, distributed database modeled. HBase is a clone of Google’s BigTable and is written in Java It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Filesystem), providing BigTable-like capabilities for Hadoop. It has a Column oriented semi-structured data store. That is, it provides a fault-tolerant way of storing large quantities of sparse data and also provides strong consistancy.
3
Interesting facts Facebook's Messaging Platform is built using HBase. Twitter runs HBase across its entire Hadoop cluster. Yahoo! Uses HBase to store document fingerprints for detecting Duplicates.
4
HBase HBase acts as the input/output for MapReduce jobs run in Hadoop Accessed through Java API Also accessed through Avro and REST Serialization systems
5
Related Work Previous evaluations of HBase – versions 0.19, 0.20, 0.89 Research since 2007. There hasn’t been significant performance evaluation on HBase 0.90 FutureGrid project LZO Compression - It’s a real time compression library
6
LZO Compression apt-get install liblzo2-dev > cp build/hadoop-gpl-compression-0.1.0-dev/hadoop-gpl-compression-0.1.0- dev.jar $HBASE_HOME/lib/ > tar -cBf - -C build/hadoop-gpl-compression-0.1.0-dev/lib/native. | tar -xBvf - -C $HBASE_HOME/lib/native To compile it: $ export CFLAGS="-m64“ Now Using LZO, we can access the database like this: create 'mytable', {NAME=>'colfam:', COMPRESSION=>'lzo'}
7
Compatibility Issue All HBase and Hadoop versions aren’t compatible with each other. Hence we’re focusing on evaluating the HBase 0.90.2 on Hadoop 0.20
8
Implementation A blackbox approach is not enough. Performance testing helps determine the cost of the system. As a data store for loading/inserting large datasets Store large datasets analyzed by MapReduce jobs Real-time query services
9
Implementation. Install and configure Hadoop and HBase. Study the Hadoop/HBase API and write several HBase test programs to demonstrate functionality. To run a Performance evaluation HBase by performing different data model operations. which are get, put scans and delete.
10
Data Operations Get Returns attributes for a specific row Put Add new rows to a table or updates existing rows. Scans Allows iteration over multiple rows for specified attributes. Delete Removes a row from the table To do a performance analysis by varying the data size and the number of nodes to observe the behavior.
11
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.