Download presentation
Presentation is loading. Please wait.
Published byAlison Horton Modified over 6 years ago
1
Analysis of Lucene Index on Hbase in an HPC Environment
Prerna Shraff Anand Hegde
2
Concept BigTable Compressed, high performance database system
built on GFS, Chubby Lock Service, SSTable etc. Hbase Hadoop database Open source distributed versioned column oriented Modeled after BigTable
3
Outline Data intensive computing requires storage solutions for huge amount of data. The requirement is to host very large tables on clusters of commodity hardware. HBase helps in fulfilling the above requirement. Hbase provides Bigtable like capabilities on top of Hadoop.
4
The Idea Current implementation in this field includes an experiment using Lucene Index on Hbase in an HPC Environment. (Xiaoming Gao, Vaibhav Nachankar, Judy Qiu) To expand the scope of the existing project. To evaluate the performance in terms of many other parameters.
5
Architecture
6
Implemented solution Use of inverted index using Lucene index.
Index refers to doc1 -> “cloud” Inverted index refers to “cloud” -> doc1 Apache Lucene was used to implement inverted indices. Apache Lucene supports full-text search.
7
Implemented Design The existing design has separate tables for book images, book texts and Lucene indices.
8
System Implementation
9
Initial Analysis Experiment was performed on the Alamo HPC cluster of FutureGrid. Experiment was conducted with 5 Books. Total terms evaluated : 8263
10
Initial Data Analysis
11
Initial Data Analysis
12
Proposed Work To test across more number of data sets.
To test across different clusters like India on FutureGrid. To test across different number of HDFS data nodes. To test across more number of client nodes with different number of client queries.
13
Obstacles we can face Hardware differs from cluster to cluster and performance will differ accordingly. Problems may occur with increase or decrease of data nodes. Important items to consider would be switching capacity of the device, number of systems connected and uplink capacity. Finding appropriate number of data sets.
14
References Hbase http://hbase.apache.org/book.html#ops_mgt
BigTable Experiment using Lucene Index on Hbase in an HPC Environment. (Xiaoming Gao, Vaibhav Nachankar, Judy Qiu)
15
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.