Download presentation
Presentation is loading. Please wait.
Published byLucas Joseph Modified over 9 years ago
1
Int. Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT2005), Zeuthen, Germany, May 2005 Bitmap Indices for Fast End-User Physics Analysis in ROOT Kurt Stockinger 1, Kesheng Wu 1, Rene Brun 2, Philippe Canal 3 (1) Berkeley Lab, Berkeley, USA (2) CERN, Geneva, Switzerland (3) Fermi Lab, Batavia, USA
2
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 2 Contents u Introduction to Bitmap Indices u Integration of Bitmap Indices into ROOT n Support for TTree::Draw and TChain::Draw n Example Usage u Example Usage u Experimental Results n Index Size n Performance of Bitmap Index vs. TTreeFormula u Conclusions
3
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 3 Bitmap Indices u Bitmap indices are efficient data structures for accelerating multi-dimensional queries: E.g. pT > 195 AND nTracks 12.4 u Supported by most commercial database management systems and data warehouses u Optimized for read-only data
4
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 4 Equality Encoding vs. Range Encoding a) list of attributes b) equality encoding c) range encoding with cardinality 10 Range encoding optimized for one-sided range queries, e.g. a0 <= 3
5
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 5 Bitmap Indices with Binning u Simple bitmap indices work well for low-cardinality attributes, i.e. number of distinct values per attribute is low ( < 10,000) u For high-cardinality attributes, the size of the bitmap index is often too large to be of practical usage (also with good compression algorithms) u Solution: n Keep bitmap for attribute range rather than for each distinct attribute value (binning) n Requires additional step for evaluating candidates in bin (“Candidate Check”) – see example on the next slide
6
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 6 Range Query on Bitmap Index with Binning “Candidate check” is performed on bitmap 4 to identify attribute values where x < 63 bitmap 3 XOR bitmap 4
7
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 7 Implementation Details u FastBit: n Bitmap Index software developed at Berkeley Lab n Includes very efficient bitmap compression algorithm u Integrated bitmap indices to support: n TTree::Draw n TTree::Chain u Each attribute to be indexed is stored as a separate branch u Index is currently stored as binary file
8
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 8 Example - Build Index // open ROOT-file TFile f("data/root/data.root"); TTree *tree = (TTree*) f.Get("tree"); TBitmapIndex bitmapIndex; bitmapIndex.Init(); char indexLocation[1024] = “/data/index/"; bitmapIndex.ReadRootWriteIndexFile(tree, indexLocation); // build index for two attributes bitmapIndex.BuildIndex(tree, "a1", indexLocation); bitmapIndex.BuildIndex(tree, "a2", indexLocation);
9
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 9 Example - Tree::Draw with Index // open ROOT-file TFile f("data/root/data.root"); TTree *tree = (TTree*) f.Get("tree"); TBitmapIndex bitmapIndex; bitmapIndex.Init(); bitmapIndex.Draw(tree, "a1:a2", "a1 700");
10
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 10 Performance Measurements u Compare performance of TTreeFormula with TBitmapIndex::EvaluateQuery u Do not include time for drawing histograms u Run multi-dimensional queries (cuts with multiple predicates)
11
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 11 Experimental Setup u Software/Hardware: n Bitmap Index Software is implemented in C++ n Tests carried out on: s Linux CentOS s 2.8 GHz Intel Pentium IV with 1 GB RAM s Hardware RAID with SCSI disk u Data: n 7.6 million records with ~100 attributes each n Babar data set: u Bitmap Indices: n 10 out of ~100 attributes n 1000 equality-encoded bins n 100 range-encoded bins n Bitmap Index Compression algorithm: WAH (Word-Aligned Hybrid)
12
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 12 Size of Compressed Bitmap Indices EE-BMI: equality-encoded bitmap index RE-BMI: range-encoded bitmap index
13
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 13 Query Performance - TTreeFormula vs. Bitmap Indices Performance improvement of bitmap indices over TTreeFormula up to a factor of 10.
14
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 14 Query Performance - TTreeFormula vs. Bitmap Indices
15
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 15 Performance improvement of bitmap indices over TTreeFormula up to a factor of 10. Query Performance - TTreeFormula vs. Bitmap Indices
16
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 16 Approximate Answers u For bitmap indices with binning the exact answers are yielded during the Candidate Check Phase n Read certain records from disk to check if they fulfill the query constraint u Approximate answers are returned if the Candidate Check is omitted u The error of the approximate depends on the number of bins: n Note: the query result includes more events n However, no correct events are dropped u We used two different binning strategies: n Equality Encoding with 1000 bins: error rate 0.1% n Range Encoding with 100 bins: error rate 1%
17
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 17 Query Performance - Approximate Answers (Error 0.1- 1%) Performance improvement of bitmap indices over TTreeFormula up to a factor of 30.
18
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 18 Query Performance - Approximate Answers (Error 0.1- 1%)
19
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 19 Performance improvement of bitmap indices over TTreeFormula up to a factor of 30. Query Performance - Approximate Answers (Error 0.1- 1%)
20
Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 20 Conclusions u We integrated bitmap indices into ROOT to support: n TTree::Draw n TChain::Draw u Bitmap indices significantly improve the performance of end- user analysis by up to a factor of 10. u With approximate answers of 0.1-1% error the performance improvement is up to a factor of 30. u Bitmap indices are also used successfully in STAR experiment at Brookhaven to access ROOT-files with GridCollector. u Future work: n Store bitmap indices as ROOT-tree. n Integrate with PROOF to support parallel index evaluation.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.