N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday Feb
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 2 Charles Leggett CHEP 2000 Feb 10 Introduction F Current histogramming software packages, such as PAW, ROOT, JAS have enormous functionality. F They are no longer simply histogramming packages, but have added data analysis and visualization features. F The tight integration between these features has made it difficult to separate the statistical data gathering feature from the analysis and graphical presentation features. F This results in significant overheads, if only the histogramming aspect is needed.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 3 Charles Leggett CHEP 2000 Feb 10 Introduction (cont) F Many histogramming packages are wedded to a specific i/o format. F Very few translation programs exist to convert between various formats. F Makes it very hard to use analysis and visualization tools that are not part of the package used to generate the histogram. F Users have very little freedom to chose the package best suited to their needs, or the ones they are most familiar with.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 4 Charles Leggett CHEP 2000 Feb 10 Why an “Interface Layer” F Since it is format independent, and has no i/o (file or visual) requirements, it is not wedded to a specific part of the analysis procedure. F It can sit between components, such as between the data acquisition component and the analysis component, offering the ability to use various formats in different applications.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 5 Charles Leggett CHEP 2000 Feb 10 Design Requirements F Platform and i/o format independent F Lightweight - low overhead, minimal non-histogram features F Possibility to histogram any data type F Ability to use within an analysis schema, as an interface between different components, or as a standalone utility F Ability to use as a translator between various i/o formats F i/o formats user extensible F Easy implementation by user
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 6 Charles Leggett CHEP 2000 Feb 10 Required Qualities of a Histogram F A collection of statistical data related to a particular process. F Should not contain any information unrelated to the statistical data, such as colour, fitting parameters, line width, cuts, etc. F Number of bins + overflow/underflow F Bin edges F Entries per bin + associated errors F Identification information, such as an ID or name = n+3 + 2n
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 7 Charles Leggett CHEP 2000 Feb 10 Minimal Set of Useful Methods F weighted entries F reset() F bin contents, errors, centers, edges F bin numbers bin edges/centers F simple operations: =, +, - F mean(), rms() F min(), max() F rebin(), resize() F change title
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 8 Charles Leggett CHEP 2000 Feb 10 What Gets Histogrammed Normally we used to histogram ints and floats. F What about entire objects? F To histogram an object, have to define which aspect of the object is used to order the histogram. F Can provide this ordering every time a histogram is filled, but nicer to associate an ordering mechanism with the histogram itself. F Define a function which provides this ordering, give pointer to histogram object.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 9 Charles Leggett CHEP 2000 Feb 10 Types of Histograms F BINNED –bin edges defined when created. –Either fixed or variable width F UNBINNED –only for very small data samples –can be converted to BINNED F AUTO-BINNED –starts off as UNBINNED, automatically converted to BINNED after a set number of entries. –Conversion routines calculate bin edges with either fixed width, or to maximize occupancy in each bin.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 10 Charles Leggett CHEP 2000 Feb 10 Use Overview Book as: Binned Unbinned Auto Output : hbook/PAW ROOT JAS text User Defined Basic Operations : Fill Weighted Fill Add, Subtract,... Resize, Rebin Convert Type etc User defined quantization function User Object Continued Analysis
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 11 Charles Leggett CHEP 2000 Feb 10 Internal Storage F If memory utilization is very tight, the user may want to limit the precision of the statistical data F User can chose between 4 and 8 byte internal record keeping –bin contents –bin errors –number of entries –number of equivalent entries
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 12 Charles Leggett CHEP 2000 Feb 10 Memory Usage F Dynamic memory allocation is neat, but implementation (often) sucks. Will always be an overhead to using it. F Pre-allocate memory - fairly easy to do with a BINNED histogram. F Limit use of dynamic structures. F Only run into trouble if need to re-size or re-bin a histogram after it’s been created. F UNBINNED histograms can either pre-allocate memory, or dynamically allocate on the fly. F Total overhead per histogram: 80 bytes.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 13 Charles Leggett CHEP 2000 Feb 10 Implementation Details F The requirement to be able to histogram objects has a serious implication - use of templates. F The histogram object becomes a templated object, with parameters the type of object to be histogrammed and the type of internal record keeping data: Histogram F For UNBINNED histograms, STL vectors are used if dynamic memory management is chosen. F Similar syntax for 2D histograms.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 14 Charles Leggett CHEP 2000 Feb 10 Usage F Simple histogram of floats, fixed bin width Histogram<> h1(-10.,10.,100); h1.Fill(X); F Histogram of ints, variable bin width, double precision Histogram h2(Xedge); F Histogram of Muon object, automatically binned to maximize occupancy float MuonQuantFunction(const Muon &M){}; Histogram h3(AUTOBINNED); h3.SetQuantFunction( MuonQuantFunction );
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 15 Charles Leggett CHEP 2000 Feb 10 I/O F File manager class used to read and write histograms from/to disk in a variety of formats F Internal histograms are only converted to a particular format when they are written. F File manager can easily be extended to encompass new file formats. F Current formats: –ASCII flat file –HBOOK –ROOT –XDR / DSL
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 16 Charles Leggett CHEP 2000 Feb 10 Ntuples F ntuples are trickier than histograms, as there are several different types (column-wise vs. row-wise, ROOT trees, etc) F For the moment, have implemented them in the most trivial way: arrays/vectors of structs. struct S { float E; int np; Muon M; }; ntuple nt; S.E =.... ; nt.Fill(S); F Simple accessor methods also provided.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 17 Charles Leggett CHEP 2000 Feb 10 Additional Functionality F Even though no complex functions are provided within the package, users may find it necessary to create them at needed. F Library functions can easily be added to provide user-specific histogram/ntuple operations. F For instance, if a user needs to perform a double gaussian fit to a histogram, it is very easy to add this function in an external library, declared as a friend.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 18 Charles Leggett CHEP 2000 Feb 10 Additions in the Pipeline F Ability to use shared memory F Extend i/o format to include JAS F Internal conversion to ROOT/HBOOK/JAS F Profile histograms F Further support for ntuples F Adhere to AIDA interface
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 19 Charles Leggett CHEP 2000 Feb 10 Pipedreams F Create an adaptor to a memory resident histogram object to allow multi-format access. F Basic histogram object sits in memory, presents different representations of itself to various components - eg looks like an HBOOK histogram to minuit, a ROOT histogram to a ROOT specific process. If modifications are made to histogram by other applications, can re-synchronize and update itself.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 20 Charles Leggett CHEP 2000 Feb 10 Conclusions F Makes a clean break between statistical data gathering, and analysis and visualization tasks. F Enables histogramming of complex types. F Simple and small implementation that is well suited to memory restricted tasks, such as online data taking. F Provides the user with the freedom to chose a wide variety of different analysis and visualization tools. F Easily extensible, whether to new i/o formats or specific analysis functions.