Download presentation
Presentation is loading. Please wait.
Published byAlan May Modified over 9 years ago
1
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch1 Anaphe OO libraries for data analysis Anaphe OO libraries for data analysis Jakub T. Mościcki CERN IT/API jakub.moscicki@cern.ch
2
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 2 Outline zOverview of Anaphe and LHC Computing zAnaphe components zLizard - Interactive Data Analysis Tool zSummary
3
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch3 LHC Computing challenge
4
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 4 LHC & The Alps 27km circumference ~100m deep Interaction Points
5
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 5 LHC Computing Challenge z4 experiments will create huge amount of data y>1 PetaByte/year for each experiment ! x10 15 Bytes x1,000 TeraBytes x20,000 Redwood tapes x100,000 dual-sided DVD-RAM disks x1,500,000 sets of the Encyclopaedia Britannica (w/o photos) zNeed lots of CPU power to reconstruct/analyse yabout 1000 PC boxes per experiment (2004 ones !) xcomplex data models zData mining and analysis by thousands of geographically dispersed scientists around the globe
6
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 6 Lifetime of LHC software = 25 yrs WWW
7
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 7 Technology (R)Evolution z10 yrs major cycle length (HW,SW,OS) y~12 evolutionary changes in the market y 1 revolutionary change y towards greater diversity y don’t forget changes of requirements zConsequences ySW written today most probably will be rewritten tomorrow yWe must anticipate changes
8
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 8 Anaphe: what it is zModular (OO/C++) replacement of CERNLIB functionality for use in HEP experiments (previously LHC++) ymemory management and I/O yfoundation classes yhistogramming, minimizing/fitting yvisualization yinteractive data analysis zTrying to use standards wherever possible zTrying to re-use existing class libraries zThis talk will not cover detector simulation (GEANT-4)
9
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 9 Anaphe Components
10
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 10 ‘Layered’ Approach zComponents are individual C++ class libraries. yEasy to replace one part without throwing away everything yAlternative implementations interchangeable xHepODBMS versus HBOOK Ntuples xNag C minimizers versus MINUIT yEasy customization to match experiment specific needs yRuntime flexibility yComponents may be used individually (limited interdependencies) zInsulate components through Abstract Interfaces y“wrapper” layer to implement Interfaces in terms of existing libs zIdentify and use patterns - avoid anti-patterns ylearn from other people’s experiences/failures
11
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 11 Anaphe Components: Overview
12
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 12 Users and Collaborations zAIDA spoken here! yIGUANA (CMS visualization) yGAUDI (LHCb) framework yATHENA (Atlas) framework yAnalyzer modules in Geant 4 yJAS yOpen Scientist y…you?
13
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch13 Anaphe components
14
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 14 CLHEP zHEP foundation class library yRandom number generators yPhysics vectors x3- and 4- vectors yGeometry yLinear algebra ySystem of units ymore packages recently added xwill continue to evolve zwwwinfo.cern.ch/asd/lhc++/clhep/
15
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 15 2D Graphics libraries zQt ymulti-platform C++ GUI toolkit xC++ class library, not wrapper around C libs xsuperset of Motif and MFC xavailable on Unix and MS Windows xno change for developer ycommercial but with public domain version ywww.troll.no zQplotter y“add-on” functionality for HEP x“HIGZ/HPLOT”
16
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 16 Basic 3D Graphic Libraries zOpenGL (basic graphics) yDe-facto industry standard for basic 3D graphics yUsed in CAD/CAE, games, VR, medical imaging zOpenInventor (scene mgmt.) yOO 3D toolkit for graphics yCubes, polygons, text, materials yCameras, lights, picking y3D viewers/editors,animation yBased on OpenGL/MesaGL
17
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 17 Mathematical Libraries zNAG (Numerical Algorithms Group) C Library yCovers a broad range of functionality xLinear algebra xdifferential equations xquadrature, etc. ySpecial functions of CERNLIB added to Mark-6 release xmostly for theory and accelerator xQuality assurance xextensive testing done by NAG ywww.nag.com
18
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 18 Histograms: the HTL package zHistograms are the basic tool for physics analysis yStatistical information of density distributions zHistogram Template Library (HTL) ydesign based on C++ templates yModular : separation between sampling and display yExtensible : open for user defined binning systems yFlexible: support transient/persistent at the same time yOpen: large use of abstract interfaces yrecent addition: 3D histograms
19
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 19 Fitting and Minimization zFitting and Minimization Library (FML) ycommon OO interface xNAG-C, MINUIT ybased on Abstract Interfaces IVector, IModelFunction, … yfitting as a special case of minimization xminimize “distance” between data and model yreplacement for HepFitting (and Gemini) zGemini ycommon minimization interface yvery thin layer
20
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 20 Tags, Ntuples and Events zNtupleTag Library yNtuple navigation and analysis ycommon OO interface for different storage xODBMS xHBook (CERNLIB) zExploiting Tag concept yenhanced Ntuples yassociated with an underlying persistent store yoptional association to the Event may be used to navigate to any other part of the Event xeven from an interactive visualization program ymain use: speedup data selection for analysis… xTag data is typically better clustered than the original data Object Association
21
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch21 Interactive Data Analysis
22
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 22 Interactive Data Analysis zAim: “OO replacement for PAW” yanalysis of “ntuple-like data” (“Tags”, “Ntuples”, …) yvisualisation of data (Histograms, scatter-plot, “Vectors”) yfitting of histograms (and other data) yaccess to experiment specific data/code zMaximize flexibility and re-use yplug-in structure ycareful design with limited source and binary dependencies zForesee customization/integration yallow use from within experiment’s s/w yframework!
23
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 23 Lizard Internals: Interfaces
24
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 24 Anaphe components
25
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 25 Architectural issue: Scripting zTypical use of scripting is quite different from programming (reconstruction, analysis,...) yhistory “go back to where I was before” yrepetition/looping - with “modifiable parameters” zSWIG to (semi-) automatically create connection to chosen scripting language yallows flexibility to choose amongst several scripting languages yPython, Perl, Tcl, Guile, Ruby, (Java) … zPython - OO scripting, no “strange $!%-variables” yother scripting languages possible (through SWIG) zCan be enhanced and/or replaced by a GUI yscripting window within GUI application
26
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 26 Example script (ntuple) # get list of names of all tuples from tuplemanager ntm.listTuples() nt1=ntm.findNtuple(“Charm1”) # retrieve tuple by name # create 1D histos to project into h1=hm.create1D(10, “mass”,100, 0., 5000.) h2=hm.create1D(20, “mass for pt1>10”,100, 0., 5000.) # project the attribute ”MASS" into histo h1 without cut ("") nt1.project1D( h1, “”, “MASS”) # project the attribute ”MASS" into histo h2 with cut (”PT1>10") nt1.project1D( h2, “PT1>10”, “MASS”)
27
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 27
28
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 28 Lizard: History and Present Status zStarted after CHEP-2000 zFull version out since June 2001 y“PAW like” analysis functionality plus yon-demand loading of compiled code using shared libraries xgives full access to experiment’s analysis code and data ybased on Abstract Interfaces xflexible and extensible
29
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 29 Possible Future Enhancements zAccess to other implementations of components yHBOOK histograms and ntuples (RWN) /coming soon/ yOpenScientist, ROOT histograms? zAdding other “scripting” languages yPerl, Tcl, cint ? zCommunication with Java tools/packages yvia AIDA xJAS xWIRED
30
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 30 Architectural issue: Distributed Computing zMotivation ymove code to data yparallel analysis zTechniques yservices via AI ylate binding yplug-in architecture zEnd-user (Lizard) ylook-and-feel of local analysis zR&D started and first prototype available soon yCORBA
31
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 31 Summary zThe architecture of Anaphe shows some important items for flexible and modular data analysis: yweak coupling between components through use of Abstract Interface ybasic functionality is covered by C++ class libraries zMajor criteria are flexibility, extensibility and interoperability yrecent example: GEANT-4 space examples using G4Analysis component (based on AIDA) zLizard is based on Anaphe components and the Python scripting language (through SWIG) yLizard is young but has very solid base in mature Anaphe libraries yreal plug-in structure
32
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 32 More information zcern.ch/Anaphe zcern.ch/Anaphe/Lizard zaida.freehep.org/ zcern.ch/DB zwwwinfo.cern.ch/asd/lhc++/clhep/
33
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 33
34
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch34 Opening bracket: Persistency
35
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 35 Ntuple versus TagDB Model Event Data FilesNtuple File Ad hoc extraction prg. Object Association Federated DB of Event & Tag
36
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 36 Object persistency Two concepts: serial and page I/O z“Sequential access to objects” (streaming) ygood in networking context or serial writes to files ymuch like “good old Fortran” yoften perceived to be “simpler” to implement (“ >”) z“Navigational access to objects” (buffered) yI/O on demand for complex data models yoptimized for (random) disk access (disks deliver pages) ysequential write to file still ok zBoth concepts need to take care about changes of the internal structure of the objects (schema evolution)
37
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 37 Architectural Issue: Persistency (“Object-I/O”) zBrings a completely new quality into the design zObjects have now lifetime ydon’t “delete” until you really are sure you want to ypersistency is kind of “intended memory leak” zObjects may change during their (extended) life y“schema evolution” yadditions/deletions of attributes ychanges of inheritance relations
38
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 38 Architectural Issue: Persistency (“Object-I/O”) (II) zObjects can be placed (“clustering”) yde-coupling of logical and physical view of data zSpecial care needed to ensure consistency in data set yavoid reading group of objects (tracks, events,...) for which writing/updating is not (yet) complete yclean up if only part of the objects are written ytypically taken care of by using transactions zComplications possible in distributed computing yneed to protect disk access now like memory access in past (“Segmentation violation”)
39
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 39 Physical Model and Logical Model Physical model may be changed to optimise performance Physical model may be changed to optimise performance Existing applications continue to work Existing applications continue to work
40
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 40 Concurrent Access zData changes are part of a Transaction yACID: Atomicity, Consistency, Isolation, Durability yGuarantees consistency of data zSupport for multiple concurrent writers ye.g. Multiple parallel data streams ye.g. Filter or reconstruction farms ye.g. Distributed simulation zAccess is co-ordinated by a lock server yMROW: Multiple Reader, One Writer per container (Objectivity/DB)
41
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 41 Plain files vs. Databases zHEP is using DBs since LEP ye.g., FATMEN, HepDB zMainly for “Meta data” yevent -> file -> tape mappings ycalibration data (conditions) zWhy not use a single system for “Meta data” and “data” ? yOverhead for DB administration is there anyway yaccessing “Meta data” from “data” is significantly easier xsimple navigation from event -> calibrationData ytransaction safety also for event/reconstructed data
42
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch 42 Persistency: Objectivity/DB zODMG compliant database yObject Data Management Group defined a standard zLanguage binding for C++, Java, Smalltalk yODBMS allow to use persistent objects directly as variables of the OO language zStorage entity is a complete object yState of all data members & Object class yGuarantees consistent view of data (DB feature) zC++ Language Support yAbstraction, Inheritance, Polymorphism yParameterised Types (Templates) zLocation transparent access to objects
43
CHEP01 CERN IT/API, Jakub.Moscicki@cern.ch43 Closing bracket: Persistency
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.