Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer.

Slides:



Advertisements
Similar presentations
Physicist Interfaces Project an overview Physicist Interfaces Project an overview Jakub T. Moscicki CERN June 2003.
Advertisements

Vincenzo Innocente, BluePrint RTAGNuts & Bolts1 Architecture Nuts & Bolts Vincenzo Innocente CMS.
O. Stézowski IPN Lyon AGATA Week September 2003 Legnaro Data Analysis – Team #3 ROOT as a framework for AGATA.
Core Application Software Activities Ian Fisk US-CMS Physics Meeting April 20, 2001.
Usage of the Python Programming Language in the CMS Experiment Rick Wilkinson (Caltech), Benedikt Hegner (CERN) On behalf of CMS Offline & Computing 1.
ACAT Lassi A. Tuura, Northeastern University Ignominy Tool for Analysing Software Dependencies and For Reducing Complexity.
Victor Serbo, SLAC30 September 2004, Interlaken, Switzerland JASSimApp plugin for JAS3: Interactive Geant4 GUI Serbo, Victor (SLAC) - presenter Donszelmann,
Gran Sasso Lab, Jul Andreas Pfeiffer, CERN/IT-API, Anaphe - OO Libraries for Data Analysis using C++ and Python AIDA –
By Steven Taylor.  Basically a video game engine is a software system designed for the creation and development of video games.  There are many game.
Software Installation The full set of lecture notes of this Geant4 Course is available at
SEAL V1 Status 12 February 2003 P. Mato / CERN Shared Environment for Applications at LHC.
ROOT An object oriented HEP analysis framework.. Computing in Physics Physics = experimental science =>Experiments (e.g. at CERN) Planning phase Physics.
ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.
Framework for Automated Builds Natalia Ratnikova CHEP’03.
Zubanov Alexei, 2006 Aug 07 – Sep 28 QtROOT package extension with Coin3D lib.
Ianna Gaponenko, Northeastern University, Boston The CMS IGUANA Project1 George Alverson, Ianna Gaponenko, and Lucas Taylor Northeastern University, Boston.
Java Analysis Studio Status Update 12 May 2000 Altas Software Week Tony Johnson
Advanced Analysis Environments What is the role of Java in physics analysis? Will programming languages at all be relevant? Can commercial products help.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Introduzione al Software di CMS N. Amapane. Nicola AmapaneTorino, Aprile Outline CMS Software projects The framework: overview Finding more.
JAS3 + AIDA LC Simulations Workshop SLAC 19 th May 2003.
IX International Workshop on Advanced Computing and Analysis Techniques in Physics Research KEK, Tsukuba, December 2003
Java Root IO Part of the FreeHEP Java Library Tony Johnson Mark Dönszelmann
IBM Software Group ® Overview of SA and RSA Integration John Jessup June 1, 2012 Slides from Kevin Cornell December 2008 Have been reused in this presentation.
CPT Week, Apr Lassi A. Tuura, Northeastern University Software Quality with Ignominy Lassi A. Tuura Northeastern.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
V. Serbo, SLAC ACAT03, 1-5 December 2003 Interactive GUI for Geant4 by Victor Serbo, SLAC.
CHEP Feb 7-11, 2000 Andreas Pfeiffer, CERN/IT, 1 AIDA - Abstract Interfaces for Data Analysis Andreas Pfeiffer CERN IT
MINER A Software The Goals Software being developed have to be portable maintainable over the expected lifetime of the experiment extensible accessible.
JAS3 - A general purpose data analysis framework for HENP and beyond Tony Johnson, Victor Serbo, Max Turri, Mark Dönszelmann, Joseph Perl SLAC.
CHEP Lassi A. Tuura, Northeastern University Analysing Software Dependencies With Ignominy Lucas Taylor Lassi.
Postgraduate Computing Lectures Applications I: Overview 1 Applications: Overview Symbiosis: Theory v. Experiment Theory –Build models to explain existing.
CMS pixel data quality monitoring Petra Merkel, Purdue University For the CMS Pixel DQM Group Vertex 2008, Sweden.
WIRED 4 An extensible generic Event Display Mark Donszelmann SLAC, Stanford, U.S.A. CHEP2004, 27 september – 1 october Interlaken, Switzerland.
ROOT Future1 Some views on the ROOT future ROOT Workshop 2001 June 13 FNAL Ren é Brun CERN.
GranSasso, Jul-2002 Andreas Pfeiffer, CERN/IT-API, AIDA Abstract Interfaces for Data Analysis Andreas Pfeiffer CERN IT/API
GranSasso, Jul-2002 Andreas Pfeiffer, CERN/IT-API, AIDA Abstract Interfaces for Data Analysis Andreas Pfeiffer CERN IT/API
SEAL Core Libraries and Services CLHEP Workshop 28 January 2003 P. Mato / CERN Shared Environment for Applications at LHC.
Java Analysis Studio - Status CHEP 98 - September 1998 Tony Johnson - SLAC Jonas Gifford + Kevin Garwood - University of Victoria.
OnX & ROOT1 OnX & ROOT on behalf of Guy Barrand ROOT Workshop 2001 June 13 FNAL Ren é Brun CERN.
GDB Meeting - 10 June 2003 ATLAS Offline Software David R. Quarrie Lawrence Berkeley National Laboratory
Not Invented Here: The Re-use of Commercial Components in HEP Computing Jeremy Walton The Numerical Algorithms Group Ltd, UK.
Introduction What is detector simulation? A detector simulation program must provide the possibility of describing accurately an experimental setup (both.
Computing R&D and Milestones LHCb Plenary June 18th, 1998 These slides are on WWW at:
23/2/2000Status of GAUDI 1 P. Mato / CERN Computing meeting, LHCb Week 23 February 2000.
Geant4 Workshop, Sept/Oct 2002 Software Process and Quality Assurance Software Metrics And Ignominy “How to Win Friends And Influence People” Lassi A.
Analysis Software Strategy Jürgen Knobloch HTASC, DESY 9 October 2001 AIDA ANAPHE LIZARD.
5 Novembre 2001 Vincenzo Innocente AFT Agenda 1 AFT Tasks l Architecture l Framework l Framework specializations l Utility Toolkit l Graphics tools l Data.
JAS and JACO – Status Report Atlas Graphics Group August 2000 Tony Johnson.
CHEP01 CERN IT/API, Anaphe OO libraries for data analysis Anaphe OO libraries for data analysis Jakub T. Mościcki CERN IT/API
Giulio Eulisse, Northeastern University CHEP’04, Interlaken, 27th Sep - 1st Oct, 2004 CHEP’04 IGUANA Interactive Graphics Project:
AIDA Abstract Interfaces for Data Analysis Massimiliano Turri, SLACCHEP, La Jolla, March “The goal of the AIDA project is to define abstract.
Predrag Buncic (CERN/PH-SFT) Software Packaging: Can Virtualization help?
Aug 2000 Andreas Pfeiffer, CERN/IT, 1 Lizard A Flexible and Modular Data Analysis Tool using Abstract Types Andreas Pfeiffer CERN.
Ianna Gaponenko, Northeastern University, Boston The CMS IGUANA Project1 George Alverson, Ianna Gaponenko and Lucas Taylor Northeastern University, Boston.
Geant4 User Workshop 15, 2002 Lassi A. Tuura, Northeastern University IGUANA Overview Lassi A. Tuura Northeastern University,
CPT Week, November , 2002 Lassi A. Tuura, Northeastern University Core Framework Infrastructure Lassi A. Tuura Northeastern.
VI/ CERN Dec 4 CMS Software Architecture vs Hybrid Store Vincenzo Innocente CMS Week CERN, Dec
Online Software November 10, 2009 Infrastructure Overview Luciano Orsini, Roland Moser Invited Talk at SuperB ETD-Online Status Review.
Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, AIDA Abstract Interfaces for Data Analysis Andreas Pfeiffer CERN IT/API
Discussion with Blueprint RTAG August 2002 Tony Johnson SLAC.
POOL Based CMS Framework Bill Tanenbaum US-CMS/Fermilab 04/June/2003.
(on behalf of the POOL team)
Anaphe OO Libraries for Data Analysis using C++ and Python
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Project Status and Plan
Software Installation
OO-Design in PHENIX PHENIX, a BIG Collaboration A Liberal Data Model
Use of GEANT4 in CMS The OSCAR Project
WIRED 4 Event Display Linear Collider Simulation Workshop
Presentation transcript:

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, Anaphe OO Libraries for Data Analysis using C++ and Python Andreas Pfeiffer CERN IT/API

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 2 Outline zMotivation yLHC computing challenge zAnaphe Components yC++ zLizard: Interactive Data Analysis yPython zSoftware quality control zSummary

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, LHC Computing challenge

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 4 LHC & The Alps 27km circumference ~100m deep Interaction Points

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 5 The Large Hadron Collider zA completely new particle collider (start-up in 2006) y the largest superconductor installation in the world zA collision will take place every 25 nanoseconds zBut only one in a billion will be interesting… zAnd only one in a trillion will be really interesting !!! zReal-time data filtering: Petabytes per second to Gigabytes per second zAccumulated data: Petabytes per year zData mining by thousands of geographically dispersed scientists in hundreds of teams

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 6 LHC Computing Challenge z4 experiments will create huge amount of data y>1 PetaByte/year for each experiment ! x10 15 Bytes x1,000 TeraBytes x20,000 Redwood tapes x100,000 dual-sided DVD-RAM disks x1,500,000 sets of the Encyclopaedia Britannica (w/o photos) zNeed lots of CPU power to reconstruct/analyse yabout 1000 PC boxes per experiment (2005 ones !) x of today’s boxes (dual P-III 800 MHz) xcomplex data models yreconstruction s/w is also used for online filtering xneeds high quality s/w in order not to waste beam time

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 7 Lifetime of LHC software = 25 yrs WWW Thanks to Dino Ferrero Merlino(IT)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 8 Technology (R)Evolution z10 yrs major cycle length (HW,SW,OS) y~12 evolutionary changes in the market y 1 revolutionary change y towards greater diversity y don’t forget changes of requirements zConsequences ys/w written today most probably will be rewritten tomorrow ywe must anticipate changes

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 9 Anaphe: what it is zModular (OO/C++) replacement of CERNLIB functionality for use in HEP experiments ymemory management yI/O yfoundation classes yhistogramming yminimizing/fitting yvisualization yinteractive data analysis zTrying to use standards wherever possible zTrying to re-use existing class libraries zThis talk will not cover detector simulation (GEANT-4)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 10 Anaphe Components

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 11 Use of Components with Abstract Interfaces  User Code uses only Interface classes  IHistogram1D * hist = histoFactory-> create1D(‘track quality’, 100, 0., 10.) zActual implementations are selected at run-time yloading of shared libraries zNo change at all to user code but keep freedom to choose implementation Histo- Impl. 2 Histo-IFFitter-IF User Code Fitter- Impl. Y Histo- Impl. 1 Fitter- Impl. X

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 12 The AIDA project zAIDA project (Abstract Interfaces for Data Analysis) was initiated at the HepVis’99 workshop in Orsay zPresently active mainly developers from existing packages yTony Johnson (JAS) yAndreas Pfeiffer (Lizard/Anaphe) yGuy Barrand (OpenScientist ) yMark Dönszelmann (Wired) yDevelopers from LHCb/Gaudi zmore on AIDA tomorrow...

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 13 ‘Layered’ Approach zBasic functionalities (histograms, fitting, etc.) are available as individual C++ class libraries. zEasy replacing one part without throwing away everything yObjectivity/DB to provide persistence yHepODBMS library (“insulating layer”, “tags”) yHistogram library (HTL) yFitting libraries (Gemini, HepFitting) yGraphics libraries (Qt, Qplotter) zInsulate components through Abstract Interfaces y“wrapper” layer to implement Interfaces in terms of existing libs zApply s/w quality control tools ycode checking, testing

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 14 Anaphe Components: Overview

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 15 Anaphe Internals: Abstract Interfaces

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 16 Anaphe components

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 17 Basic 3D Graphic Libraries zOpenGL (basic graphics) yDe-facto industry standard for basic 3D graphics yUsed in CAD/CAE, games, VR, medical imaging zOpenInventor (scene mgmt.) yOO 3D toolkit for graphics yCubes, polygons, text, materials yCameras, lights, picking y3D viewers/editors,animation yBased on OpenGL/MesaGL

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 18 2D Graphics libraries zQt ymulti-platform C++ GUI toolkit xC++ class library, not wrapper around C libs xsuperset of Motif and MFC xavailable on Unix and MS Windows xno change for developer ycommercial but with public domain version ywww.troll.no zQplotter y“add-on” functionality for HEP x“HIGZ/HPLOT”

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 19 Mathematical Libraries zNAG (Numerical Algorithms Group) C Library yCovers a broad range of functionality xLinear algebra xdifferential equations xquadrature, etc. ySpecial functions of CERNLIB added to Mark-6 release xmostly for theory and accelerator xQuality assurance xextensive testing done by NAG ywww.nag.com

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 20 CLHEP - foundation classes zHEP foundation class library yRandom number generators yPhysics vectors x3- and 4- vectors yGeometry yLinear algebra ySystem of units ymore packages recently added xwill continue to evolve zwwwinfo.cern.ch/asd/lhc++/clhep/

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 21 Histograms: the HTL package zHistograms are the basic tool for physics analysis yStatistical information of density distributions zHistogram Template Library (HTL) ydesign based on C++ templates yModular : separation between sampling and display yExtensible : open for user defined binning systems yFlexible: support transient/persistent at the same time yOpen: large use of abstract interfaces yrecent addition: 3D histograms

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 22 Fitting and Minimization zFitting and Minimization Library (FML) ycommon OO interface xNAG-C, MINUIT ybased on Abstract Interfaces  IVector, IModelFunction, … yfitting as a special case of minimization xminimize “distance” between data and model yreplacement for HepFitting (and Gemini) zGemini ycommon interface to minimizer engine yvery thin layer

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, Opening bracket: Persistency

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 24 Object persistency Two concepts: serial and page I/O z“Sequential access to objects” (streaming) ygood in networking context or serial writes to file(s) ymuch like “good old Fortran” yoften perceived to be “simpler” to implement (“ >”) z“Navigational access to objects” (buffered) yI/O on demand for complex data models ylocation transparent (for user) access to object xtypically by de-referencing of a smart pointer yoptimized for (random) disk access (disks deliver pages) ysequential write to file(s) still ok zBoth concepts need to take care about changes of the internal structure of the objects (schema evolution)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 25 Architectural Issue: Persistency (“Object-I/O”) zBrings a completely new quality into the design zObjects have now lifetime ydon’t “delete” until you really are sure you want to ypersistency is kind of “intended memory leak” ywould like to see no difference between memory and disk z“Layout” of objects may change during (extended) life y“schema evolution” yadditions/deletions of attributes ychanges of inheritance relations

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 26 Architectural Issue: Persistency (“Object-I/O”) (II) zObjects can be placed (“clustering”) yde-coupling of logical and physical view of data zSpecial care needed to ensure consistency in data set yavoid reading group of objects (tracks, events,...) for which writing/updating is not (yet) complete yclean up if only part of the objects are written ytypically taken care of by using transactions zComplications possible in distributed computing yneed to protect disk access now like memory access in past (“Segmentation violation”)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 27 Physical Model and Logical Model Physical model may be changed to optimise performance Physical model may be changed to optimise performance Existing applications continue to work transparently ! Existing applications continue to work transparently !

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 28 Object Model Thanks to Vincenzo Innocente (CMS)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 29 Physical clustering Thanks to Vincenzo Innocente (CMS)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, Closing bracket: Persistency

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 31 “Tags”, Ntuples and Events zTags - a special kind of Ntuple yAlways associated with an underlying persistent store yTags may be used to store “ntuple-like” data xextracted from all over the event xminPt, maxEmiss, nJets, nMuon, trigger, … zMain use: speedup data selection for analysis … yTag simplifies selection without loosing complexity zEvents more complex than a tree structure (“CWN”) ylots of cross-references between classes, containers zAssociation from the Tag to the Event may be used to navigate to any other part of the Event yeven from an interactive visualization program

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 32 AIDA compliance of Anaphe zPresently (Anaphe 3.x) only AIDA 1.0 compliant zPlan to implement AIDA 2.2 Interfaces by end 2001 (Anaphe 4.x) yinitially as wrappers to existing interfaces/packages zWill maintain 3.x for some time yensures stability for users zDevelopment will concentrate on 4.x ywhile AIDA will evolve further zSimilar timeschedule as JAS (Tony Johnson) zOpenScientist (Guy Barrand) already there

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, Lizard: a tool for Interactive Data Analysis

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 34 Interactive Data Analysis zAim: “OO replacement for PAW” (at least) yanalysis of “ntuple-like data” (“Tags”, “Ntuples”, …) yvisualisation of data (Histograms, scatter-plot, “Vectors”) yfitting of histograms (and other data) yaccess to experiment specific data/code zMaximize flexibility and re-use zForesee customization/integration yallow use from within experiment’s s/w zPlan for extensions y“code for now, design for the future” zEnsure maintainability yuse of s/w quality control tools

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 35 Lizard zUn tool di analisi interattiva AIDA compatibile yPython scripting yVisualizzazione con Qt yIstogrammi HTL (via AIDA) yPersistenza con Objectivity yFitting con NAG Libraries (o Minuit) zComponenti disponibili come shared libraries yindipendenti dal linguaggio di scripting ysi possono usare anche in programmi C++ (Geant4)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 36 Scripting - why zTypical use of scripting is quite different from programming (reconstruction, analysis,...) yhistory “go back to where I was before” yrepetition/looping - with “modifiable parameters” zavoid “one size fits all” or “using power-tool as hammer” yrapid prototyping in “scripting language” xquick turn-around times yperformance critical code in “core language” xexploit richer set of features/functionality (e.g. templates in C++) zscripting languages usually less susceptible to changes than “mainstream languages” ypotentially longer lifes

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 37 Python - why zPython - OO (scripting) language öno “strange $!%-variables” ôsensitive to indentation zMore easy for users yas Java zLots of user supplied modules available and ready for use yscientific, numerics, graphics, GUI, network, OS, games, DBs, …  example:  Parnassus Totals: 1173 items in 49 categories. zAlso usable in Java (Jython) yused in JAS for scripting yminimize changes needed within AIDA compliant environments

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 38 Python - how zSWIG to (semi-) automatically create connection to chosen scripting language yallows flexibility to choose amongst several scripting languages yPython, Perl, Tcl, Guile, Ruby, (Java) …  Very easy to use  swig -c++ -python -shadow -c myClass.h  create shared lib from myClass.cpp and myClass_wrap.c  start python and import myClass.h to use it zVery easy to extend ysimply inherit from “swiggified” class in python ymodifications can later be fed back into C++ xperformance, type safety, special language features (templates), …

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 39 PAW -> Lizard translation  Ntuple projection Lizard  lizard --useHBook  :-) nt = ntm.findNtuple(“higgscand.hbk::cands”)  :-) nplot1D(nt, “mass”, “quality=5 && cut > 198”) zNtuple projection PAW  pawX11  paw> h/file 1 higgscand.hbk  paw> nt/pl 10.mass quality=5.and.cut>198  Assuming file higgscand.hbk contains ntuple with number 10 and title cands Any valid C++ expression

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 40 Example script (ntuple) # get list of names of all tuples from tuplemanager ntm.listTuples() nt1=ntm.findNtuple(“Charm1”) # retrieve tuple by name # create 1D histos to project into h1=hm.create1D(10, “mass”,100, 0., 5000.) h2=hm.create1D(20, “mass for pt1>10”,100, 0., 5000.) # project the attribute ”MASS" into histo h1 without cut ("") nt1.project1D( h1, “”, “MASS”) # project the attribute ”MASS" into histo h2 with cut (”PT1>10") nt1.project1D( h2, “PT1>10”, “MASS”)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 41

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 42 Lizard: History and Present Status zStarted after CHEP-2000 zFull version out since June 2001 y“PAW like” analysis functionality plus: yon-demand loading of compiled code using shared libraries xgives full access to experiment’s analysis code and data ybased on Abstract Interfaces xflexible and extensible z“License free” version since Sep yHBook for RWNtuples and Histogram storage yMinuit as minimizer engine

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 43 Users and Collaborations zAIDA spoken here! yIGUANA (CMS visualization) yGAUDI (LHCb/HARP) framework yATHENA (Atlas) framework yAnalyzer modules in Geant 4 yJAS yOpen Scientist y…you?

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, Software quality control

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 45 Software quality control zUsing tools for testing/checking has started yInsure++, CodeWizard zPackage dependencies: Ignominy ySet of perl and shell scripts by Lassi Tuura (CMS) yIgnominy scans… xMake dependency data produced by the compilers (*.d files) xSource code for #includes (resolved against the ones actually seen) xShared library dependencies (“ldd” output) xDefined and required symbols (“nm” output) yAnd maps… xSource code and binaries into packages x#include dependencies into package dependencies xUnresolved/defined symbols into package dependencies ignominy: dishonour, disgrace, shame; infamy; the condition of being in disgrace, etc. (Oxford English Dictionary)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 46 Ignominy Analysis of Anaphe zDistribution of tools and utilities for LHC era physics yCombination of commercial, free and HEP software yClaims to be a toolkit zSeems to live up to its toolkit claims yGood work on modularity yClean design is evident in many places yDependency diagrams often split naturally into functional units Thanks to Lassi Tuura (CMS)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 47 Package Metrics ySize = total amount of source code (not normalised across projects!) yACD = average component dependency (~ libraries linked in) yCCD = cumulative component dependency sum of single-package component dependencies over whole release yNCCD = Measure of CCD compared to a balanced binary tree xA good toolkit’s NCCD will be close to 1.0 x< 1.0: structure is flatter than a binary tree (= independent packages) x> 1.0: structure is more strongly coupled (vertical or cyclic) xAim: NCCD ~ 1 for given software/functionality Thanks to Lassi Tuura (CMS)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 48 Metrics: NCCD vs Cycles Toolkits & Frameworks ATLAS ORCA IGUANA COBRA G4 ROOT Thanks to Lassi Tuura (CMS) Includes Fortran NCCD (“spaghetti index”)  1.0: good toolkit < 1.0: indep. packages > 1.0: strongly-coupled NCCD (“spaghetti index”)  1.0: good toolkit < 1.0: indep. packages > 1.0: strongly-coupled Anaphe

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 49 Future enhancements zAccess to other implementations of components yHBOOK CWNtuples zReading of ROOT (> V3.0) files ysimilar to Tony Johnson’s (Java) RootIO package zAIDA Ntuple/Histo store yoptimized for Ntuples, Histograms as (compressed) XML zCommunication with Java tools/packages (JAS, Wired) yvia AIDA zAdding other “scripting” languages yPerl, Tcl, cint ?

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 50 Challenge: Distributed Computing zMotivation ymove code to data yparallel analysis zTechniques yservices via AI ylate binding yplug-in architecture zEnd-user (Lizard) ylook-and-feel of local analysis zR&D started and first prototype available soon yCORBA based

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 51 Summary zThe architecture of Anaphe shows some important items for flexible and modular data analysis: yweak coupling between components through use of Abstract Interface ybasic functionality is covered by individual C++ class libraries yemphasis on usability and maintainability zMajor criteria are flexibility, extensibility and interoperability yrecent example: GEANT-4 examples (based on AIDA) zLizard is an Interactive Data Analysis Tool based on Anaphe components and the Python scripting language (through SWIG) yLizard is young but has very solid base in mature Anaphe libraries yreal plug-in structure zSoftware quality control is important ytools help to optimize dependencies / minimize maintenance effort

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 52 More information zcern.ch/Anaphe zcern.ch/Anaphe/Lizard zaida.freehep.org/ zcern.ch/DB zwwwinfo.cern.ch/asd/lhc++/clhep/

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, Additional slides

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 54 Analysis of Geant4 zFairly large C++ project yVery fine-grained (and multi-level) package structuring ySeems quite clean from the preliminary analysis zFine package subdivision helps in many ways but makes analysis and code understanding more complicated zOne subsystem seems strongly coupled and needs attention zNeed to study the use of the internal command system Thanks to Lassi Tuura (CMS)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 55 Analysis of ROOT zROOT developers have done a formidable job of breaking binary (shared library) dependencies, but… yFor example: By static analysis, nothing seems to use the postscript package directly (no incoming dependencies), but there is this code: void TPad::Print (const char *filename, Option_t *option) { […] TVirtualPS *psave = gVirtualPS; if (gROOT->LoadClass("TPostScript","Postscript")) return; gROOT->ProcessLineFast("new TPostScript()"); gVirtualPS->Open(psname,pstype); gVirtualPS->SetBit(kPrintingPS); […] } yTaking these and global objects into account makes the dependency diagrams very different zSign of fast growth? Need a “next evolutionary step”? ySo “coherent” that replacing parts could get painful… Thanks to Lassi Tuura (CMS)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 56 Analysis of ROOT… Binary only Binary + Source + Logical = Real Thanks to Lassi Tuura (CMS)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 57 Metrics: NCCD vs ACD Toolkits & Frameworks ATLAS ORCA Anaphe IGUANA COBRA G4 ROOT Thanks to Lassi Tuura (CMS)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 58 Metrics: NCCD vs Size Toolkits & Frameworks ATLAS ORCA Anaphe IGUANA COBRA G4 ROOT Thanks to Lassi Tuura (CMS)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 59 Metrics: NCCD vs AID Toolkits & Frameworks ATLAS ORCA AnapheIGUANA COBRA G4 ROOT Thanks to Lassi Tuura (CMS)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 60 Metrics: Packages vs Size Toolkits & Frameworks ATLAS ORCA Anaphe IGUANA COBRA G4 ROOT Thanks to Lassi Tuura (CMS)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 61 Metrics: Packages vs Size Toolkits & Frameworks ATLAS ORCA Anaphe IGUANA COBRA G4 ROOT Thanks to Lassi Tuura (CMS)

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 62 Scripting in Lizard User Python Controller Shadow classes C++ interfaces C++ implementations Automatically generated by SWIG AIDA Interfaces Anaphe implementations

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 63 Software life cycle for LHC expts. zLHC starts ~ 2006 zat least 10 yr of running zadditionally at least 5 yr of data analysis

Genova 10-Dec-2001 Andreas Pfeiffer, CERN/IT-API, 64 Lifetime of LHC software = 25 yrs WWW SPS 1969 LEP 1989 W and Z 1983 LEP ends 2000 XML Linux V C Ethernet standar d 1983 IBM PC 1981 K&R C 1978 Unix V6 first public version 1975 Java 1995 Intel Pentium 1992