Conference on High-Energy Physics Moscow, Russia, Common Software Pere Mato, CERN XXXIII International Conference on High-Energy Physics Moscow, Russia, July 26 - August 2, 2006
Foreword I will be focusing more on the software for LHC experiments This is simply because they are the experiments I know best “Common software” is the is the software that is used by at least two experiments In general, common software would be of a generic nature and non-specific to one experiment The borderline between generic and specific is somehow arbitrary It depends very much on the willingness of re-using (i.e. trusting) software developed by others and adapting own requirements to fit it. Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Outline Main software requirements Software structure Programming languages Non-HEP packages HEP generic packages Experiment’s software frameworks The LCG Applications Area Summary Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Main Software Requirements The new software being developed by the LHC experiments must cope with the unprecedented conditions and challenges that characterizes these experiments (trigger rate, data volumes, etc.) The software should not become the limiting factor for the trigger, detector performance and physics reach fro these experiments In spite of its complexity it should be easy-to-use Each one of the ~ 4000 LHC physicists (including people from remote/isolated countries, physicists who have built the detectors, software-old-fashioned senior physicists) should be able to run the software, modify part of it (reconstruction, ...), analyze the data, extract physics results Users demand simplicity (i.e. hiding complexity) and stability Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Processing Stages and Datasets event filter (selection & reconstruction) Event Summary Data (ESD) processed data detector batch physics analysis raw data Analysis Object Data (AOD) (extracted by physics topic) event reconstruction Raw data are staged to disk and archived on tape Data for subsequent processing (ESD, AOD) are cached on disk Production analysis typically involves skims and archiving according to event type for ease of replication and caching (department servers and desktop) Few cycles re-reconstruction – most data/cpu intensive Several cycles on production analysis Very many cycles on end user analysis – least data/cpu intensive event simulation individual physics analysis Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Software Structure Applications Applications are built on top of frameworks and implementing the required algorithms Event Det Desc. Calib. Every experiment has a framework for basic services and various specialized frameworks: event model, detector description, visualization, persistency, interactivity, simulation, calibrarion, etc. Experiment Framework Simulation Data Mngmt. Distrib. Analysis Specialized domains that are common among the experiments Core Libraries Core libraries and services that are widely used and provide basic functionality non-HEP specific software packages Many non-HEP libraries widely used Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Software Components Simulation Toolkits Foundation Libraries Event generators Detector simulation Statistical Analysis Tools Histograms, N-tuples Fitting Interactivity and User Interfaces GUI Scripting Interactive analysis Data Visualization and Graphics Event and Geometry displays Distributed Applications Parallel processing Grid computing Foundation Libraries Basic types Utility libraries System isolation libraries Mathematical Libraries Special functions Minimization, Random Numbers Data Organization Event Data Event Metadata (Event collections) Detector Conditions Data Data Management Tools Object Persistency Data Distribution and Replication Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Programming Languages Object-Oriented (O-O) programming languages have become the norm for developing the software for HEP experiments C++ is in use by (almost) all Experiments Pioneered by Babar and Run II (D0 and CDF) LHC experiments with an initial FORTRAN code base have basically completed the migration to C++ Large common software projects in C++ are in production for many years ROOT, Geant4, … FORTRAN still in use mainly by the MC generators Large developments efforts are put for the migration to C++ (Pythia8, Herwig++, Sherpa,…) Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Scripting Languages Scripting has been an essential component in the HEP analysis software for the last decades PAW macros (kumac) in the FORTRAN era C++ interpreter (CINT) in the C++ era Python recently introduced and gaining momentum Most of the statistical data analysis and final presentation is done with scripts Interactive analysis Rapid prototyping to test new ideas Scripts are also used to “configure” complex C++ programs developed and used by the experiments “Simulation” and “Reconstruction” programs with hundreds or thousands of options to configure Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Python Role Python language is interesting for two main reasons: High level programming language Simple, elegant, easy to learn language Ideal for rapid prototyping Used for scientific programming (www.scipy.org) Framework to “glue” different functionalities Any two pieces of software can be glued at runtime if they offer a Python interface A word of caution: Python is interpreted: not for computation Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Non-HEP Packages widely used in HEP Non-HEP specific functionality required by HEP programs can be implemented using existing packages Favoring free and open-source software About 30 packages are currently in use by the LHC experiments Here are some examples Boost Portable and free C++ source libraries intended to be widely useful and usable across a broad spectrum of applications GSL GNU Scientific Library Coin3D High-level 3D graphics toolkit for developing cross-platform real-time 3D visualization XercesC XML parser written in a portable subset of C++ non-HEP specific software packages Experiment Framework Applications Core Libraries Simulation Data Mngmt. Distrib. Analysis Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
HEP Generic Packages Foundation and Core Libraries MC Generators This is the best example of common code used by all the experiments Well defined functionality and fairly simple interfaces Detector Simulation Presented in form of toolkits/frameworks (Geant4, FLUKA) The user needs to input the geometry description, primary particles, user actions, etc. Data Persistency and Management To store and manage the data produced by experiments Data Visualization GUI, 2D and 3D graphics Distributed and Grid Analysis To support end-users using the distributed computing resources (PROOF, Ganga,…) non-HEP specific software packages Experiment Framework Applications Core Libraries Simulation Data Mngmt. Distrib. Analysis Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
ROOT - Core Libraries and Services ROOT provides the basic functionality needed by any application Used basically by all HEP experiments Current ROOT work packages BASE: Foundation and system classes, documentation and releases DICT: Reflexion system, meta classes, CINT and Python interpreters I/O: Basic I/O, trees, queries PROOF: parallel ROOT facility, xrootd MATH: Mathematical libraries, histogramming, fitting GUI: Graphical User interfaces and Object editors GRAPHICS: 2-D and 3-D graphics GEOM: Geometry system Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
ROOT - Core Integrating Elements The common application software should facilitate the integration of independently developed components to build a coherent application Dictionaries Dictionaries provide meta data information (reflection) to allow introspection and interaction of objects in a generic manner The ROOT strategy is to evolve to a single reflection system (Reflex) Scripting languages Interpreted languages are ideal for rapid prototyping They allow integration of independently developed software modules (software bus) Standardizing on CINT(C++) and Python scripting languages Component model and Plugin Management Modeling the application as components with well defined interfaces Loading the required functionality at runtime Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
ROOT: Strategic role of C++ Reflexion Object I/O Scripting (CINT, Python) Plug-in management etc. Python CINT Root meta C++ Reflex/Cint DS ROOT Reflex API rootcint -cint XDictcint.so rootcint -reflex X.h rootcint -gccxml Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
ROOT – Math libraries organization Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
PROOF – Parallel ROOT Facility PROOF aims to provide the necessary functionality that allows to run ROOT data analysis in parallel A major upgrade of the PROOF system has been started in 2005. The system is evolving from processing interactive short blocking queries to a system that also supports long running queries in a stateless client mode. Currently working with ALICE to get it deployed on the CERN CAF for the next data challenge Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
ROOT I/O ROOT provides support for object input/output from/to platform independent files The system is designed to be particularly efficient for objects frequently manipulated by physicists: histograms, ntuples, trees and events I/O is possible for any user class. Non-intrusive, only the class “dictionary” needs to be defined Extensive support for “schema evolution”. Class definitions are not immutable over the life-time of the experiment The ROOT I/O area is still moving after 10 years Recent additions: Full STL support, data compression, tree I/O from ASCII, tree indices, etc. All new experiments rely on ROOT I/O to store its data Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Persistency Framework FILES - based on ROOT I/O Targeted for complex data structure: event data, analysis data Management of object relationships: file catalogues Interface to Grid file catalogs and Grid file access Relational Databases – Oracle, MySQL, SQLite Suitable for conditions, calibration, alignment, detector description data - possibly produced by online systems Complex use cases and requirements, multiple ‘environments’ – difficult to be satisfied by a single solution Isolating applications from the database implementations with a standardized relational database interface facilitate the life of the application developers no change in the application to run in different environments encode “good practices” once for all Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
POOL - Persistency framework The POOL project is delivering a number of “products” POOL – Object and references persistency framework CORAL – Generic database access interface ORA – Mapping C++ objects into relational database COOL – Detector conditions database Object storage and references successfully used in large scale production in ATLAS, CMS, LHCb Need to focus on database access and deployment in Grid basically starting now Oracle SQLite MySQL ROOT I/O RDBMS STORAGE MGR COLLECTIONS FILE CATALOG POOL API USER CODE COOL API COOL CORAL Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
MC Generators Many MC generators and tools are available to the experiments provided by a strong community Each experiment chooses the tools more adequate for their physics Example: ATLAS alone uses currently Generators AcerMC: Zbb~, tt~, single top, tt~bb~, Wbb~ Alpgen (+ MLM matching): W+jets, Z+jets, QCD multijets Charbydis: black holes HERWIG: QCD multijets, Drell-Yan, SUSY... Hijing: Heavy Ions, Beam-gas.. MC@NLO: tt~, Drell-Yan, boson pair production Pythia: QCD multijets, B-physics, Higgs production... Decay packages TAUOLA: Interfaced to work with Pythia, Herwig and Sherpa, PHOTOS: Interfaced to work with Pythia, Herwig and Sherpa, EvtGen: Used in B-physics channels. Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Geant4 - Detector Simulation Geant4 has become an established tool, in production for the majority of LHC experiments during the past two years, and in use in many other HEP experiments and for applications in medical, space and other fields On going work in the physics validation Good example of common software LHCb : ~ 18 million volumes ALICE : ~3 million volumes Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Experiment Data Processing Frameworks Experiments develop Software Frameworks General Architecture of any Event processing applications (simulation, trigger, reconstruction, analysis, etc.) To achieve coherency and to facilitate software re-use Hide technical details to the end-user Physicists Help the Physicists to focus on their physics algorithms Applications are developed by customizing the Framework By the “composition” of elemental Algorithms to form complete applications Using third-party components wherever possible and configuring them ALICE: AliROOT; ATLAS+LHCb: Athena/Gaudi CMS: moved to a new framework recently non-HEP specific software packages Experiment Framework Applications Core Libraries Simulation Data Mngmt. Distrib. Analysis Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Example: The GAUDI Framework User “algorithms” consume event data from the “transient data store” with the help of “services” and “tools” with well defined interfaces and produce new data that is made available to other “algorithms”. Data can have various representations and “converters” take care of their transformation The GAUDI framework is used by LHCb, ATLAS, Harp, Glast, BES III Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Software Configuration Re-using existing software packages saves on development effort but complicates “software configuration” We need to hide this complexity A configuration is a combination of packages and versions that are coherent and compatible E.g. LHC experiments build their application software based on a given “LCG/AA configuration”, which is decided by the “architects” Interfaces to the experiments configuration systems (SCRAM, CMT) Concurrent different configurations are everyday situation Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Grid Deployment of Common Software The current model is that experiments take the responsibility of deploying ALL software packages (external, common packages and experiment) to the Grid Concurrent versions of packages need to be available to allow running applications based on different configurations For most of the packages the deployment is “trivial” Copy the shared library in the adequate place Some packages require coordination between areas for consistent external software configurations The current model is a problem for “small” experiments (VOs) that can not afford to have dedicated people for the deployment Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
LCG Applications Area The Applications Area is one of the six activity areas of the LHC Computing Project (LCG) that should deliver the common physics applications software for the LHC experiments The area is organized to ensure focus on real experiment needs Experiment-driven requirements and monitoring Architects in management and execution Open information flow and decision making Participation of experiment developers Frequent releases enabling iterative feedback Success is defined by adoption and validation of the developed products by the experiments Integration, evaluation, successful deployment Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Applications Area Organization MB LHCC Alice Atlas CMS LHCb Work plans Quarterly Reports Reviews Resources Chairs Architects Forum AA Manager Application Area Meeting Decisions LCG AA Projects SPI ROOT POOL SIMULATION WP2 WP1 WP1 Subproject 1 WP2 WP1 WP2 WP3 WP1 External Collaborations ROOT Geant4 EGEE Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
AA Projects SPI – Software process infrastructure (A. Pfeiffer) Software and development services: external libraries, savannah, software distribution, support for build, test, QA, etc. ROOT – Core Libraries and Services (R. Brun) Foundation class libraries, math libraries, framework services, dictionaries, scripting, GUI, graphics, SEAL libraries, etc. POOL – Persistency Framework (D. Duellmann) Storage manager, file catalogs, event collections, relational access layer, conditions database, etc. SIMU - Simulation project (G. Cosmo) Simulation framework, physics validation studies, MC event generators, Garfield, participation in Geant4, Fluka. Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH
Summary The next generation of software for experiments needs to cope with more stringent requirements and new challenging conditions The software should not be the limiting factor and should allow the physicists extract the best physics from the experiment The new software is more powerful but at the same time more complex Some techniques and tools allow us to integrate functionality developed independently into a single and coherent application Dictionaries, scripting languages, component models and plugin management Substantial effort is put in software configuration to provide stable and coherent set of software versions of the packages needed by the experiments The tendency is to push the line of what is called common software upwards LCG project is helping in this direction by organizing the requirements gathering, the development and the adoption by the experiments of the common software products Common Software, ICHEP’06, Moscow, 26.07-02.08, 2006 Pere Mato, CERN/PH