ALICE analysis preservation

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
open access portal Mihaela Gheata ALICE offline week 19 Nov 2014.
Trains status&tests M. Gheata. Train types run centrally FILTERING – Default trains for p-p and Pb-Pb, data and MC (4) Special configuration need to be.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
The ALICE Analysis Framework A.Gheata for ALICE Offline Collaboration 11/3/2008 ACAT'081A.Gheata – ALICE Analysis Framework.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Experience with analysis of TPC data Marian Ivanov.
LHC: ATLAS Experiment meeting “Conditions” data challenge Elizabeth Gallas - Oxford - August 29, 2009 XLDB3.
Level 3 Muon Software Paul Balm Muon Vertical Review May 22, 2000.
Introduction Advantages/ disadvantages Code examples Speed Summary Running on the AOD Analysis Platforms 1/11/2007 Andrew Mehta.
Analysis infrastructure/framework A collection of questions, observations, suggestions concerning analysis infrastructure and framework Compiled by Marco.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
LCG Generator Meeting, December 11 th 2003 Introduction to the LCG Generator Monthly Meeting.
Infrastructure for QA and automatic trending F. Bellini, M. Germain ALICE Offline Week, 19 th November 2014.
Vertex finding and B-Tagging for the ATLAS Inner Detector A.H. Wildauer Universität Innsbruck CERN ATLAS Computing Group on behalf of the ATLAS collaboration.
PWG3 Analysis: status, experience, requests Andrea Dainese on behalf of PWG3 ALICE Offline Week, CERN, Andrea Dainese 1.
Andrei Gheata, Mihaela Gheata, Andreas Morsch ALICE offline week, 5-9 July 2010.
Analysis trains – Status & experience from operation Mihaela Gheata.
Status of OPAL data Matthias Schröder
Artemis School On Calibration and Performance of ATLAS Detectors Jörg Stelzer / David Berge.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
HLT/AliRoot integration C.Cheshkov, P.Hristov 2/06/2005 ALICE Offline Week.
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
Claudio Grandi INFN-Bologna CHEP 2000Abstract B 029 Object Oriented simulation of the Level 1 Trigger system of a CMS muon chamber Claudio Grandi INFN-Bologna.
The ATLAS TAGs Database - Experiences and further developments Elisabeth Vinek, CERN & University of Vienna on behalf of the TAGs developers group.
A. Gheata, ALICE offline week March 09 Status of the analysis framework.
M. Oldenburg GridPP Metadata Workshop — July 4–7 2006, Oxford University 1 Markus Oldenburg GridPP Metadata Workshop July 4–7 2006, Oxford University ALICE.
AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.
1 Offline Week, October 28 th 2009 PWG3-Muon: Analysis Status From ESD to AOD:  inclusion of MC branch in the AOD  standard AOD creation for PDC09 files.
Javier Castillo 1 Muon Embedding Status & Open Issues PWG3 - CERN - 15/02/2011.
Data processing Offline review Feb 2, Productions, tools and results Three basic types of processing RAW MC Trains/AODs I will go through these.
M. Gheata ALICE offline week, October Current train wagons GroupAOD producersWork on ESD input Work on AOD input PWG PWG31 (vertexing)2 (+
LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.
Analysis train M.Gheata ALICE offline week, 17 March '09.
M. Gheata ALICE offline week, 24 June  A new analysis train macro was designed for production  /ANALYSIS/macros/AnalysisTrainNew.C /ANALYSIS/macros/AnalysisTrainNew.C.
ATLAS Data preservation April 2015 Roger Jones for the ATLAS Collaboration.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
V5-01-Release & v5-02-Release Peter Hristov 20/02/2012.
D. Elia (INFN Bari)Offline week / CERN Status of the SPD Offline Domenico Elia (INFN Bari) Overview:  Response simulation (timing info, dead/noisy.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
AliRoot survey: Calibration P.Hristov 11/06/2013.
MAUS Status A. Dobbs CM43 29 th October Contents MAUS Overview Infrastructure Geometry and CDB Detector Updates CKOV EMR KL TOF Tracker Global Tracking.
1 GlueX Software Oct. 21, 2004 D. Lawrence, JLab.
DAQ thoughts about upgrade 11/07/2012
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
Monthly video-conference, 18/12/2003 P.Hristov1 Preparation for physics data challenge'04 P.Hristov Alice monthly off-line video-conference December 18,
CALIBRATION: PREPARATION FOR RUN2 ALICE Offline Week, 25 June 2014 C. Zampolli.
QC-specific database(s) vs aggregated data database(s) Outline
Calibration: preparation for pa
Data Formats and Impact on Federated Access
Jacek Otwinowski (Data Preparation Group)
ALICE experience with ROOT I/O
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Analysis trains – Status & experience from operation
CMS High Level Trigger Configuration Management
Analysis tools in ALICE
Developments of the PWG3 muon analysis code
DPG Activities DPG Session, ALICE Monthly Mini Week
New calibration strategy – follow-up from the production review
Experience in ALICE – Analysis Framework and Train
Analysis Trains - Reloaded
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Commissioning of the ALICE HLT, TPC and PHOS systems
Data Preparation Group Summary of the Activities
Analysis framework - status
ALICE Computing Upgrade Predrag Buncic
2 Getting Started.
ATLAS DC2 & Continuous production
Offline framework for conditions data
Presentation transcript:

ALICE analysis preservation Mihaela Gheata DASPOS/DPHEP7 workshop

Outline ALICE data flow ALICE analysis Data & software preservation Open access and sharing analysis tools Conclusions

ALICE data flow ONLINE OFFLINE ALIEN + FILE CATALOG OCDB DETECTOR ALGORITHMS RAW ROOTIFY CALIBRATION RECONSTRUCTION FILTERING QUALITY ASSURANCE ANALYSIS OCDB Calibration data AODB (analysis database) SHUTTLE Detector algorithms ALIEN + FILE CATALOG results.root AliAOD.root RAW.root AliESDs.root

Filtering analysis input data Event summary data (ESD) produced by reconstruction Can be used as input by analysis Not efficient for distributed data processing (big size, a lot of info not relevant for analysis) Filtering is the procedure to create a lighter event format (AOD) starting from ESD data Reducing the size by a factor of >3 AOD files contain events organized in ROOT trees Only events coming from physics triggers Additional filters for specific physics channels (muons, jets, vertexing) – producing AOD tree friends (adding branches) Big files (up to 1GB) First filtering runs in the same process with reconstruction Filtering can be redone any time on demand Backward compatibility: any old AOD file can be reprocessed with the current version

Base ingredients for analysis Code describing the AOD data format AOD event model (AliAODEvent) class library, containing tracks, vertices, cascades, PID probabilities, centrality, trigger decisions or physics selection flags AOD data sets Filtered reconstructed events embedding best knowledge of detector calibration Classified by year, collision type, accelerator period, ... Run conditions table (RCT) All data relevant for the quality and conditions for each individual run and per detector The base for quality-based data selections Analysis database (OADB) Containing "calibrated" objects used by critical analysis services (centrality, particle identification, trigger analysis) + analysis specific configurations (cuts) The infrastructure allowing to run distributed analysis GRID, local clusters with access to data, PROOF farms, ... Monte Carlo AOD's produced from simulated data Contains both the reconstructed information and the MC truth (kinematics) Almost 1/1 ratio with real data AODs, associated with anchor runs for each data set

What will we need to process data after 10+ years? ROOT library A minimal versioned lightweight library with the data format and analysis framework Allowing to process the ROOT trees in the AOD files The RCT and cross-links from the ALICE logbook (snapshot) Relevant AOD datasets (data & MC), produced by filtering most recent reconstruction passes, with the corresponding versioned OADB Relevant documentation, analysis examples and processing tools

Software preservation GPL or LGPL SW based on ROOT Assuming that ROOT will still "exist" and provide similar functionality (such as schema evolution) in 10-20 years Single SVN repository for the experiment code External packages: ROOT, GEANT3/4, event generators, AliEn GRID Several work packages bundled in separate libraries (per detector or analysis groups) Several production tags with documented changes Multiple platform support The documentation describes clearly the analysis procedure The provided documentation is generally enough for a newcomer doing analysis. The problems show up after the software is not maintained anymore

Data preservation for analysis If reprocessing is needed (improved reconstruction algorithms, major bug fixes, AOD data loss), primary data is needed Raw data, calibration data, conditions metadata Most recent AOD datasets The metadata related to analysis and run conditions Essential for dataset selection

ALICE analysis shareable tools ~ 3000 jobs running in average at any given time last year ~ 15-20% of used resources for organized analysis ~20000 jobs per train Between 10 and 100 wagons ~40 trains Specific framework allowing to process main data types (AOD, ESD, MC kinematics) Organized in "trains" processing common data LEGO-like analysis modules created by users via a simple web interface ALICE internal usage, too complex for outreach

Level 3 preservation and open access A replication of the current analysis LEGO services on dedicated resources is possible Technically feasible in short time (on clusters or clouds) The only problems are related to partitioning data and services for external usage (in case of sharing GRID with ALICE users) The procedure to process ALICE data very well documented Several examples and walkthroughs, tutorials External code can be compiled on the fly in our framework Simple examples on how to pack in tarballs The resources to be assigned for open access still to be assessed by management Like other important details: when, which data, procedure, access control, outcome validation, publication rules...

Data access and sharing No external sharing foreseen for RAW, calibration and conditions data For analysis formats, a simplification is considered Both for data format and processing SW Sharing of data started for level 2 (small samples used for outreach) Not yet for other levels To be addressed during the SW upgrade process Time for public data and SW releases not defined at this point

Conclusions ALICE is aware on the importance of long term data preservation A medium to long plan to be accommodated in the ambitious upgrade projects We are already investigating technical solutions allowing to reproduce results and re-run analysis in 10+ years Using existing tools, but also others (e.g. virtualization technology) Sharing of data is possible, but the implementation depends on the availability of resources Practical sharing conditions not considered yet

Backup slides

Raw and calibration data Raw files are "rootified" on the global data concentrators Copied in CASTOR then replicated in T1 - tape /alice/data/<yyyy>/<period>/<run%09>/raw/<yy><run%09><GDC_no>.<seq>.root All runs copied, but not all reconstructed Some runs marked as bad The ALICE logbook contains metadata and important bservations related to data taking for each run Primary calibration data extracted from online by the "Shuttle" program Running detector algorithms (DA) and copying calibration to OCDB

Calibration strategy – CPass0 + CPass1 Chunk 0 Chunk 1 Chunk N Chunk 0 Chunk 1 Chunk N … … Alien job Alien job Alien job Alien job Alien job Alien job Alien job Alien job OCDB snapshot snapshot reconstruction reconstruction ESDs ESDs Calib. train Calib. train QA train Merging + OCDB entires Merging + OCDB entires OCDB Update OCDB Update

Reconstruction Input: raw data + OCDB Output: event summary data (ESD) Copied in at least 2 locations (disk) Content: ALICE event model in ROOT trees Events, tracks with extended PID info, vertices, cascades, V0's, ... Trigger information and physics selection masks, detector specific information (clusters, tracklets) ESD format used as input by filtering procedures and analysis ESD files can always be reproduced using latest software+calibration -> reconstruction passes Backward compatibility (old data always readable)