Metadata and Supporting Tools on Day One David Malon Argonne National Laboratory Argonne ATLAS Analysis Jamboree Chicago, Illinois 22 May 2009.

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

Anatoli Romaniouk, TRT IB meeting October 30 th 2013 TRT Tasks/Responsibilities/Manpow er issues. 1.
Releases & validation Simon George & Ricardo Goncalo Royal Holloway University of London HLT UK – RAL – 13 July 2009.
F22H1 Logic and Proof Week 7 Clausal Form and Resolution.
Ganga in the ATLAS Full Dress Rehearsal Birmingham 4th June 2008 Karl Harrison University of Birmingham - ATLAS Full Dress Rehearsal (FDR) involves loading.
Conditions and configuration metadata for the ATLAS experiment E J Gallas 1, S Albrand 2, J Fulachier 2, F Lambert 2, K E Pachal 1, J C L Tseng 1, Q Zhang.
29 July 2008Elizabeth Gallas1 An introduction to “TAG”s for ATLAS analysis Elizabeth Gallas Oxford Oxford ATLAS Physics Meeting Tuesday 29 July 2008.
CSC DQA and Commissioning Summary  We are responsible for the online and offline DQA for the CSC system, a US ATLAS responsibility  We are ready for.
AMI S.A. Datasets… Solveig Albrand. AMI S.A. A set is… A number of things grouped together according to a system of classification, or conceived as forming.
Objectives of the Lecture :
O FFLINE T RIGGER M ONITORING TDAQ Training 30 July 2010 On behalf of the Trigger Offline Monitoring Experts team.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
LHC: ATLAS Experiment meeting “Conditions” data challenge Elizabeth Gallas - Oxford - August 29, 2009 XLDB3.
30 Jan 2009Elizabeth Gallas1 Introduction to TAGs Elizabeth Gallas Oxford ATLAS-UK Distributed Computing Tutorial January 2009.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
Darren Price – HLT B-trigger offline status report :: B-Physics meeting July 23 rd ‘08Page 1 HLT B-trigger offline monitoring status Darren Price, LANCASTER.
ATLAS Detector Description Database Vakho Tsulaia University of Pittsburgh 3D workshop, CERN 14-Dec-2004.
An RTAG View of Event Collections, and Early Implementations David Malon ATLAS Database Group LHC Persistence Workshop 5 June 2002.
OFFLINE TRIGGER MONITORING TDAQ Training 5 th November 2010 Ricardo Gonçalo On behalf of the Trigger Offline Monitoring Experts team.
3rd November Richard Hawkings Luminosity, detector status and trigger - conditions database and meta-data issues  How we might apply the conditions.
Trigger Offline Expert Report Simon George (RHUL), Ricardo Gonçalo (RHUL) Trigger General Meeting – 25 th November 2009.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting.
Conditions Metadata for TAGs Elizabeth Gallas, (Ryan Buckingham, Jeff Tseng) - Oxford ATLAS Software & Computing Workshop CERN – April 19-23, 2010.
Karsten Köneke October 22 nd 2007 Ganga User Experience 1/9 Outline: Introduction What are we trying to do? Problems What are the problems? Conclusions.
Argonne Jamboree January 2010 Esteban Fullana AOD example analysis.
DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
A Flexible Distributed Event-level Metadata System for ATLAS David Malon*, Jack Cranshaw, Kristo Karr (Argonne), Julius Hrivnac, Arthur Schaffer (LAL Orsay)
Trigger Validation Olga Igonkina (U.Oregon), Ricardo Gonçalo (RHUL) on behalf of trigger community Physics Validation Meeting – Feb. 13, 2007.
1 Checks on SDD Data Piergiorgio Cerello, Francesco Prino, Melinda Siciliano.
K. Harrison CERN, 3rd March 2004 GANGA CONTRIBUTIONS TO ADA RELEASE IN MAY - Outline of Ganga project - Python support for AJDL - LCG analysis service.
AMI, Metadata, and Software Infrastructure David Malon 30 August 2010 ATLAS AMI and Metadata Workshop.
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
The ATLAS TAGs Database - Experiences and further developments Elisabeth Vinek, CERN & University of Vienna on behalf of the TAGs developers group.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
TAGS in the Analysis Model Jack Cranshaw, Argonne National Lab September 10, 2009.
Vincenzo Innocente, CERN/EPUser Collections1 Grid Scenarios in CMS Vincenzo Innocente CERN/EP Simulation, Reconstruction and Analysis scenarios.
Data processing Offline review Feb 2, Productions, tools and results Three basic types of processing RAW MC Trains/AODs I will go through these.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
ELSSISuite Services QIZHI ZHANG Argonne National Laboratory on behalf of the TAG developers group ATLAS Software and Computing Week, 4~8 April, 2011.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
Commissioning and run coordination CMS week Commissioning plenary 28-February 2007 DQM and monitoring workshop MandateGoals.
WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
I/O and Metadata Jack Cranshaw Argonne National Laboratory November 9, ATLAS Core Software TIM.
ATLAS Physics Analysis Framework James R. Catmore Lancaster University.
ATLAS TAGs: Tools from the ELSSI Suite Elizabeth Gallas - Oxford ATLAS-UK Distributed Computing Tutorial Edinburgh, UK – March 21-22, 2011.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Dario Barberis: ATLAS DB S&C Week – 3 December Oracle/Frontier and CondDB Consolidation Dario Barberis Genoa University/INFN.
Introduction 08/11/2007 Higgs WG – Trigger meeting Ricardo Gonçalo, RHUL.
Analysis Tools interface - configuration Wouter Verkerke Wouter Verkerke, NIKHEF 1.
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
Monitoring of L1Calo EM Efficiencies
SM Direct Photons meeting
David Adams Brookhaven National Laboratory September 28, 2006
AMI – Status November Solveig Albrand Jerome Fulachier
Readiness of ATLAS Computing - A personal view
HLT Jet and MET DQA Summary
Presentation transcript:

Metadata and Supporting Tools on Day One David Malon Argonne National Laboratory Argonne ATLAS Analysis Jamboree Chicago, Illinois 22 May 2009

Day One: “What will this day be like? I wonder.” 2 David MalonArgonne ATLAS Analysis Jamboree, 22 May 2009 N.B.: from “I Have Confidence,” The Sound of Music, Rodgers and Hammerstein

Disclaimer Rik asked me to say something about what the day one situation will be. Here are some guesses. The are optimistic without being unrealistic-- –but they do require that people get involved and make things happen. 3 David MalonArgonne ATLAS Analysis Jamboree, 22 May 2009

Finding data Identifying which datasets are the ESD, AOD, performance DPD for your stream will be straightforward –Metadata in AMI will improve substantially, but –Most people’s dataset selections will continue to be based upon well- known names, and –the early “Good Runs” will be well known, mentioned by number on Twikis and in meetings everywhere 4 David MalonArgonne ATLAS Analysis Jamboree, 22 May 2009

Quality and status information: versions Quality information will be updated continually Getting the first DQ flags from quasi-real-time monitoring by shifters will be in good shape Getting subsequent updates into the database reliably, consisently, timely will not yet be automatic Data Preparation/Data Quality coordinators will try to “tag” consistent versions of quality/status updates, but the process may at first look a bit like software release coordination Infrastructure for maintaining offline status/quality metadata will be limited and fragile –And cause painful problems for some of today’s presenters 5 David MalonArgonne ATLAS Analysis Jamboree, 22 May 2009

Good Run Lists Good Run Lists will be a reality, at least for some purposes –Not all the flags in the three-tier hierarchy will necessarily be filled –Some Good Run Lists may not quite be based upon DQ flags as proposed –Volunteers could certainly accelerate progress here Standard samples for a physics group, based upon Good Run Lists (explicitly or implicitly), will be okay at the D1PD level, but further derived data products (D2PD, D3PD) will give everyone headaches –Particularly when an expert user’s dataset needs to be “promoted” to become a production dataset 6 David MalonArgonne ATLAS Analysis Jamboree, 22 May 2009

Tools for luminosity calculation Tools like LumiCalc.py myDataFile.py –trigger=myTrigger will work under normal conditions –And under certain reasonable error conditions Bookkeeping for multi-stream analyses will be fragile and complex –And much of the complexity is inherent LumiCalc will work even when one begins with a TAG-based selection –At least if all the events in the selection can be retrieved, but the prospects for doing better are good Getting trustworthy lumi estimates, and getting them into the database, is another matter, but the tools to access the data for analysis will be in place 7 David MalonArgonne ATLAS Analysis Jamboree, 22 May 2009

Analysis tools and metadata Simple tools e.g., to cull your sample to Good Run Lists or to the latest quality updates, will be available –“Iterate through the events in my file, but skip any events that are (or are not) in this auxiliary list of Bad/Good {run, lumi block}s.” Augmenting your sample because lumi blocks that were previously “Yellow” have been changed to “Green” will be much harder 8 David MalonArgonne ATLAS Analysis Jamboree, 22 May 2009

Metadata chains and repositories Some metadata pipelines are today quite fragile and heterogeneous Particularly for globally distributed production: –Components to publish metadata from inside Athena –Components in the wrapper transforms to marshall metadata from many sources –Components in the “production system” to pass file-level, job-level, task-level metadata back to a repository –Repository infrastructure All of these need work. On Day One, the situation will be much improved –No more “we don’t know how many events are in that file” (?) –But a new core transform layer is needed, and it appears that it may not be ready unless new effort is found Probably already too late given release deadlines On the plus side, more and more physics metadata will be available in AMI 9 David MalonArgonne ATLAS Analysis Jamboree, 22 May 2009

Metadata from online to offline Replication from online COOL to offline COOL is relatively reliable Need therefore to ensure that what we need offline actually gets into online COOL on a more than a “we’ll take care of it when we have a chance” basis My biggest worry (perhaps needless) is getting the necessary trigger counts and deadtime-related information from the HLT processors through the trigger’s IS to the databases reliably –People are thinking about this, and something will be working Or maybe you don’t believe the HLT will be a player on Day One Some unmanaged practices that don’t matter in the trigger do matter offline –Maps of L1 triggers to CTP bits, L2/EF to chain counter numbers –Work is in progress to make this as stable as possible 10 David MalonArgonne ATLAS Analysis Jamboree, 22 May 2009

Metadata component integration Interfaces to select which runs and lumi blocks to query will be integrated with interfaces for event selection –“Which runs and lumi blocks” is usually a selection from a range of runs the ones that meet your quality and status criteria, AND for which the triggers you care about were part of the menu –“Which events” is usually based upon physics quantities and triggers passed and so on (information in the TAGs) –You will be able to use the TAG database to select, from the top group’s Good Run List Version 3.1, all the events that satisfy your trigger and physics cuts –Will be able to use a single interface to select your runs and lumi blocks, and then optionally to proceed with an event-level selection You will be able to see integrated luminosity and trigger counts and so on at the end of the first step 11 David MalonArgonne ATLAS Analysis Jamboree, 22 May 2009

Metadata access Access patterns to auxiliary non-event data from Tier 3 users are not well understood Already know that a surprising number of tasks require non-event data, for less than obvious reasons Most metadata needed for post-AOD analysis will be cached in the data files, but some tasks will need more –And you will always want to get the latest lumi estimates and quality information for your sample before hitting the “publish” button—you cannot assume that what is in your file is authoritative Some access at Tier 3s to metadata will be required –Probably will be extractable by a command-line utility into a local file –But this may require that the site allow outbound database connections, and that a remote site allow the incoming connections This is an occasional connection—one user connects one time for small-volume data even if she is running N jobs 12 David MalonArgonne ATLAS Analysis Jamboree, 22 May 2009

Event-level selection Event selections will properly carry and process provenance information For efficiency, TAGs will be writable into event data files (AOD, DPD) Forward pointing from TAGs (i.e., to data that had not yet been produced when the TAGs were written) will not be possible (yet) Replicas of the TAG database at sites others than CERN will not be fully functional –E.g., data may be there, but surrounding services may not 13 David MalonArgonne ATLAS Analysis Jamboree, 22 May 2009

Day one in an alternate universe… 14 David MalonArgonne ATLAS Analysis Jamboree, 22 May 2009 “Try not to look worried, David. The metadata are tracking this.” ATLAS Data ATLAS Data ATL AS D at a