Metadata in ATLAS Elizabeth Gallas – Oxford 2nd ATLAS-South Caucasus Software/Computing Workshop & Tutorial October 26, 2012.

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

Configuration management
Alternate Software Development Methodologies
The Latest news … and Future of ATLAS Databases Elizabeth Gallas - Oxford ATLAS Software & Computing Workshop CERN November 29 to December 3, 2010.
Conditions and configuration metadata for the ATLAS experiment E J Gallas 1, S Albrand 2, J Fulachier 2, F Lambert 2, K E Pachal 1, J C L Tseng 1, Q Zhang.
ATLAS Databases: An Overview, Athena use of Geometry/Conditions DB, and Conditions Metadata Elizabeth Gallas - Oxford ATLAS-UK Distributed Computing Tutorial.
Requirements Specification
Conditions Metadata for TAGs Elizabeth Gallas, (Ryan Buckingham, Jeff Tseng) - Oxford ATLAS Software & Computing Workshop CERN – April 19-23, 2010.
29 July 2008Elizabeth Gallas1 An introduction to “TAG”s for ATLAS analysis Elizabeth Gallas Oxford Oxford ATLAS Physics Meeting Tuesday 29 July 2008.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
This presentation will guide you though the initial stages of installation, through to producing your first report Click your mouse to advance the presentation.
Section 15.1 Identify Webmastering tasks Identify Web server maintenance techniques Describe the importance of backups Section 15.2 Identify guidelines.
ATLAS : File and Dataset Metadata Collection and Use S Albrand 1, J Fulachier 1, E J Gallas 2, F Lambert 1 1. Introduction The ATLAS dataset search catalogs.
ATLAS Data Periods in COMA Elizabeth Gallas - Oxford ATLAS Software and Computing Week CERN – April 4-8, 2011.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
Finding Information: Metadata in ATLAS Elizabeth Gallas – Oxford ATLAS UK: Software Session Lancaster, UK January 9, 2013.
11/10/2015S.A.1 Searches for data using AMI October 2010 Solveig Albrand.
Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.
Alignment Strategy for ATLAS: Detector Description and Database Issues
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
LHC: ATLAS Experiment meeting “Conditions” data challenge Elizabeth Gallas - Oxford - August 29, 2009 XLDB3.
30 Jan 2009Elizabeth Gallas1 Introduction to TAGs Elizabeth Gallas Oxford ATLAS-UK Distributed Computing Tutorial January 2009.
DEPICT: DiscovEring Patterns and InteraCTions in databases A tool for testing data-intensive systems.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
July 10, 2006ElizabethGallas1 Luminosity Database Elizabeth Gallas Fermilab Computing Division / D0 Computing and Analysis Group D0 Database ‘Taking Stock’
ATLAS Detector Description Database Vakho Tsulaia University of Pittsburgh 3D workshop, CERN 14-Dec-2004.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
Metadata Mòrag Burgon-Lyon University of Glasgow.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
AMI 30/11/2015S.A.1 AMI – Status April Solveig Albrand Jerome Fulachier Fabian Lambert.
Metadata and COMA Elizabeth Gallas Oxford. ATLAS Weekly / Open EB Meeting October 18, 2011.
Conditions Metadata for TAGs Elizabeth Gallas, (Ryan Buckingham, Jeff Tseng) - Oxford ATLAS Software & Computing Workshop CERN – April 19-23, 2010.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
TSS Database Inventory. CIRA has… Received and imported the 2002 and 2018 modeling data Decided to initially store only IMPROVE site-specific data Decided.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
New COOL Tag Browser Release 10 Giorgi BATIASHVILI Georgian Engineering Center 23/10/2012
Nov 1, 2002D0 DB Taking Stock1 Trigger Database Status and Plans Elizabeth Gallas – FNAL CD (with recent help from Jeremy Simmons, John Weigand, and Adam.
Software Reuse Course: # The Johns-Hopkins University Montgomery County Campus Fall 2000 Session 4 Lecture # 3 - September 28, 2004.
David Adams ATLAS ATLAS Distributed Analysis: Overview David Adams BNL December 8, 2004 Distributed Analysis working group ATLAS software workshop.
TAG Metadata Browser (a conceptual implementation in development) Elizabeth Gallas - Oxford - March 4, 2009 ATLAS TAG Developers meeting.
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
November 1, 2004 ElizabethGallas -- D0 Luminosity Db 1 D0 Luminosity Database: Checklist for Production Elizabeth Gallas Fermilab Computing Division /
User Guide to Trigger/Prescale information in COMA runBrowserReport Elizabeth Gallas - Oxford December, 2010.
ATLAS Metadata Interface Campaign Definition in AMI S.Albrand 23/02/2016ATLAS Metadata Interface1.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
TAGS in the Analysis Model Jack Cranshaw, Argonne National Lab September 10, 2009.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
Utility of collecting metadata to manage a large scale conditions database in ATLAS Elizabeth Gallas, Solveig Albrand, Mikhail Borodin, and Andrea Formica.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
ELSSISuite Services QIZHI ZHANG Argonne National Laboratory on behalf of the TAG developers group ATLAS Software and Computing Week, 4~8 April, 2011.
Level 1-2 Trigger Data Base development Current status and overview Myron Campbell, Alexei Varganov, Stephen Miller University of Michigan August 17, 2000.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
Conditions Metadata for TAGs Elizabeth Gallas, (Ryan Buckingham, Jeff Tseng) - Oxford ATLAS Software & Computing Workshop CERN – April 19-23, 2010.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
Future of Distributed Production in US Facilities Kaushik De Univ. of Texas at Arlington US ATLAS Distributed Facility Workshop, Santa Cruz November 13,
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
ATLAS TAGs: Tools from the ELSSI Suite Elizabeth Gallas - Oxford ATLAS-UK Distributed Computing Tutorial Edinburgh, UK – March 21-22, 2011.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
AMI – Status November Solveig Albrand Jerome Fulachier
Readiness of ATLAS Computing - A personal view
Metadata Editor Introduction
Chapter 4 Entity Relationship (ER) Modeling
Elizabeth Gallas, Solveig Albrand, Mikhail Borodin, and Andrea Formica
ATLAS TAGs: Tools from the ELSSI Suite
Presentation transcript:

Metadata in ATLAS Elizabeth Gallas – Oxford 2nd ATLAS-South Caucasus Software/Computing Workshop & Tutorial October 26, 2012

 Metadata  What is it ? Challenges in ATLAS  Survey some user oriented systems using Metadata  Describe a few areas: metadata in evolution  Dataset Nomenclature, PhysicsShort, AMI tags  Delicate balance between constraints and usability  Focus on Datasets containing events in this talk  Transforms and metadata  AMI Hierarchical Search  New interface … aim to help metadata issues in MC  COMA Content evolution and aggregation  Runs, Periods, and Conditions DB management metadata  Summary and Conclusions Metadata … Outline Oct 2012E Gallas / Metadata2

 What is metadata ?  Concisely:  “data about data”  More precisely:  “data used to describe the context, content or structure of data”  Structural or Descriptive  Metadata: used extensively in ATLAS … in fact, no process in ATLAS doesn’t use metadata … examples range from:  Upstream: data taking with the correct calibrations …  Downstream: physicist finding Events of interest …  Metadata challenges:  Size/Scope of ATLAS data … Volume/Diversity of metadata  Data/metadata: has grown organically as the experiment evolved  Try to offer a coherent / integrated view to physicists while devising strategic placement for processing and analysis  Impossible to inventory all ATLAS Metadata in reasonable time  Today: highlight a few areas of user metadata which play important roles in ATLAS which are currently evolving  Note: unable to cover many areas Metadata ? Oct 2012E Gallas / Metadata3

 Subsystem specific metadata: driven by subsystem specific needs  Trigger: TriggerTool … Web: trigconf and trigger timelines  Geometry DB: Detector Description Browser  Conditions DB:  runQuery (Run information from Conditions DB)  ATLAS WEB DQ  COOL Tag Browser  Data Summary Reports (Luminosity, Beam)  Beam Spot Summary  GANGA and PAthena  Panda / monitor  DQ2 Client  Tag Collector – software releases ... (not a complete list !)  Dedicated Metadata Catalogs  TAGs (and TAG Catalog) – event level metadata  iELSSI and Suite of TAG Services  AMI – Datasets, processing … other metadata  And the AMI Suite of services  COMA – Run/LB level Conditions and configuration  Plus Conditions DB management metadata  Important metadata facilitator: ATLAS Job Transforms ATLAS Metadata User Application Overview Oct 2012E Gallas / Metadata4

 Dataset names used extensively:  Storage and operating systems, DDM, ProdSys, Metadata repositories  But needs to be mnemonic from user point of view  Dataset naming rules:  Carefully defined by experts, evolved somewhat, has served us well  But was last updated in 2010 … needs of ATLAS have grown  2012 Task Force formed to try to amend the rules to address these needs   Overall length < 231 characters (base directory name): Hard limit !  If each field at field limit, overall limit is exceeded !  Many pressures on component lengths … highest areas of concern:  “physicsShort” – for MC datasets  AMI tag – for both data and MC  Importance of Name: coherence must be understood at all ATLAS levels  From Management to Users … and sometimes limits are good  Keep a rational balance !!! Dataset Names Oct 2012E Gallas / Metadata5 Project.datasetNumber.physicsShort.productionStep.dataType.AMITag[/] Project.runNumber.streamType.productionStep.dataType.AMItag[/] dataNN_* or mcNN_* ESD, AOD, … Concatenation of configurations

 Example of one proposed “Physics Short”: MadgraphPythia8_NNPDF21NLOME_AU2NNPDF21LOMPI_SingleTopTChanWelenu_LeptonFilter  Rules: physicsShort field must not exceed 40 characters.”  This one: 78 characters (and is it really user friendly ?)  This kind of ‘growth’ is oblivious to the rules, shows addiction of experts/users to depending entirely on the Dataset Name to identify/find their data  General frustration  finding MC needed, Twiki pages, understanding the MC they use, and identifying additional MC samples they need or what exists …  Jamie Boyd: “General feeling is this level of info should be encoded in AMI rather than the filename – need to follow up with generators group on this”  Progress in 2012:  “Simulation Metadata Workshop” – held in April 2012  Commendable effort by MC Coordination: add more metadata to AMI  Metadata systems need to provide better tools which  Better explains relays the metadata behind the dataset AND  Better allows browsing of the datasets and the metadata “PhysicsShort” for MC Oct 2012E Gallas / Metadata6 Project.datasetNumber.physicsShort.productionStep.dataType.AMITag[/]

 AMI: ‘ATLAS Metadata Interface’  AMI Portal Page:  Means of finding datasets to use in analysis using physics metadata predicates  Not just dataset names, but also the underlying metadata  AMI contains a LOT more than just a list of datasets  Dataset provenance  Files  Lost Lumi blocks  Links to other applications  Nomenclature reference tables  New AMI interface: Dataset ‘browsing’ (hierarchical search)  Now available to users (first version !)  Good feedback from users … important for evolution  AMI team working on refining this tool based on feedback AMI Introduction Oct 2012E Gallas / Metadata7

 User is guided to the AMI catalog specific to the project of interest  Information varies according to project  Allows users progressive selection to iteratively narrow result set AMI Dataset Browsing Oct 2012E Gallas / Metadata8

 The AMI Tag:  Definition:  Is composed of concatenated strings encoding processing steps  Example: r2713_p705 … encodes information about  which ATLAS releases ( )  which database releases ( )  which transforms (reco_trf.py), job configurations, …  Why is it called the “AMI tag” ?  AMI provides interfaces for its interpretation  Rules for AMI tags also listed in Nomenclature doc  Original specification now also needs revision  Max length sometimes exceeds limit (22) – multiple factors driving this …  Highlight some issues to be addressed:  Running out of lower case letters  Numeric parts … require more characters (99 … 999 … 9999 ….?)  More processing/merging steps: add more/more fields  Must find a way to consolidate steps in a managed way  These issues also being discussed in the Task Force  Solveig Albrand: evolving document describing issues/possibilities  AMI (“Config”) Tags Oct 2012E Gallas / Metadata9

Transforms and Metadata  “ATLAS Transforms”: a wrapper to Athena & python job options  Thanks to the Transform Group !  Graeme Stewart, Stephen Beal, Thomas Gadfort, Harvey Maddocks, Bjorn Sarrazin  See Graeme’s talk during Software week (last week)  Required, for example, by the ATLAS production system  Provides uniform, coherent mechanisms for specifying, executing tasks  Even multi-step transforms  New Transforms:  General merging capabilities  Also need for the merging of file based metadata  Provide important computations  Such as Event counts  Bridges the gap in metadata communication  uniform information transfer to other systems and metadata repositories Oct 2012E Gallas / Metadata10

E Gallas / Metadata What is COMA ?  COMA: “Conditions/Configuration Metadata for ATLAS”  Originally: built to support dynamic queries of TAG DB  Evolved: into a standalone system with interfaces  Now part of general effort to consolidate/relate ATLAS Metadata  COMA Components:  Relational Database  Contains extracted, refined, reduced, and derived information from system specific data sources (e.g. Conditions DB, …) plus information from non-database sources.  Interfaces  A set of unique Browsers and Reports  Special relationship to AMI  Enhances COMA in many ways (e.g. Data Periods);  pyAMI methods make COMA information available to many systems.  Main Documentation  COMA Portal (grid certificate in browser is required) Oct

 A Data Period is a set of ATLAS Runs grouped for a purpose  Defined by Data Preparation Coordinators  Used in ATLAS data processing, assessment, and selection …  Each Period uniquely defined with a combination of  Project name (i.e. ‘data10_7TeV’)  Period name (i.e. ‘C1’, ‘C2’, ‘C’, ‘AllYear’ …)  Before 2011, Data Periods were  Described on TWiki page   Stored in a file based system  Edited by hand by Data Prep Coordination (experts)  Structure evolved over 2010 with experience  This experience  valuable to decide/define long term solution  In 2011: Data Periods stored in the COMA DB  Coordination/Effort: Data Prep, AMI, COMA experts  Since then, COMA has enhanced content in many areas  Allows for more details reports and information to other systems  Enables aggregation of this information by Run, … Period. COMA: ATLAS Data Periods … + aggregating new content Oct 2012E Gallas / Metadata12

Shows “at a glance”: the latest Period Runs with Magnet states, ‘ready fraction’, link to Stable Beam fill(s), beam information … Oct 2012E Gallas / Metadata13 COMA multi-Run report: Latest 8 runs

New aggregated information Oct 2012E Gallas / Metadata14 COMA Period Documentation Report: enhanced content

Conditions DB Management Metadata ATLAS Conditions Database: Non-event wise conditions and configuration information  Designed for primary purpose, extensive documentation  Processing of event-wise data  IOV (interval of validity) based structure is ideal  To non-experts  Mysterious, cryptic, specialized  What’s a COOL Global Tag ???  DB and COOL Tag Coordination need overview  Storage, tools based on individual schemas: make Overview difficult  Extra part of COMA: COOL Management Metadata (for Folders & Tags)  July 2011 SW week - Discussion with Lasha Sharmazanashvili  Then with Misha Borodin, Andrea Formica, Paul Laycock … other experts  NOT intended to replace existing Storage or tools  Makes possible:  Quicker access to overview information from external systems  Web based reports  A different view and can offer links to other metadata tools  Since then  In production operation (daily synchronization)  Other applications start development to incorporate it (Such as COOL Tag Browser)  Now working on:  Improving content and interfaces  Speeding up synchronization  Possible incorporation of “Current”, “Next” … COOL Tags into COMA (move from AFS files) Oct 2012E Gallas / Metadata15

E Gallas / Metadata COMA Conditions DB Folder Browser Menu  Enter criteria into textbox at left:  Type manually or  Click on options at right or re-generates Menu applying selection  Choose for the Global Report  Choose for the Folder Report Oct

E Gallas / Metadata COMA (Multi) Global Tag Report  COMA_CBAMI_GTAGS table  AMI Team has populated with counts showing Global Tag usage in dataset processing Oct

 A lot of progress in many areas using metadata:  Transforms, Data Processing, Dataset related metadata  Dedicated Metadata Catalogs: AMI, COMA, (TAGs)  Metadata in ATLAS continues to evolve  Naming conventions/rules  Important to form coherent view over datasets, runs, periods, …  Increased cooperation between systems  Upstream and downstream  Use cases continue to expand  Improvements in metadata  Storage  Consistency  Delivery  Usage  Challenges ahead  Offer coherence at Management and User levels  To keep pace with  system evolution (such as DDM  Rucio, ProdSys, … upgrades)  Analysis pattern evolution and use cases Summary and Conclusions E Gallas / MetadataOct

Oct 2012E Gallas / Metadata19 Backup

 Motivated by: COOL Tag Coordination, Upcoming reprocessing, LS1  Significant new “Expert level” selection criteria:  allows viewing of Folders, Folder Tags, and Global tags by, for example, creation dates (or ranges) by themselves or in combination with any of the other criteria.  New selection criteria: Open Expert Section of the Folder Browser Menu  Example links: Global Tags created since March 2012: CBAction=GlobalTagReport&cbgtdate= Global Tags created in the last 30 days: CBAction=GlobalTagReport&cbgtdate=latest30 Global Tags for MC: CBAction=GlobalTagReport&cbgt=OFLCOND* Global Tags for real data: CBAction=GlobalTagReport&cbgt=COMCOND* Oct 2012E Gallas / Metadata20 COMA COOL Folder&Tag Metadata: Report improvements