AMI, Metadata, and Software Infrastructure David Malon 30 August 2010 ATLAS AMI and Metadata Workshop.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Configuration management
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
Gorilla Systems Engineering versus Guerilla Systems Engineering Keith A. Taggart, PhD James Willis Steve Dam, PhD Presented to the INCOSE SE DC Meeting,
Metadata Considerations for ATLAS Distributed Computing David Malon Argonne National Laboratory ADC Technical Interchange Meeting University.
Publishing Workflow for InDesign Import/Export of XML
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Basic Input/Output Operations
Software Documentation Written By: Ian Sommerville Presentation By: Stephen Lopez-Couto.
AMI S.A. Datasets… Solveig Albrand. AMI S.A. A set is… A number of things grouped together according to a system of classification, or conceived as forming.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
Persistence Technology and I/O Framework Evolution Planning David Malon Argonne National Laboratory 18 July 2011.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
Process Design (Requirements). Recall the Four Steps of Problem Solving * Orient Plan Execute Test These apply to any kind of problem, not just spreadsheet.
Team Skill 6: Building the Right System From Use Cases to Implementation (25)
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
Software Engineering General architecture. Architectural components:  Program organisation overview Major building blocks in a system Definition of each.
Configuration Management (CM)
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
The european ITM Task Force data structure F. Imbeaux.
New perfSonar Dashboard Andy Lake, Tom Wlodek. What is the dashboard? I assume that everybody is familiar with the “old dashboard”:
ATLAS Detector Description Database Vakho Tsulaia University of Pittsburgh 3D workshop, CERN 14-Dec-2004.
David Adams ATLAS Architecture for ATLAS Distributed Analysis David Adams BNL March 25, 2004 ATLAS Distributed Analysis Meeting.
An RTAG View of Event Collections, and Early Implementations David Malon ATLAS Database Group LHC Persistence Workshop 5 June 2002.
Instructore: Tasneem Darwish1 University of Palestine Faculty of Applied Engineering and Urban Planning Software Engineering Department Requirement engineering.
A Framework-oriented Bridge to Metadata Discussions David Malon, Jack Cranshaw, Peter van Gemmeren, Marcin Nowak, Alexandre Vaniachine US ATLAS Technical.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
Software Testing and Quality Assurance Practical Considerations (4) 1.
ECE450 - Software Engineering II1 ECE450 – Software Engineering II Today: Introduction to Software Architecture.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
VdM analysis framework status and plans 14 April 2015 Monika Grothe (Wisconsin) 1 14 April 2015 VdM analysis Framework status.
Analyzing Systems Using Data Dictionaries Systems Analysis and Design, 8e Kendall & Kendall 8.
GDB Meeting - 10 June 2003 ATLAS Offline Software David R. Quarrie Lawrence Berkeley National Laboratory
Karsten Köneke October 22 nd 2007 Ganga User Experience 1/9 Outline: Introduction What are we trying to do? Problems What are the problems? Conclusions.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
I/O Infrastructure Support and Development David Malon ATLAS Software Technical Interchange Meeting 9 November 2015.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
CSC 480 Software Engineering Test Planning. Test Cases and Test Plans A test case is an explicit set of instructions designed to detect a particular class.
Winter 2011SEG Chapter 11 Chapter 1 (Part 1) Review from previous courses Subject 1: The Software Development Process.
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
David Adams ATLAS ATLAS Distributed Analysis: Overview David Adams BNL December 8, 2004 Distributed Analysis working group ATLAS software workshop.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
ATLAS Metadata Interface Campaign Definition in AMI S.Albrand 23/02/2016ATLAS Metadata Interface1.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
TAGS in the Analysis Model Jack Cranshaw, Argonne National Lab September 10, 2009.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
Summary of User Requirements for Calibration and Alignment Database Magali Gruwé CERN PH/AIP ALICE Offline Week Alignment and Calibration Workshop February.
JavaScript Introduction and Background. 2 Web languages Three formal languages HTML JavaScript CSS Three different tasks Document description Client-side.
Summary of persistence discussions with LHCb and LCG/IT POOL team David Malon Argonne National Laboratory Joint ATLAS, LHCb, LCG/IT meeting.
Alexei Klimentov. December 2, 2010 SW&C workshop. Database Session. December, 2010 ATLAS Metadata Handling and AMI Wokshop Highlights Alexei Klimentov.
Introduction to JavaScript MIS 3502, Spring 2016 Jeremy Shafer Department of MIS Fox School of Business Temple University 2/2/2016.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
I/O and Metadata Jack Cranshaw Argonne National Laboratory November 9, ATLAS Core Software TIM.
ATLAS Physics Analysis Framework James R. Catmore Lancaster University.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Dario Barberis: ATLAS DB S&C Week – 3 December Oracle/Frontier and CondDB Consolidation Dario Barberis Genoa University/INFN.
Metadata and Supporting Tools on Day One David Malon Argonne National Laboratory Argonne ATLAS Analysis Jamboree Chicago, Illinois 22 May 2009.
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
AMI – Status November Solveig Albrand Jerome Fulachier
Software Documentation
Software Development Techniques
Presentation transcript:

AMI, Metadata, and Software Infrastructure David Malon 30 August 2010 ATLAS AMI and Metadata Workshop

Introduction  Assigned title seems to be “metadata: technology overview”  Will not exactly be a technology overview –Lots of technologies related to metadata will not be discussed –But I do represent the software project side, not data preparation, not physics, not distributed analysis,..., so I suppose I do bring a technological perspective  Many technologies at work in support of metadata –Databases, of course –ELSSI and TAGs for event-level metadata –COOL for interval-of-validity and lumi-block-level metadata –RunQuery as an interface thereto –XML DTDs as exchange formats (e.g.,for Good Run Lists) –Relatively sophisticated in-file metadata infrastructure (storage, incident handling, …) –AMI at the dataset level –Sources of metadata for AMI, tools that transport it, …  Possible focus on technology related to getting metadata into and out of AMI –But Solveig, who knows many aspects of this work better (and who wrote her slides earlier!), will cover much of this 26 August 2010 David Malon U.S. ATLAS Physics Support and Computing Meeting 2

Earlier work  Metadata and luminosity task forces –reasonable job of enumerating and classifying various kinds of metadata –and describing a few key requirements and use cases  These task forces did not, however, provide much guidance on required tools or infrastructure or architecture  Nonetheless, the collaboration has developed an array of metadata tools that meet most of the needs foreseen in those efforts –The ATLAS “liaison” organization has worked well in many metadata areas Database plus Data Prep: Data Quality and COOL Database plus Data Prep + software project: luminosity infrastructure Software plus database plus Data Prep: Good Run Lists and related work Software plus Data Prep: in-file metadata for job configuration... etc. … 26 August 2010 David Malon U.S. ATLAS Physics Support and Computing Meeting 3

Sources of metadata  General consensus after AMI review: physics metadata about datasets should be in or mediated by AMI –As opposed to in middleware and DDM catalogs/databases –Includes higher-level datasets (physics containers, super-datasets; we used to talk about hierarchical datasets when the meaning was less limited (i.e., before they existed))  There are always items on the boundary (is this datum a physics metadatum?) –Provenance is an example, but clearly provenance has a physics aspect, and should be recorded in AMI  And there are physics metadata about files  Review followups required some reverse engineering by AMI team to extract data from other sources (task request databases, production system databases, T0 management databases, …) –Not necessarily intended as a long-term strategy –Perhaps this workshop provides an opportunity to think about longer-term strategy to accomplish our collaborative goals 26 August 2010 David Malon U.S. ATLAS Physics Support and Computing Meeting 4

Levels of metadata from production  Some metadata known a priori –At task definition time  Some metadata known only after the last job completes and output is checked/validated  We mix these sometimes  Some metadata are the same for all jobs in a task  Some metadata are the same for all files written by a job  Some metadata vary by file  Currently, in distributed production, much of this metadata is reported redundantly, repeated for every file –We return far too much redundant metadata  Improvements to reduce this were discussed and some were even coded, but apparently not put into production 26 August 2010 David Malon U.S. ATLAS Physics Support and Computing Meeting 5

Metadata transport  Technologies for metadata transport are different for Tier 0 than for distributed production –Job report pickle files versus metadata.xml files, and so on –Metadata xml files are a legacy format: reuse of POOL file catalog DTD and its limited provisions for metadata –Alvin Tan proposed using Python dictionaries in place of xml files, but …  Metadata packaging and transport from jobs should probably be revisited 26 August 2010 David Malon U.S. ATLAS Physics Support and Computing Meeting 6

Task-level metadata and AMI  Discussion at the time of the AMI review of a tighter integration of task request infrastructure and AMI  Much dataset-level metadata is known at the time the task that will create the dataset is defined  Introduction of configuration tags (“AMI tags”) has been a major improvement in this direction –Configuration tags provide clear documentation of Athena configuration used to produce an output dataset 26 August 2010 David Malon U.S. ATLAS Physics Support and Computing Meeting 7

Transforms and metadata  Metadata packaging for return to AMI (and elsewhere?) is handled largely at the transform level  Transforms have grown into something of a control framework of their own –Production step sequencing, but also input file peeking for configuration, output post- processing and limited validation of sorts, metadata handling  Plans years ago to rationalize the transform infrastructure  Slow progress for various reasons, and now the developer is leaving ATLAS  An important part of transform improvements will be ensuring that metadata are correctly and robustly packaged and returned –Have seen problems here  Transforms generally, and metadata handling by transforms, need work 26 August 2010 David Malon U.S. ATLAS Physics Support and Computing Meeting 8

Metadata and files  If you have a file, can you figure out what’s in it? –Which release produced it, which runs and streams and lumi blocks are in it, …  If you don’t have a file, can you figure out what you’re missing?  We’re getting pretty good at the first of these –Lots of work on in-file metadata infrastructure and content –Even for eventless files (from sparse selections, etc.) –Example: ability to auto-configure jobs based upon file contents  We’re not quite as good at the second –But that’s where AMI should help  Lots of work on correct propagation of in-file metadata from input to output –{run, lumi block} ranges, for example –Can even merge N eventless files and get the metadata “right”  Still some work needed to make this robust, though most standard cases are handled reasonably well 26 August 2010 David Malon U.S. ATLAS Physics Support and Computing Meeting 9

Metadata and auto-configuration  Can largely auto-configure jobs to process data files by peeking at contents  Have worked hard to make this robust –Can usually do this even when the first input file, or several, is/are eventless  Over-reliance, perhaps, though, on peeking  In most cases, it should be possible to configure all jobs in a task from task- and dataset-level metadata, before the first file is opened 26 August 2010 David Malon U.S. ATLAS Physics Support and Computing Meeting 10

Executing jobs and metadata  Athena jobs have little access to metadata –Metadata about the input dataset? About the task? –In a simple {dataset in, dataset out} task, try to find out from inside Athena the name of the dataset to which the job’s input file belongs  Writing metadata? Suppose you want to add a metadatum to the output –There are several ways that information can get into the metadata.xml file –From inside Athena, none is pretty –There are no Services for this  Data currently written from executing jobs can be iffy –Event counting seems inconsistent and data-product-dependent, for example  This is an area that needs thought, and work (thought first) 26 August 2010 David Malon U.S. ATLAS Physics Support and Computing Meeting 11

Some issues for followup: I Transform and metadata transport issues  Transform architecture and its support for metadata  Architecture for writing/reading metadata at the job (and Athena) level  Metadata packaging and transport –Also improved consistency between T0 and distributed production  Would be nice to be able to configure tasks/jobs from a dataset-level metadata source rather than by peeking General  Metadata validation as part of validation process for production  Standard and optional metadata content 26 August 2010 David Malon U.S. ATLAS Physics Support and Computing Meeting 12

Some issues for followup: II AMI-related  Improvements to DDM/production/AMI agreement, understanding, and representation of dataset status –(Is ithe dataset ready, and what does that mean?  Detection and recording in metadata of unexpected event count differences and other production anomalies  AMI provenance improvement –Not always clear or complete; particularly problematic at the file level –Which lumi blocks are in which file? –Beyond input/output provenance, most other provenance questions are hard to answer Though configuration tags are a big help  File naming differences between Tier 0 and distributed production? –Metadata in the file names is different (maybe this matters only to those who try to use DDM system to find data without AMI?) 26 August 2010 David Malon U.S. ATLAS Physics Support and Computing Meeting 13

Some issues for followup: III Simulation metadata (defer to Borut here) –Input of simulation metadata seems fragile, and reliant upon human input that may or may not happen –Recurring issues related to use of run number in simulation Affects metadata of various kinds adversely Offline “quality” and similar metadata  Stream-dependent offline “quality” flags? –More generally, a means to flag files that are problematic or interesting or anomalous in some way Robustness improvements to in-file metadata handling Metadata about the content of standard data products  Can we answer questions like, does the AOD for period E2 contain AntiKt6H1TopoJets? –What is the EDM content of a given dataset? –Some kinds of queries/provenance questions are difficult without reading (running?) all the nested job options that went into a job’s definition 26 August 2010 David Malon U.S. ATLAS Physics Support and Computing Meeting 14