Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding Information: Metadata in ATLAS Elizabeth Gallas – Oxford ATLAS UK: Software Session Lancaster, UK January 9, 2013.

Similar presentations


Presentation on theme: "Finding Information: Metadata in ATLAS Elizabeth Gallas – Oxford ATLAS UK: Software Session Lancaster, UK January 9, 2013."— Presentation transcript:

1 Finding Information: Metadata in ATLAS Elizabeth Gallas – Oxford ATLAS UK: Software Session Lancaster, UK January 9, 2013

2 Jan 2013E.Gallas- Metadata2 Outline  What is “Metadata” ?  Challenges in ATLAS  Survey some user oriented systems using Metadata  Show utility of collecting metadata into dedicated systems  Tour of some COMA Reports  Features: Runs, Periods, Triggers, Luminosity … metadata  New content and newly aggregated quantities  Describe a few areas: metadata in evolution  (Event) Dataset Nomenclature: PhysicsShort, AMI tags  Transforms and metadata  AMI Hierarchical Search (aka: Dataset Browser)  New interface … a different way to find Datasets of interest  aim to help metadata issues in MC  Summary and Conclusions

3 Jan 2013E.Gallas- Metadata3 What is Metadata ? Metadata definition:  Concisely: “data about data”  More precisely: “data used to describe the context, content or structure of data”  Structural or Descriptive Metadata: used extensively in ATLAS … In fact: No process doesn’t use metadata  Descriptive examples:  Dataset name, Run Number, Channel number in some detector, TWiki Name, Trigger Names, dates/times, DQ Defect, ATLAS Software release number, …  Structural examples:  Number of runs or events or files, data volume, structure of compound objects, …  Usage examples:  Upstream: data taking with the correct calibrations …  Downstream: user finding Events of interest … or Luminosity for an event sample  Metadata challenges:  Data/metadata: have grown organically as the experiment evolved  Size/Scope of ATLAS data … Volume/Diversity of metadata  Following evolution in Run1 and trying to anticipate changes for Run2  Try to offer a coherent / integrated view to physicists while devising strategic placement for processing and analysis

4 Jan 2013E.Gallas- Metadata4 ATLAS User Application Overview  Subsystem specific: driven by subsystem specific needs (using metadata)  Trigger: wide variety of tools and interfaces  Geometry DB: Detector Description Browser  Conditions DB:  RunQuery (in-depth Run information from Conditions DB)  ATLAS WEB DQ  COOL Tag Browser  Lumi Data Summary Reports (Luminosity, Beam)  GRLs (Good Run List xml)  And the Luminosity calculator  Beam Spot Summary  GANGA and PAthena  Panda / monitor  DQ2 Client  Tag Collector – software releases ... (not a complete list !)  Dedicated Metadata Catalogs  TAGs (and TAG Catalog) – event level metadata  iELSSI and Suite of TAG Services  AMI – Datasets, processing … other metadata  And the AMI Suite of services  COMA – Run/LB level Conditions and configuration  Plus Conditions DB management metadata  Important metadata facilitator: ATLAS Job Transforms Fundamental areas for every analysis ! See specific talks in software tutorials. COMA: Is an Oxford based project.

5 Jan 2013E.Gallas- Metadata5 COMA @ Oxford The COMA Project:  TWiki: ConditionsMetadata TWiki: ConditionsMetadata  Originally: built to support other systems.  Evolved into a standalone system with its own interfaces. Components:  Relational Database (Oracle)  Info: copied, refined, reduced, derived  Unique content (not found elsewhere)   Data Periods, Derived/Aggregated data  Interfaces (Reports and Browsers) Current efforts:  COMA Database content/interfaces growing  Aggregating various quantities across Periods, Runs and by Trigger  Adding event counts: Stream, Trigger  Enhance aspects of MC metadata (LS1)  Improve content, functionality, and usability Beyond COMA:  COMA is part of general effort to consolidate/relate ATLAS Metadata  Strong ties with AMI and TAG DB  COMA data/links now found in many ATLAS systems:  AMI, TAGs, DataQuality, RunQuery, Muon alignment, Conditions DB  Many links from ATLAS TWiki physics pages and personal pages Ryan Buckingham (4 th year) Kate Pachal (2 nd year) Dr. Jeff Tseng Dr. Elizabeth Gallas

6 Jan 2013E.Gallas- Metadata6 COMA Interfaces Portal https://atlas-tagservices.cern.ch/tagservices/RunBrowser/index.html  Most popular Reports/Browsers at top of this Portal page  (shade: grey) … operational … but no current/active development

7 Jan 2013E.Gallas- Metadata7 COMA: ATLAS Data Periods … + aggregating new content  A Data Period is a set of ATLAS Runs grouped for a purpose  Defined by Data Preparation Coordinators  Used in ATLAS data processing, assessment, and selection …  Each Period uniquely defined with a combination of  Project name (i.e. ‘data10_7TeV’)  Period name (i.e. ‘C1’, ‘C2’, ‘C’, ‘AllYear’ …)  Before 2011, Data Periods were  Described on TWiki page  https://twiki.cern.ch/twiki/bin/view/AtlasProtected/DataPeriods  Stored in a file based system  Edited by hand by Data Prep Coordination (experts)  Structure evolved over 2010 with experience  This experience  valuable to decide/define long term solution  In 2011: Data Periods moved into COMA  Coordination/Effort: Data Prep, AMI, COMA experts  This made all aspects of Period definitions available programmatically  Since then, COMA content has grown in many areas  Allows for more details reports and information to other systems  Enables aggregation of LB-wise information by Run, … Period. Painful to maintain, Error prone Simple to enter, check integrity, more Robust, available

8 Jan 2013E.Gallas- Metadata88 https://atlas-tagservices.cern.ch/RBR/rBR_Period_Report.php Period Menu Purpose: Overview of all DataPrep defined Periods giving links to reports of general info about their Runs. Choose the Period of interest:  By Year  e.g. all ‘2011’  Or for ‘all years’  By Project  e.g. ‘data12_8TeV’  By Beam Energy or Type  e.g. ‘7TeV’  By specific Period or Group  Click on the project and then Period of interest General feature of COMA Reports “highlighted” link opens expanding sections Help, Doc Links

9 Jan 2013E.Gallas- Metadata9 highlight links: show / hide period members Members of data 12_8TeV.A are A1-A8 Links: to COMA,RunQuery, AMI Container production Header: Input criteria Links in Table column headers:  Short description of column Note: some columns removed using the “customize report” feature (not shown) Hover on link: Indicates what will happen

10 Jan 2013E.Gallas- Metadata10 “Event”: detector output during a single particle bunch crossing “Lots”: LHC max particle bunch crossing rate is 31.6 MHz “Fewer”: a few hundred events per second “Trigger” is a multi-component selection filter for events:  ATLAS detector hardware/electronics  Many subsystems … TDAQ  ATLAS software: HLT Release  Mostly C++ algorithms collected in a specific ATLAS Software Release  executed by the HLT (2 nd,3 rd trigger levels)  Trigger Menu: defines ~500 to 1000 Triggers  Every distinct Menu is assigned a unique integer ID  SMK : Super Master Key  Configurable input to the Trigger hardware and software  Specifies what logic or algorithms to execute, including configurable parameters (eg: thresholds)  Assigns each trigger to one/more output Streams  Menu (SMK) is FIXED during each Run (not incl.prescales)  Each trigger: 3 levels of pass OR fail  Each Event either passes or fails each Trigger  Prescales: Blind filter applied by TDAQ when above Trigger logic does not sufficiently reduce event output rate  Prescales can change during a Run (on LB boundary)  Integer identifiers are assigned to sets of prescales  Level 1 and HLT Prescale Keys Trigger Intro Event is recorded for offline physics analysis if it passes at least one trigger (and its prescale) “Trigger” “ Fewer ” but more interesting Events “Lots” of “Events” Level1 HLT: L2 HLT: EF

11 Jan 2013E.Gallas- Metadata11  Highest level Trigger Configuration Metadata:  SMK Trigger Chains: EF chain, L2 Chain, L1 Item  Names, Versions, Bit Assignments, Streams, ReRun  LVL1, HLT Prescale Keys:  EF, L2, L1 prescales  EF, L2 Passthrough values  Details behind Trigger Configuration and what is stored event-wise: need tools from the Trigger Experts  Understanding trigger execution and info storage  Algorithms, cuts, multiplicities, bunch groups  Dead-time veto, BCID / Train / Lumi dependence  Trigger objects related to trigger decisions  HLT algorithm Error codes  Trigger EDM and the Trigger Decision Tool  How to work with Chain Groups (Trigger ‘OR’s)  See the trigger related talks in Software Tutorials: https://indico.cern.ch/conferenceDisplay.py?confId=212225 Trigger Metadata: just the tip of the iceberg COMA: Stores this metadata. Combines it w / Period,Run,Lumi data to provide unique reports.

12 Jan 2013E.Gallas- Metadata12 COMA Chain Wildcard Reports L1_2EM*_MU* over all periods EF_*ZEE*

13 Jan 2013E.Gallas- Metadata13 1.Configuration Summary: Shows where this element is configured:  Super Master Key(s)  Project (Summary) 2. Period Evolution: Shows chain/item bit, version evolution for EF_g20_loose chains during PeriodRuns 3. Activation Summary: Shows Runs where this chain is ”active”  Via prescale  Via pass through  Via rerun EF_g20_loose

14 Jan 2013E.Gallas- Metadata14June 2011Elizabeth Gallas - COMA14 COMA Chain Report (EF_e9_tight_e5_tight_Jpsi) Expand Run-wise Activation …  “Physics” EF-L2-L1 signatures  Active via Prescale  Runs in Data Periods Table Shows (Run Count):  Periods, Link: Run, SMK Reports  Level bit assignments  Link to: Chain/Item Reports (3)  Range of Aggregate Prescale while chain is active via prescale in Run  Links: COMA Prescale Report (3) 70 Period Runs where this chain is “active”

15 Jan 2013E.Gallas- Metadata15June 2011Elizabeth Gallas - COMA15 COMA Chain Report (EF_e9_tight_e5_tight_Jpsi) Period Evolution Section …  For “Physics” chains in Period Runs  Separate table for each EF-L2-L1 signature at each beam energy Each Table Shows:  Row-wise:Distinct set of bit and chain/item versions Columns:  Bit assignments  Chain version (links to Trig diff)  Chain Report Links  Range of AggPS, SMK, Run, Period, Date, HLTRelease Thanks to Tomasz, Joerg for many useful discussions

16 Jan 2013E.Gallas- Metadata16June 2011Elizabeth Gallas - COMA16 COMA Chain Wildcard Report (input: “EF_g10%”) Purpose: See all the names matching a pattern or Find exact name from part of the name Report: Displays chain/item names matching the input string … text size proportional to occurrence in SMK  In Period Runs and in All Runs

17 Jan 2013E.Gallas- Metadata17June 2011Elizabeth Gallas - COMA17 Summary and Plans  COMA – an integral part of ATLAS Metadata infrastructure  Essential to ATLAS event-level metadata decoding  Ideally placed to provide links and interface to other metadata  Special relationship to AMI (and TAG catalog)  Launch iELSSI to take a quick look at any Run  Primary source for “ATLAS Data Periods”  Periods in Lum, DQ, Run Summary, AMI reports comes from COMA  Reports feature “derived” information not available elsewhere  Trigger experts recommend COMA Trigger/Prescale  Report usage: from ~200 to over 5000 pages viewed/month.  Peaked in July as users did final preparations for summer conferences  Current efforts:  COMA Database content growing  Watch use cases to identify new areas to focus growth  COMA Report and Browser development  keep pace with content, improve functionality and usability  Beyond COMA: Interface development  Connecting COMA to other parts of the infrastructure

18 Jan 2013E.Gallas- Metadata18June 2011Elizabeth Gallas - COMA18 COMA Conclusions  This is an evolving system … information in the system is growing based on information available and use cases  Adding more dimensions to the Conditions data  With suitable relationships to facilitate queries  Making that criteria available in dynamic useable interfaces  We want to insure the Metadata is  complete enough to satisfy use cases while  reflecting accurately its limitations  Interfaces are being constructed to use selection syntax, criteria, and communication in common use in ATLAS  This facilitates cross checks with other systems  Continuous process: talking with various experts to ensure  data integrity, completeness, compatibility w/other systems … Very positive feedback so far … more always welcome … hn-atlas-physicsMetadata@cern.ch

19 Jan 2013E.Gallas- Metadata19 Shows “at a glance”: the latest Period Runs with Magnet states, ‘ready fraction’, link to Stable Beam fill(s), beam information … Oct 2012E Gallas / COMA & TAGs19 COMA multi-Run report: Latest 6 runs

20 Jan 2013E.Gallas- Metadata20 New aggregated information Oct 2012E Gallas / COMA & TAGs20 COMA Period Documentation Report: enhanced content

21 Jan 2013E.Gallas- Metadata21 New aggregated information COMA Period Documentation Report: enhanced content

22 Jan 2013E.Gallas- Metadata22 Shows “at a glance”: the latest Period Runs with Magnet states, ‘ready fraction’, link to Stable Beam fill(s), beam information … Oct 2012E Gallas / Metadata22 COMA multi-Run report: Latest 8 runs

23 Jan 2013E.Gallas- Metadata23  A lot of progress in many areas using metadata:  Transforms, Data Processing, Dataset related metadata  Dedicated Metadata Catalogs: AMI, COMA, (TAGs)  Metadata in ATLAS continues to evolve  Naming conventions/rules  Important to form coherent view over datasets, runs, periods, …  Increased cooperation between systems  Upstream and downstream  Use cases continue to expand  Improvements in metadata  Storage  Consistency  Delivery  Usage  Challenges ahead  Offer coherence at Management and User levels  To keep pace with  system evolution (such as DDM  Rucio, ProdSys, … upgrades)  Analysis pattern evolution and use cases Summary and Conclusions E Gallas / MetadataOct 201223

24 Jan 2013E.Gallas- Metadata24  Dataset names used extensively:  Storage and operating systems, DDM, ProdSys, Metadata repositories  But needs to be pneumonic from user point of view  Dataset naming rules: http://cdsweb.cern.ch/record-restricted/1070318/http://cdsweb.cern.ch/record-restricted/1070318/  Carefully defined by experts, evolved somewhat, has served us well  But was last updated in 2010 … needs of ATLAS have grown  2012 Task Force formed to try to amend the rules to address these needs  https://twiki.cern.ch/twiki/bin/viewauth/Atlas/DatasetNomenclaturehttps://twiki.cern.ch/twiki/bin/viewauth/Atlas/DatasetNomenclature  Overall length < 231 characters (base directory name): Hard limit !  If each field at field limit, overall limit is exceeded !  Many pressures on component lengths … Highest areas of concern:  “physicsShort” – for MC datasets  AMI tag – for both data and MC  Importance of Name: coherence must be understood at all ATLAS levels  From Management to Users … and sometimes limits are good  Keep a rational balance !!! Metadata Issues: Dataset Names Project.datasetNumber.physicsShort.productionStep.dataType.AMITag[/] Project.runNumber.streamType.productionStep.dataType.AMItag[/] dataNN_* or mcNN_* ESD, AOD, … Concatenation of configurations

25 Jan 2013E.Gallas- Metadata25  Example of one proposed “Physics Short”: MadgraphPythia8_NNPDF21NLOME_AU2NNPDF21LOMPI_SingleTopTChanWelenu_LeptonFilter  Rules: physicsShort field must not exceed 40 characters.”  This one: 78 characters (and is it really user friendly ?)  This kind of ‘growth’ is oblivious to the rules, shows addiction of experts/users to depending entirely on the Dataset Name to identify/find their data  General frustration  finding MC needed, Twiki pages, understanding the MC they use, and identifying additional MC samples they need or what exists …  Jamie Boyd: “General feeling is this level of info should be encoded in AMI rather than the filename – need to follow up with generators group on this”  Progress in 2012:  Commendable effort by MC Coordination: add more metadata to AMI  “Simulation Metadata Workshop” – held in April 2012  Metadata systems need to provide better tools which  Better explains relays the metadata behind the dataset AND  Better allows browsing of the datasets and the metadata “PhysicsShort” for MC Project.datasetNumber.physicsShort.productionStep.dataType.AMITag[/]

26 Jan 2013E.Gallas- Metadata26  AMI: https://twiki.cern.ch/twiki/bin/viewauth/Atlas/AtlasMetadataInterface AMI Portal Page: http://ami.in2p3.fr/http://ami.in2p3.fr/  Means of finding datasets to use in analysis using physics metadata predicates  Not just dataset names, but also the underlying metadata  AMI contains a LOT more than just a list of datasets  Dataset provenance  Files  Lost Lumi blocks  Links to other applications  Nomenclature reference tables  Connection to COMA and all its data  New AMI interface: Dataset ‘browsing’ (hierarchical search)  Now available to users (first version !)  Good feedback from users … important for evolution  AMI team working on refining this tool based on feedback  Adding available metadata coherently: always a challenge How does Metadata Help ?

27 Jan 2013E.Gallas- Metadata27  User is guided to the AMI catalog specific to the project of interest  Information varies according to project  Allows users progressive selection to iteratively narrow result set  This is a working/evolving example … the major point is:  Always open to ideas for new interfaces using wealth of metadata that exists AMI Dataset Browsing

28 Jan 2013E.Gallas- Metadata28 Critical Component: “Transforms” and Metadata  “ATLAS Transforms”: a wrapper to Athena & python job options  Thanks to the Transform Group ! GraemeStewart, StephenBeal, ThomasGadfort, HarveyMaddocks, BjornSarrazin  See Graeme’s talk during Software week  Required, for example, by the ATLAS production system  Provides uniform, coherent mechanisms for specifying, executing tasks  Even multi-step transforms  New Transforms:  General merging capabilities  Also need for the merging of file based metadata  Provide important computations  Such as Event counts  Bridges the gap in metadata communication uniform information transfer to other systems and metadata repositories

29 Jan 2013E.Gallas- Metadata29 Summary and Conclusions  There are significant challenges ahead  LS1 planning is well underway  With a longer term view: we hope will handle future data volumes  Many major systems need to evolve in major ways  Take advantage of accumulated experience and new technology  While maintaining operations  Maintaining the experts we need !!!  Metadata in ATLAS continues to evolve  Naming conventions/rules  Important to form coherent view over datasets, runs, periods, …  Increased cooperation between systems  Upstream and downstream  Use cases continue to expand  Improvements in metadata  Storage  Consistency  Delivery  Usage

30 Jan 2013E.Gallas- Metadata30 2nd issue in Dataset Names: AMI (“Config”) Tags  The AMI Tag:  Definition:  Is composed of concatenated strings encoding processing steps  Example: r2713_p705 … encodes information about  which ATLAS releases (17.0.3.3)  which database releases (16.9.1.1)  which transforms (reco_trf.py), job configurations, …  Why is it called the “AMI tag” ?  AMI provides interfaces for its interpretation  Rules for AMI tags also listed in Nomenclature doc  Original specification now also needs revision  Max length sometimes exceeds limit (22) – multiple factors driving this …  Highlight some issues to be addressed:  Running out of lower case letters  Numeric parts … require more characters (99 … 999 … 9999 ….?)  More processing/merging steps: add more/more fields  Must find a way to consolidate steps in a managed way

31 Jan 2013E.Gallas- Metadata31 AMI tags: Evolution  AMI Tag issues also being discussed in the Nomenclature Task Force  Solveig Albrand: evolving document describing issues/possibilities  https://twiki.cern.ch/twiki/pub/Atlas/DatasetNomenclature/AMItags.pdf https://twiki.cern.ch/twiki/pub/Atlas/DatasetNomenclature/AMItags.pdf  The scope and use AMI tags has turned out to be much wider than the original design anticipated  When considering all issues: A major phase change is required  A step-wise way solution (case by case concatenation of parts of AMI tag but not others) would be a long term mistake:  Confusing, waste developer time, inevitably incomplete  Example proposal: AMI tag “e1494_s1499_s1504_r3658_r3549_t85” would become “mc1201234_t85”  where “ mcYYnnnnn” means e1494_s1499_s1504_r3658_r3549, and would be substituted for the AMI tag used to produce merged AOD, the nnnnn th chain for the mcYY data  This is under discussion. A complete set of rules will be written and proposed for approval by Data Preparation  Proposal must include how other systems cope with the change  And take advantage of it  Describe the interfaces: help users understand underlying information

32 Jan 2013E.Gallas- Metadata32

33 Jan 2013E.Gallas- Metadata33 Overview of Plans for LS1 Sept 2012: DB coordination asked all database developers their plans for LS1:  Plans to modify in any way the use of central (Oracle) databases  Needs to scale up Oracle data sizes and/or load in Run 2  Intentions to move any activities to Hadoop  Foreseen load (data and CPU) for the Hadoop applications  Requests for: web servers or centrally managed machines  Sub-system plans for change vary widely  From NONE to major changes in storage  TOO many to list individually here  Responses/plans collected in TWiki:  DatabasesLS1Planning (All)DatabasesLS1Planning  LS1ConditionsDBLS1ConditionsDB  CompUpgPlanDistriComputing (ADC) CompUpgPlanDistriComputing  DCS workshop (PVSS): https://indico.cern.ch/conferenceDisplay.py?confId=208712 https://indico.cern.ch/conferenceDisplay.py?confId=208712  Some details also in talks SC Week DB session:  https://indico.cern.ch/conferenceDisplay.py?confId=169697 https://indico.cern.ch/conferenceDisplay.py?confId=169697

34 Jan 2013E.Gallas- Metadata34 Metadata tools: now  upgrade  Users need appropriate tools to find, understand, process, analyse the data they need to produce results.  Increase in data rate will make this even more critical   Improve and expand use of metadata tools  AMI, COMA, and TAG systems are currently undergoing a lot of growth and evolution out of use cases arising with existing data  2012 data volume is forcing changes  Heavy Ion processing MUST use TAGs: currently in use  Group processing also testing TAG usage  TIM workshop: many jobs peeking at files, but reading no events ?  Better usage of metadata might eliminate the need to provide/access unneeded files  Reduce unneeded use of grid resources  Recent TAG “Brainstorming” (November 2012):  https://indico.cern.ch/conferenceDisplay.py?confId=215781 https://indico.cern.ch/conferenceDisplay.py?confId=215781  Collect feedback from users, experts  Identify issues and use cases  Parallel efforts:  Keep system running while improving existing TAG performance  Look into possible use of alternative storage (Hadoop / HBase)

35 Jan 2013E.Gallas- Metadata35  Must recognize: any change is painful for users  Disruptive to workflow; Immediate interest is to get results out quickly  Any change must, in parallel, come with tools they need to “GET OVER IT”  It helps the process if we provide MORE of the information they need Cartoon Break: Cycles of Change

36 Jan 2013E.Gallas- Metadata36 Challenges of building a New World  New/Replacement systems require:  Motivation “why do we need that”?  Vision (long term)  Resources  Developers, infrastructure  New/improved technology *  Knowledge how/when to use it  Existing data/systems are  A Blessing  Reflect real usage  Populate new system with real data  A Curse  Maintain existing operations  LOTS of real data  Backward compatibility  Carries inherently  Risks (failure) and  Rewards (better world)


Download ppt "Finding Information: Metadata in ATLAS Elizabeth Gallas – Oxford ATLAS UK: Software Session Lancaster, UK January 9, 2013."

Similar presentations


Ads by Google