AMI S.A. Datasets… Solveig Albrand. AMI S.A. A set is… A number of things grouped together according to a system of classification, or conceived as forming.

Slides:



Advertisements
Similar presentations
DATABASE RC D DD CMA C M R B PK E I S H S RC H L I V FK.
Advertisements

Simulation - An Introduction Simulation:- The technique of imitating the behaviour of some situation or system (economic, military, mechanical, etc.) by.
ICS103 Programming in C Lecture 1: Overview of Computers & Programming
Lecture 1: Overview of Computers & Programming
Information Communication and Technology Class By: Mr. Latibeaudiere March, 2011.
Graeme Stewart: ATLAS Computing WLCG Workshop, Prague ATLAS Suspension and Downtime Procedures Graeme Stewart (for ATLAS Central Operations Team)
Conditions and configuration metadata for the ATLAS experiment E J Gallas 1, S Albrand 2, J Fulachier 2, F Lambert 2, K E Pachal 1, J C L Tseng 1, Q Zhang.
EventStore Managing Event Versioning and Data Partitioning using Legacy Data Formats Chris Jones Valentin Kuznetsov Dan Riley Greg Sharp CLEO Collaboration.
Organizing Data & Information
Physical design. Stage 6 - Physical Design Retrieve the target physical environment Create physical data design Create function component implementation.
Modules, Hierarchy Charts, and Documentation
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
1 The Problem Do you have: A legacy ABL system with millions of Lines of ABL Code? Years and years of modifications to your ABL code? System documentation.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
Microsoft ® Office Access ™ 2007 Training Choose between Access and Excel ICT Staff Development presents:
Choose between Access and Excel Right questions, right program If you’re having trouble choosing between Access and Excel, take a moment to answer an important.
ATLAS : File and Dataset Metadata Collection and Use S Albrand 1, J Fulachier 1, E J Gallas 2, F Lambert 1 1. Introduction The ATLAS dataset search catalogs.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
2005 SPRING CSMUIntroduction to Information Management1 Organizing Data John Sum Institute of Technology Management National Chung Hsing University.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
11/10/2015S.A.1 Searches for data using AMI October 2010 Solveig Albrand.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
The european ITM Task Force data structure F. Imbeaux.
Simulation is the process of studying the behavior of a real system by using a model that replicates the behavior of the system under different scenarios.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Metadata requirements for HEP Paul Millar. Slide 2 12 September 2007 Metadata requirements for HEP Some of the players in this game... WLCG – Umbrella.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Lesson 1 Operating Systems, Part 1. Objectives Describe and list different operating systems Understand file extensions Manage files and folders.
Graeme Stewart: ATLAS Computing WLCG Workshop, Prague ATLAS Suspension and Downtime Procedures Graeme Stewart (for ATLAS Central Operations Team)
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
Relational Databases. Relational database  data stored in tables  must put data into the correct tables  define relationship between tables  primary.
5/2/  Online  Offline 5/2/20072  Online  Raw data : within the DAQ monitoring framework  Reconstructed data : with the HLT monitoring framework.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
The ATLAS Cloud Model Simone Campana. LCG sites and ATLAS sites LCG counts almost 200 sites. –Almost all of them support the ATLAS VO. –The ATLAS production.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
AMI, Metadata, and Software Infrastructure David Malon 30 August 2010 ATLAS AMI and Metadata Workshop.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
ATLAS Metadata Interface Campaign Definition in AMI S.Albrand 23/02/2016ATLAS Metadata Interface1.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
TAGS in the Analysis Model Jack Cranshaw, Argonne National Lab September 10, 2009.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
The ATLAS Computing & Analysis Model Roger Jones Lancaster University ATLAS UK 06 IPPP, 20/9/2006.
ELSSISuite Services QIZHI ZHANG Argonne National Laboratory on behalf of the TAG developers group ATLAS Software and Computing Week, 4~8 April, 2011.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
ATLAS Physics Analysis Framework James R. Catmore Lancaster University.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Victoria Ibarra Mat:  Generally, Computer hardware is divided into four main functional areas. These are:  Input devices Input devices  Output.
Using the Grid for the ILC Mokka and Marlin on the Grid ILC Software Meeting, Cambridge 2006.
MANAGEMENT INFORMATION SYSTEM
Existing Perl/Oracle Pipeline
ICS103 Programming in C Lecture 1: Overview of Computers & Programming
AMI – Status November Solveig Albrand Jerome Fulachier
Readiness of ATLAS Computing - A personal view
Event Chain Methodology
Information Technology Ms. Abeer Helwa
INTRODUCTION To COMPUTER what is information? Information is that which informs. In other words, it is the answer to a question of some kind. It is thus.
MonteCarlo production for the BaBar experiment on the Italian grid
ATLAS DC2 & Continuous production
Presentation transcript:

AMI S.A. Datasets… Solveig Albrand

AMI S.A. A set is… A number of things grouped together according to a system of classification, or conceived as forming a whole. A number of things connected in temporal of spatial succession, or by natural production or formation. A collection of instruments, tools, or machines used together in a particular operation. Just a few of the definitions of sets in the Shorter Oxford Dictionary

AMI S.A. Applied to ATLAS Our production is TDAQ or Monte-Carlo Our operation is moving from one ATLAS site to another. An ATLAS dataset is a number of files which have been produced together, or which are usefully grouped together for transport.

AMI S.A. The things we put in the sets Our things are in general files. (usually of binary data, but not always) What we really want out of the datasets is not the files themselves but the events in the files. It’s just that we have to transport files. The connection between files and events is quite “natural” in Monte Carlo production.

AMI S.A. Dataset Definition Document “ A set of data produced under the same logical conditions and is a minimal portion of data movable across GRID by ATLAS Distributed Data Management system, and is expected to consist of uniform files suitable for processing with the same application in the transformation chain “ Atlas Dataset Definition Document

AMI S.A. Monte-Carlo Production TASK (EVGEN) TASK (SIMUL) Task = « a set of jobs » EVNTS HITS LOG

AMI S.A. Notion of “Task” A “Task” is a transformation of the events in one or more dataset of a given type, into one or more datasets of other types which is usually (but not necessarily) different from the input type. Note that if more than one type is produced by a task, then we define an output dataset for each type, because the input of a succeeding step will be defined as a unique type.

AMI S.A. AMI Provenance Diagram

AMI S.A. What about real data? Discussions are on-going about how datasets will be formed for real data, and even for commissioning. For ctb in 2004: 1 run = 1 dataset of “RAW” type, then from each “RAW” dataset several “recon” tasks produced ESD. This was in pre-DDM days.

AMI S.A. DDM requires “A set of data produced under the same logical conditions and is a minimal portion of data movable across GRID by ATLAS Distributed Data Management system.” It seems that one CSC run is too small to be moved across the grid, so several runs are grouped together, according to the metadata. New VERSIONS of the dataset must be defined as runs become validated, or not.

AMI S.A. Tiles & Larg Green runs Blue runs Red runs larg BarrelP3C.Pedestal.high.v larg EC_Installation.Trigger.high.v000001

AMI S.A. How will the datasets be formed? TDAQ will write a certain amount of metadata into the header of each file. Probably this should be written into a database also – Surely we should not have to open each file to decide which dataset it belongs to?

AMI S.A. Event Collections C.f. Caitrina’s talk yesterday. We are interested in the events, but we can only transport events in files. The files should be transparent to the user. Note that the SAME set of files can be required for several DIFFERENT event collections. (How will we tell DDM this? (Perhaps we don’t need to)

AMI S.A. 2 Collections, same set of files

AMI S.A. Other datasets For Monte-Carlo production to get the “cross-section” calculated by EVGEN need to parse the log files. (Done by AMI). Need only look at one log file per task. Either get ALL evgen logs for all evgen tasks OR – make a “secondary” dataset – first two evgen log files of each task primary evgen log dataset, and open a subscription to this dataset on some site accessible to AMI Actually, even doing this we end up transporting rather more than we need to, because in fact the “log” datasets contain the whole sandbox, and we only need just the “log” file output by the job.

AMI S.A. Conditions DB Some trials have been made of transport of snapshots of the conditions DB to ATLAS sites, using DDM.

AMI S.A. How many Datasets are we expecting? Used the computing model in 2 ways: Raw data + analysis model  128 million Storage Estimate  N Files ( ) then nFiles/dataset  22 million But 42 million is just as good an answer as any…