David Adams Brookhaven National Laboratory September 28, 2006

Slides:



Advertisements
Similar presentations
AmeriCorps is introducing a new online payment system for the processing of AmeriCorps forms
Advertisements

Auto-Graphics Update Mary E. Jackson Product Manager, Resource Sharing October 20, 2010.
CS4432: Database Systems II Buffer Manager 1. 2 Covered in week 1.
Database Management System MIS 520 – Database Theory Fall 2001 (Day) Lecture 13.
IAPT Commissioner Data Web-ex Monday 23 rd February – 2.30pm.
Software Summary Database Data Flow G4MICE Status & Plans Detector Reconstruction 1M.Ellis - CM24 - 3rd June 2009.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Database Design Concepts Info 1408 Lecture 2 An Introduction to Data Storage.
Chapter 7 Indexing Objectives: To get familiar with: Indexing
Database Auditing Models Dr. Gabriel. 2 Auditing Overview Audit examines: documentation that reflects (from business or individuals); actions, practices,
Chapter 7 Database Auditing Models
MS Access Advanced Instructor: Vicki Weidler Assistant:
FireRMS SQL Audit, Archiving & Purging Presented by Laura Small FireRMS Quality Assurance.
Classroom User Training June 29, 2005 Presented by:
© Paradigm Publishing Inc. 9-1 Chapter 9 Database and Information Management.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
McGraw-Hill Technology Education © 2004 by the McGraw-Hill Companies, Inc. All rights reserved. Office Access 2003 Lab 3 Analyzing Data and Creating Reports.
Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.
Database Security and Auditing: Protecting Data Integrity and Accessibility Chapter 7 Database Auditing Models.
David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN.
Basic & Advanced Reporting in TIMSNT ** Part Two **
Event Data History David Adams BNL Atlas Software Week December 2001.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
Data and information. Information and data By the end of this, you should be able to state the difference between DATE and INFORMAITON.
Computer Studies (AL) Memory Management Virtual Memory I.
WISER: Keeping up to date Kate Petherbridge & Judy Reading.
INFO1408 Database Design Concepts Week 16: Introduction to Database Management Systems Continued.
Configuration Management and Change Control Change is inevitable! So it has to be planned for and managed.
Metadata Mòrag Burgon-Lyon University of Glasgow.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
INFSO-RI Enabling Grids for E-sciencE ATLAS DDM Operations - II Monitoring and Daily Tasks Jiří Chudoba ATLAS meeting, ,
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
22/10/2007Software Week1 Distributed analysis user feedback (I) Carminati Leonardo Universita’ degli Studi e sezione INFN di Milano.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
ATLAS ATLAS muon CSC clustering David Adams Brookhaven National Laboratory June 15, 2006 Muon Software Updated 11:00 EDT June 15, 2006.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
Consistency Checking And RUCIO Progress Update Sarah Williams Indiana University ADC Weekly Meeting,
FHA Training Module 1 This document reflects current policy related to this topic. Its content is approved for use in all external and internal FHA-related.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
ATLAS Physics Analysis Framework James R. Catmore Lancaster University.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
AliRoot survey: Calibration P.Hristov 11/06/2013.
Creative Create Lists Elizabeth B. Thomsen Member Services Manager
Web Portal tricks.
Jean-Philippe Baud, IT-GD, CERN November 2007
Spring Cleanup: Student Manager and ACEweb Maintenance Tips
IST 220 – Intro to Databases
Data Virtualization Tutorial… OAuth Example using Google Sheets
Status Report of EDI on the CAA
AMI – Status November Solveig Albrand Jerome Fulachier
Readiness of ATLAS Computing - A personal view
Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.
Job Processing Database consolidation Task recovery De-cronification
Analysis Operations Monitoring Requirements Stefano Belforte
Materials Engineering Product Data Management (ePDM)
How to Use Members Area of The Ninety-Nines Website
WISER: Keeping up to date
Geant4 Documentation Geant4 Workshop 4 October 2002 Dennis Wright
Offline framework for conditions data
Presentation transcript:

David Adams Brookhaven National Laboratory September 28, 2006 Data validation DDM Workshop BNL David Adams Brookhaven National Laboratory September 28, 2006 Updated September 27, 2006

D. Adams BNL data validation BNL DDM Workshop Contents Goals Publication Issues Status Conclusions D. Adams BNL data validation BNL DDM Workshop September 28, 2006

D. Adams BNL data validation BNL DDM Workshop Goals Goal of the BNL data validation effort Determine which data is available at BNL Which datasets Which files in each of these datasets Validate each dataset Validity of GUID and LFN’s LFN corresponds to dataset name Duplicate file numbers within datasets Consistency of BNL replica catalog Publish results Create “BNL” datasets Include only files at BNL Remove duplicate and invalid files Registered as DSNAME_bnl in DQ2 D. Adams BNL data validation BNL DDM Workshop September 28, 2006

D. Adams BNL data validation BNL DDM Workshop Publication Validation is published on a series of web pages Starting point: http://www.usatlas.bnl.gov/~dial/atprod/validation/html BNL summary: http://www.usatlas.bnl.gov/~dial/atprod/validation/html/bnl_datasets.html Tables are updated twice a day Update time at the top of each page Automatic and fairly robust procedure Tables provide field that can be used to restrict listing Simple pattern matching with * for wildcard E.g. *Zmumu*AOD* Tables for tasks, task names, datasets and BNL resident datasets D. Adams BNL data validation BNL DDM Workshop September 28, 2006

Issues: Which datasets Which datasets should be validated? I start from the task table (BNL replica) Select tasks that begin with “csc” (task table) Combine tasks with the same name (task name table) Follow conventions to guess the datasets produced by each task Check if dataset name is registered in DQ2 (DQ2 tasks) Check if BNL is a DQ2 location for the dataset (BNL datasets) This has potential problems Conventions change and my code has to keep up Datasets become obsolete and should be dropped from validation Restricted to production datasets Preferable to have an external source listing datasets of interest Perhaps the metadata catalog D. Adams BNL data validation BNL DDM Workshop September 28, 2006

Issues: Additional validation What additional validation is desired? Check existence of physical files at BNL BNL dcache sometimes loses files and the replica catalog is not updated to reflect this Not too difficult if check is done with ls command Data inside file Right type (AOD, ESD, RDO, …) Event numbers consistent with file name Difficult because these checks require sophisticated code and reading each file Accessibility: can files be read? Again difficult because it is time consuming to open and read files Staging: report how many files in each dataset are staged Expensive to check each file with dc_check Status often changes faster than my twice daily validation checks D. Adams BNL data validation BNL DDM Workshop September 28, 2006

D. Adams BNL data validation BNL DDM Workshop Issues: Remediation Remediation When there are problems (and there are some), who should resolve them? E.g. duplicate files, files missing in dcache Problems such as duplicate files are a feature of the dataset definition and are not BNL-specific Need production expert to sort out which file should be kept Need authorization to change the dataset definition Other problems such as files disappearing from dcache and not from the replica catalog need to resolved locally D. Adams BNL data validation BNL DDM Workshop September 28, 2006

D. Adams BNL data validation BNL DDM Workshop Issues: Data movement Replicating data at BNL Which data? Long term model is all AOD and ESD And 2/N of raw? Should support reasonable user requests Can we do this now? Are we trying? 245/402 AOD datasets are at BNL Those at BNL are mostly complete Big improvement since the spring Validation At present the BNL validation table lists datasets registered at BNL Replace this selection with a policy or an external list? Or just register BNL as an (incomplete) location for desired datasets? Can/should table let users know what data is coming? Or why data is missing? D. Adams BNL data validation BNL DDM Workshop September 28, 2006

Issues: Historical information Current validation pages only provide a snapshot view Difficult to know if the situation is improving or deteriorating Historical data is available Results from each scan are saved Last week on disk, web accessible All data since June stored in dcache Interesting to track the number of datasets and files of each type as a function of time Both in DQ2 and at BNL Volunteers? D. Adams BNL data validation BNL DDM Workshop September 28, 2006

D. Adams BNL data validation BNL DDM Workshop Status Validation has been running at BNL since Spring Automated: I only need to update my proxy every week or two Fairly robust Down for a couple weeks when I was on vacation because database passwords changed Major DB failures will occasionally leave it in a state that require me to clean up BNL table provides a nice answer to the question “What data is available at BNL?” Easy to select by physics process, data type and release Up to date without requiring DB query for each request D. Adams BNL data validation BNL DDM Workshop September 28, 2006

D. Adams BNL data validation BNL DDM Workshop Conclusions Validation pages provide useful summaries For users and production experts Easy to use and understand Can be improved External list or list of datasets Additional validation Active reporting of problems Information about data movement (or lack thereof) Historical information What else? Volunteers welcome To address any of the above or whatever other features you would like to see D. Adams BNL data validation BNL DDM Workshop September 28, 2006