Presentation is loading. Please wait.

Presentation is loading. Please wait.

Consistency Checking And RUCIO Progress Update Sarah Williams Indiana University ADC Weekly Meeting, 2015-02-24 1.

Similar presentations


Presentation on theme: "Consistency Checking And RUCIO Progress Update Sarah Williams Indiana University ADC Weekly Meeting, 2015-02-24 1."— Presentation transcript:

1 Consistency Checking And RUCIO Progress Update Sarah Williams Indiana University ADC Weekly Meeting, 2015-02-24 1

2 Consistency Checking in ATLAS There are currently two twikis available for consistency checking – DarkData cleanup procedureprocedure – MissingData detection and cleanup procedure (in validation)procedure – These work by comparing the list of files at a site to the file replica catalog in RUCIO Site-developed tool CCC provides a top-to-bottom audit of data catalogs: dataset level, file replica level, local file level, and at dCache sites, PNFS catalog and disk level. CCC is being developed into a tool rucio-rse-auditor that can be distributed with RUCIO toolset 2

3 Outstanding Issues Need to stop using DQ2 API – DQ2 API is deprecated – Will become harder to support as time goes on – RUCIO API is available and does (most) of what we need – The RUCIO API version is complete and going through some final tests Performance – DQ2 version of the tool takes 24 hours on a 400K dataset site Rucio tool wrapper – So that the tool can be shipped with the other rucio tools to sites Terminology Other uses of dumps – If dumps identify datasets which are unique, having no replica at another site, we can more easily assess the impact of data loss. 3

4 Previously Used DQ2 API listDatasetsByNameInSite(site,complete=1) – complete datasets listDatasetsByNameInSite(site) – complete and incomplete datasets listFilesInDataset(dataset) getState(dataset) – to see if the dataset is frozen. If it is we can cache its info for future runs, since it won’t change. 4

5 Equivalent RUCIO API Calls listDatasetsByNameInSite(site,complete=1) is replaced by the RUCIO Dataset dump file listDatasetsByNameInSite(site) will be replaced by the forthcoming dump file listFilesInDataset(dataset) is replaced by RucioClient.list_files(dataset) getState(dataset) has no equivalent. Datasets no longer freeze, so this feature has gone away. Now we compare modification date reported in the RUCIO Dataset dump file to the cached version. If the modification date is changed, the dataset is re- downloaded. 5

6 Performance – A cold run (no cache) of the DQ2 version at MWT2 takes 24 hours, running over approx 400k datasets – The daily runs with caching turned on take approx 8 hours – The RUCIO version has not yet done a complete check, but on smaller runs it takes about 28% less time to complete. – When incomplete dataset dump file and files per dataset dump files are available, the performance should be dramatically better. 6

7 Terminology Typical report CCC was built using some non-standard terminology for file inconsistencies – GHOST: A file that is present in a higher level storage catalog, but not in a lower one or on disk. Ghost files often cause job failures when a job runs at the site and fails to fetch the missing input file. Sites should report these to DDM Ops. – ORPHAN: A file that is missing in the higher level storage catalog, but is present in the lower one or on disk. Orphans do not cause job failures. They are typically 'dark data', and may need to be removed manually. Dataset terms are better but could use improvement – DAMAGED DATASET: A dataset that is listed as having a complete replica at the site, but lower level catalogs or disk show missing files. Damaged datasets may cause job failures, for the same reason as ghost files. – MISSING DATASET: A dataset that is listed as having a complete replica at the site, but lower level catalogs or disk show all files are missing. Missing datasets may cause job failures, for the same reason as ghost files. – INCOMPLETE DATASET: A dataset is listed as having a incomplete replica at the site, and the lower level catalog and file system confirm that. These are not a problem. – EMPTY DATASET: A dataset listed as having 0 files in the catalog. These are not a problem. – UNKNOWN DATASET: The catalog lists the dataset as present at the site, but when the catalog was queried for a list of files it returned an 'unknown dataset' error – OK DATASET: A replica that is complete at all levels. Most datasets should be of this type. Historically this has caused confusion to users Need an agreed-upon and easier to understand way to identify types of inconsistencies 7

8 Questions? 8


Download ppt "Consistency Checking And RUCIO Progress Update Sarah Williams Indiana University ADC Weekly Meeting, 2015-02-24 1."

Similar presentations


Ads by Google