Presentation is loading. Please wait.

Presentation is loading. Please wait.

RUCIO Consistency Status

Similar presentations


Presentation on theme: "RUCIO Consistency Status"— Presentation transcript:

1 RUCIO Consistency Status
Sarah Williams Indiana University

2 Making Consistency Checks with RUCIO
Previous version of CCC (Complete Consistency Check) using DQ2 api is still usable. However, changes brought by RUCIO have side-effects: Queries for a list of datasets at an endpoint returns only complete replicas, not incompletes. The command to list files in a dataset is slow. We have a working version of CCC using the RUCIO API. RUCIO developers would prefer that we not query each dataset to produce a list of files, as in the previous version. They are concerned that if multiple sites run the tool at once, it could cause load issues. They have agreed to produce a dump file with a list of datasets and files per dataset.

3 RUCIO Dump Files Two types of dump files are currently available:
A list of files per endpoint. A list of datasets per endpoint A dump of files per dataset is still pending. Request for the dumps was made mid-December. ETA was a few weeks. Timeline for producing this dump has been pushed back several times. Latest news: “Waiting on non trivial changes to Rucio that unfortunately take some time. The current time frame is end of April (after CHEP)”

4 RUCIO rse-audit vs CCC rucio rse-audit compares the list of files per endpoint to a list of files on disk. It knows nothing about datasets CCC fetches a list of datasets, then a list of files per dataset, and compares that to a list of files on disk. Due to the change in the API, it knows nothing about incomplete datasets. When a dataset is no longer listed as having a replica at a site, the files are not deleted right away. They are instead recorded as still being at the site, and are candidates for deletion if space is needed. CCC also does not know about these files. Therefore, these report different numbers for dark data. At present the rucio rse-audit numbers are more accurate.

5 RUCIO rse-audit vs CCC RSE RUCIO dark file # RUCIO bytes
CCC dark file # CCC Bytes MWT2_UC_PERF-JETS 200 25.98G 27.89G MWT2_UC_LOCALGROUPDISK 13.64K 730G 251.9K 13.39T MWT2_UC_TRIG-DAQ 0B 115 21.88G MWT2_UC_PRODDISK 155.4K 30.66T 378K 102.5T MWT2_UC_SCRATCHDISK 222 75.71G 232 82.3G MWT2_DATADISK 10.4K 16.86T 25.69K 53.33T MWT2_UC_USERDISK 496.4K 18.63T 2.156M 55.75T MWT2_UC_PHYS-TOP 1 183.3M MWT2_UC_PHYS-HIGGS 376 288.6G 309.9G MWT2_UC_PERF-TAU 38 117.5G 126.2G Totals 676.7K 67.36T 2.812M 225.6T

6 Other Monitors: rucio rse-usage and DDM Accounting Visualization
RUCIO reports RSE used space as the files written to the endpoint + the files scheduled to be written. With this information a user can decide if there is enough space to subscribe a dataset to an endpoint. But, this misleading to a site manager, who wants to know how much data RUCIO thinks is at the site. The following plot from DDM monitoring compares used space used in MWT2_DATADISK as reported by SRM (green dotted line) to RUCIO (red bars). Because the amount of used data RUCIO reports is higher, it looks like there is missing data, but in fact that is the subscribed data. The total space allocated by SRM (free + used) is represented by the magenta dotted line.

7 Other Monitors: rucio rse-usage and DDM Accounting Visualization (cont
Chart taken from the DDM Accounting Visualization site with MWT2_DATADISK selected Magenta dotted line: free + used SRM space (847TB on Mar 22) Green dotted line: used SRM space (723TB on Mar 22) Red bar: RUCIO reported used space (746TB on Mar 22)

8 Questions?


Download ppt "RUCIO Consistency Status"

Similar presentations


Ads by Google