RUCIO Consistency Status

Slides:



Advertisements
Similar presentations
Building Portals to access Grid Middleware National Technical University of Athens Konstantinos Dolkas, On behalf of Andreas Menychtas.
Advertisements

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Chapter 101 Cleaning Policy When should a modified page be written out to disk?  Demand cleaning write page out only when its frame has been selected.
Chap 2 System Structures.
U.ACHIEVE Let us help you get your degree.. What is u.achieve?  u.achieve is the new degree evaluation tool purchased by EMU. This program will allow.
Modern app development Continuous value delivery and rapid response to change.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Computer Organization and Architecture
New CERN CAF facility: parameters, usage statistics, user support Marco MEONI Jan Fiete GROSSE-OETRINGHAUS CERN - Offline Week –
Tier-0: Preparations for Run-2 Armin NAIRZ (CERN) ADC Technical Interchange Meeting Chicago, 29 October 2014.
AMOD Report Doug Benjamin Duke University. Hourly Jobs Running during last week 140 K Blue – MC simulation Yellow Data processing Red – user Analysis.
HBar OR Reader Documentation A copy of the PowerPoint Viewer is shipped with the HBar OR Reader on the HBar Official Records [OR] CD. The PowerPoint Viewer.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Palm OS Jeremy Etzkorn Paul Rutschky Adam Lee Amit Bhatia Tony Picarazzi.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
Network Management Tool Amy Auburger. 2 Product Overview Made by Ipswitch Affordable alternative to expensive & complicated Network Management Systems.
1-1 Embedded Network Interface (ENI) API Concepts Shared RAM vs. FIFO modes ENI API’s.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
Disk Fragmentation 1. Contents What is Disk Fragmentation Solution For Disk Fragmentation Key features of NTFS Comparing Between NTFS and FAT 2.
Guide to Linux Installation and Administration, 2e1 Chapter 10 Managing System Resources.
DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006.
26-Oct-15CSE 542: Operating Systems1 File system trace papers The Design and Implementation of a Log- Structured File System. M. Rosenblum, and J.K. Ousterhout.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
CS333 Intro to Operating Systems Jonathan Walpole.
EGI-InSPIRE EGI-InSPIRE RI DDM solutions for disk space resource optimization Fernando H. Barreiro Megino (CERN-IT Experiment Support)
This is a presentation of the ABC Plus Waves and the COMBO Plus Waves. These Plus waves are the latest version, released on April 28, Even though.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
Distributed Data Management Miguel Branco 1 DQ2 discussion on future features BNL workshop October 4, 2007.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Disk Space publication Simone Campana Fernando Barreiro Wahid.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)
Consistency Checking And RUCIO Progress Update Sarah Williams Indiana University ADC Weekly Meeting,
TNPM v1.3 Flow Control. 2 High Level Instead of each component having flow control settings that govern only its directory, we now have a set of flow.
ATLAS Computing Wenjing Wu outline Local accounts Tier3 resources Tier2 resources.
PD2P Planning Kaushik De Univ. of Texas at Arlington S&C Week, CERN Dec 2, 2010.
Data Distribution Performance Hironori Ito Brookhaven National Laboratory.
System Components Operating System Services System Calls.
ATLAS DDM Developing a Data Management System for the ATLAS Experiment September 20, 2005 Miguel Branco
Virtual Memory.
Central Applications Office (CAO)
Memory Management.
Jonathan Walpole Computer Science Portland State University
The ATLAS “DQ2 Accounting and Storage Usage Service”
BNL Tier1 Report Worker nodes Tier 1: added 88 Dell R430 nodes
BACKGROUND New Jersey Immunization Information
David Adams Brookhaven National Laboratory September 28, 2006
CS703 - Advanced Operating Systems
Let us help you get your degree.
INSTRUCTIONS TO COMPLETE 2017 POOL RE MEMBER DATA RETURN
Storage information and the Site Status Board
Welcome to our first session!
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Chapter 9: Virtual-Memory Management
TIME LINE.
So far… Text RO …. printf() RW link printf Linking, loading
Microsoft Visual Source Safe How & Why
Michael P. McCumber Task Force Meeting April 3, 2006
Chapter 2: Operating-System Structures
Introduction to Operating Systems
Outline Chapter 2 (cont) OS Design OS structure
Critical Path Simulation
Chapter 2: Operating-System Structures
Data Quality 2 (DQ2) & Staff Reporting Webinar
Presentation transcript:

RUCIO Consistency Status Sarah Williams Indiana University 2015-03-23

Making Consistency Checks with RUCIO Previous version of CCC (Complete Consistency Check) using DQ2 api is still usable. However, changes brought by RUCIO have side-effects: Queries for a list of datasets at an endpoint returns only complete replicas, not incompletes. The command to list files in a dataset is slow. We have a working version of CCC using the RUCIO API. RUCIO developers would prefer that we not query each dataset to produce a list of files, as in the previous version. They are concerned that if multiple sites run the tool at once, it could cause load issues. They have agreed to produce a dump file with a list of datasets and files per dataset.

RUCIO Dump Files Two types of dump files are currently available: A list of files per endpoint. A list of datasets per endpoint A dump of files per dataset is still pending. Request for the dumps was made mid-December. ETA was a few weeks. Timeline for producing this dump has been pushed back several times. Latest news: “Waiting on non trivial changes to Rucio that unfortunately take some time. The current time frame is end of April (after CHEP)”

RUCIO rse-audit vs CCC rucio rse-audit compares the list of files per endpoint to a list of files on disk. It knows nothing about datasets CCC fetches a list of datasets, then a list of files per dataset, and compares that to a list of files on disk. Due to the change in the API, it knows nothing about incomplete datasets. When a dataset is no longer listed as having a replica at a site, the files are not deleted right away. They are instead recorded as still being at the site, and are candidates for deletion if space is needed. CCC also does not know about these files. Therefore, these report different numbers for dark data. At present the rucio rse-audit numbers are more accurate.

RUCIO rse-audit vs CCC RSE RUCIO dark file # RUCIO bytes CCC dark file # CCC Bytes MWT2_UC_PERF-JETS 200 25.98G 27.89G MWT2_UC_LOCALGROUPDISK 13.64K 730G 251.9K 13.39T MWT2_UC_TRIG-DAQ 0B 115 21.88G MWT2_UC_PRODDISK 155.4K 30.66T 378K 102.5T MWT2_UC_SCRATCHDISK 222 75.71G 232 82.3G MWT2_DATADISK 10.4K 16.86T 25.69K 53.33T MWT2_UC_USERDISK 496.4K 18.63T 2.156M 55.75T MWT2_UC_PHYS-TOP 1 183.3M MWT2_UC_PHYS-HIGGS 376 288.6G 309.9G MWT2_UC_PERF-TAU 38 117.5G 126.2G Totals 676.7K 67.36T 2.812M 225.6T

Other Monitors: rucio rse-usage and DDM Accounting Visualization RUCIO reports RSE used space as the files written to the endpoint + the files scheduled to be written. With this information a user can decide if there is enough space to subscribe a dataset to an endpoint. But, this misleading to a site manager, who wants to know how much data RUCIO thinks is at the site. The following plot from DDM monitoring compares used space used in MWT2_DATADISK as reported by SRM (green dotted line) to RUCIO (red bars). Because the amount of used data RUCIO reports is higher, it looks like there is missing data, but in fact that is the subscribed data. The total space allocated by SRM (free + used) is represented by the magenta dotted line.

Other Monitors: rucio rse-usage and DDM Accounting Visualization (cont Chart taken from the DDM Accounting Visualization site with MWT2_DATADISK selected Magenta dotted line: free + used SRM space (847TB on Mar 22) Green dotted line: used SRM space (723TB on Mar 22) Red bar: RUCIO reported used space (746TB on Mar 22)

Questions?