Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.

Slides:

Advertisements

Similar presentations

Xrootd and clouds Doug Benjamin Duke University. Introduction Cloud computing is here to stay – likely more than just Hype (Gartner Research Hype Cycle.

Advertisements

Duke and ANL ASC Tier 3 (stand alone Tier 3’s) Doug Benjamin Duke University.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.

Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.

1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu

Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.

December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.

ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.

The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.

Integrating HPC into the ATLAS Distributed Computing environment Doug Benjamin Duke University.

How to Install and Use the DQ2 User Tools US ATLAS Tier2 workshop at IU June 20, Bloomington, IN Marco Mambelli University of Chicago.

PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…

Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.

FAX UPDATE 26 TH AUGUST Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation.

DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006.

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.

OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL

Tier 3 Storage Survey and Status of Tools for Tier 3 access, futures Doug Benjamin Duke University.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April

PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.

What is SAM-Grid? Job Handling Data Handling Monitoring and Information.

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.

AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.

SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.

Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.

PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.

INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.

Data Management: US Focus Kaushik De, Armen Vartapetian Univ. of Texas at Arlington US ATLAS Facility, SLAC Apr 7, 2014.

Storage and Data Movement at FNAL D. Petravick CHEP 2003.

PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.

Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.

Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.

Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.

Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.

FAX UPDATE 12 TH AUGUST Discussion points: Developments FAX failover monitoring and issues SSB Mailing issues Panda re-brokering to FAX Monitoring.

MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.

US Atlas Tier 3 Overview Doug Benjamin Duke University.

U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

T3g software services Outline of the T3g Components R. Yoshida (ANL)

Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.

Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.

10 March Andrey Grid Tools Working Prototype of Distributed Computing Infrastructure for Physics Analysis SUNY.

An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

New Features of Xrootd SE Wei Yang US ATLAS Tier 2/Tier 3 meeting, University of Texas, Arlington,

Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.

Atlas Tier 3 Overview Doug Benjamin Duke University.

J Jensen / WP5 /RAL UCL 4/5 March 2004 GridPP / DataGrid wrap-up Mass Storage Management J Jensen

ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,

WLCG IPv6 deployment strategy

Computing Operations Roadmap

Doug Benjamin Duke University On Behalf of the Atlas Collaboration

ATLAS Grid Information System

BNL Tier1 Report Worker nodes Tier 1: added 88 Dell R430 nodes

Future of WAN Access in ATLAS

RUCIO Consistency Status

Brookhaven National Laboratory Storage service Group Hironori Ito

Data Management cluster summary

Presentation transcript:

Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University

Current situation Using information from AGIS SRM based sites – 9 sites o ANLASC (HPC frontend), Bellarmine-T3, IllinoisHEP, Lucille, NERSC (local groupdisk only), OUHEP, SMU (local group disk only), Penn (local group disk only), Wisc (localgroup disk only) Gridftp only endpoints – 8 +1 sites o ANLASC (for ANL Tier 3), Duke, Indiana (online?), Nevis, NYU (still online?), Stony Brook, UC Santa Cruz (not on line), UT Dallas, (SLAC has a gridftp only site but list in AGIS as Tier 3) Gridftp only sites – Not officially supported with ATLAS ADC. DDM team helps best effort – Transfers monitored on test Dashboard tbed.cern.ch/dashboard/request.py/dataset?site=NYU- ATLAS_GRIDFTP tbed.cern.ch/dashboard/request.py/dataset?site=NYU- ATLAS_GRIDFTP

Why gridftp only endpoints in first place? Facilities need controlled way to transfer data Users want simple efficient way to transfer the data SLAC moves a lot of data through its gridftp only endpoint No worries about consistency between files on site and central LFC or Rucio catalogs File cleanup a local issue (no deletion service) No way to broker PANDA jobs against this storage (blessing and a curse) Can we do better?

Rucio Cache nt.html nt.html

Rucio Cache(2) “A cache is storage service which keeps additional copies of files to reduce response time and bandwidth usage. In Rucio, a cache is an RSE, tagged as volatile. The control of the cache content is usually handled by an external process or applications (e.g. Panda) and not by Rucio. Thus, as Rucio doesn’t control all file movements on these RSEs, the application populating the cache must register and unregister these file replicas in Rucio. The information about replica location on volatile RSEs can have a lifetime. Replicas registered on volatile RSEs are excluded from the Rucio replica management system (replication rules, quota, replication locks) described in the section Replica management. Explicit transfer requests can be made to Rucio in order to populate the cache.”

Rucio Cache (RSE) development Have had several meetings in person and remotely with Rucio development team to identify the issues Currently technologies under development o Xrootd FRM (file redicency manager) o GRIDFTP server o Web DAV server (after first 2) Rucio team determining best way to get the file population and file depopulation information from Cache. o Current Baseline is to use call outs to ActiveMQ server at CERN o Need to agree on API (or at least the message format) o Xrootd solution looks straight forward o Just started to look at globus DSI (data interface) – Wei has given my the xrootd example for the DSI libraries o Also some though about publishing Cache content (ala ARC cache)

Can we use the Federated Storage at Tier 1 and Tier 2 ? US ATLAS has the potential to have decent amount of storage in Local Group Disk at Tier 1 and Tier 2 sites. This storage has excellent network connectivity Many US ATLAS institutions have been given NSF funds for significant network upgrades to the edge of the campus (Duke for example is going to many 10’s Gbs WAN, other places even higher – 100 Gbe) Many US ATLAS insitutions are close in network time (others are not) to a Tier 1 or Tier 2 site.

Fax can be part of the Tier 3 data handling solution But…. Need a consistent way to reference the data – With the new Rucio N2N system is much more scalable. Testing needs to identify the problems with the system before the end user For example – Recently – while doing data analysis I discovered a severe incompatibility between the MWT2 networking configuration and Xrootd. o Pilot error was blamed initially because the current level of testing did not show the problem. The current testing is not doing what users are doing some of the time. Worry that if the system is not more robust – far fewer failures using root code from a Tier 3 then initial users might walk away from the system and give it bad press.

Next Steps Need to identify more effort for the various development activities Will need help from Rucio team but they are focused on getting the system output (as they should be) Expect prototype Rucio Cache before the summer Expect a production ready part for Tier 3’s by end of September More labor from facilities would help. Likely should think about better network monitoring at the Tier 3 site site (perfsonar boxes from Tier 1/Tier 2 cast off machines?)

Summary Need to consider end to end solution for data at the Tier 3 From Grid to final plots for talk/ conference and paper All Solutions need to be trivial/ robust and set and forget. Need to minimize the support load on End users and local Site system admins. o Time administering the system reduces the time for Scientific Discovery Federated Storage and new Tools (Rucio Cache for example) will play a crucial role in Run 2 and we need to be prepared and ready by the data challenge next Summer if at possible