CCRC'08 experience at PIC storage POV

Slides:



Advertisements
Similar presentations
S.Chechelnitskiy / SFU Simon Fraser Running CE and SE in a XEN virtualized environment S.Chechelnitskiy Simon Fraser University CHEP 2007 September 6 th.
Advertisements

EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari
Prime’ Senior Project. Presentation Outline What is Our Project? Problem Definition What does our system do? How does the system work? Implementation.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Getting prepared for virtual hands-on workshop : “Enjoyable Introduction to Programming using Alice” Review this presentation early and prepare for a successful.
CERN IT Department CH-1211 Genève 23 Switzerland t Some Hints for “Best Practice” Regarding VO Boxes Running Critical Services and Real Use-cases.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Offline Programming to Online using IPS
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Introduction: Distributed POOL File Access Elizabeth Gallas - Oxford – September 16, 2009 Offline Database Meeting.
Computing Infrastructure for Large Ecommerce Systems -- based on material written by Jacob Lindeman.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Imports. Odyssey Teacher and Student Import With the Odyssey Import tool, you can create or update teacher and student records directly from a comma‐separated.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
EGEE is a project funded by the European Union under contract IST Gap analysis draft v2 Olle Mulmo, David Groep, Joni Hahkala JRA3 Gap, 10.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
Alberto Aimar CERN – LCG1 Reliability Reports – May 2007
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
Karsten Köneke October 22 nd 2007 Ganga User Experience 1/9 Outline: Introduction What are we trying to do? Problems What are the problems? Conclusions.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Priorities update Andrea Sciabà IT/GS Ulrich Schwickerath IT/FIO.
C Wrapper to LAPACK by Rémi Delmas supervised by Julien Langou.
Busy Storage Services Flavia Donno CERN/IT-GS WLCG Management Board, CERN 10 March 2009.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
20 October 2005 LCG Generator Services monthly meeting, CERN Validation of GENSER & News on GENSER Alexander Toropin LCG Generator Services monthly meeting.
INTRODUCTION TO COMPUTER PROGRAMMING(IT-303) Basics.
AliRoot survey: Reconstruction P.Hristov 11/06/2013.
7.1 Operating Systems. 7.2 A computer is a system composed of two major components: hardware and software. Computer hardware is the physical equipment.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Lecture 4 Page 1 CS 111 Summer 2013 Scheduling CS 111 Operating Systems Peter Reiher.
Chapter 25 – Configuration Management 1Chapter 25 Configuration management.
Dave Newbold, University of BristolGridPP Middleware Meeting ‘Real World’ issues from DC04 DC04: Trying to operate the CMS computing system at 25Hz for.
Weighted Available Space Selection Fixing write pool selection Mattias Wadenstein Gerd Behrmann
Jean-Philippe Baud, IT-GD, CERN November 2007
Design Rules for NBD – Network Based Defence
Kevin Thaddeus Flood University of Wisconsin
Introduction to the Theory of Constraints (TOC) & Critical Chain Project Management (CCPM) Major Mark McNabb.
First proposal for a modification of the GIS schema
Integrating HA Legacy Products into OpenSAF based system
dCache “Intro” a layperson perspective Frank Würthwein UCSD
Latest WMS news and more
Copyright ©: Nahrstedt, Angrave, Abdelzaher
Run Control (and Other) Work
Service Challenge 3 CERN
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
R-GMA as an example of a generic framework for information exchange
INFNGRID Workshop – Bari, Italy, October 2004
Abhishek Singh Rana UC San Diego
SRM2 Migration Strategy
CCRC08 May Post-Mortem Tier-1 view
AAA from HEP* Perspective
Active Directory Administration
Process Description and Control
Artem Trunov and EKP team EPK – Uni Karlsruhe
Get z0-499 Actual Tests - 1z0-499 Actual Dumps PDF - Dumps4download.us
CIT 470: Advanced Network and System Administration
Pass Oracle 1Z0-499 Exam with Valid 1Z0-499 Exam Question Answers - Dumps4download
Introduction to Operating Systems
Exploring the Power of EPDM Tasks - Working with and Developing Tasks in EPDM By: Marc Young XLM Solutions
Agile testing for web API with Postman
Discussing an OVS/OVN Split
Operating System Introduction.
Hardware-less Testing for RAS Software
New cluster capabilities through Checkpoint/ Restart
Introducing NTFS Reliability Security Long file names Efficiency
Francis Soriano (EN/ACE)
Summary of the dCache workshop
Presentation transcript:

CCRC'08 experience at PIC storage POV Francisco Martínez Gerard Bernabeu Esther Acción

Outline Preparation Results Issues With proposals Conclusions

Preparation Little preparation was needed Most operations were carried over from Feb run Only the definitions of the tokens and namespace was needed to be modified It was handed to us by experiments the last day, that at least is an improvement from Feb run :) LHCb did not provide us the requeriments :(

Preparation dCache version was 1.8.0-12p6 (Feb run ready) We decided not to upgrade to the latest dCache version, the recommended one for CCRC08 run2, for its late arrival. It turned out to be an excellent choice. Enstore version was 1.0.1 No special versions for CCRC08, enstore seems to be pretty LHC-agnostic.

Results: network pools wan wn

Results: enstore Only IBM robot was used No special load or behavior CCRC08 running was not apparent

Issues Data on pools was poorly balanced Interferences PinManager bug Mover queues dimensioning Enstore movers

Issues Data on pools was poorly balanced Some pools were almost full Some pools were freshly added and had no data on them This lead to multiple crashes on pools due to overload dc046

Issues Proposal for better balancing Deep analysis of the cost assignment policy on dCache formerly no work at all in this area, we have the naive default policy Implementation of Brian Bockleman's scripts for load balancing of files and non-automatic file replication Replica Manager that-works® Dcache-pfm: Physical File Manager

Issues Interferences IFCA was trying to read files that we had marked as without access (STK robot) Those accesses generated unnecessary mover queuing Thanks to Brian Bockleman for sorting this out! Would be nice if Experiments can enforce files marked offline in some way Cannot be done at dCache level

Issues PinManager bug Running dCache version 1.8.0-12p6 has a bug regarding the pinManager Some files become unaccesible due to lcg-gt hanging A cron restarting the pinManager solved this Current dCache release 1.8.0-15p5 has already solved this

Issues Mover queues dimensioning Has to be done depending on the type of access of jobs Low throughput-high duration profile, with locking of files These need a high number of movers. Maybe a deadlock detection mechanism would help at application level! Throughput intensive profile These suggest low number of movers

Issues Proposal for mover queues Jon Backen, head dCache admin at FNAL is coming to visit PIC They have implemented drastical solutions to this problem We will need to see if we can use those configuration queues with 1800 movers

Issues Enstore movers Had an issue with a memory leak that made the movers go offline Was solved before run2 started Enstore accounting We have the info, but it is difficult to parse or analyze. It is advisable to implement some way to visualize that data

Conclusions Storage component at PIC seems to be roughly ready to start data-taking, but Still pending issues with dCache Load balancing is specially critical Kudos to Enstore: no issues We have good feedback with experiments thanks to the Liaisons, but Experiments as a whole are many times not too communicative