The bulk dataTransfer procedures (CNAF, Nov 2015) L. Salconi, A. Bozzi.

Slides:



Advertisements
Similar presentations
NGAS – The Next Generation Archive System Jens Knudstrup NGAS The Next Generation Archive System.
Advertisements

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
11/12/2003LIGO Document G Z1 Data reduction for S3 I Leonor (UOregon), P Charlton (CIT), S Anderson (CIT), K Bayer (MIT), M Foster (PSU), S Grunewald.
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Introduction to MySQL Administration.  Server startup and shutdown ◦ How to manually start and stop it from the command line ◦ How to arrange an automated.
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Talend 5.4 Architecture Adam Pemble Talend Professional Services.
October 30, 2008 Extensible Workflow Management for Simmod ESUG32, Frankfurt, Oct 30, 2008 Alexander Scharnweber (DLR) October 30, 2008 Slide 1 > Extensible.
INFN-Pisa Glast Database in Pisa A practical solution based on MSAccess Luca Latronico INFN Pisa.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
LcgCAF:CDF submission portal to LCG Federica Fanzago for CDF-Italian Computing Group Gabriele Compostella, Francesco Delli Paoli, Donatella Lucchesi, Daniel.
LFC tutorial Jean-Philippe Baud, IT-GT, CERN July 2010.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Interactive Job Monitor: CafMon kill CafMon tail CafMon dir CafMon log CafMon top CafMon ps LcgCAF: CDF submission portal to LCG resources Francesco Delli.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Computer Emergency Notification System (CENS)
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford.
LHCb week, 27 May 2004, CERN1 Using services in DIRAC A.Tsaregorodtsev, CPPM, Marseille 2 nd ARDA Workshop, June 2004, CERN.
Database weekly reports Zbigniew Baranowski Carlos Fernando Gamboa.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
ALICE, ATLAS, CMS & LHCb joint workshop on
Andrey Meeting 7 October 2003 General scheme: jobs are planned to go where data are and to less loaded clusters SUNY.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
LMA: Log Mail Analyzer Maurizio Aiello National Research Council Institute of Electronics and Telecommunications and Information.
WP3 Information and Monitoring Steve Fisher / RAL 23/9/2003.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
The DataTransfer status Experience on VSR2 A. Bozzi, L. Salconi – 27 Oct 2009.
Introduction CMS database workshop 23 rd to 25 th of February 2004 Frank Glege.
Author: Andrew C. Smith Abstract: LHCb's participation in LCG's Service Challenge 3 involves testing the bulk data transfer infrastructure developed to.
INFSOM-RI Elisabetta Ronchieri INFN CNAF ETICS 2 nd EU Review (CERN) 15 February 2008 WP3 - Software Configuration Tools and Methodologies.
Services Security A. Casajus R. Graciani. 12/12/ Overview DIRAC Security Infrastructure HSGE Transport Authentication Authorization DIRAC Authorization.
MySQL and GRID status Gabriele Carcassi 9 September 2002.
1Antonella Bozzi – LSC/Virgo meeting Amsterdam Ligo/Virgo Data Transfer Bulk data replication tools and plans for S6 data replication Antonella.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Data Manipulation with Globus Toolkit Ivan Ivanovski TU München,
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Mardi 8 mars 2016 Status of new features in CIC Portal Latest Release of 22/08/07 Osman Aidel, Hélène Cordier, Cyril L’Orphelin, Gilles Mathieu IN2P3/CNRS.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Mario Reale – GARR NetJobs: Network Monitoring Using Grid Jobs.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
EGEE is a project funded by the European Union under contract IST Issues from current Experience SA1 Feedback to JRA1 A. Pacheco PIC Barcelona.
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA Grid2Win : gLite for Microsoft Windows Elisa Ingrà - INFN.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
IST 201 Chapter 11 Lecture 2. Ports Used by TCP & UDP Keep track of different types of transmissions crossing the network simultaneously. Combination.
CTA: CERN Tape Archive Rationale, Architecture and Status
Jean-Philippe Baud, IT-GD, CERN November 2007
Simulation Production System
Real Time Fake Analysis at PIC
StoRM: a SRM solution for disk based storage systems
An Overview of iRODS Integrated Rule-Oriented Data System
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
StoRM Architecture and Daemons
Pluggable code Repository
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Patrick Dreher Research Scientist & Associate Director
Web Browser server client 3-Tier Architecture Apache web server PHP
Status and plans for bookkeeping system and production tools
Production Manager Tools (New Architecture)
Sending data to EUROSTAT using STATEL and STADIUM web client
Presentation transcript:

The bulk dataTransfer procedures (CNAF, Nov 2015) L. Salconi, A. Bozzi

Technical specs A modular program (written in Python) that manages the file replica to the remote repositories (the datasender.py module):  class based architecture, modular and expandable for future requirements;  database with MySQL interface (with full stats for each file);  logging system with support support for file, smtp, http;  new command line interface with feedback support;  thread based;  run-time parameter controls (bandwidth limits, destination dirs, ecc..)‏;  support for both data sender engines (GRID lcg tools and iRODS) via a wrapper classes with standard methods (sendFile, checkFile)‏;  support for standard.ini sections oriented configuration files;  management for 2 main data flows (rawdata and RDS) for each remote repository;  socket interface with the DAQ data production flow;

Operational schema thread socket server MySQL (db + queries)‏ new files (from DAQ or archive)‏ multimode log LCG tools (to Bologna)‏ iRODS (to Lyon)‏ operator (from CLI)‏

The “fake” ffl generation Lyon Bologna datagw.virgo.infn.it local files on disk local ffl files dataSender informations (log + ini)‏ new remote ffl file FrDump and tools

Achieved performance Last spring we test deeply both engines (iRODS and LCG tools) with all parameter combination and with a bucket of rawdata files from the current production, we found the typical speed rate is:  to Lyon:42 MBytes/sec  to Bologna:65 MBytes/sec (we sent 959 rawdata files of about 1.25 GByte size each one, no errors were unmanaged by the software)

Tips, tricks and desiderata Some tips, tricks and desiderata to improve the software:  a permanent daemon module installed at CC’s: no more fake ffl, better control on file’s true final location, use of a true md5 or adler32 checksum code, ffl can be produced “on the fly” after a request;  a recursive parser: the current architecture do not performs well on big subdirectory layout (like LIGO’s) and do not support recursive replication, this issue is closely related to the used engines;  a local disk bucket at remote CC’s: this will avoid the coupling with the local engines. Any engine can be used for the transfer and, after the succesfully replica, the file can be moved and registered locally to final destination.  a longer proxy certificate expire time: this will help with LCG tools and daily authentication;

Final architecture (desiderata) local ffl files Storage farm (Cascina) Remote bucket Storage farm (remote CC) Transfer tool (x-ftp)local tool