Grid data transfer tools Tier 3(g,w) meeting at ANL ASC – May 19, 2009 Marco Mambelli – University of Chicago

Slides:



Advertisements
Similar presentations
ATLAS T1/T2 Name Space Issue with Federated Storage Hironori Ito Brookhaven National Laboratory.
Advertisements

Submitting jobs to the grid Argonne Jamboree January 2010 R. Yoshida Esteban Fullana.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
Setup your environment : From Andy Hass: Set ATLCURRENT file to contain "none". You should then see, when you login: $ bash Doing hepix login.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Data management at T3s Hironori Ito Brookhaven National Laboratory.
Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.
How to Install and Use the DQ2 User Tools US ATLAS Tier2 workshop at IU June 20, Bloomington, IN Marco Mambelli University of Chicago.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Submitting jobs to the grid Argonne Jamboree January 2010 R. Yoshida (revised March 2010) Esteban Fullana.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
My Name: ATLAS Computing Meeting – NN Xxxxxx A Dynamic System for ATLAS Software Installation on OSG Sites Xin Zhao, Tadashi Maeno, Torre Wenaus.
Jan 31, 2006 SEE-GRID Nis Training Session Hands-on V: Standard Grid Usage Dušan Vudragović SCL and ATLAS group Institute of Physics, Belgrade.
OSG Tier 3 support Marco Mambelli - OSG Tier 3 Dan Fraser - OSG Tier 3 liaison Tanya Levshina - OSG.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Nurcan Ozturk University of Texas at Arlington US ATLAS Transparent Distributed Facility Workshop University of North Carolina - March 4, 2008 A Distributed.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
Tier 3 Status at Panjab V. Bhatnagar, S. Gautam India-CMS Meeting, July 20-21, 2007 BARC, Mumbai Centre of Advanced Study in Physics, Panjab University,
INFSO-RI Enabling Grids for E-sciencE ATLAS DDM Operations - II Monitoring and Daily Tasks Jiří Chudoba ATLAS meeting, ,
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
Data Management The European DataGrid Project Team
Author - Title- Date - n° 1 Partner Logo WP5 Status John Gordon Budapest September 2002.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
Tier 3 Support and the OSG US ATLAS Tier2/Tier3 Workshop at UChicago August 20, 2009 Marco Mambelli –
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
28 Nov 2007 Alessandro Di Girolamo 1 A “Hands On” overview of the ATLAS Distributed Data Management Disclaimer & Special Thanks Things are changing (of.
Discussion on data transfer options for Tier 3 Tier 3(g,w) meeting at ANL ASC – May 19, 2009 Marco Mambelli – University of Chicago
Data Management at Tier-1 and Tier-2 Centers Hironori Ito Brookhaven National Laboratory US ATLAS Tier-2/Tier-3/OSG meeting March 2010.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
Entorno De Prácticas Gabriel Amorós (IFIC). vo.formacion.es-ngi.eu: is a Virtual Organisation (VO) from the Spanish National Grid Initiative (NGI) devoted.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
GRID commands lines Original presentation from David Bouvet CC/IN2P3/CNRS.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
CERN IT Department CH-1211 Genève 23 Switzerland t CMS SAM Testing Andrea Sciabà Grid Deployment Board May 14, 2008.
Data management in ATLAS S. Jézéquel LAPP-CNRS-France.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
a brief summary for users
gLite Basic APIs Christos Filippidis
ATLAS Use and Experience of FTS
Java API del Logical File Catalog (LFC)
gLite 1.4. Data Mangement Exercises
Data Challenge with the Grid in ATLAS
Extended OSG client for WLCG
R. Graciani for LHCb Mumbay, Feb 2006
Data Management Ouafa Bentaleb CERIST, Algeria
Data services in gLite “s” gLite and LCG.
Architecture of the gLite Data Management System
Presentation transcript:

Grid data transfer tools Tier 3(g,w) meeting at ANL ASC – May 19, 2009 Marco Mambelli – University of Chicago

Grid/OSG components  Standard installation  Set of services or resources containing data and providing processing power  Computing elements (CE)  Storage elements (SE)  Information systems  Clients WN Disk CE Door Gatekeeper SE LRM Clients Information system Clients Auxiliary systems

Site ATLAS Distributed Data Management (DQ2) DQ2 dataset catalogs: -files in dataset -where are datasets DQ2 “Queued Transfers” Local File Catalog (LFN ->PFN) File Transfer Service File Transfer Service DQ2 Subscription Agents DQ2 (ATLAS) Grid provided ATLAS provided? Local Storage Remote Storage

ATLAS space token token namestorage type used ATLASDATATAPET1D0RAW data, ESD, AOD from re-proc XX ATLASDATADISKT0D1ESD, AOD from dataXXX ATLASMCTAPET1D0HITS from G4, AOD from ATLFAST X ATLASMCDISKT0D1AOD from MCXXX ATLASPRODDISKT0D1buffer for in-and exportX ATLASGROUPDISKT0D1DPDXXX ATLASUSERDISKT0D1User DataXX *) ATLASLOCALGROUP DISK T0D1Local User How data is organized at sites (SEs)

Name used to refer files  Logical File Name  lfn:/grid/dteam/vkoukis-gfal-test2  Grid Unique IDentifier (GUID)  guid:4f74b453-5aaf-4af5-af2d-b438d081bb63  Storage URL (for a specific replica, on a specific Storage Element)  srm://se01.athena.hellasgrid.gr/pnfs/athena.hellasgrid.gr/data/dteam/generated/ /file775276c2-7bbf-4f3d-92cf-5d5cdbaba254  Transport URL (for a specific replica, on an SE, with a specific protocol)  gsidcap://se01.athena.hellasgrid.gr:22128//pnfs/athena.hellasgrid.gr/data/dteam/generated/ /file775276c2-7bbf-4f3d-92cf-5d5cdbaba254 SURLGUID LFN TURL

Datasets and Containers  Container datasets are the logical equivalent of physics datasets. They contain the same files.  Their name is the dataset name before the tid (or T for cosmic data). Name ends with a ‘/’  Containers can contain only DDM datasets. You cannot make containers of containers  Dataset in DDM is a set of files  For example, MC containers contain the files belonging to the tid datasets (containing tidxxxx with xxxx as task number).  Most of the dq2 commands are similar for datasets and containers except few ones

Example  Data : data08_cosmag physics_RPCwBeam.recon.ESD_FIL TERED.o4_f70/  MC :mc PythiaWenu_1Lepton.recon.AOD.e352_s462_r 541/

How to move data  DQ2 Client Tools  command line  package developed by ATLAS  RPM and Pacman  uses Grid middleware  tested on WLCG-Client and gLiteUI

Discover and get data  DDM central catalog:  list of sites contain at least a fraction of dataset  List of GUIDs per dataset + checksum + size (for consistency check)  LFC catalog : List of files on a site:  Each catalog contains list of physical replicas for a GUID  Input : list of GUIDs  Middle : For each GUID, list of replicas managed by T1 LFC  Output: List of GUIDs on the site  In addition: list of SURLs of the files (used by jobs)  Storage Element catalog:  Map the SURL  WARNING : No check of consistency between catalogs : DDM/ LFC/SE

Client installation  Install DQ2 clients (and WLCG-client)  $ source /share/pacman/setup.sh  $ mkdir /share/wlcg-client; cd /share/wlcg-client  $ pacman -get  (choose local installation ‘l’)  source setup.sh  vi $VDT_LOCATION/vdt/etc/vdt-update-certs.conf  (choose OSG certificate set)  source $VDT_LOCATION/vdt-questions.sh; $VDT_LOCATION/vdt/sbin/vdt-setup-ca-certificates  DQ2 client itself  mkdir /share/dq2-client; cd /share/dq2-client  pacman -trust-all-caches -allow tar-overwrite -get pacman/cache:DQ2Clients

Client setup  Setup with WLCG client (superset of OSG/VDT Client)  $ source /share/osg-client/setup.sh  $ source /share/dq2-client/setup.sh  (you need a grid certificate in ATLAS VO)  $ voms-proxy-init –voms atlas:/atlas  Now you can:  move dataset  find files  copy files

Commands and Use   dq2-ls -fp DATASETNAME  dq2-ls -L SITENAME -fp DATASETNAME  dq2-get DATASETNAME  dq2-get -f FILENAME DATASETNAME  dq2-put -s SOURCEDIRECTORY DATASETNAME  dq2-list-datasets-container CONTAINERNAME  dq2-list-dataset-replicas-container CONTAINERNAME  dq2-get -f FILENAME1 CONTAINERNAME

References and other way to move data  DQ2 Clients  ement ement   OSG OSG  Subscriptions   Direct copy  Examples for the first two follow

Subscription service

DDM Dashboard Can be used to check the status of your (and other) subscriptions Use drill-down menus

DDM Dashboard detail Here you see the subscriptions to Chicago (BNL cloud)

Some observations  Works generally for O(TB) datasets  Large number of small files are inefficient (avoid O(100) files datasets  DDM operations for support

DQ2 Clients  Use the how-to page as reference (open in browser)  Live demo (terminal)  setup  voms proxy  dq2-ls, -L ROAMING, name*; hat’s in the output  containers (= physics datasets)  _tidXXXX datasets  non _tid datasets – DO NOT USE THIS!  dq2-list-dataset-replicas-container, explain output  which site (SE) has all your files?  # of complete (datasets) = Total datasets  verify that the files are there: dq2-ls -f (verify path -p)  use –L site_that_you_want_to_check

DQ2 Clients (2)  (cont)  Create PoolFileCatalog.xml dq2-ls –P  --replace-io=REPLACEIO ('regularexpression^newvalue’)  e.g. srm://uct2-dc1.uchicago.edu(:[0- 9]+)(/srm/managerv2SFN=)/pnfs/^dcache:/pnfs/  Errors in dq2-get  no replica available (check dq2-ls -r)  files not copied: try different ‘protocol’: ng,lcg,lcg1,srm,dcap,rfio,…  checksum error: report to DDM operations  Help  DDM operation savannah portal or hn-atlas-DDMOperations  hn-atlas-dist-analysis-help hypernews

Demo 1 [uct3-edge5] /ecache/marco/test_dq2-get >. /share/wlcg-client/setup.sh >. /share/dq2-client/setup.sh > voms-proxy-init -voms atlas Enter GRID pass phrase: Your identity: /DC=org/DC=doegrids/OU=People/CN=Marco Mambelli Creating temporary proxy Done Contacting lcg-voms.cern.ch:15001 [/DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch] "atlas" Done Creating proxy Done Your proxy is valid until Thu Feb 26 22:13: > dq2-ls -L ROAMING mc singlepart_e_Et25.recon.AOD.* mc singlepart_e_Et25.recon.AOD.e342_s462_r604_tid032272_sub mc singlepart_e_Et25.recon.AOD.e342_s462_r604_tid032272_sub mc singlepart_e_Et25.recon.AOD.e342_s462_r563 … mc singlepart_e_Et25.recon.AOD.e342_s462_r604/ mc singlepart_e_Et25.recon.AOD.e342_s462_r604_tid mc singlepart_e_Et25.recon.AOD.e342_s462_r604_tid032272

Demo 2 > dq2-list-dataset-replicas-container \ mc singlepart_e_Et25.recon.AOD.e342_s462_r563/ mc singlepart_e_Et25.recon.AOD.e342_s462_r563_tid027926: INCOMPLETE: WISC_MCDISK COMPLETE: AGLT2_MCDISK,BNL-OSG2_MCDISK,BNLPANDA,… mc singlepart_e_Et25.recon.AOD.e342_s462_r563_tid027925: INCOMPLETE: RRC-KI_MCDISK COMPLETE: AGLT2_MCDISK,BNL-OSG2_MCDISK,BNLPANDA,… Container name: mc singlepart_e_Et25.recon.AOD.e342_s462_r563/ Total datasets: 2 Summary: SITE / # COMPLETE / # INCOMPLETE / TOTAL UKI-SCOTGRID-GLASGOW_MCDISK WISC_MCDISK CERN-PROD_MCDISK … SARA-MATRIX_MCDISK MWT2_UC_MCDISK 2 0 2

Demo 3 > dq2-ls -L MWT2_UC_MCDISK -f mc singlepart_e_Et25.recon.AOD.e342_s462_r563_tid mc singlepart_e_Et25.recon.AOD.e342_s462_r563_tid [X] AOD _00507.pool.root.3 48B4D116-38AE-DD11-B876-00A0D1E7FB98 md5:ba76b8c771a11c d0fb [X] AOD _00196.pool.root.1 5E39D1C3-31AE-DD11-9C54-00A0D1E7BC5E md5:6a2fd3ad0464c4da30faaf05ca55b86a [X] AOD _00249.pool.root.1 0A AE-DD11-9E81-00A0D1E70C54 md5:797dfea0f1b e18f96c6e060e … [X] AOD _00209.pool.root.1 3C2D1AC7-31AE-DD A0D1E7B60E md5:2b85fcf c8ec3d5b7725e5f [X] AOD _00765.pool.root.1 B0773A0C-A8AF-DD11-ABD6-00A0D1E50051 md5:099b1c154adf4e4376def5afd93c total files: 720 local files: 720 total size: date: :53:23 > dq2-ls -L MWT2_UC_MCDISK -f mc singlepart_e_Et25.recon.AOD.e342_s462_r563_tid |\ tail -n 4 total files: 80 local files: 80 total size: date: :38:26 > dq2-ls -L MWT2_UC_MCDISK -fp mc singlepart_e_Et25.recon.AOD.e342_s462_r563_tid027926