Enrico Fattibene INFN-CNAF

Slides:



Advertisements
Similar presentations
Workflows over Grid-based Web services General framework and a practical case in structural biology gLite 3.0 Data Management Hands-on David García Aristegui.
Advertisements

Workflows over Grid-based Web services General framework and a practical case in structural biology gLite 3.0 Data Management David García Aristegui Grid.
INFSO-RI Enabling Grids for E-sciencE Data Management System Jean Salzemann CNRS/IN2P3 ACGRID School, Hanoi (Vietnam) November 6th,
1 CHEP 2000, Roberto Barbera Tests of data management services in EDG 1.2 ALICE Off-line Week,
Grid Data Management Assaf Gottlieb - Israeli Grid NA3 Team EGEE is a project funded by the European Union under contract IST EGEE tutorial,
EGEE is a project funded by the European Union under contract IST Data Services Valeria Ardizzone EGEE NA4 Generic Applications INFN Catania.
EGEE-II INFSO-RI Enabling Grids for E-sciencE gLite Data Management System Yaodong Cheng CC-IHEP, Chinese Academy.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
How to Install and Use the DQ2 User Tools US ATLAS Tier2 workshop at IU June 20, Bloomington, IN Marco Mambelli University of Chicago.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Data Management Hands-on Claudio Cherubino.
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware Data Management in gLite.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Nov. 18, EGEE and gLite are registered trademarks gLite Middleware Usage Dusan.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
Ákos FROHNER – DataGrid Security n° 1 Security Group D7.6 Design Ideas
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Linux+ Guide to Linux Certification, Third Edition
E-science grid facility for Europe and Latin America Data Management Services E2GRIS1 Rafael Silva – UFCG (Brazil) Universidade Federal.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
First International Workshop on Portals for Life Sciences Sandra Gesing
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
SEE-GRID-SCI Storage Element Installation and Configuration Branimir Ackovic Institute of Physics Serbia The SEE-GRID-SCI.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Further aspects of EGEE middleware components INFN, Catania EGEE is funded by the European Union under contract IST
Data Management The European DataGrid Project Team
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data management in EGEE.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Data Management Hands-on Juan Eduardo Murrieta.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The AliEn File Catalogue Jamboree on Evolution of WLCG Data &
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
Istituto Nazionale di Astrofisica Information Technology Unit INAF-SI Job with data management Giuliano Taffoni.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra gLite 1.4 Data Management System Salvatore Scifo, Riccardo Bruno Test.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Algiers, EUMED/Epikh Application Porting Tutorial, 2010/07/04.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) LFC Installation and Configuration Dong Xu IHEP,
Grid Data Management Assaf Gottlieb Tel-Aviv University assafgot tau.ac.il EGEE is a project funded by the European Union under contract IST
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Data Management Maha Metawei
INFSO-RI Enabling Grids for E-sciencE Practicals on LFC and gLite DMS Tony Calanducci Emidio Giorgio INFN Retreat between GILDA.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
Create an script to print “hello world” in an output file with also the information of an input file. The input file should be previously register in the.
Martedi 8 novembre 2005 Consorzio COMETA “Progetto PI2S2” FESR Data Management System Annamaria Muoio -- INFN Catania PI2S2 First Tutorial -- Messina,
EGEE Data Management Services
GFAL Grid File Access Library
GFAL Grid File Access Library
GFAL: Grid File Access Library
LFC Server Installation & Configuration
gLite Basic APIs Christos Filippidis
Oxana Smirnova, Jakob Nielsen (Lund University/CERN)
Java API del Logical File Catalog (LFC)
The gLite Data Management System
LFC Installation and Configuration
Scuola Grid INFN, Martina Franca, Nov
gLite Data management system overview
Taming the protocol zoo
Introduction to reading and writing files in Grid
Grid Services Ouafa Bentaleb CERIST, Algeria
Hands-On Session: Data Management
LFC Installation and configuration
Data Management in Release 2
Riccardo Bruno, Salvatore Scifo gLite - Tutorial Catania, dd.mm.yyyy
GSAF Grid Storage Access Framework
Data Management Ouafa Bentaleb CERIST, Algeria
Data services in gLite “s” gLite and LCG.
Architecture of the gLite Data Management System
gLite Data and Metadata Management
Data Management system in gLite middleware
Presentation transcript:

Enrico Fattibene INFN-CNAF Grid data management Enrico Fattibene INFN-CNAF 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

Calcolo Parallelo su Grid e CSN4cluster Outline Motivation The Grid data management challenge Data management components LFC File Catalog Hands on LFC LCG Utils commands Hands on lcg-utils 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

Calcolo Parallelo su Grid e CSN4cluster Motivation Data Intensive Sciences depend on Grid Infrastructures Characteristics: Data is inherently distributed Data is produced in large quantities Data is produced at a very high rate Data has complex interrelations Data is needed by many people A single person / computer alone cannot do all the work Several groups collaborating in data analysis 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

Calcolo Parallelo su Grid e CSN4cluster The data flood Instrument data Satellites Microscopes Telescopes Accelerators .. Simulation data Climate Material science Physics, Chemistry Imaging data Medical imaging Visualizations Animations .. Generic metadata Description data Libraries Publications 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

High-level data processing scenario Data Source Preprocessing Formatting Data descriptors Storage Security Distribution Transfer Replication Caching Analysis Computation Workflows Science Data Interpretation Publications Knowledge New ideas Science Library Indexing 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

High-level data processing scenario Distributed data management Data Source Preprocessing Formatting Data descriptors Storage Security Distribution Transfer Replication Caching Analysis Computation Workflows Science Data Interpretation Publications Knowledge New ideas Science Library Indexing 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

High-level data processing scenario Distributed data management Data Source Preprocessing Formatting Data descriptors Storage Security Distribution Transfer Replication Caching Analysis Computation Workflows Science Data COMPLEXITY Interpretation Publications Knowledge New ideas Science Library Indexing 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

Calcolo Parallelo su Grid e CSN4cluster High Energy Physics Large Hadron Collider (LHC) at CERN One of the most powerful instruments ever built to investigate matter 4 Experiments: ALICE, ATLAS, CMS, LHCb 4 Virtual Organizations 27 km circumference tunnel Generating 10PB/year Mont Blanc (4810 m) Downtown Geneva Calcolo Parallelo su Grid e CSN4cluster 26 Settembre 2011

Biomedical data – making connections 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

The Grid data management challenge Need common interface to storage resources Storage Resource Manager (SRM) Need to keep track where data are stored File and Replica Catalogs Need scheduled, reliable file transfer File Transfer Service Heterogeneity Data are stored on different storage systems using different access technologies Distribution Data are stored in different locations – in most cases there is no shared file system or common namespace Data need to be moved between different locations 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

Calcolo Parallelo su Grid e CSN4cluster gLite/UMD components worker nodes 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

Data management components 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

Data management components 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

Storage Resource Manager (SRM) 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

Calcolo Parallelo su Grid e CSN4cluster Name conventions Logical File Name (LFN) An alias created by a user to refer to some item of data, e.g. lfn:/grid/gilda/budapest23/run2/track1 Globally Unique Identifier (GUID) A non-human-readable unique identifier for an item of data, e.g. guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 Site URL (SURL) (or Physical File Name (PFN) or Site FN) The location of an actual piece of data on a storage system, e.g. srm://pcrd24.cern.ch/flatfiles/cms/output10_1 (SRM) sfn://lxshare0209.cern.ch/data/alice/ntuples.dat (Classic SE) Transport URL (TURL) Temporary locator of a replica + access protocol: understood by a SE, e.g. rfio://lxshare0209.cern.ch//data/alice/ntuples.dat 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

LFN has a directory tree structure Name conventions Users primarily access and manage files through “logical filenames” - LFN Defined by the user LFN Namespace LFN has a directory tree structure lfn:/grid/<VO_name>/ <you create it> Mapping by the “LFC” catalogue server 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

File content cannot change  No need to synchronize replicas Resolving LFN LCG FileCatalogue (LFC) User Interface “Myfile.dat” File_on_se1 (“SURL”: site URL) Myfile.dat “Logical filename” “GUID” Global Unique Identifier File_on_se2 (“SURL”: site URL) Slide inherited from EDG – European Data Grid File content cannot change  No need to synchronize replicas Storage Element 2 Storage Element 1 Content is available on 2 SEs 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

Calcolo Parallelo su Grid e CSN4cluster LFC Catalog commands Add/replace a comment lfc-setcomment Set file/directory access control lists lfc-setacl Remove a file/directory lfc-rm Rename a file/directory lfc-rename Create a directory lfc-mkdir List file/directory entries in a directory lfc-ls Make a symbolic link to a file/directory lfc-ln Get file/directory access control lists lfc-getacl Delete the comment associated with the file/directory lfc-delcomment Change owner and group of the LFC file-directory lfc-chown Change access mode of the LFC file/directory lfc-chmod 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

LFC hands on Several environment variables need to be set before you start to ensure that the correct catalog service is used. The default settings for these variables in your account should be correct, however this needs to be checked. The variables which need to be checked having these exact values are: $LCG_CATALOG_TYPE $LFC_HOST $LCG_GFAL_INFOSYS If one or more of them has different or empty value, please set it (them) in this way: export LCG_CATALOG_TYPE=lfc export LCG_GFAL_INFOSYS=egee-bdii.cnaf.infn.it:2170 Now, you are ready to start (replace dteam with your_vo and LFC_HOST with the LFC server of your VO). lcg-infosites –vo <your_vo> lfc export LFC_HOST=prod-lfc-shared-central.cern.ch #VO dependent 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

Calcolo Parallelo su Grid e CSN4cluster LFC hands on $ lfc-mkdir /grid/dteam/veronesi $ lfc-setcomment /grid/dteam/veronesi/ "Veronesi LFC working dir“ $ lfc-ls -l --comment /grid/dteam | grep veronesi -rw-rw-r-- 1 18956 2688 5343544320 Nov 05 09:51 /grid/dteam/veronesi/ Veronesi LFC working dir $ lfc-getacl /grid/dteam/veronesi # file: /grid/dteam/veronesi # owner: /C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Paolo Veronesi # group: dteamuser::rw-group::rw- #effective:rw-group:dteam:rwx #effective:rw-group:dteam/Role=lcgadmin:rwx #effective:rw-group:dteam/Role=production:rwx #effective:rw-mask::rw-other::r-- $ export LFC_HOME=/grid/dteam/veronesi 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

Calcolo Parallelo su Grid e CSN4cluster LCG Utils hands on $ dd if=/dev/urandom of=bigfile.dat bs=100k count=100 $ lcg-infosites --vo dteam se |grep infn 73134641 2799493 n.a se01-lhcb-t2.cr.cnaf.infn.it 53415528890 88392995962 n.a storm-fe-lhcb.cr.cnaf.infn.it 17157000000 153659000000 n.a srm-v2.cr.cnaf.infn.it 119810365063 110625494663 n.a storm-fe-alice.cr.cnaf.infn.it [...] Upload to Grid (lcg-cr); Replicate a file (lcg-rep); Download from Grid (lcg-cp); 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

LCG Utils hands on upload file to Grid $ lcg-cr -v --vo dteam -d se01-lhcb-t2.cr.cnaf.infn.it \ -l lfn:my_first_grid_file file:///home/veronesi/bigfile.dat Using grid catalog type: lfc Using grid catalog : prod-lfc-shared-central.cern.ch Checksum type: None SE type: SRMv2 Destination SURL : srm://se01-lhcb-t2.cr.cnaf.infn.it/dteam/generated/2009-11-24/fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 Source SRM Request Token: 89539494-e6c6-4df5-a674-d4f2708cb4b9 Source URL: file:/home/veronesi/bigfile.datFile size: 10240000 VO name: dteam Destination specified: se01-lhcb-t2.cr.cnaf.infn.it Destination URL for copy: gsiftp://se01-lhcb-t2.cr.cnaf.infn.it:2811//storage/dteam/generated/2009-11-24/fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 # streams: 1 7340032 bytes 4586.79 KB/sec avg 6144.00 KB/sec inst Transfer took 3070 ms Using LFN: lfn:/grid/dteam/veronesi/my_first_grid_file Using GUID: guid:18115931-fb12-48ef-8365-ad99b809783f Registering LFN: /grid/dteam/veronesi/my_first_grid_file (18115931-fb12-48ef-8365-ad99b809783f) Registering SURL: srm://se01-lhcb-t2.cr.cnaf.infn.it/dteam/generated/2009-11-24/fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 (18115931-fb12-48ef-8365-ad99b809783f) guid:18115931-fb12-48ef-8365-ad99b809783f $ lcg-lr lfn:my_first_grid_file srm://se01-lhcb-t2.cr.cnaf.infn.it/dteam/generated/2009-11-24/fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

LCG Utils hands on replicate a file $ lcg-rep -v --vo dteam -d gridsrm.ts.infn.it lfn:my_first_grid_file Using grid catalog type: LFC Using grid catalog : prod-lfc-shared-central.cern.ch VO name: dteam Checksum type: None Trying SURL srm://se01-lhcb-t2.cr.cnaf.infn.it/dteam/generated/2009-11-24/fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 ... Source SE type: SRMv2 Source SRM Request Token: 2a65cfe6-d0f5-4834-b317-80ebd4baf1fa Destination SE type: SRMv2 Destination SRM Request Token: ec404fb0-f4ae-46ea-ad85-4fb623670a54 Source URL: /grid/dteam/veronesi/my_first_grid_file File size: 10240000 Destination specified: gridsrm.ts.infn.it Source URL for copy: gsiftp://se01-lhcb-t2.cr.cnaf.infn.it:2811//storage/dteam/generated/2009-11-24/fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 Destination URL: gsiftp://gridsrm.ts.infn.it:2811//gpfs/grid/srm/dteam/generated/2009-11-24/file88326738-ee99-4faa-abbf-3d8aeeab7813 # streams: 1 10240000 bytes 5555.56 KB/sec avg 5555.56 KB/sec inst Transfer took 4410 ms Using LFN: lfn:/grid/dteam/veronesi/my_first_grid_file Using GUID: guid:18115931-fb12-48ef-8365-ad99b809783f Registering SURL: srm://gridsrm.ts.infn.it/dteam/generated/2009-11-24/file88326738-ee99-4faa-abbf-3d8aeeab7813 (18115931-fb12-48ef-8365-ad99b809783f) Destination URL registered in file catalog: srm://gridsrm.ts.infn.it/dteam/generated/2009-11-24/file88326738-ee99-4faa-abbf-3d8aeeab7813 $ lcg-lr lfn:my_first_grid_file srm://gridsrm.ts.infn.it/dteam/generated/2009-11-24/file88326738-ee99-4faa-abbf-3d8aeeab7813 srm://se01-lhcb-t2.cr.cnaf.infn.it/dteam/generated/2009-11-24/fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

LCG Utils hands on download from Grid $ lfc-ls -l /grid/dteam/veronesi -rw-rw-r-- 1 18956 2688 10240000 Nov 24 12:00 my_first_grid_file $ lcg-cp --vo dteam -v lfn:my_first_grid_file file:/home/veronesi/bigfile2.dat Using grid catalog type: LFC Using grid catalog : prod-lfc-shared-central.cern.ch VO name: dteam Checksum type: None Trying SURL srm://se01-lhcb-t2.cr.cnaf.infn.it/dteam/generated/2009-11-24/fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 ... Source SE type: SRMv2 Source SRM Request Token: a9aeffa8-bdca-476a-a8a7-04765a4f9917 Source URL: /grid/dteam/veronesi/my_first_grid_file File size: 10240000 Source URL for copy: gsiftp://se01-lhcb-t2.cr.cnaf.infn.it:2811//storage/dteam/generated/2009-11-24/fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 Destination URL: file:/home/veronesi/bigfile2.dat # streams: 1 0 bytes 0.00 KB/sec avg 0.00 KB/sec inst Transfer took 1010 ms 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

Calcolo Parallelo su Grid e CSN4cluster Thank you Questions ? 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster