Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enrico Fattibene INFN-CNAF

Similar presentations


Presentation on theme: "Enrico Fattibene INFN-CNAF"— Presentation transcript:

1 Enrico Fattibene INFN-CNAF
Grid data management Enrico Fattibene INFN-CNAF 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

2 Calcolo Parallelo su Grid e CSN4cluster
Outline Motivation The Grid data management challenge Data management components LFC File Catalog Hands on LFC LCG Utils commands Hands on lcg-utils 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

3 Calcolo Parallelo su Grid e CSN4cluster
Motivation Data Intensive Sciences depend on Grid Infrastructures Characteristics: Data is inherently distributed Data is produced in large quantities Data is produced at a very high rate Data has complex interrelations Data is needed by many people A single person / computer alone cannot do all the work Several groups collaborating in data analysis 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

4 Calcolo Parallelo su Grid e CSN4cluster
The data flood Instrument data Satellites Microscopes Telescopes Accelerators .. Simulation data Climate Material science Physics, Chemistry Imaging data Medical imaging Visualizations Animations .. Generic metadata Description data Libraries Publications 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

5 High-level data processing scenario
Data Source Preprocessing Formatting Data descriptors Storage Security Distribution Transfer Replication Caching Analysis Computation Workflows Science Data Interpretation Publications Knowledge New ideas Science Library Indexing 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

6 High-level data processing scenario
Distributed data management Data Source Preprocessing Formatting Data descriptors Storage Security Distribution Transfer Replication Caching Analysis Computation Workflows Science Data Interpretation Publications Knowledge New ideas Science Library Indexing 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

7 High-level data processing scenario
Distributed data management Data Source Preprocessing Formatting Data descriptors Storage Security Distribution Transfer Replication Caching Analysis Computation Workflows Science Data COMPLEXITY Interpretation Publications Knowledge New ideas Science Library Indexing 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

8 Calcolo Parallelo su Grid e CSN4cluster
High Energy Physics Large Hadron Collider (LHC) at CERN One of the most powerful instruments ever built to investigate matter 4 Experiments: ALICE, ATLAS, CMS, LHCb 4 Virtual Organizations 27 km circumference tunnel Generating 10PB/year Mont Blanc (4810 m) Downtown Geneva Calcolo Parallelo su Grid e CSN4cluster 26 Settembre 2011

9 Biomedical data – making connections
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

10 The Grid data management challenge
Need common interface to storage resources Storage Resource Manager (SRM) Need to keep track where data are stored File and Replica Catalogs Need scheduled, reliable file transfer File Transfer Service Heterogeneity Data are stored on different storage systems using different access technologies Distribution Data are stored in different locations – in most cases there is no shared file system or common namespace Data need to be moved between different locations 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

11 Calcolo Parallelo su Grid e CSN4cluster
gLite/UMD components worker nodes 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

12 Data management components
26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

13 Data management components
26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

14 Storage Resource Manager (SRM)
26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

15 Calcolo Parallelo su Grid e CSN4cluster
Name conventions Logical File Name (LFN) An alias created by a user to refer to some item of data, e.g. lfn:/grid/gilda/budapest23/run2/track1 Globally Unique Identifier (GUID) A non-human-readable unique identifier for an item of data, e.g. guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 Site URL (SURL) (or Physical File Name (PFN) or Site FN) The location of an actual piece of data on a storage system, e.g. srm://pcrd24.cern.ch/flatfiles/cms/output10_ (SRM) sfn://lxshare0209.cern.ch/data/alice/ntuples.dat (Classic SE) Transport URL (TURL) Temporary locator of a replica + access protocol: understood by a SE, e.g. rfio://lxshare0209.cern.ch//data/alice/ntuples.dat 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

16 LFN has a directory tree structure
Name conventions Users primarily access and manage files through “logical filenames” - LFN Defined by the user LFN Namespace LFN has a directory tree structure lfn:/grid/<VO_name>/ <you create it> Mapping by the “LFC” catalogue server 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

17 File content cannot change  No need to synchronize replicas
Resolving LFN LCG FileCatalogue (LFC) User Interface “Myfile.dat” File_on_se1 (“SURL”: site URL) Myfile.dat “Logical filename” “GUID” Global Unique Identifier File_on_se2 (“SURL”: site URL) Slide inherited from EDG – European Data Grid File content cannot change  No need to synchronize replicas Storage Element 2 Storage Element 1 Content is available on 2 SEs 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

18 Calcolo Parallelo su Grid e CSN4cluster
LFC Catalog commands Add/replace a comment lfc-setcomment Set file/directory access control lists lfc-setacl Remove a file/directory lfc-rm Rename a file/directory lfc-rename Create a directory lfc-mkdir List file/directory entries in a directory lfc-ls Make a symbolic link to a file/directory lfc-ln Get file/directory access control lists lfc-getacl Delete the comment associated with the file/directory lfc-delcomment Change owner and group of the LFC file-directory lfc-chown Change access mode of the LFC file/directory lfc-chmod 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

19 LFC hands on Several environment variables need to be set before you start to ensure that the correct catalog service is used. The default settings for these variables in your account should be correct, however this needs to be checked. The variables which need to be checked having these exact values are: $LCG_CATALOG_TYPE $LFC_HOST $LCG_GFAL_INFOSYS If one or more of them has different or empty value, please set it (them) in this way: export LCG_CATALOG_TYPE=lfc export LCG_GFAL_INFOSYS=egee-bdii.cnaf.infn.it:2170 Now, you are ready to start (replace dteam with your_vo and LFC_HOST with the LFC server of your VO). lcg-infosites –vo <your_vo> lfc export LFC_HOST=prod-lfc-shared-central.cern.ch #VO dependent 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

20 Calcolo Parallelo su Grid e CSN4cluster
LFC hands on $ lfc-mkdir /grid/dteam/veronesi $ lfc-setcomment /grid/dteam/veronesi/ "Veronesi LFC working dir“ $ lfc-ls -l --comment /grid/dteam | grep veronesi -rw-rw-r Nov 05 09:51 /grid/dteam/veronesi/ Veronesi LFC working dir $ lfc-getacl /grid/dteam/veronesi # file: /grid/dteam/veronesi # owner: /C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Paolo Veronesi # group: dteamuser::rw-group::rw- #effective:rw-group:dteam:rwx #effective:rw-group:dteam/Role=lcgadmin:rwx #effective:rw-group:dteam/Role=production:rwx #effective:rw-mask::rw-other::r-- $ export LFC_HOME=/grid/dteam/veronesi 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

21 Calcolo Parallelo su Grid e CSN4cluster
LCG Utils hands on $ dd if=/dev/urandom of=bigfile.dat bs=100k count=100 $ lcg-infosites --vo dteam se |grep infn n.a se01-lhcb-t2.cr.cnaf.infn.it n.a storm-fe-lhcb.cr.cnaf.infn.it n.a srm-v2.cr.cnaf.infn.it n.a storm-fe-alice.cr.cnaf.infn.it [...] Upload to Grid (lcg-cr); Replicate a file (lcg-rep); Download from Grid (lcg-cp); 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

22 LCG Utils hands on upload file to Grid
$ lcg-cr -v --vo dteam -d se01-lhcb-t2.cr.cnaf.infn.it \ -l lfn:my_first_grid_file file:///home/veronesi/bigfile.dat Using grid catalog type: lfc Using grid catalog : prod-lfc-shared-central.cern.ch Checksum type: None SE type: SRMv2 Destination SURL : srm://se01-lhcb-t2.cr.cnaf.infn.it/dteam/generated/ /fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 Source SRM Request Token: e6c6-4df5-a674-d4f2708cb4b9 Source URL: file:/home/veronesi/bigfile.datFile size: VO name: dteam Destination specified: se01-lhcb-t2.cr.cnaf.infn.it Destination URL for copy: gsiftp://se01-lhcb-t2.cr.cnaf.infn.it:2811//storage/dteam/generated/ /fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 # streams: bytes KB/sec avg KB/sec inst Transfer took 3070 ms Using LFN: lfn:/grid/dteam/veronesi/my_first_grid_file Using GUID: guid: fb12-48ef-8365-ad99b809783f Registering LFN: /grid/dteam/veronesi/my_first_grid_file ( fb12-48ef-8365-ad99b809783f) Registering SURL: srm://se01-lhcb-t2.cr.cnaf.infn.it/dteam/generated/ /fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 ( fb12-48ef-8365-ad99b809783f) guid: fb12-48ef-8365-ad99b809783f $ lcg-lr lfn:my_first_grid_file srm://se01-lhcb-t2.cr.cnaf.infn.it/dteam/generated/ /fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

23 LCG Utils hands on replicate a file
$ lcg-rep -v --vo dteam -d gridsrm.ts.infn.it lfn:my_first_grid_file Using grid catalog type: LFC Using grid catalog : prod-lfc-shared-central.cern.ch VO name: dteam Checksum type: None Trying SURL srm://se01-lhcb-t2.cr.cnaf.infn.it/dteam/generated/ /fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 ... Source SE type: SRMv2 Source SRM Request Token: 2a65cfe6-d0f b317-80ebd4baf1fa Destination SE type: SRMv2 Destination SRM Request Token: ec404fb0-f4ae-46ea-ad85-4fb623670a54 Source URL: /grid/dteam/veronesi/my_first_grid_file File size: Destination specified: gridsrm.ts.infn.it Source URL for copy: gsiftp://se01-lhcb-t2.cr.cnaf.infn.it:2811//storage/dteam/generated/ /fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 Destination URL: gsiftp://gridsrm.ts.infn.it:2811//gpfs/grid/srm/dteam/generated/ /file ee99-4faa-abbf-3d8aeeab7813 # streams: bytes KB/sec avg KB/sec inst Transfer took 4410 ms Using LFN: lfn:/grid/dteam/veronesi/my_first_grid_file Using GUID: guid: fb12-48ef-8365-ad99b809783f Registering SURL: srm://gridsrm.ts.infn.it/dteam/generated/ /file ee99-4faa-abbf-3d8aeeab7813 ( fb12-48ef-8365-ad99b809783f) Destination URL registered in file catalog: srm://gridsrm.ts.infn.it/dteam/generated/ /file ee99-4faa-abbf-3d8aeeab7813 $ lcg-lr lfn:my_first_grid_file srm://gridsrm.ts.infn.it/dteam/generated/ /file ee99-4faa-abbf-3d8aeeab7813 srm://se01-lhcb-t2.cr.cnaf.infn.it/dteam/generated/ /fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

24 LCG Utils hands on download from Grid
$ lfc-ls -l /grid/dteam/veronesi -rw-rw-r Nov 24 12:00 my_first_grid_file $ lcg-cp --vo dteam -v lfn:my_first_grid_file file:/home/veronesi/bigfile2.dat Using grid catalog type: LFC Using grid catalog : prod-lfc-shared-central.cern.ch VO name: dteam Checksum type: None Trying SURL srm://se01-lhcb-t2.cr.cnaf.infn.it/dteam/generated/ /fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 ... Source SE type: SRMv2 Source SRM Request Token: a9aeffa8-bdca-476a-a8a a4f9917 Source URL: /grid/dteam/veronesi/my_first_grid_file File size: Source URL for copy: gsiftp://se01-lhcb-t2.cr.cnaf.infn.it:2811//storage/dteam/generated/ /fileb9656a46-11cc-4e67-b76f-92f6cb3cfa14 Destination URL: file:/home/veronesi/bigfile2.dat # streams: bytes KB/sec avg KB/sec inst Transfer took 1010 ms 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster

25 Calcolo Parallelo su Grid e CSN4cluster
Thank you Questions ? 26 Settembre 2011 Calcolo Parallelo su Grid e CSN4cluster


Download ppt "Enrico Fattibene INFN-CNAF"

Similar presentations


Ads by Google