Presentation is loading. Please wait.

Presentation is loading. Please wait.

Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool.

Similar presentations


Presentation on theme: "Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool."— Presentation transcript:

1

2 Part Four: The LSC DataGrid

3 Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool

4 A: Data Replication

5 General Principle Not all pipes are created equal. Neither are all storage locations.

6 Data Requirements Catalog 10 8 files and their locations What files are where (possibly at more than one place) Across multiple sites within a Grid No single point of failure No central catalog/server

7 Data Replication Services: Concepts Abstract logical file name (LFN) from physical filename (PFN) Maintain a local replica catalog (LRC) mapping from LFNs to PFNs only for local files. Maintain a replica location index (RLI) mapping LFNs to other sites’ LRCs for files that aren’t local.

8 Replica Location Service file1→ gsiftp://serverA/file1 file2→ gsiftp://serverA/file2 LRC RLI file3→ rls://serverB/file3 file4→ rls://serverB/file4 rls://serverA:39281 file1 file2 site A file3→ gsiftp://serverB/file3 file4→ gsiftp://serverB/file4 LRC RLI file1→ rls://serverA/file1 file2→ rls://serverA/file2 rls://serverB:39281 file3 file4 site B

9 RLS: Replica Location Service Globus RLS Each RLS server usually runs two catalogs: LRC: Local Replica Catalog Catalog of what files you have (LFNs) and mappings to URL(s) or PFNs RLI: Replica Location Index Catalog of which files (LFNs) that other LRCs in your data grid know about

10 A Site’s LRC Each site has LRC with mappings of LFNs to PFNs usually contains the “local” mappings where files are located at the site Example: UMW might have this mapping in its LRC: H-R-792845521-16.gwf → gsiftp://dataserver.phys.uwm.edu/LIGO/H-R-792845521- 16.gwf

11 LRCs Inform Each Other LRC catalog at each site tells remote RLIs what LFNs it has mappings for. Example: UWM tells Caltech it has a mapping for H-R-792845521-16.gwf So Caltech RLI has mapping H-R-792845521-16.gwf → LRC at Milwaukee

12 How it Works (Under the Hood) Ask your local LRC: “Do you know about file X?” If yes, you can ask your local LRC for the corresponding URL (PFN). If no, Ask your local RLI: “Who do I ask about X?” It will answer, “The RLS server at Site Y.” Ask the LRC at Site Y, “Do you know about file X?” It will return the PFN.

13 SRB: Storage Request Broker http://www.sdsc.edu/srb/ Distributed data management solution Supports management, collaborative (and controlled) sharing, publication, and preservation of distributed data collections Provides rich set of APIs available to higher-level applications Provides a management layer on top of a wide variety of storage systems.

14 SRB SRB can be thought of as a: Distributed file system Datagrid management system Digital Library system Semantic Web

15 SRB as Data Grid Management Transparent replication Archiving, caching, synchs, and backups Heterogeneous storage Container and aggregated data movement Bulk data ingestion Third-party copy & move

16 LDR: Lightweight Data Replicator http://www.lsc-group.phys.uwm.edu/LDR Replicates datasets within a data grid High-speed data transfers with Globus GridFTP Globus RLS stored using a MySQL backend Metadata stored in MySQL backend Uses GSI for security

17 LDR Collections of files to be replicated defined by LRD administrator as a SQL query Priority queue for scheduling replication

18 B: What is the LSC DataGrid?

19 What is the LSC DataGrid? A collection of LSC computational and storage resources… … linked through Grid middleware… … into a uniform LSC data analysis environment.

20 LSC DataGrid Sites Tier 1: CalTech Tier 2: UWM and PSU Tier 3: UT-Brownsville and Salish Kootenai College (SKC) Linux clusters at GEO sites Birmingham, Cardiff and the Albert Einstein Institute (AEI) LDAS instances at Caltech, MIT, PSU, and UWM

21 Monitoring the LSC DataGrid http://watchtower.phys.uwm.edu/ganglia-webfrontend/

22 Lab 4: LSCDataFind

23 In this lab, you’ll: Verify your DataFind configuration Find observatories Find data types Find actual data (wow!) Refine a search Retrieve data you’ve found

24 Credits NSF disclaimer Portions of this presentation were adapted from the following sources: GryPhyN Grid Summer Workshop NEESgrid Sysadmin Workshop


Download ppt "Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool."

Similar presentations


Ads by Google