Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool.

Slides:



Advertisements
Similar presentations
Giggle: A Framework for Constructing Scalable Replica Location Services Ann Chervenak, Ewa Deelman, Ian Foster, Leanne Guy, Wolfgang Hoschekk, Adriana.
Advertisements

The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Case Study 1: Data Replication for LIGO Scott Koranda Ann Chervenak.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Globus DataGrid Overview Bill Allcock, ANL GridPP Meeting 30 June 2003.
WP2 and GridPP UK Simulation W. H. Bell University of Glasgow EDG – WP2.
The Storage Resource Broker and.
The Storage Resource Broker and.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
LIGO LSC DataGrid Workshop March 24-26, 2005 Livingston Observatory.
DataGrid is a project funded by the European Commission under contract IST WP2 – R2.1 Overview of WP2 middleware as present in EDG 2.1 release.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June What is the RLS -
N° 1 LCG EDG Data Management Catalogs in LCG James Casey LCG Fellow, IT-DB Group, CERN
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Grid Data Management Kasturi Chatterjee. 2 Motivation: The Data Problem Motivate our discussion with the large physics experiments Laser Interferometer.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Grid Data Management. 2 Data Management Distributed community of users need to access and analyze large amounts of data Requirement arises in both simulation.
The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.
Part Three: Data Management 3: Data Management A: Data Management — The Problem B: Moving Data on the Grid FTP, SCP GridFTP, UberFTP globus-URL-copy.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
Patrick R Brady University of Wisconsin-Milwaukee
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Grid Services/SRB/SRM & Practical Hai-Ning Wu Academia Sinica Grid Computing.
File and Object Replication in Data Grids Chin-Yi Tsai.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Grid Data Management. 2 Data Management Want to move data around:  Store it long term in appropriate places (e.g., tape silos) ‏  Move input to where.
Grid Data Management. 2 Data Management Want to move data around:  Store it long term in appropriate places (e.g., tape silos) ‏  Move input to where.
Grid Data Management. March 24-25, 2007 Grid Data Management 2 Motivation: The Data Problem Motivate our discussion with the large physics experiments.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
AERG 2007Grid Data Management1 Grid Data Management Replica Location Service Carolina León Carri Ben Clifford (OSG)
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19,
Wide Area Data Replication for Scientific Collaborations Ann Chervenak, Robert Schuler, Carl Kesselman USC Information Sciences Institute Scott Koranda.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
CGW 04, Stripped replication for the grid environment as a web service1 Stripped replication for the Grid environment as a web service Marek Ciglan, Ondrej.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Introduction to The Storage Resource.
LIGO Plans for OSG J. Kent Blackburn LIGO Laboratory California Institute of Technology Open Science Grid Technical Meeting UCSD December 15-17, 2004.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
State of LSC Data Analysis and Software LSC Meeting LIGO Hanford Observatory November 11 th, 2003 Kent Blackburn, Stuart Anderson, Albert Lazzarini LIGO.
Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center.
Biomedical Informatics Research Network The Storage Resource Broker & Integration with NMI Middleware Arcot Rajasekar, BIRN-CC SDSC October 9th 2002 BIRN.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
1 Data Management for Internet Backplane Protocol by Tang Ming Assoc/Prof. Francis Lee School of Computer Engineering, Nanyang Technological University,
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Replication of Heavyweight Data Scott Koranda University of Wisconsin-Milwaukee &
Oxana Smirnova, Jakob Nielsen (Lund University/CERN)
The Data Grid: Towards an architecture for Distributed Management
Grid Data Integration In the CMS Experiment
Software Implementation
Presentation transcript:

Part Four: The LSC DataGrid

Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool

A: Data Replication

General Principle Not all pipes are created equal. Neither are all storage locations.

Data Requirements Catalog 10 8 files and their locations What files are where (possibly at more than one place) Across multiple sites within a Grid No single point of failure No central catalog/server

Data Replication Services: Concepts Abstract logical file name (LFN) from physical filename (PFN) Maintain a local replica catalog (LRC) mapping from LFNs to PFNs only for local files. Maintain a replica location index (RLI) mapping LFNs to other sites’ LRCs for files that aren’t local.

Replica Location Service file1→ gsiftp://serverA/file1 file2→ gsiftp://serverA/file2 LRC RLI file3→ rls://serverB/file3 file4→ rls://serverB/file4 rls://serverA:39281 file1 file2 site A file3→ gsiftp://serverB/file3 file4→ gsiftp://serverB/file4 LRC RLI file1→ rls://serverA/file1 file2→ rls://serverA/file2 rls://serverB:39281 file3 file4 site B

RLS: Replica Location Service Globus RLS Each RLS server usually runs two catalogs: LRC: Local Replica Catalog Catalog of what files you have (LFNs) and mappings to URL(s) or PFNs RLI: Replica Location Index Catalog of which files (LFNs) that other LRCs in your data grid know about

A Site’s LRC Each site has LRC with mappings of LFNs to PFNs usually contains the “local” mappings where files are located at the site Example: UMW might have this mapping in its LRC: H-R gwf → gsiftp://dataserver.phys.uwm.edu/LIGO/H-R gwf

LRCs Inform Each Other LRC catalog at each site tells remote RLIs what LFNs it has mappings for. Example: UWM tells Caltech it has a mapping for H-R gwf So Caltech RLI has mapping H-R gwf → LRC at Milwaukee

How it Works (Under the Hood) Ask your local LRC: “Do you know about file X?” If yes, you can ask your local LRC for the corresponding URL (PFN). If no, Ask your local RLI: “Who do I ask about X?” It will answer, “The RLS server at Site Y.” Ask the LRC at Site Y, “Do you know about file X?” It will return the PFN.

SRB: Storage Request Broker Distributed data management solution Supports management, collaborative (and controlled) sharing, publication, and preservation of distributed data collections Provides rich set of APIs available to higher-level applications Provides a management layer on top of a wide variety of storage systems.

SRB SRB can be thought of as a: Distributed file system Datagrid management system Digital Library system Semantic Web

SRB as Data Grid Management Transparent replication Archiving, caching, synchs, and backups Heterogeneous storage Container and aggregated data movement Bulk data ingestion Third-party copy & move

LDR: Lightweight Data Replicator Replicates datasets within a data grid High-speed data transfers with Globus GridFTP Globus RLS stored using a MySQL backend Metadata stored in MySQL backend Uses GSI for security

LDR Collections of files to be replicated defined by LRD administrator as a SQL query Priority queue for scheduling replication

B: What is the LSC DataGrid?

What is the LSC DataGrid? A collection of LSC computational and storage resources… … linked through Grid middleware… … into a uniform LSC data analysis environment.

LSC DataGrid Sites Tier 1: CalTech Tier 2: UWM and PSU Tier 3: UT-Brownsville and Salish Kootenai College (SKC) Linux clusters at GEO sites Birmingham, Cardiff and the Albert Einstein Institute (AEI) LDAS instances at Caltech, MIT, PSU, and UWM

Monitoring the LSC DataGrid

Lab 4: LSCDataFind

In this lab, you’ll: Verify your DataFind configuration Find observatories Find data types Find actual data (wow!) Refine a search Retrieve data you’ve found

Credits NSF disclaimer Portions of this presentation were adapted from the following sources: GryPhyN Grid Summer Workshop NEESgrid Sysadmin Workshop