Case Study 1: Data Replication for LIGO Scott Koranda Ann Chervenak.

Slides:



Advertisements
Similar presentations
Giggle: A Framework for Constructing Scalable Replica Location Services Ann Chervenak, Ewa Deelman, Ian Foster, Leanne Guy, Wolfgang Hoschekk, Adriana.
Advertisements

The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
RLS and DRS Roadmap Items Ann Chervenak Robert Schuler USC Information Sciences Institute.
Globus DataGrid Overview Bill Allcock, ANL GridPP Meeting 30 June 2003.
Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
GridFTP: File Transfer Protocol in Grid Computing Networks
Seminar Grid Computing ‘05 Hui Li Sep 19, Overview Brief Introduction Presentations Projects Remarks.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Application of GRID technologies for satellite data analysis Stepan G. Antushev, Andrey V. Golik and Vitaly K. Fischenko 2007.
Globus Toolkit 4 hands-on Gergely Sipos, Gábor Kecskeméti MTA SZTAKI
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Grid Computing, B. Wilkinson, 20046c.1 Globus III - Information Services.
Globus Computing Infrustructure Software Globus Toolkit 11-2.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
Globus 4 Guy Warner NeSC Training.
Sergey Belov, Tatiana Goloskokova, Vladimir Korenkov, Nikolay Kutovskiy, Danila Oleynik, Artem Petrosyan, Roman Semenov, Alexander Uzhinskiy LIT JINR The.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.
Grid Computing. What is a Grid? Many definitions exist in the literature Early definitions: Foster and Kesselman, 1998 –“A computational grid is a hardware.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
10/20/05 LIGO Scientific Collaboration 1 LIGO Data Grid: Making it Go Scott Koranda University of Wisconsin-Milwaukee.
Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
1 Introduction to Grid Computing. 2 What is a Grid? Many definitions exist in the literature Early definitions: Foster and Kesselman, 1998 “A computational.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
INFSO-RI Enabling Grids for E-sciencE The US Federation Miron Livny Computer Sciences Department University of Wisconsin – Madison.
Why GridFTP? l Performance u Parallel TCP streams, optimal TCP buffer u Non TCP protocol such as UDT u Order of magnitude greater l Cluster-to-cluster.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Secure, Collaborative, Web Service enabled and Bittorrent Inspired High-speed Scientific Data Transfer Framework.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
Wide Area Data Replication for Scientific Collaborations Ann Chervenak, Robert Schuler, Carl Kesselman USC Information Sciences Institute Scott Koranda.
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
CGW 04, Stripped replication for the grid environment as a web service1 Stripped replication for the Grid environment as a web service Marek Ciglan, Ondrej.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Globus – Part II Sathish Vadhiyar. Globus Information Service.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
Rights Management in Globus Data Services Ann Chervenak, ISI/USC Bill Allcock, ANL/UC.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center.
AERG 2007Grid Data Management1 Grid Data Management GridFTP Carolina León Carri Ben Clifford (OSG)
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Data Manipulation with Globus Toolkit Ivan Ivanovski TU München,
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
OGSA-DAI.
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Replication of Heavyweight Data Scott Koranda University of Wisconsin-Milwaukee &
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Evaluation of “data” grid tools
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Milestone 2 Include the names of the papers
Presentation transcript:

Case Study 1: Data Replication for LIGO Scott Koranda Ann Chervenak

eSS 2007Service-Oriented Science: Globus Software in Action2 Laser Interferometer Gravitational Wave Observatory l Goal: Observe gravitational waves predicted by theory l Three physical detectors in two locations (plus GEO detector in Germany) l 10+ data centers for data analysis l Collaborators in ~40 institutions on at least three continents

eSS 2007Service-Oriented Science: Globus Software in Action3 LIGO by the Numbers l LIGO records thousands of channels, generating approx. 1 TB per day of data during a detection run u Data is published and data centers subscribe to portions that local users want for analysis or for local storage l Data analysis results in derived data u Currently ~30% of all LIGO data u This also is published and replicated l Over 30 million files in LDR network (April 2005) * l 6+ million unique logical files l 30+ million physical copies of those files * Scott Koranda, UW-Milwaukee, April 2005

eSS 2007Service-Oriented Science: Globus Software in Action4 The Challenge Replicate 1 TB/day of data to 10+ international sites u Publish/subscribe (pull) model u Provide scientists with the means to specify and discover data based on application criteria (metadata) u Provide scientists with the means to locate copies of data

eSS 2007Service-Oriented Science: Globus Software in Action5 Issues - Technical l Efficiency u Avoid unused bandwidth while data transfers are taking place, esp. on high- bandwidth links (10+ Gbps) u Avoid idle time on the network between transfers

eSS 2007Service-Oriented Science: Globus Software in Action6 Issues - Social (1) l Workflow u The publish/subscribe model matches how scientists think about their use of the data u Use of metadata is critical l Security u Authenticate endpoints of data transfers to prevent unauthorized data manipulation l Heterogeneity u Cant tell data centers what storage systems to use u Cant get everyone to do accounts the same way

eSS 2007Service-Oriented Science: Globus Software in Action7 Issues - Social (2) l Maintenance u LIGO is focused on science, not IT u Ideally, they would not have to build or maintain data management software u GOAL: If software must be produced, keep it simple and do it in a way that non-CS people can understand u GOAL: Produce a solution that others will adopt and maintain for them

eSS 2007Service-Oriented Science: Globus Software in Action8 GridFTP l A high-performance, secure data transfer service optimized for high- bandwidth wide-area networks u FTP with extensions u Uses basic Grid security (control and data channels) u Multiple data channels for parallel transfers u Partial file transfers u Third-party (direct server- to-server) transfers l GGF recommendation GFD.20 Basic Transfer One control channel, several parallel data channels Third-party Transfer Control channels to each server, several parallel data channels between servers

eSS 2007Service-Oriented Science: Globus Software in Action9 Striped GridFTP l GridFTP supports a striped (multi-node) configuration u Establish control channel with one node u Coordinate data channels on multiple nodes u Allows use of many NICs in a single transfer l Requires shared/parallel filesystem on all nodes u On high-performance WANs, aggregate performance is limited by filesystem data rates

eSS 2007Service-Oriented Science: Globus Software in Action10 globus-url-copy l Command-line client for GridFTP servers u Text interface u No interactive shell (single command per invocation) l Many features u Grid security, including data channel(s) u HTTP, FTP, GridFTP u Server-to-server transfers u Subdirectory transfers and lists of transfers u Multiple parallel data channels u TCP tuning parameters u Retry parameters u Transfer status output

eSS 2007Service-Oriented Science: Globus Software in Action11 RFT - File Transfer Queuing l A WSRF service for queuing file transfer requests u Server-to-server transfers u Checkpointing for restarts u Database back-end for failovers l Allows clients to request transfers and then disappear u No need to manage the transfer u Status monitoring available if desired

eSS 2007Service-Oriented Science: Globus Software in Action12 RLS - Replica Location Service l A distributed system for tracking replicated data u Consistent local state maintained in Local Replica Catalogs (LRCs) u Collective state with relaxed consistency maintained in Replica Location Indices (RLIs) l Performance features u Soft state maintenance of RLI state u Compression of state updates u Membership and partitioning information maintenance Simple Hierarchy The most basic deployment of RLS Fully Connected High availability of the data at all sites Tiered Hierarchy For very large systems and/or very Large collections

eSS 2007Service-Oriented Science: Globus Software in Action13 pyGlobus * l High-level, object- oriented interface in Python to GT4 Pre- WS APIs. u GSI security u GridFTP u GRAM u XIO u GASS u MyProxy u RLS * pyGlobus contributed to GT4 by the Distributed Computing Department of LBNL l Also includes tools and services u GridFTP server u GridFTP GUI client u Other GT4 clients

eSS 2007Service-Oriented Science: Globus Software in Action14 Lightweight Data Replicator * Ties together 3 basic Grid services: 1. Metadata Service info about files such as size, md5, GPS time,... sets of interesting metadata propagate answers question What files or data are available? 2. Globus Replica Location Service (RLS) catalog service maps filenames to URLs also maps filenames to sites answers question Where are the files? 3. GridFTP Service server and customized, tightly integrated client use to actually replicate files from site to site answers question How do we move the files? * Slide courtesy of Scott Koranda, UW-Milwaukee

eSS 2007Service-Oriented Science: Globus Software in Action15 LDR Architecture l Each site has its own machinery for pulling data needed to satisfy local users l Scheduler daemon queries metadata and replica catalogs to identify missing files, which it records in a prioritized list l Transfer daemon manages transfers for files on the list, and updates LRC when files arrive l If a transfer fails, the transfer daemon simply re-adds the file to the list for future transfers l Daemons and metadata service written in Python

eSS 2007Service-Oriented Science: Globus Software in Action16 Results - LIGO * l LIGO/GEO S4 science run completed March u Replicated over 30 TB in 30 days u Mean time between failure now one month (outside of CIT) u Over 30 million files in LDR network l 6+ million unique LFNs in RLS network l 30+ million PFNs for those LFNs l Performance currently appears to be limited by Python performance l Partnership with Globus Alliance members is leading to LDR-like tools being added to Globus Toolkit * Slide courtesy of Scott Koranda, UW-Milwaukee

eSS 2007Service-Oriented Science: Globus Software in Action17 Results - GT and e-Science l Data Replication Service (DRS) u Reimplementation of LDRs subscription capabilities u Uses WSRF-compliant services written in Java u Works with RLS and RFT services in GT4 l Tech preview in GT4 l LIGO evaluating as a replacement for LDRs subscription system l Remaining piece is metadata service

eSS 2007Service-Oriented Science: Globus Software in Action18 A Few LDR/DRS Lessons l Understanding performance requirements is important u Performance is complicated. Requirements must be expressed very specifically and must be thoroughly informed by use u LIGOs operational use was specific enough to give the RLS team good requirements l Reliability doesnt have to be sophisticated u LDRs strategy for recovering from a transfer failure is simply to put the transfer request back on the queue l Standard interfaces are important u LIGO was not successful at getting other projects to rally around LDRs non-standard interfaces u DRS might overcome that hurdle by using WSRF interfaces