Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center.

Slides:

Advertisements

Similar presentations

Case Study 1: Data Replication for LIGO Scott Koranda Ann Chervenak.

Advertisements

Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.

Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.

11/12/2003LIGO Document G Z1 Data reduction for S3 I Leonor (UOregon), P Charlton (CIT), S Anderson (CIT), K Bayer (MIT), M Foster (PSU), S Grunewald.

Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.

1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.

1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.

GridFTP Guy Warner, NeSC Training.

The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.

LIGO-G E ITR 2003 DMT Sub-Project John G. Zweizig LIGO/Caltech Argonne, May 10, 2004.

April 19, 2005Scott Koranda1 The LIGO Scientific Collaboration Data Grid Scott Koranda UW-Milwaukee On behalf of the LIGO Scientific Collaboration.

Part Three: Data Management 3: Data Management A: Data Management — The Problem B: Moving Data on the Grid FTP, SCP GridFTP, UberFTP globus-URL-copy.

LSC Segment Database Duncan Brown Caltech LIGO-G Z.

Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.

Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC

10/20/05 LIGO Scientific Collaboration 1 LIGO Data Grid: Making it Go Scott Koranda University of Wisconsin-Milwaukee.

Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.

Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.

Patrick R Brady University of Wisconsin-Milwaukee

Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.

The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.

G Z LIGO Scientific Collaboration Grid Patrick Brady University of Wisconsin-Milwaukee LIGO Scientific Collaboration.

GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.

Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.

Update on the LIGO Data Analysis System LIGO Scientific Collaboration Meeting LIGO Hanford Observatory August 19 th, 2002 Kent Blackburn Albert Lazzarini.

Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.

File and Object Replication in Data Grids Chin-Yi Tsai.

Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.

LIGO-G9900XX-00-M ITR 2003 DMT Sub-Project John G. Zweizig LIGO/Caltech.

Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.

ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.

International Workshop on HEP Data Grid Nov 9, 2002, KNU Data Storage, Network, Handling, and Clustering in CDF Korea group Intae Yu*, Junghyun Kim, Ilsung.

BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool.

The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.

Computing Division Requests The following is a list of tasks about to be officially submitted to the Computing Division for requested support. D0 personnel.

The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.

Wide Area Data Replication for Scientific Collaborations Ann Chervenak, Robert Schuler, Carl Kesselman USC Information Sciences Institute Scott Koranda.

GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid

GridFTP Richard Hopkins

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

LIGO-G E LIGO Scientific Collaboration Data Grid Status Albert Lazzarini Caltech LIGO Laboratory Trillium Steering Committee Meeting 20 May 2004.

1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,

LIGO Plans for OSG J. Kent Blackburn LIGO Laboratory California Institute of Technology Open Science Grid Technical Meeting UCSD December 15-17, 2004.

BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

State of LSC Data Analysis and Software LSC Meeting LIGO Hanford Observatory November 11 th, 2003 Kent Blackburn, Stuart Anderson, Albert Lazzarini LIGO.

11/12/2003LIGO-G Z1 Data reduction for S3 P Charlton (CIT), I Leonor (UOregon), S Anderson (CIT), K Bayer (MIT), M Foster (PSU), S Grunewald (AEI),

Gregory Mendell, LIGO Hanford Observatory LIGO-G WLIGO-G WLIGO-G W LIGO S5 Reduced Data Set Generation March 2007.

AERG 2007Grid Data Management1 Grid Data Management GridFTP Carolina León Carri Ben Clifford (OSG)

Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.

Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.

LIGO-G W Use of Condor by the LIGO Scientific Collaboration Gregory Mendell, LIGO Hanford Observatory On behalf of the LIGO Scientific Collaboration.

NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

GridFTP Guy Warner, NeSC Training Team.

1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.

Tackling I/O Issues 1 David Race 16 March 2010.

LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.

Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.

Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 10: Mass-Storage Systems.

Feb 4, 2005Scott Koranda1 Cataloging, Replicating, and Managing LIGO Data on the Grid Scott Koranda UW-Milwaukee On behalf of the LIGO.

Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Replication of Heavyweight Data Scott Koranda University of Wisconsin-Milwaukee &

Data Acquisition, Diagnostics & Controls (DAQ)

U.S. ATLAS Grid Production Experience

Gregory Mendell LIGO Hanford Observatory

Part Three: Data Management

Presentation transcript:

Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center for Supercomputing Applications Brian Moe University of Wisconsin-Milwaukee

Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org LIGO data replication needs Sites at Livingston, LA (LLO) and Hanford, WA (LHO) 2 interferometers at LHO, 1 at LLO 1000’s of channels recorded at rates of 16 KHz, 16 Hz, 1 Hz,… Output is binary ‘frame’ files holding 16 seconds data with GPS timestamp ~ 100 MB from LHO ~ 50 MB from LLO ~ 1 TB/day in total S1 run ~ 2 weeks S2 run ~ 8 weeks

Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Networking to IFOs Limited LIGO IFOs remote, making bandwidth expensive Couple of T1 lines for /administration only Ship tapes to Caltech (SAM- QFS) Reduced data sets (RDS) generated and stored on disk ~ 20 % size of raw data ~ 200 GB/day GridFedEx protocol Bandwidth to LHO increases dramatically for S3!

Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Replication to University Sites CIT UWM PSU MIT UTB Cardiff AEI LHO

Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Why Bulk Replication to University Sites? Each has compute resources (Linux clusters) –Early plan was to provide one or two analysis centers –Now everyone has a cluster Cheap storage is cheap –$1/GB for drives –TB RAID-5 < $10K –Throw more drives into your cluster Analysis applications read a lot of data –Different ways to slice some problems, but most want access to large sets of data for a particular instance of search parameters

Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org LIGO Data Replication Challenge Replicate 200 GB/day of data to multiple sites securely, efficiently, robustly (no babysitting…) Support a number of storage models at sites –CIT → SAM-QFS (tape) and large IDE farms –UWM → 600 partitions on 300 cluster nodes –PSU → multiple 1 TB RAID-5 servers –AEI → 150 partitions on 150 nodes with redundancy Coherent mechanism for data discovery by users and their codes Know what data we have, where it is, and replicate it fast and easy

Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Prototyping “Realizations” Need to keep “pipe” full to achieve desired transfer rates –Mindful of overhead of setting up connections –Set up GridFTP connection with multiple channels, tuned TCP windows and I/O buffers and leave it open –Sustained 10 MB/s between Caltech and UWM, peaks up to 21 MB/s Need cataloging that scales and performs –Globus Replica Catalog (LDAP) < 10 5 and not acceptable –Need solution with relational database backend scales to 10 7 and fast updates/reads Not necessarily need “reliable file transfer” (RFT) –Problem with any single transfer? Forget it, come back later… Need robust mechanism for selecting collections of files –Users/sites demand flexibility choosing what data to replicate

Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org LIGO, err… Lightweight Data Replicator (LDR) What data we have… –Globus Metadata Catalog Service (MCS) Where data is… –Globus Replica Location Service (RLS) Replicate it fast… –Globus GridFTP protocol –What client to use? Right now we use our own Replicate it easy… –Logic we added –Is there a better solution?

Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Replicated > 20 TB to UWM thus far Less to MIT, PSU Just deployed version to MIT, PSU, AEI, CIT, UWM, LHO, LLO for LIGO/GEO S3 run Deployment in progress at Cardiff LDRdataFindServer running at UWM for S2, soon at all sites for S3

Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator “Lightweight” because we think it is the minimal collection of code needed to get the job done Logic coded in Python –Use SWIG to wrap Globus RLS –Use pyGlobus from LBL elsewhere Each site is any combination of publisher, provider, subscriber –Publisher populates metadata catalog –Provider populates location catalog (RLS) –Subscriber replicates data using information provided by publishers and providers small, independent daemons that each do one thing –LDRMaster, LDRMetadata, LDRSchedule, LDRTransfer,…

Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Future? Held LDR face-to-face at UWM last summer CIT, MIT, PSU, UWM, AEI, Cardiff all represented LDR “Needs” –Better/easier installation, configuration –“Dashboard” for admins for insights into LDR state –More robustness, especially with RLS server hangs Fixed with version –API and templates for publishing

Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Future? LDR is a tool that works now for LIGO Still, we recognize a number of projects need bulk data replication –There has to be common ground What middleware can be developed and shared? –We are looking for “opportunities” Code for “solve our problems for us…” –Still want to investigate Stork, DiskRouter, ? –Do contact me if you do bulk data replication…