Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center.

Similar presentations


Presentation on theme: "Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center."— Presentation transcript:

1 Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center for Supercomputing Applications Brian Moe University of Wisconsin-Milwaukee

2 Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org LIGO data replication needs Sites at Livingston, LA (LLO) and Hanford, WA (LHO) 2 interferometers at LHO, 1 at LLO 1000’s of channels recorded at rates of 16 KHz, 16 Hz, 1 Hz,… Output is binary ‘frame’ files holding 16 seconds data with GPS timestamp ~ 100 MB from LHO ~ 50 MB from LLO ~ 1 TB/day in total S1 run ~ 2 weeks S2 run ~ 8 weeks

3 Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Networking to IFOs Limited LIGO IFOs remote, making bandwidth expensive Couple of T1 lines for email/administration only Ship tapes to Caltech (SAM- QFS) Reduced data sets (RDS) generated and stored on disk ~ 20 % size of raw data ~ 200 GB/day GridFedEx protocol Bandwidth to LHO increases dramatically for S3!

4 Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Replication to University Sites CIT UWM PSU MIT UTB Cardiff AEI LHO

5 Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Why Bulk Replication to University Sites? Each has compute resources (Linux clusters) –Early plan was to provide one or two analysis centers –Now everyone has a cluster Cheap storage is cheap –$1/GB for drives –TB RAID-5 < $10K –Throw more drives into your cluster Analysis applications read a lot of data –Different ways to slice some problems, but most want access to large sets of data for a particular instance of search parameters

6 Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org LIGO Data Replication Challenge Replicate 200 GB/day of data to multiple sites securely, efficiently, robustly (no babysitting…) Support a number of storage models at sites –CIT → SAM-QFS (tape) and large IDE farms –UWM → 600 partitions on 300 cluster nodes –PSU → multiple 1 TB RAID-5 servers –AEI → 150 partitions on 150 nodes with redundancy Coherent mechanism for data discovery by users and their codes Know what data we have, where it is, and replicate it fast and easy

7 Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Prototyping “Realizations” Need to keep “pipe” full to achieve desired transfer rates –Mindful of overhead of setting up connections –Set up GridFTP connection with multiple channels, tuned TCP windows and I/O buffers and leave it open –Sustained 10 MB/s between Caltech and UWM, peaks up to 21 MB/s Need cataloging that scales and performs –Globus Replica Catalog (LDAP) < 10 5 and not acceptable –Need solution with relational database backend scales to 10 7 and fast updates/reads Not necessarily need “reliable file transfer” (RFT) –Problem with any single transfer? Forget it, come back later… Need robust mechanism for selecting collections of files –Users/sites demand flexibility choosing what data to replicate

8 Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org LIGO, err… Lightweight Data Replicator (LDR) What data we have… –Globus Metadata Catalog Service (MCS) Where data is… –Globus Replica Location Service (RLS) Replicate it fast… –Globus GridFTP protocol –What client to use? Right now we use our own Replicate it easy… –Logic we added –Is there a better solution?

9 Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Replicated > 20 TB to UWM thus far Less to MIT, PSU Just deployed version 0.5.5 to MIT, PSU, AEI, CIT, UWM, LHO, LLO for LIGO/GEO S3 run Deployment in progress at Cardiff LDRdataFindServer running at UWM for S2, soon at all sites for S3

10 Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator “Lightweight” because we think it is the minimal collection of code needed to get the job done Logic coded in Python –Use SWIG to wrap Globus RLS –Use pyGlobus from LBL elsewhere Each site is any combination of publisher, provider, subscriber –Publisher populates metadata catalog –Provider populates location catalog (RLS) –Subscriber replicates data using information provided by publishers and providers small, independent daemons that each do one thing –LDRMaster, LDRMetadata, LDRSchedule, LDRTransfer,…

11 Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Future? Held LDR face-to-face at UWM last summer CIT, MIT, PSU, UWM, AEI, Cardiff all represented LDR “Needs” –Better/easier installation, configuration –“Dashboard” for admins for insights into LDR state –More robustness, especially with RLS server hangs Fixed with version 2.0.9 –API and templates for publishing

12 Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Future? LDR is a tool that works now for LIGO Still, we recognize a number of projects need bulk data replication –There has to be common ground What middleware can be developed and shared? –We are looking for “opportunities” Code for “solve our problems for us…” –Still want to investigate Stork, DiskRouter, ? –Do contact me if you do bulk data replication…


Download ppt "Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center."

Similar presentations


Ads by Google