Download presentation
Presentation is loading. Please wait.
Published byRandell Merritt Modified over 8 years ago
1
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Replication of Heavyweight Data Scott Koranda University of Wisconsin-Milwaukee & National Center for Supercomputing Applications
2
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Heavyweight Data from LIGO Sites at Livingston, LA (LLO) and Hanford, WA (LHO) 2 interferometers at LHO, 1 at LLO 1000’s of channels recorded at rates of 16 KHz, 16 Hz, 1 Hz,… Output is binary ‘frame’ files holding 16 seconds data with GPS timestamp ~ 100 MB from LHO ~ 50 MB from LLO ~ 1 TB/day in total S1 run ~ 2 weeks S2 run ~ 8 weeks 4 km LIGO interferometer at Livingston, LA
3
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Networking to IFOs Limited LIGO IFOs remote, making bandwidth expensive Couple of T1 lines for email/administration only Ship tapes to Caltech (SAM- QFS) Reduced data sets (RDS) generated and stored on disk ~ 20 % size of raw data ~ 200 GB/day GridFedEx protocol
4
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Replication to University Sites CIT UWM PSU MIT UTB Cardiff AEI
5
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Why Bulk Replication to University Sites? Each has compute resources (Linux clusters) –Early plan was to provide one or two analysis centers –Now everyone has a cluster Cheap storage is cheap –$1/GB for drives –TB RAID-5 < $10K –Throw more drives into your cluster Analysis applications read a lot of data –Different ways to slice some problems, but most want access to large sets of data for a particular instance of search parameters
6
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org LIGO Data Replication Challenge Replicate 200 GB/day of data to multiple sites securely, efficiently, robustly (no babysitting…) Support a number of storage models at sites –CIT → SAM-QFS (tape) and large IDE farms –UWM → 600 partitions on 300 cluster nodes –PSU → multiple 1 TB RAID-5 servers –AEI → 150 partitions on 150 nodes with redundancy Coherent mechanism for data discovery by users and their codes Know what data we have, where it is, and replicate it fast and easy
7
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Prototyping “Realizations” Need to keep “pipe” full to achieve desired transfer rates –Mindful of overhead of setting up connections –Set up GridFTP connection with multiple channels, tuned TCP windows and I/O buffers and leave it open –Sustained 10 MB/s between Caltech and UWM, peaks up to 21 MB/s Need cataloging that scales and performs –Globus Replica Catalog (LDAP) < 10 5 and not acceptable –Need solution with relational database backend scales to 10 7 and fast updates/reads No need for “reliable file transfer” (RFT) –Problem with any single transfer? Forget it, come back later… Need robust mechanism for selecting collections of files –Users/sites demand flexibility choosing what data to replicate Need to get network people interested –Do your homework, then challenge them to make your data flow faster
8
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org LIGO, err… Lightweight Data Replicator (LDR) What data we have… –Globus Metadata Catalog Service (MCS) Where data is… –Globus Replica Location Service (RLS) Replicate it fast… –Globus GridFTP protocol –What client to use? Right now we use our own Replicate it easy… –Logic we added –Is there a better solution?
9
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Data Replicator Replicated 20 TB to UWM thus far Just deployed at MIT, PSU, AEI Deployment in progress at Cardiff LDRdataFindServer running at UWM
10
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Data Replicator “Lightweight” because we think it is the minimal collection of code needed to get the job done Logic coded in Python –Use SWIG to wrap Globus RLS –Use pyGlobus from LBL elsewhere Each site is any combination of publisher, provider, subscriber –Publisher populates metadata catalog –Provider populates location catalog (RLS) –Subscriber replicates data using information provided by publishers and providers Take “Condor” approach with small, independent daemons that each do one thing –LDRMaster, LDRMetadata, LDRSchedule, LDRTransfer,…
11
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Future? LDR is a tool that works now for LIGO Still, we recognize a number of projects need bulk data replication –There has to be common ground What middleware can be developed and shared? –We are looking for “opportunities” Code for “solve our problems for us…” –Want to investigate Stork, DiskRouter, ? –Do contact me if you do bulk data replication…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.