Feb 4, 2005Scott Koranda1 Cataloging, Replicating, and Managing LIGO Data on the Grid Scott Koranda UW-Milwaukee On behalf of the LIGO.

Slides:



Advertisements
Similar presentations
The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Advertisements

Case Study 1: Data Replication for LIGO Scott Koranda Ann Chervenak.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
11/12/2003LIGO Document G Z1 Data reduction for S3 I Leonor (UOregon), P Charlton (CIT), S Anderson (CIT), K Bayer (MIT), M Foster (PSU), S Grunewald.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Workload Management Massimo Sgaravatto INFN Padova.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.
April 19, 2005Scott Koranda1 The LIGO Scientific Collaboration Data Grid Scott Koranda UW-Milwaukee On behalf of the LIGO Scientific Collaboration.
Part Three: Data Management 3: Data Management A: Data Management — The Problem B: Moving Data on the Grid FTP, SCP GridFTP, UberFTP globus-URL-copy.
LSC Segment Database Duncan Brown Caltech LIGO-G Z.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
10/20/05 LIGO Scientific Collaboration 1 LIGO Data Grid: Making it Go Scott Koranda University of Wisconsin-Milwaukee.
Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Patrick R Brady University of Wisconsin-Milwaukee
G Z LIGO Scientific Collaboration Grid Patrick Brady University of Wisconsin-Milwaukee LIGO Scientific Collaboration.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Module 10 Administering and Configuring SharePoint Search.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Chapter 10 Chapter 10: Managing the Distributed File System, Disk Quotas, and Software Installation.
Some Grid Science California Institute of Technology Roy Williams Paul Messina Grids and Virtual Observatory Grids and and LIGO.
AERG 2007Grid Data Management1 Grid Data Management Replica Location Service Carolina León Carri Ben Clifford (OSG)
LIGO-G E LIGO Scientific Collaboration Data Grid Status Albert Lazzarini Caltech LIGO Laboratory Trillium Steering Committee Meeting 20 May 2004.
LIGO Plans for OSG J. Kent Blackburn LIGO Laboratory California Institute of Technology Open Science Grid Technical Meeting UCSD December 15-17, 2004.
PPDG February 2002 Iosif Legrand Monitoring systems requirements, Prototype tools and integration with other services Iosif Legrand California Institute.
State of LSC Data Analysis and Software LSC Meeting LIGO Hanford Observatory November 11 th, 2003 Kent Blackburn, Stuart Anderson, Albert Lazzarini LIGO.
Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center.
11/12/2003LIGO-G Z1 Data reduction for S3 P Charlton (CIT), I Leonor (UOregon), S Anderson (CIT), K Bayer (MIT), M Foster (PSU), S Grunewald (AEI),
AERG 2007Grid Data Management1 Grid Data Management GridFTP Carolina León Carri Ben Clifford (OSG)
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Magda Distributed Data Manager Prototype Torre Wenaus BNL September 2001.
LIGO-G W Use of Condor by the LIGO Scientific Collaboration Gregory Mendell, LIGO Hanford Observatory On behalf of the LIGO Scientific Collaboration.
LIGO-G Z1 Using Condor for Large Scale Data Analysis within the LIGO Scientific Collaboration Duncan Brown California Institute of Technology.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Current Globus Developments Jennifer Schopf, ANL.
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
J Jensen / WP5 /RAL UCL 4/5 March 2004 GridPP / DataGrid wrap-up Mass Storage Management J Jensen
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Replication of Heavyweight Data Scott Koranda University of Wisconsin-Milwaukee &
Progress Apama Fundamentals
Workload Management Workpackage
Installation The Intercompany Integration Solution for SAP Business One Version 2.0 for SAP Business One 9.1 Welcome to the course on the installation.
Grid Computing Security Mechanisms: the state-of-the-art
Hadoop.
Oxana Smirnova, Jakob Nielsen (Lund University/CERN)
Vincenzo Spinoso EGI.eu/INFN
U.S. ATLAS Grid Production Experience
Open Source distributed document DB for an enterprise
Securing the Network Perimeter with ISA 2004
Introduction to Data Management in EGI
Grid Portal Services IeSE (the Integrated e-Science Environment)
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Viet Tran Institute of Informatics Slovakia
Installation The Intercompany Integration Solution for SAP Business One Version 2.0 for SAP Business One 9.1 Welcome to the course on the installation.
X in [Integration, Delivery, Deployment]
Milestone 2 Include the names of the papers
Part Three: Data Management
Technical Capabilities
敦群數位科技有限公司(vanGene Digital Inc.) 游家德(Jade Yu.)
Software Implementation
Presentation transcript:

Feb 4, 2005Scott Koranda1 Cataloging, Replicating, and Managing LIGO Data on the Grid Scott Koranda UW-Milwaukee On behalf of the LIGO Scientific Collaboration MardiGras Conference February 4, 2005 LSU

Feb 4, 2005 Scott Koranda 2 Laser Interferometer Gravitational-wave Observatory LIGO is opening a new frontier in observational astrophysics  Detect & use gravitational waves (GW) to observe the Universe, provide a more complete picture of the Cosmos.  Complementary to radio/infrared/optical/X-ray/g-ray astronomy EM emitters not likely to be strong GW emitters & vice versa  Detect & observe cataclysmic events leading to death of stars, birth of neutron stars & black holes  Study Einstein’s theory of general relativity in the strong-field regime near massive compact objects, where GW are produced LIGO is now observing, acquiring science data and in full analysis production

Feb 4, 2005 Scott Koranda 3 Who is LIGO? LIGO = Laser Interferometer Gravitational-wave Observatory LIGO = + + +

Feb 4, 2005 Scott Koranda 4 What is the LSC? LSC = LIGO Scientific Collaboration LSC = (or more) other institutions

Feb 4, 2005 Scott Koranda 5 LIGO Data Challenges Revealing the full science content of LIGO data is a computationally and data intensive challenge  LIGO interferometers generate ~ 10 MB/s or almost 1 TB/day Several classes of data analysis challenges require large-scale computational resources  In general for analysis FFT data segment Choose template (based on physical parameters) and filter Repeat again and again and again…

Feb 4, 2005 Scott Koranda 6 One Adventure Story on the “Grid”... an evolutionary tale of one physicist’s adventures reveal some things we learned  technical  social (collaborations)

Feb 4, 2005 Scott Koranda 7 First Challenge Early 2001 Bruce Allen asked me to “use some Grid tools to get LIGO data to UWM”  LIGO data in HPSS archive at CIT (CACR)  LIGO E7 data set 10’s of TBs (full frames at time)  Data at UWM to go onto 296 disk partitions  Make it fast!  Make it robust! If a disk at UWM dies the data should automatically reappear (quickly, or course!)  And, make it fast!

Feb 4, 2005 Scott Koranda 8 Initial Prototype Pulled together existing production tools Globus GridFTP server and client  Move data fast!  globus-url-copy command-line client  Multiple parallel data streams  GSI authentication  Tunable tcp windows and I/O buffers Globus Replica Catalog for tracking what we have where  Mappings of logical filenames to physical locations  LDAP based Python scripts as “glue” to hold it together

Feb 4, 2005 Scott Koranda 9 How did we do for E7? Failure! Low HPSS effective throughput Globus Replica Catalog (LDAP) not up to the challenge  Bad scaling past 10 5 filenames Nothing fault tolerant or robust about this prototype Still… ~ 2 TB came over the network 10 MB/s transfers once data out of HPSS (good sign) Learned GridFTP would be a firm foundation Feedback into Globus via GriPhyN and iVDGL

Feb 4, 2005 Scott Koranda 10 Second Prototype for S1 Limited task again: Replicate S1 data from Caltech to UWM  S1 data spins at Caltech Pull together pieces: GridFTP  Code against API to create custom, tightly-integrated client  Use default but improved GridFTP server  Cache open connections and fill the pipe with data Plain-text catalogs!  Simple flat files to keep track of what files are available in the “collection” and what we already have  Begin storing file metadata like size, md5 checksums  Added simple data verification on client side Python as glue again

Feb 4, 2005 Scott Koranda 11 How did we do for S1? So So…  Did get all S1 locked data to UWM  Plain-text catalogs don’t scale (not surprisingly)  Not much fault tolerant or robust about the system Needs administrative attention each and every day Still…  Did get all S1 locked data to UWM  10 MB/s transfer rate when system working  GridFTP definitely firm foundation  Verified integrity of replica using checksums and sizes  Provide data catalog requirements to Globus team

Feb 4, 2005 Scott Koranda 12 A Lesson Learned Too much focus early on data transfer rates Moving data fast easier problem to solve Real challenges for data replication What data exists?  How does a site learn about what data exists? Where is it?  How does a site learn about what data other sites have? How does a site get the data it wants?  What mechanism can be used to schedule and prioritize data for replication? How do users find the data?  In what ways will users try to find the data? What tools are necessary? And of course, data should move fast… Is it here yet?

Feb 4, 2005 Scott Koranda 13 LIGO Data Replicator for S2 Replicate S2 RDS data from Caltech to MIT, PSU, UWM  AEI, Cardiff added late LIGO Data Replicator (LDR) GridFTP server, customized clients Globus Replica Location Service (RLS)  Local Replica Catalog (LRC) maps logical filenames to URLs  Replica Location Index (RLI) maps logical filenames to other LRCs  RDBMS based (MySQL) First attempt at real metadata catalog  Use MySQL but with very naïve tables Python Glue LIGO Data Replicator → “Lightweight Data Replicator”

Feb 4, 2005 Scott Koranda 14 How did we do for S2? Better, but not quite there…  Network transfer rate problems  Smaller files leads to new problems/insights  Naïve metadata table design limits performance & scalability  Robustness and fault tolerance better--not good enough  Publishing is awkward  No automatic data discovery Still…  Replication to MIT, AEI, Cardiff from both Caltech and UWM  Replicated from UWM back to Caltech after disk lost  LDRdataFindServer and LALdataFind expose data to users at UWM A first attempt that goes too well…becomes a necessary feature!  GridFTP continues to be solid foundation  Globus RLS will be a firm foundation There have been reliability issues, but all addressed

Feb 4, 2005 Scott Koranda 15 LDR for S3 Wish list by admins for LDR features/enhancements  Better/easier installation, configuration ☺  “Dashboard” for admins for insights into LDR state X  More robustness, especially with RLS server hangs ☺  API and templates for publishing X  New schema for metadata tables X  Transfer rate database X  Latest version of Globus Replica Location Server (RLS) ☺  Latest upgrades to GridFTP Server and API ☺  Simple installation using Pacman ☺ Deploy at end of September 2003

Feb 4, 2005 Scott Koranda 16 How did we do for S3? Still… Replicated data to 4 sites with minimal latency for most of the extended run Average LDR intervention time up to a few days (for the most part) Deployment using Pacman a solid foundation  But now there is yum and apt-get??? RLS statistics:  ~ 6 million LFNs per LRC  between 6 and 30 million PFNs per LRC  network of 5 to 7 RLS servers all updating each other

Feb 4, 2005 Scott Koranda 17 Great Collaboration ISI Globus team and LDR team  close collaboration over RLS  many performance issues solved  new client API functions added Collaboration challenges  What the CS people had to “put up with”? physicists more concerned about performance then new CS research ideas irregular update schedule, based on experiment’s needs not CS needs server performance statistics not a high priority use cases change—sometimes daily

Feb 4, 2005 Scott Koranda 18 Great Collaboration Collaboration Challenges What the physics people had to “put up with”?  tendencies for “throw it over the fence” approach  lack of interest in some user/admin issues  landscape shifts (to web services for example)  Java...

Feb 4, 2005 Scott Koranda 19 Great Collaboration Why did it work?  Credit RLS developers with great listening much effort into understanding LSC use case(s)  Single points of contact between two groups make 1 physicist and 1 CS responsible ignore other “inputs”  Good logging helped communicate state  Regular face-2-face meetings sounds simple prevents useless tangents due to poor communication

Feb 4, 2005 Scott Koranda 20 What’s Next for S4/S5? New metadata schema has to be top priority Current schema makes queries to find data too slow  More users demanding LSCdataFind/framequery capabilities and performance Current metadata propagation is not scaling well  Probably can’t even make it to S4, much less into S4 New metadata based on Globus MCS project  We don’t want to be in the metadata catalog business, but have to be at this time  We are making particular assumptions/choices in order to implement a propagation strategy  Feedback our experience and requirements into Grid community

Feb 4, 2005 Scott Koranda 21 What’s Next for S4/S5? Need to solve the “small file problem” Bruce is right and we do still have to worry about replication rates Trend is toward more but smaller files published into LDR Plan is a “tar on the fly, move, untar” approach New technologies make this attainable  pyGlobus GridFTP-enabled server class  New Globus GridFTP server base in beta Proof of concept already done by IBM using Java Cog

Feb 4, 2005 Scott Koranda 22 What’s Next for S4/S5? Data discovery and automated filesystem watching Admins do need to move data around in filesystem and have changes appear automatically in LDR Publishing of existing data sets needs to be quicker and easier and automated

Feb 4, 2005 Scott Koranda 23 What’s Next for S4/S5? Strong pressure to “open” LDR network for any user/files I have resisted this initially  Problem of replicating bulk “raw” data sets fundamentally different then many small sets of user files  User’s do crazy things!  Undelete problem is hard Have agreed to look in detail and try  Need to make sure doesn’t derail LDR’s first mission of replicating LIGO/GEO S4/S5 data  Need to broaden the discussion

Feb 4, 2005 Scott Koranda 24 Lightweight Data Replicator Metadata Service Discovery Service Replication Service MySQL RLS LDR Grid FTP GSI SOAP GSI

Feb 4, 2005 Scott Koranda 25 LHOCITMITLLOUWMPSUAEI“Publish” data Metadata Catalog H-RDS_R_L gwf bytes Frame type RDS_R_L3 Run tag S3 Locked … “Publish” data Metadata Catalog L-RDS_R_L gwf bytes Frame type RDS_R_L3 Run tag S3 Locked … What data do we want? Ask metadata catalog Collection: Instrument = ‘H’ AND frameType = ‘RDS_R_L3’ AND runTag = ‘S3’ Where can we get it? Ask URL catalog H-RDS_R_L gwf is available at LHO Local Replica Catalog H-RDS_R_L gwf → gsiftp://ldas.ligo-wa.caltech. edu:15000/samrds/S3/L3/LHO/H- RDS_R_L3-7526/H-RDS_R_L gwf Local Replica Catalog L-RDS_R_L gwf → gsiftp://ldas.ligo-la.caltech. edu:15000/samrds/S3/L3/LLO/L- RDS_R_L3-7526/L-RDS_R_L gwf “I have URLs for files…” URL Catalog What is URL for H-RDS_R_L gwf? gsiftp://ldas.ligo-wa.caltech.edu:15000/samrds/S3/L3/LHO/H-RDS_R_L3-7526/H-RDS_R_L gwf

Feb 4, 2005 Scott Koranda 26 Looking ahead to S6... Metadata Avalanche!  3 sites (LHO, LLO, GEO) for 6 month run  ~ 40 million new pieces of metadata information at a minimum  maximum could be order of magnitude higher  How we partition?  Replication strategies?  Performance?

Feb 4, 2005 Scott Koranda 27 LDR “like” replication service for GT4 Collaboration between Globus ISI and UW-M Look for “preview technology” in GT4  start with just a “replication service”  send the service list of files to replicate  later add more components like metadata service?