San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008
2 PDS MC Policies on Media, Integrity and Backup Data Delivery Policy –Data producers shall deliver one copy of each archival volume to the appropriate Discipline Node using means/media that are mutually acceptable to the two parties. The Discipline Node shall declare the volume delivery complete when the contents have been validated against PDS Standards and the transfer has been certified error free. –The receiving Discipline Node is then responsible for ensuring that three copies of the volume are preserved within PDS. Several options for "local back-up" are allowed including use of RAID or other fault tolerant storage, a copy on separate backup media at the Discipline Node, or a separate copy elsewhere within PDS. The third copy is delivered to the deep archive at NSSDC by means/media that are mutually acceptable to the two parties. (Adopted by PDS MC October 2005) Archive Integrity Policy - –Each node is responsible for periodically verifying the integrity of its archival holdings based on a schedule approved by the Management Council. Verification includes confirming that all files are accounted for, are not corrupted, and can be accessed regardless of the medium on which they are stored. Each node will report on its verification to the PDS Program Manager, who will report the results to the Management Council. (Adopted by MC November 2006)
3 Background Presented repository survey to MC (August 2007) –MC resolution to move all data online –Identified the need for a geographically separate repository as an operational backup Began evaluating San Diego Super Computing Center (SDSC) –Currently managing data storage for a number of science- related programs –Provides high speed data exchange and mass storage management –Very inexpensive at $450/TB/Year for near-line –Explore as a secondary storage option for PDS –SDSC agreed to let PDS evaluate using their beta iRODS software –Determine if 500 GB / day is a realistic goal for moving data to a secondary repository
4 Timeline Fall 2007 –Evaluated iRODS beta software between JPL and SDSC –Good performance results in moving data –Captured metrics for different scenarios Size of files transferred Number of files transferred Time when files are transferred Network speed Network connection (e.g., 10/100 vs GigE) –Minor bugs found Decision to wait for a more stable release Winter 2008 –New release of iRODS client (February 2008) –Testing between JPL and SDSC Excellent performance (e.g., ~ Mbytes/sec) –Some network problems encountered between JPL and SDSC which required resolution –Extended test to PPI/UCLA with good results (e.g. transferred 1/2 terabyte over a 15 hour period using a real PDS data) –Extended test to GEO
5 Summary of Testing Results Reliability – Transferring data between JPL and SDSC is only partially successful as checksum failures appear randomly This appears to be a network routing issue which is being resolved by our network administrators – Transferring data between PPI and SDSC has not shown any problems Performance (using iRods) – JPL to SDSC – 0.5 to 5 GByte file = between 6 and 16 MBytes/sec – PPI to SDSC to 3 GByte file = between 7 and 8 MBytes/sec – SDSC to JPL – 0.5 to 5 GByte file = ~ 8 MBytes/sec – SDSC to PPI to 3 GByte file = ~ 5 MBytes/sec Usability –Installation and configuration has been straight-forward
6 Recommendations As more testing builds confidence in using iRods s/w: –Bring more Nodes into testing (May 2008): More diverse testing environments More opportunity to identify PDS wide systemic problems / areas of concern –Make final recommendation on using SDSC and options for making it operational (June 2008)
7 Backup Material
8 JPL/EN System Configuration JPL/EN Server –Memory: k –CPU: dual MHz processor. –O/S: Red Hat Enterprise Linux ES release 4 (Nahant) –Hard drive: 73 GB Ultra-160 Scsi drive –Ethernet card 0: negotiated 100baseTx-FD Network Bandwidth –100 MBits/sec – 1000 MBits/sec
9 JPL/PO.DAAC System Configuration PO.DAAC server –Sun X4100 –2 x Dual Core AMD Opteron(tm) Processor 285 –16GB of memory –Linux kernel ELsmp –Red Hat Enterprise Linux ES release 4 (Nahant Update 6) –Westwood+ turned on Network Bandwidth –100 MBits/sec – 1000 MBits/sec
10 PDS/PPI System Configuration PDS/PPI Server –Intel Pentium D 2.8 GHz –2 GB RAM –Kernel ver ELsmp –Red Hat Enterprise WS 4 update 4 Network Bandwidth –1000 MBits/sec
11 Repository Directories / Files Directory: /tsraid1/ Size: ~1.0036E12 Bytes clem1-1-rss-1-bsr-v1.0_s clem1-1-rss-5-bsr-v1.0_s clem1-l_e_y-a_b_u_h_l_n-2-edr-v1.0_s clem1-l-h-5-dim-mosaic-v1.0_s clem1-l-u-5-dim-basemap-v1.0_s clem1-l-u-5-dim-uvvis-v1.0_s eso-j-irspec-3-rdr-sl9-v1.0 eso-j-s-n-u-spectrophotometer-4-v2.0_s eso-j-susi-3-rdr-sl9-v1.0 go-a_c-ssi-2-redr-v1.0_s go-a_e-ssi-2-redr-v1.0_s go-j_jsa-ssi-2-redr-v1.0_s go-j_jsa-ssi-4-redr-v1.0 go-j-nims-2-edr-v2.0 go-v_e-ssi-2-redr-v1.0_s go-v-rss-1-tdf-v1.0_s group_clem_xxxx_m group_dmgsm_100x_m group_dmgsm_200x_m group_go_00xx_m group_go_100x_m group_go_1101_m group_go_110x23_m group_go_110x_m group_gp_0001_m group_hal_0024_m group_hal_0025_m group_hal_0026_m group_hal_00xx_m group_lp_00xx_m group_mg_0xxx_m group_mg_2401_m group_mg_5201_m group_mgs_0001_m group_mgs_100x_m group_mgsa_0002_m group_mgsl_000x_m group_mgsl_20xx_m group_sl9_0001_m group_sl9_0004_m hst-j-wfpc2-3-sl9-impact-v1.0_s hst-s-wfpc2-3-rpx-v1.0_s ihw-c-lspn-2-didr-crommelin-v1.0 ihw-c-lspn-2-didr-halley-v1.0 irtf-j_c-nsfcam-3-rdr-sl9-v1.0_s iue-j-lwp-3-edr-v1.0 lp-l-rss-5-gravity-v1.0_s lp-l-rss-5-los-v1.0_s mer1-m-pancam-2-edr-sci-v1.0_s mer1-m-pancam-3-radcal-rdr-v1.0_s mgn-v-rdrs-5-cdr-alt_rad-v1.0_s mgn-v-rdrs-5-dim-v1.0_s mgn-v-rdrs-5-gvdr-v1.0_s mgn-v-rdrs-5-midr-n-polar-stereogr-v1.0 mgn-v-rdrs-5-midr-s-polar-stereogr-v1.0 mgn-v-rss-5-losapdr-l2-v1.0_s mgn-v-rss-5-losapdr-l2-v1.13_s mgs-m-accel-0-accel_data-v1.0_s mgs-m-accel-2-edr-v1.1_s mgs-m-accel-5-profile-v1.2 mgs-m-moc-na_wa-2-dsdp-l0-v1.0_s mgs-m-moc-na_wa-2-sdp-l0-v1.0_s mgs-m-mola-1-aedr-10-v1.0_s mgs-m-mola-3-pedr-ascii-v1.0 mgs-m-mola-3-pedr-l1a-v1.0_s mgs-m-rss-1-cru-v1.0_s mgs-m-rss-1-ext-v1.0_s mgs-m-rss-1-map-v1.0_s mgs-m-rss-1-moi-v1.0_s mgs-m-rss-5-sdp-v1.0_s mgs-m-tes-3-tsdr-v1.0_s mgs-m-tes-3-tsdr-v2.0_s mpfl-m-imp-2-edr-v1.0_s mr9-m-iris-3-rdr-v1.0_s mr9_vo1_vo2-m-iss_vis-5-cloud-v1.0_s mssso-j-caspir-3-rdr-sl9-stds-v1.0 mssso-j-caspir-3-rdr-sl9-v1.0 mssso-j-caspir-3-rdr-sl9-v1.0_s near-a-grs-3-edr-erosorbit-v1.0 near-a-mag-2-edr-cruise1-v1.0 near-a-mag-2-edr-cruise2-v1.0 near-a-mag-2-edr-cruise3-v1.0 near-a-mag-2-edr-cruise4-v1.0 near-a-mag-2-edr-earth-v1.0 near-a-mag-2-edr-erosflyby-v1.0 near-a-mag-2-edr-erosorbit-v1.0 near-a-mag-2-edr-erossurface-v1.0 near-a-mag-3-rdr-cruise2-v1.0 near-a-mag-3-rdr-cruise3-v1.0 near-a-mag-3-rdr-cruise4-v1.0 near-a-mag-3-rdr-earth-v1.0 near-a-mag-3-rdr-erosflyby-v1.0 near-a-mag-3-rdr-erosorbit-v1.0 vg1-s-rss-1-rocc-v1.0_s vg1-ssa-rss-1-rocc-v1.0_s vg2-s-rss-1-rocc-v1.0_s
12 PDS / SDSC Testing Timeline –09/07 – JPL/EN writes Test Plan (for testing iRods s/w) Identifies / documents PDS/SDSC architecture Identifies the set of parameters that are to be varied : – - size of files transferred -- Mbytes to Gbytes – - number of files transferred -- 1 to hundreds – - time when files transferred -- peak / low network access periods – - network speed -- Mbits / Gbits – - and basically any other parameters that might affect reliability / transfer speed Identifies the set of parameters to be measured: –Transfer speed (Mbytes/sec) –Reliability (% of transmission failtures) –09/07 - JPL/EN tested pre-production version of iRods s/w EN testing shows checksum errors on file transfer EN & SDSC agree to halt testing until SDSC can provide stable s/w –10/07 – JPL/EN captured test results in Test Report
13 PDS/SDSC Testing Configuration SDSC Data Central Server PDS Server (32 Bit) (Rebuild of Starburst) PDS Archive Starbase IRODS Client Sam-QFS SDSC Repository PDS Repository ~100 MB/S Mounted ~1 GB/S Int2/OC12 Accounts PDS Dev First Tests 1.iRODS client installed at JPL 2.iRODS metadata catalog (iCAT) running on Postgresql at SDSC. 3. iRODS managed data transfer from JPL to Sam-QFS at SDSC. We would use parallel I/O to do the transfer, with the goal of moving the terabyte of data within a day. In effect, we would use iRODS to move a file from a disk at JPL to storage at SDSC. 4.iRODS checksums used to validate data integrity Base Configuration Phase 1 iCAT
14 PDS / SDSC Testing Timeline –03/08 – JPL/PO.DAAC begins testing production version of iRods s/w: PO.DAAC varied parameters: –Server separate from JPL/EN servers –Same JPL network and network speeds –File sizes varied from 0.5MBytes to 17GBytes –Single file transfer; multi-file transfer –Single thread transfer; multi-thread (up to 16 threads) –Network speed (100Mbits, 1Gbit) PO.DAAC testing indicates random data corruption Transfer rates: –Using 64 bit system: –PO.DAAC to SDSC; 0.5 GByte file: Mbytes/sec –PO.DAAC to SDSC; 17 GByte file: 30 Mbytes/sec –03/08 – JPL/PO.DAAC tests using iperf s/w: Tested iperf between: JPL to SDSC (error detected) Tested iperf between: Raytheon to SDSC (no errors) Tested iperf between: UCLA to SDSC (no errors) Tested iperf between: JPL to UCLA (random errors detected) Tested iperf between: JPL to SDSC (error detected) Tested iperf between: JPL/EN to JPL/PO.DAAC (error detected) –03/08 – Testing indicates problem within the JPL network
15 PDS / SDSC Testing Timeline –03/08 – PDS/PPI begins testing production version of iRods s/w PPI varied parameters: –Server separate from JPL servers –UCLA network PPI testing shows no data corruption Transfer rates: –from PPI to SDSC (0.5 TBytes; 300 transfers): ~7-8 Mbytes/sec –from SDSC to PPI: ~TBD Mbytes/sec Impressions of using iRods s/w: –Easy to install and configure –Descent transfer rates
16 PDS / SDSC Testing Timeline –03/27/08 – JPL/JPL.NET: identifies router, outside of JPL and between JPL and SDSC, is dropping bits. re-routes traffic to bypass errant router –03/27/08 – JPL/PO.DAAC re-tests data transfer from JPL to SDSC: no data corruption –03/27/08 – PDS/EN asks GEO and IMG/USGS to participate in testing production version of iRods s/w: ed GEO and IMG/USGS SDSC contact information and start- up procedures –GEO will provide baseline for iRods operating on Windows
17 PDS / SDSC Testing Timeline –02/08 – SDSC releases production version of iRods s/w Version 1.0; more robust / stable version Concurrently being tested by other SDSC clients: –Maryland (not SBN), Wisconsin, and UCLA (not PPI) –02/08 – JPL/EN begins testing production version of iRods s/w EN varied parameters: –2 different servers using different H/W and OS –File sizes varied from 500MBytes to 2GBytes –Single file transfer; multi-file transfer –Single thread transfer; multi-thread (up to 16 threads) –Network speed (100Mbits, 1Gbit) EN testing shows checksum errors on file transfer Transfer rates: –using 32 bit system on 2 GB file: »from JPL to SDSC: ~7.2 Mbytes/sec »from SDSC to JPL: ~8.3 Mbytes/sec –using 64 bit system on 2 GB file: »from JPL to SDSC: ~11.5 Mbytes/sec »from SDSC to JPL: ~26.2 Mbytes/sec