San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

Slides:



Advertisements
Similar presentations
Computing Infrastructure
Advertisements

Deploying GMP Applications Scott Fry, Director of Professional Services.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
Novell Server Linux vs. windows server 2008 By: Gabe Miller.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Lesson 15 – INSTALL AND SET UP NETWARE 5.1. Understanding NetWare 5.1 Preparing for installation Installing NetWare 5.1 Configuring NetWare 5.1 client.
Teraserver Darrel Sharpe Matt Todd Rob Neff Mentor: Dr. Palaniappan.
Lesson 5-Accessing Networks. Overview Introduction to Windows XP Professional. Introduction to Novell Client. Introduction to Red Hat Linux workstation.
Lesson 12 – NETWORK SERVERS Distinguish between servers and workstations. Choose servers for Windows NT and Netware. Maintain and troubleshoot servers.
Chapter 3 Chapter 3: Server Hardware. Chapter 3 Learning Objectives n Describe the base system requirements for Windows NT 4.0 Server n Explain how to.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.
Tripwire Enterprise Server – Getting Started Doreen Meyer and Vincent Fox UC Davis, Information and Education Technology June 6, 2006.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
THE QUE GROUP WOULD LIKE TO THANK THE 2013 SPONSORS.
MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design Study Guide (70-443) Chapter 1: Designing the Hardware and Software.
Physical Media WG Report Physical Media Working Group August 13, 2007.
DB2 (Express C Edition) Installation and Using a Database
LAN / WAN Business Proposal. What is a LAN or WAN? A LAN is a Local Area Network it usually connects all computers in one building or several building.
Linux+ Guide to Linux Certification Chapter Seven Advanced Installation.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Elements of a Computer System Dr Kathryn Merrick Thursday 4 th June, 2009.
The Mass Storage System at JLAB - Today and Tomorrow Andy Kowalski.
WINDOWS XP PROFESSIONAL Bilal Munir Mughal Chapter-1 1.
Guide to Linux Installation and Administration, 2e1 Chapter 3 Installing Linux.
IBM Express Runtime Quick Start Workshop © 2007 IBM Corporation Install IBM Express Runtime Development Environment.
IRODS performance test and SRB system at KEK Yoshimi KEK Building data grids with iRODS 27 May 2008.
Local Area Networks (LAN) are small networks, with a short distance for the cables to run, typically a room, a floor, or a building. - LANs are limited.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
Appendix B Planning a Virtualization Strategy for Exchange Server 2010.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
1 Selecting LAN server (Week 3, Monday 9/8/2003) © Abdou Illia, Fall 2003.
Update from the Data Integrity & Tracking WG Management Council F2F UCLA Los Angles, CA August 13-14, 2007
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Experience with the Thumper Wei Yang Stanford Linear Accelerator Center May 27-28, 2008 US ATLAS Tier 2/3 workshop University of Michigan, Ann Arbor.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
29 Nov 2006PDS MC NSSDC MOU history PDS-NSSDC MOU circa 1994 Reviewed in Jan 2003, June 2004, Oct 2005, Nov 2006 Add words to remove HQ changes Change.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Data Integrity Issues: How to Proceed? Engineering Node Elizabeth Rye August 3, 2006
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
ClinicalSoftwareSolutions Patient focused.Business minded. Slide 1 Opus Server Architecture Fritz Feltner Sept 7, 2007 Director, IT and Systems Integration.
Physical Media WG Update November 30, Fall 2006 WG Status Initial technology plan Use Cases Decision Matrix Media Testing NMSU Website Physical.
Data Integrity Management Council F2F Washington, D.C. November 29-30, 2006
 System Requirements are the prerequisites needed in order for a software or any other resources to execute efficiently.  Most software defines two.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
MSG Laptop Computer (MLC) Transition to A31p Thinkpad.
12/19/01MODIS Science Team Meeting1 MODAPS Status and Plans Edward Masuoka, Code 922 MODIS Science Data Support Team NASA’s Goddard Space Flight Center.
A Solution for Maintaining File Integrity within an Online Data Archive Dan Scholes PDS Geosciences Node Washington University 1.
Guide to Linux Installation and Administration, 2e
The transfer performance of iRODS between CC-IN2P3 and KEK
Experiences and Outlook Data Preservation and Long Term Analysis
הכרת המחשב האישי PC - Personal Computer
Chapter 2 Objectives Identify Windows 7 Hardware Requirements.
Windows xp professional
Design Unit 26 Design a small or home office network
Web Server Administration
Press ESC for Startup Options © Microsoft Corporation.
Windows Virtual PC / Hyper-V
Chapter 2: Operating-System Structures
Chapter 2: Operating-System Structures
Presentation transcript:

San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008

2 PDS MC Policies on Media, Integrity and Backup Data Delivery Policy –Data producers shall deliver one copy of each archival volume to the appropriate Discipline Node using means/media that are mutually acceptable to the two parties. The Discipline Node shall declare the volume delivery complete when the contents have been validated against PDS Standards and the transfer has been certified error free. –The receiving Discipline Node is then responsible for ensuring that three copies of the volume are preserved within PDS. Several options for "local back-up" are allowed including use of RAID or other fault tolerant storage, a copy on separate backup media at the Discipline Node, or a separate copy elsewhere within PDS. The third copy is delivered to the deep archive at NSSDC by means/media that are mutually acceptable to the two parties. (Adopted by PDS MC October 2005) Archive Integrity Policy - –Each node is responsible for periodically verifying the integrity of its archival holdings based on a schedule approved by the Management Council. Verification includes confirming that all files are accounted for, are not corrupted, and can be accessed regardless of the medium on which they are stored. Each node will report on its verification to the PDS Program Manager, who will report the results to the Management Council. (Adopted by MC November 2006)

3 Background Presented repository survey to MC (August 2007) –MC resolution to move all data online –Identified the need for a geographically separate repository as an operational backup Began evaluating San Diego Super Computing Center (SDSC) –Currently managing data storage for a number of science- related programs –Provides high speed data exchange and mass storage management –Very inexpensive at $450/TB/Year for near-line –Explore as a secondary storage option for PDS –SDSC agreed to let PDS evaluate using their beta iRODS software –Determine if 500 GB / day is a realistic goal for moving data to a secondary repository

4 Timeline Fall 2007 –Evaluated iRODS beta software between JPL and SDSC –Good performance results in moving data –Captured metrics for different scenarios Size of files transferred Number of files transferred Time when files are transferred Network speed Network connection (e.g., 10/100 vs GigE) –Minor bugs found Decision to wait for a more stable release Winter 2008 –New release of iRODS client (February 2008) –Testing between JPL and SDSC Excellent performance (e.g., ~ Mbytes/sec) –Some network problems encountered between JPL and SDSC which required resolution –Extended test to PPI/UCLA with good results (e.g. transferred 1/2 terabyte over a 15 hour period using a real PDS data) –Extended test to GEO

5 Summary of Testing Results Reliability – Transferring data between JPL and SDSC is only partially successful as checksum failures appear randomly This appears to be a network routing issue which is being resolved by our network administrators – Transferring data between PPI and SDSC has not shown any problems Performance (using iRods) – JPL to SDSC – 0.5 to 5 GByte file = between 6 and 16 MBytes/sec – PPI to SDSC to 3 GByte file = between 7 and 8 MBytes/sec – SDSC to JPL – 0.5 to 5 GByte file = ~ 8 MBytes/sec – SDSC to PPI to 3 GByte file = ~ 5 MBytes/sec Usability –Installation and configuration has been straight-forward

6 Recommendations As more testing builds confidence in using iRods s/w: –Bring more Nodes into testing (May 2008): More diverse testing environments More opportunity to identify PDS wide systemic problems / areas of concern –Make final recommendation on using SDSC and options for making it operational (June 2008)

7 Backup Material

8 JPL/EN System Configuration JPL/EN Server –Memory: k –CPU: dual MHz processor. –O/S: Red Hat Enterprise Linux ES release 4 (Nahant) –Hard drive: 73 GB Ultra-160 Scsi drive –Ethernet card 0: negotiated 100baseTx-FD Network Bandwidth –100 MBits/sec – 1000 MBits/sec

9 JPL/PO.DAAC System Configuration PO.DAAC server –Sun X4100 –2 x Dual Core AMD Opteron(tm) Processor 285 –16GB of memory –Linux kernel ELsmp –Red Hat Enterprise Linux ES release 4 (Nahant Update 6) –Westwood+ turned on Network Bandwidth –100 MBits/sec – 1000 MBits/sec

10 PDS/PPI System Configuration PDS/PPI Server –Intel Pentium D 2.8 GHz –2 GB RAM –Kernel ver ELsmp –Red Hat Enterprise WS 4 update 4 Network Bandwidth –1000 MBits/sec

11 Repository Directories / Files Directory: /tsraid1/ Size: ~1.0036E12 Bytes clem1-1-rss-1-bsr-v1.0_s clem1-1-rss-5-bsr-v1.0_s clem1-l_e_y-a_b_u_h_l_n-2-edr-v1.0_s clem1-l-h-5-dim-mosaic-v1.0_s clem1-l-u-5-dim-basemap-v1.0_s clem1-l-u-5-dim-uvvis-v1.0_s eso-j-irspec-3-rdr-sl9-v1.0 eso-j-s-n-u-spectrophotometer-4-v2.0_s eso-j-susi-3-rdr-sl9-v1.0 go-a_c-ssi-2-redr-v1.0_s go-a_e-ssi-2-redr-v1.0_s go-j_jsa-ssi-2-redr-v1.0_s go-j_jsa-ssi-4-redr-v1.0 go-j-nims-2-edr-v2.0 go-v_e-ssi-2-redr-v1.0_s go-v-rss-1-tdf-v1.0_s group_clem_xxxx_m group_dmgsm_100x_m group_dmgsm_200x_m group_go_00xx_m group_go_100x_m group_go_1101_m group_go_110x23_m group_go_110x_m group_gp_0001_m group_hal_0024_m group_hal_0025_m group_hal_0026_m group_hal_00xx_m group_lp_00xx_m group_mg_0xxx_m group_mg_2401_m group_mg_5201_m group_mgs_0001_m group_mgs_100x_m group_mgsa_0002_m group_mgsl_000x_m group_mgsl_20xx_m group_sl9_0001_m group_sl9_0004_m hst-j-wfpc2-3-sl9-impact-v1.0_s hst-s-wfpc2-3-rpx-v1.0_s ihw-c-lspn-2-didr-crommelin-v1.0 ihw-c-lspn-2-didr-halley-v1.0 irtf-j_c-nsfcam-3-rdr-sl9-v1.0_s iue-j-lwp-3-edr-v1.0 lp-l-rss-5-gravity-v1.0_s lp-l-rss-5-los-v1.0_s mer1-m-pancam-2-edr-sci-v1.0_s mer1-m-pancam-3-radcal-rdr-v1.0_s mgn-v-rdrs-5-cdr-alt_rad-v1.0_s mgn-v-rdrs-5-dim-v1.0_s mgn-v-rdrs-5-gvdr-v1.0_s mgn-v-rdrs-5-midr-n-polar-stereogr-v1.0 mgn-v-rdrs-5-midr-s-polar-stereogr-v1.0 mgn-v-rss-5-losapdr-l2-v1.0_s mgn-v-rss-5-losapdr-l2-v1.13_s mgs-m-accel-0-accel_data-v1.0_s mgs-m-accel-2-edr-v1.1_s mgs-m-accel-5-profile-v1.2 mgs-m-moc-na_wa-2-dsdp-l0-v1.0_s mgs-m-moc-na_wa-2-sdp-l0-v1.0_s mgs-m-mola-1-aedr-10-v1.0_s mgs-m-mola-3-pedr-ascii-v1.0 mgs-m-mola-3-pedr-l1a-v1.0_s mgs-m-rss-1-cru-v1.0_s mgs-m-rss-1-ext-v1.0_s mgs-m-rss-1-map-v1.0_s mgs-m-rss-1-moi-v1.0_s mgs-m-rss-5-sdp-v1.0_s mgs-m-tes-3-tsdr-v1.0_s mgs-m-tes-3-tsdr-v2.0_s mpfl-m-imp-2-edr-v1.0_s mr9-m-iris-3-rdr-v1.0_s mr9_vo1_vo2-m-iss_vis-5-cloud-v1.0_s mssso-j-caspir-3-rdr-sl9-stds-v1.0 mssso-j-caspir-3-rdr-sl9-v1.0 mssso-j-caspir-3-rdr-sl9-v1.0_s near-a-grs-3-edr-erosorbit-v1.0 near-a-mag-2-edr-cruise1-v1.0 near-a-mag-2-edr-cruise2-v1.0 near-a-mag-2-edr-cruise3-v1.0 near-a-mag-2-edr-cruise4-v1.0 near-a-mag-2-edr-earth-v1.0 near-a-mag-2-edr-erosflyby-v1.0 near-a-mag-2-edr-erosorbit-v1.0 near-a-mag-2-edr-erossurface-v1.0 near-a-mag-3-rdr-cruise2-v1.0 near-a-mag-3-rdr-cruise3-v1.0 near-a-mag-3-rdr-cruise4-v1.0 near-a-mag-3-rdr-earth-v1.0 near-a-mag-3-rdr-erosflyby-v1.0 near-a-mag-3-rdr-erosorbit-v1.0 vg1-s-rss-1-rocc-v1.0_s vg1-ssa-rss-1-rocc-v1.0_s vg2-s-rss-1-rocc-v1.0_s

12 PDS / SDSC Testing Timeline –09/07 – JPL/EN writes Test Plan (for testing iRods s/w) Identifies / documents PDS/SDSC architecture Identifies the set of parameters that are to be varied : – - size of files transferred -- Mbytes to Gbytes – - number of files transferred -- 1 to hundreds – - time when files transferred -- peak / low network access periods – - network speed -- Mbits / Gbits – - and basically any other parameters that might affect reliability / transfer speed Identifies the set of parameters to be measured: –Transfer speed (Mbytes/sec) –Reliability (% of transmission failtures) –09/07 - JPL/EN tested pre-production version of iRods s/w EN testing shows checksum errors on file transfer EN & SDSC agree to halt testing until SDSC can provide stable s/w –10/07 – JPL/EN captured test results in Test Report

13 PDS/SDSC Testing Configuration SDSC Data Central Server PDS Server (32 Bit) (Rebuild of Starburst) PDS Archive Starbase IRODS Client Sam-QFS SDSC Repository PDS Repository ~100 MB/S Mounted ~1 GB/S Int2/OC12 Accounts PDS Dev First Tests 1.iRODS client installed at JPL 2.iRODS metadata catalog (iCAT) running on Postgresql at SDSC. 3. iRODS managed data transfer from JPL to Sam-QFS at SDSC. We would use parallel I/O to do the transfer, with the goal of moving the terabyte of data within a day. In effect, we would use iRODS to move a file from a disk at JPL to storage at SDSC. 4.iRODS checksums used to validate data integrity Base Configuration Phase 1 iCAT

14 PDS / SDSC Testing Timeline –03/08 – JPL/PO.DAAC begins testing production version of iRods s/w: PO.DAAC varied parameters: –Server separate from JPL/EN servers –Same JPL network and network speeds –File sizes varied from 0.5MBytes to 17GBytes –Single file transfer; multi-file transfer –Single thread transfer; multi-thread (up to 16 threads) –Network speed (100Mbits, 1Gbit) PO.DAAC testing indicates random data corruption Transfer rates: –Using 64 bit system: –PO.DAAC to SDSC; 0.5 GByte file: Mbytes/sec –PO.DAAC to SDSC; 17 GByte file: 30 Mbytes/sec –03/08 – JPL/PO.DAAC tests using iperf s/w: Tested iperf between: JPL to SDSC (error detected) Tested iperf between: Raytheon to SDSC (no errors) Tested iperf between: UCLA to SDSC (no errors) Tested iperf between: JPL to UCLA (random errors detected) Tested iperf between: JPL to SDSC (error detected) Tested iperf between: JPL/EN to JPL/PO.DAAC (error detected) –03/08 – Testing indicates problem within the JPL network

15 PDS / SDSC Testing Timeline –03/08 – PDS/PPI begins testing production version of iRods s/w PPI varied parameters: –Server separate from JPL servers –UCLA network PPI testing shows no data corruption Transfer rates: –from PPI to SDSC (0.5 TBytes; 300 transfers): ~7-8 Mbytes/sec –from SDSC to PPI: ~TBD Mbytes/sec Impressions of using iRods s/w: –Easy to install and configure –Descent transfer rates

16 PDS / SDSC Testing Timeline –03/27/08 – JPL/JPL.NET: identifies router, outside of JPL and between JPL and SDSC, is dropping bits. re-routes traffic to bypass errant router –03/27/08 – JPL/PO.DAAC re-tests data transfer from JPL to SDSC: no data corruption –03/27/08 – PDS/EN asks GEO and IMG/USGS to participate in testing production version of iRods s/w: ed GEO and IMG/USGS SDSC contact information and start- up procedures –GEO will provide baseline for iRods operating on Windows

17 PDS / SDSC Testing Timeline –02/08 – SDSC releases production version of iRods s/w Version 1.0; more robust / stable version Concurrently being tested by other SDSC clients: –Maryland (not SBN), Wisconsin, and UCLA (not PPI) –02/08 – JPL/EN begins testing production version of iRods s/w EN varied parameters: –2 different servers using different H/W and OS –File sizes varied from 500MBytes to 2GBytes –Single file transfer; multi-file transfer –Single thread transfer; multi-thread (up to 16 threads) –Network speed (100Mbits, 1Gbit) EN testing shows checksum errors on file transfer Transfer rates: –using 32 bit system on 2 GB file: »from JPL to SDSC: ~7.2 Mbytes/sec »from SDSC to JPL: ~8.3 Mbytes/sec –using 64 bit system on 2 GB file: »from JPL to SDSC: ~11.5 Mbytes/sec »from SDSC to JPL: ~26.2 Mbytes/sec