Download presentation
Presentation is loading. Please wait.
Published byAshlee McCormick Modified over 9 years ago
1
Linux Servers with JASMine K. Edwards, A. Kowalski, S. Philpott HEPiX May 21, 2003
2
JASMine JASMine JLab’s Mass Storage SystemJLab’s Mass Storage System i.e. CASTOR, Enstore, … Distributed ServersDistributed Servers Data Movers (tape and disk) Two tape drives per Data MoverTwo tape drives per Data Mover 600+GB of staging disk space (3 9840B tapes)600+GB of staging disk space (3 9840B tapes) Need fast access to/from disk to keep up with the 9940B tape drives and gigabit ethernetNeed fast access to/from disk to keep up with the 9940B tape drives and gigabit ethernet Cache Servers (disk) 1-2TB file servers1-2TB file servers JASMine manages the filesJASMine manages the files Copies from Data Movers via JASMine’s jcp protocol User access via NFS (read-only)User access via NFS (read-only)
3
Lastest Data Mover Operating SystemOperating System RedHat 7.3, kernel 2.4.20-xfs XFS File System HardwareHardware Dual 2.2GHz Xeon CPUs SuperMicro P4DPE Motherboard 2 GBytes RAM 2 LSI Logic MegaRaid 320-2 raid controllers 14 Seagate 73GB disk drives (hot swap) Qlogic 2342 dual port fiber card($$) 2 9940B tape drives Intel PRO/1000XT Server Ethernet Card 3U Chassis with N+1 power supplies $14,200.00 US (without the 2 9940B tape drives)
4
Disk Performance Tests Used Standard Tests (Disktest, Bonnie++, IOZone) 4GB file size used4GB file size used Wanted to try the Fermi test (lack of time)Wanted to try the Fermi test (lack of time) Parameters tested Write-through vs Write-back cache policyWrite-through vs Write-back cache policy Optimum disk read/write block sizesOptimum disk read/write block sizes RAID-5 vs. RAID-50 performanceRAID-5 vs. RAID-50 performance RAID 5 array done in hardware (1 RAID card) RAID 50 2 RAID-5 arrays done in hardware (1 per RAID card)2 RAID-5 arrays done in hardware (1 per RAID card) RAID-0 array done in softwareRAID-0 array done in software
5
Issues/Problems Discovered LSI Logic MegaRAID 320-2 raid controllers Vendor support only if you use standard RedHat kernelsVendor support only if you use standard RedHat kernels These do not have XFS support RAID monitor software from LSI LogicRAID monitor software from LSI Logic Causes SCSI Bus Resets Occurs every 20 seconds (not changeable) Throughput drops to 4-5MB/sec when occurring as it resets the bus and flushes cache Work AroundWork Around Turn off Raid monitoring Without this, there is no real way to monitor the status of the disks and raid hardwareWithout this, there is no real way to monitor the status of the disks and raid hardware Disk failures go unnoticedDisk failures go unnoticed Looking into Adaptec 2200S RAID cards
6
Disk Test Results Disk Results Use Write-back cache on RAID cardUse Write-back cache on RAID card 32K block sizes are optimum32K block sizes are optimum Raid 50 was fastest (no real surprise)Raid 50 was fastest (no real surprise) Idle System (1 reader or 1 writer) 210MB/sec disk read throughput210MB/sec disk read throughput 140MB/sec write throughput140MB/sec write throughput Busy system (8 readers and 8 writers) 40MB/sec aggregate read throughput40MB/sec aggregate read throughput 110 MB/sec aggregate write throughput110 MB/sec aggregate write throughput
7
Tape Performance Testing Used JASMine test program (Java) Double-bufferedDouble-buffered Threads simultaneously reading and writing from/to the buffer Calculates/Verifies file checksumCalculates/Verifies file checksum Moves file between disk and tapeMoves file between disk and tape Used real raw data from the experiments 2GB files2GB files HallA and HallC data in CODA formatHallA and HallC data in CODA format Does not compress CLAS data in BOS formatCLAS data in BOS format Does compress
8
Tape Test Results No Issues or Problems Qlogic 2342 dual port fiber card works well with LinuxQlogic 2342 dual port fiber card works well with Linux Some Extra CPU required for checksumsSome Extra CPU required for checksums Hyper-Threading really helps the performance here 9940B Results as Expected Direction does not matter (read/write)Direction does not matter (read/write) 30MB/sec if file is not compressible Up to 45MB/sec if file is compressible Depends on the compressibility of the fileDepends on the compressibility of the file Two simultaneous copiesTwo simultaneous copies 30MB/sec each if file is not compressible (no change) Expected 37.5MB/sec each for compressible file read from tape - Observed 30MB/sec each
9
Latest Cache Server Operating SystemOperating System RedHat 7.3, kernel 2.4.18-xfs XFS File System HardwareHardware Dual 2.0GHz Xeon CPUs SuperMicro P4DPE Motherboard 2 GBytes RAM 2 3ware 7850 IDE/ATA RAID controllers (RAID-5) 16 Hot Swap Disk Drives Maxtor 160GB ATA133Maxtor 160GB ATA133 Western Digital 180GB ATA100Western Digital 180GB ATA100 Intel PRO/1000XT Server Ethernet Card 4U Chassis with N+1 power supplies $9,000.00 US
10
Issues/Problems Discovered Western Digital 180GB/200GB ATA100 Drives Drives go offline/idle (WD feature)Drives go offline/idle (WD feature) 3ware card thinks the drive died SolutionSolution Get Disk Firmware Version 63.13F70 from Western Digital Use Maxtor 160GB ATA133 drives
11
Experience with IDE/ATA Drives in General High failure rates during the first two months of use 1-3 per week1-3 per week Need a longer burn in periodNeed a longer burn in period Failure rates decrease after two months of use 1 every 6-8 weeks1 every 6-8 weeks marginal drives gone?marginal drives gone? They still fail more often than SCSI disksThey still fail more often than SCSI disks Then again, we lost 2 SCSI disks today Number of disks by type used in servers 191 SCSI191 SCSI 320 ATA320 ATA
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.