Linux Servers with JASMine K. Edwards, A. Kowalski, S. Philpott HEPiX May 21, 2003.

Slides:



Advertisements
Similar presentations
Computing Infrastructure
Advertisements

Martin Bly RAL CSF Tier 1/A RAL Tier 1/A Status HEPiX-HEPNT NIKHEF, May 2003.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Teraserver Darrel Sharpe Matt Todd Rob Neff Mentor: Dr. Palaniappan.
“GRID” Bricks. Components (NARA I brick) AIC RMC4E2-QI-XPSS 4U w/SATA Raid Controllers: 3ware- mirrored root disks (2) Areca- data disks, battery backed.
1 Andrew Hanushevsky - HEPiX, October 6-8, 1999 Mass Storage For BaBar at SLAC Andrew Hanushevsky Stanford.
September 22, 2009GONG H-alpha Review1 Data Acquisition System (DAS)
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005 TRIUMF SITE REPORT Corrie Kost Update since Hepix Spring 2005.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
1 RAL Status and Plans Carmine Cioffi Database Administrator and Developer 3D Workshop, CERN, November 2009.
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
BACKUP/MASTER: Immediate Relief with Disk Backup Presented by W. Curtis Preston VP, Service Development GlassHouse Technologies, Inc.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
NEC Computers SAS - Confidential - Oct RAID General Concept 1 RAID General Concept Auteur : Franck THOMAS.
1 Storage Refinement. Outline Disk failures To attack Intermittent failures To attack Media Decay and Write failure –Checksum To attack Disk crash –RAID.
Day 10 Hardware Fault Tolerance RAID. High availability All servers should be on UPSs –2 Types Smart UPS –Serial cable connects from UPS to computer.
Module 9 Review Questions 1. The ability for a system to continue when a hardware failure occurs is A. Failure tolerance B. Hardware tolerance C. Fault.
GeoVision Solutions Storage Management & Backup. ๏ RAID - Redundant Array of Independent (or Inexpensive) Disks ๏ Combines multiple disk drives into a.
Tier1 - Disk Failure stats and Networking Martin Bly Tier1 Fabric Manager.
The Mass Storage System at JLAB - Today and Tomorrow Andy Kowalski.
Terabyte IDE RAID-5 Disk Arrays David A. Sanders, Lucien M. Cremaldi, Vance Eschenburg, Romulus Godang, Christopher N. Lawrence, Chris Riley, and Donald.
MySQL and SSD: Usage Patterns MySQL Conference & Expo Apr-2011 Vadim Tkachenko Co-founder, CTO, Percona Inc Date, time, place: Reporter:
Operating in a SAN Environment March 19, 2002 Chuck Kinne AT&T Labs Technology Consultant.
Local IBP Use at Vanderbilt University Advanced Computing Center for Research and Education.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
Confidential1 Introducing the Next Generation of Enterprise Protection Storage Enterprise Scalability Enhancements.
Hosting on a managed server hosted by TAG  No technical support required  Full backup of database and files  RAID 5 system means that if a hard drive.
November 2, 2000HEPiX/HEPNT FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Tape logging- SAM perspective Doug Benjamin (for the CDF Offline data handling group)
SLAC Site Report Chuck Boeheim Assistant Director, SLAC Computing Services.
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
Tier1 Hardware Review Martin Bly HEPSysMan - RAL, June 2013.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
Performance tests of storage arrays Irina Makhlyueva ALICE DAQ group 20 September 2004.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
1618Tb SDDS storage BE-CO-IN. Proposal Upgrade operational LHC_DATA to 1.1 Tbytes with the same operational level : – New faster technology (SAS disks.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility (formerly CEBAF - The Continuous Electron Beam Accelerator Facility)
Disk Farms at Jefferson Lab Bryan Hess
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
RAL Site report John Gordon ITD October 1999
RAL Site Report John Gordon HEPiX/HEPNT Catania 17th April 2002.
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.
Raid Techniques. Redundant Array of Independent Disks RAID is a great system for increasing speed and availability of data. More data protection than.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Review of Computer System Organization. Computer Startup For a computer to start running when it is first powered up, it needs to execute an initial program.
January 30, 2016 RHIC/USATLAS Computing Facility Overview Dantong Yu Brookhaven National Lab.
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
Linux IDE Disk Servers Andrew Sansum 8 March 2000.
COMPASS Computerized Analysis and Storage Server Iain Last.
Tested, seen, heard… Andrei Maslennikov Rome, April 2006.
Tier1A Status Martin Bly 28 April CPU Farm Older hardware: –108 dual processors (450, 600 and 1GHz) –156 dual processor 1400MHz PIII Recent delivery:
Computer Hardware & Processing Inside the Box CSC September 16, 2010.
Batch Software at JLAB Ian Bird Jefferson Lab CHEP February, 2000.
FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.
10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
12/19/01MODIS Science Team Meeting1 MODAPS Status and Plans Edward Masuoka, Code 922 MODIS Science Data Support Team NASA’s Goddard Space Flight Center.
Integrating Disk into Backup for Faster Restores
Cluster Status & Plans —— Gang Qin
Video Security Design Workshop:
High-Performance Storage System for the LHCb Experiment
Presentation transcript:

Linux Servers with JASMine K. Edwards, A. Kowalski, S. Philpott HEPiX May 21, 2003

JASMine  JASMine JLab’s Mass Storage SystemJLab’s Mass Storage System  i.e. CASTOR, Enstore, … Distributed ServersDistributed Servers  Data Movers (tape and disk) Two tape drives per Data MoverTwo tape drives per Data Mover 600+GB of staging disk space (3 9840B tapes)600+GB of staging disk space (3 9840B tapes) Need fast access to/from disk to keep up with the 9940B tape drives and gigabit ethernetNeed fast access to/from disk to keep up with the 9940B tape drives and gigabit ethernet  Cache Servers (disk) 1-2TB file servers1-2TB file servers JASMine manages the filesJASMine manages the files  Copies from Data Movers via JASMine’s jcp protocol User access via NFS (read-only)User access via NFS (read-only)

Lastest Data Mover Operating SystemOperating System  RedHat 7.3, kernel xfs  XFS File System HardwareHardware  Dual 2.2GHz Xeon CPUs  SuperMicro P4DPE Motherboard  2 GBytes RAM  2 LSI Logic MegaRaid raid controllers  14 Seagate 73GB disk drives (hot swap)  Qlogic 2342 dual port fiber card($$)  B tape drives  Intel PRO/1000XT Server Ethernet Card  3U Chassis with N+1 power supplies  $14, US (without the B tape drives)

Disk Performance Tests  Used Standard Tests (Disktest, Bonnie++, IOZone) 4GB file size used4GB file size used Wanted to try the Fermi test (lack of time)Wanted to try the Fermi test (lack of time)  Parameters tested Write-through vs Write-back cache policyWrite-through vs Write-back cache policy Optimum disk read/write block sizesOptimum disk read/write block sizes RAID-5 vs. RAID-50 performanceRAID-5 vs. RAID-50 performance  RAID 5 array done in hardware (1 RAID card)  RAID 50 2 RAID-5 arrays done in hardware (1 per RAID card)2 RAID-5 arrays done in hardware (1 per RAID card) RAID-0 array done in softwareRAID-0 array done in software

Issues/Problems Discovered  LSI Logic MegaRAID raid controllers Vendor support only if you use standard RedHat kernelsVendor support only if you use standard RedHat kernels  These do not have XFS support RAID monitor software from LSI LogicRAID monitor software from LSI Logic  Causes SCSI Bus Resets  Occurs every 20 seconds (not changeable)  Throughput drops to 4-5MB/sec when occurring as it resets the bus and flushes cache Work AroundWork Around  Turn off Raid monitoring Without this, there is no real way to monitor the status of the disks and raid hardwareWithout this, there is no real way to monitor the status of the disks and raid hardware Disk failures go unnoticedDisk failures go unnoticed  Looking into Adaptec 2200S RAID cards

Disk Test Results  Disk Results Use Write-back cache on RAID cardUse Write-back cache on RAID card 32K block sizes are optimum32K block sizes are optimum Raid 50 was fastest (no real surprise)Raid 50 was fastest (no real surprise)  Idle System (1 reader or 1 writer) 210MB/sec disk read throughput210MB/sec disk read throughput 140MB/sec write throughput140MB/sec write throughput  Busy system (8 readers and 8 writers) 40MB/sec aggregate read throughput40MB/sec aggregate read throughput 110 MB/sec aggregate write throughput110 MB/sec aggregate write throughput

Tape Performance Testing  Used JASMine test program (Java) Double-bufferedDouble-buffered  Threads simultaneously reading and writing from/to the buffer Calculates/Verifies file checksumCalculates/Verifies file checksum Moves file between disk and tapeMoves file between disk and tape  Used real raw data from the experiments 2GB files2GB files HallA and HallC data in CODA formatHallA and HallC data in CODA format  Does not compress CLAS data in BOS formatCLAS data in BOS format  Does compress

Tape Test Results  No Issues or Problems Qlogic 2342 dual port fiber card works well with LinuxQlogic 2342 dual port fiber card works well with Linux Some Extra CPU required for checksumsSome Extra CPU required for checksums  Hyper-Threading really helps the performance here  9940B Results as Expected Direction does not matter (read/write)Direction does not matter (read/write)  30MB/sec if file is not compressible  Up to 45MB/sec if file is compressible Depends on the compressibility of the fileDepends on the compressibility of the file Two simultaneous copiesTwo simultaneous copies  30MB/sec each if file is not compressible (no change)  Expected 37.5MB/sec each for compressible file read from tape - Observed 30MB/sec each

Latest Cache Server Operating SystemOperating System  RedHat 7.3, kernel xfs  XFS File System HardwareHardware  Dual 2.0GHz Xeon CPUs  SuperMicro P4DPE Motherboard  2 GBytes RAM  2 3ware 7850 IDE/ATA RAID controllers (RAID-5)  16 Hot Swap Disk Drives Maxtor 160GB ATA133Maxtor 160GB ATA133 Western Digital 180GB ATA100Western Digital 180GB ATA100  Intel PRO/1000XT Server Ethernet Card  4U Chassis with N+1 power supplies  $9, US

Issues/Problems Discovered  Western Digital 180GB/200GB ATA100 Drives Drives go offline/idle (WD feature)Drives go offline/idle (WD feature)  3ware card thinks the drive died SolutionSolution  Get Disk Firmware Version 63.13F70 from Western Digital  Use Maxtor 160GB ATA133 drives

Experience with IDE/ATA Drives in General  High failure rates during the first two months of use 1-3 per week1-3 per week Need a longer burn in periodNeed a longer burn in period  Failure rates decrease after two months of use 1 every 6-8 weeks1 every 6-8 weeks marginal drives gone?marginal drives gone? They still fail more often than SCSI disksThey still fail more often than SCSI disks  Then again, we lost 2 SCSI disks today  Number of disks by type used in servers 191 SCSI191 SCSI 320 ATA320 ATA