Download presentation
Presentation is loading. Please wait.
Published byBryce Blankenship Modified over 8 years ago
1
CERN Disk Storage Technology Choices LCG-France Meeting April 8 th 2005 Tony.Cass@ CERN.ch
2
Tony.Cass@ CERN.ch 2 History 99/00 – EIDE Disk server evaluation 2001 – Problem with IBM disks –But no serious worries about model at that stage 2002 – Continued expansion 2003 – Major problem with new servers –Significant impact on servers (and support staff!) –Entire low-cost disk server model questioned. 2004 – Western Digital admit disk problem –1224 disks replaced. –Confidence in low-cost model restored. »Added 75 with 1800 SATA disks for 175TB usable capacity »Installed base exceeds 500TB across ~350 servers 2005 – Plan to buy 500TB @ <3CHF/GB (usable)
3
Tony.Cass@ CERN.ch 3 Cost Evolution usable (RAID 5) gross Jumbos 4U…8U rackmount FC attached disk array
4
Tony.Cass@ CERN.ch 4 (Some) Options SAN vs NAS SCSI vs FC vs SATA “In a box” vs server and trays “White box” vs major vendor
5
Tony.Cass@ CERN.ch 5 (Some) Options SAN vs NAS –SAN-style solutions not obviously advantageous for HEP use pattern—and require expensive infrastructure. –ISCSI maturing, but not there yet. –Could be great with a global file system, but that technology not mature either. –CERN choice: large scale storage as NAS »But exploring SAN for some special uses, e.g. tape->disk transfer SCSI vs FC vs SATA “In a box” vs server and trays “White box” vs major vendor
6
Tony.Cass@ CERN.ch 6 (Some) Options SAN vs NAS SCSI vs FC vs EIDE/SATA –Common view is that EIDE/SATA disks are less reliable. »Most reliable platters integrated with higher value electronics –No evidence for lower reliability of EIDE vs SCSI at CERN. »MTBF for both ~200-250,000hrs »Historically, bad batch of disks (SCSI or EIDE) every 2-3years –But, note some SATA disks are rated for intermittent operation, others for 24x7 operation. –CERN choice: SATA disks rated for 24x7 operation »(high capacity 7,200rpm, not lower capacity 10,000rpm) “In a box” vs server and trays “White box” vs major vendor
7
Tony.Cass@ CERN.ch 7 (Some) Options SAN vs NAS SCSI vs FC vs SATA “In a box” vs server and trays –Specialist disk trays seen by some as better quality than trays in PC server chassis. –Possibly true, but who is responsible if there are communication problems between the disk tray and the server? –CERN choice: integrated system from one vendor »“storage in a box” has won every tender to date. »work specifications to ensure high quality chassis & trays. “White box” vs major vendor
8
Tony.Cass@ CERN.ch 8 (Some) Options SAN vs NAS SCSI vs FC vs SATA “In a box” vs server and trays “White box” vs major vendor –Major vendors claim better reliability… –… but are unable to explain how they achieve this »the underlying components are generally identical –CERN “choice”: free competition and white boxes win »but some white boxes are more equal than others; unfortunately CERN rules make prior selection of companies based on proven past performance rather difficult »Long term relationship with at least 3 suppliers would be good.
9
Tony.Cass@ CERN.ch 9 RAID and filesystems Originally mirrored the disks; redundancy with maximum performance (#independent spindles) –mirrored EIDE still cheaper than SCSI per usable byte! Gradually became less worried about disk performance –required I/O bandwidth per TB falls with each tender; »current systems can saturate GigE interface »disk sizes continue to increase »observed performance still below server capability Current CERN choice: hardware Raid5 with xfs –Hardware Raid5 performance has greatly improved –Reiserfs still immature –Some tests of hardware Raid5 with software Raid0; performance poor.
10
Tony.Cass@ CERN.ch 10 Hardware will fail
11
Tony.Cass@ CERN.ch 11 Hardware will fail On delivery or due to systematic h/w problem –CERN choice: dual source major procurements In service –RAIDx –Hot spares »Probability of 2 nd disk failure during RAID array rebuild is u a concern for 250GB disks u likely a significant problem for 400GB disks u a certainty for 1TB disks in large scale installations? However, this is a concern for any architecture with an equivalent number of disks. –Remember: CERN sees equivalent MTBF figures for SCSI and EIDE disks. »Although SCSI disks are lower capacity and higher bandwidth so reducing window for 2 nd failure. –Be prepared…
12
Tony.Cass@ CERN.ch 12 Summary CERN –will have a (Gigabit-)Ethernet based NAS configuration for bulk disk storage for LHC –is not convinced TCO concerns justify a higher initial purchase cost for SCSI/FC disk –buys (and will buy) SATA disk from the lowest bidder, but »with as much pre-selection of bidders as we are allowed, and »dual sourcing all purchases to minimise risk of major problems due to systematic failures. »with warranty (3years) to encourage initial quality –is focussing strongly on »redundancy, »rigour and organisation in operational procedures, and on »anonymity for disk servers, just as for CPU servers.
13
Tony.Cass@ CERN.ch 13 Summary CERN –will have a (Gigabit-)Ethenet based NAS configuration for bulk disk storage for LHC –is not convinced TCO concerns justify a higher initial purchase cost for SCSI/FC disk –buys (and will buy) SATA disk from the lowest bidder, but »with as much pre-selection of bidders as we are allowed, and »dual sourcing all purchases to minimise risk of major problems due to systematic failures. –is focussing strongly on »redundancy, »rigour and organisation in operational procedures, and on »anonymity for disk servers, just as for CPU servers. These points are valid whatever the disk technology!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.