Computer Hardware and Procurement at CERN Helge Meinhard (at) cern ch HEPiX fall SLAC.

Slides:



Advertisements
Similar presentations
Storage Procurements Some random thoughts on getting the storage you need Martin Bly Tier1 Fabric Manager.
Advertisements

Module – 3 Data protection – raid
Engenio 7900 HPC Storage System. 2 LSI Confidential LSI In HPC LSI (Engenio Storage Group) has a rich, successful history of deploying storage solutions.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
Reliability Week 11 - Lecture 2. What do we mean by reliability? Correctness – system/application does what it has to do correctly. Availability – Be.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
Site report: CERN Helge Meinhard (at) cern ch HEPiX fall SLAC.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Secondary Storage Unit 013: Systems Architecture Workbook: Secondary Storage 1G.
CERN IT Department CH-1211 Genève 23 Switzerland t Options for Expanding CERN’s Computing Capacity Without A New Building Medium term plans.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
Storage Area Networks The Basics. Storage Area Networks SANS are designed to give you: More disk space Multiple server access to a single disk pool Better.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
HEPiX 21/05/2014 Olof Bärring, Marco Guerri – CERN IT
Storage Survey and Recent Acquisition at LAL Michel Jouvin LAL / IN2P3
Day 10 Hardware Fault Tolerance RAID. High availability All servers should be on UPSs –2 Types Smart UPS –Serial cable connects from UPS to computer.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Elements of a Computer System Dr Kathryn Merrick Thursday 4 th June, 2009.
Tier 1A Storage Procurement 2001/2002 Andrew Sansum CLRC eScience Centre.
1 A Basic R&D for an Analysis Framework Distributed on Wide Area Network Hiroshi Sakamoto International Center for Elementary Particle Physics (ICEPP),
Computing/Tier 3 Status at Panjab S. Gautam, V. Bhatnagar India-CMS Meeting, Sept 27-28, 2007 Delhi University, Delhi Centre of Advanced Study in Physics,
Introducing Snap Server™ 700i Series. 2 Introducing the Snap Server 700i series Hardware −iSCSI storage appliances with mid-market features −1U 19” rack-mount.
Site report: CERN Helge Meinhard (at) cern ch HEPiX spring CASPUR.
Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.
Console Infrastructure in the CERN Computer Centre HEPiX / HEPNT Autumn 2003 Vancouver Mostly work done by
Purchase of micro computers Henry Alexandre, Deputy director / Central procurement agency Cyprus, 2005, July 4th.
Planning and Designing Server Virtualisation.
DAC-FF The Ultimate Fibre-to-Fibre Channel External RAID Controller Solution for High Performance Servers, Clusters, and Storage Area Networks (SAN)
TENDERING PROCEDURE IT- 3927/EN
Storage Systems Market Analysis Dec 04. Storage Market & Technologies.
IST Storage & Backup Group 2011 Jack Shnell Supervisor Joe Silva Senior Storage Administrator Dennis Leong.
EGEE is a project funded by the European Union under contract IST HellasGrid Hardware Tender Christos Aposkitis GRNET EGEE 3 rd parties Advanced.
ALMA Archive Operations Impact on the ARC Facilities.
Chapter 12 – Mass Storage Structures (Pgs )
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Phase II Purchasing LCG PEB January 6 th 2004 CERN.ch.
IDE disk servers at CERN Helge Meinhard / CERN-IT CERN OpenLab workshop 17 March 2003.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
RAL Site report John Gordon ITD October 1999
Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Automatic server registration and burn-in framework HEPIX’13 28.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
1 LHCC RRB SG 16 Sep P. Vande Vyvre CERN-PH On-line Computing M&O LHCC RRB SG 16 Sep 2004 P. Vande Vyvre CERN/PH for 4 LHC DAQ project leaders.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
Working group for optimized Computing Capacity Lifecycle Planning Created after ISM meeting 16 th of June Members: Tim B, Eric G, Helge, Massimo, Carles,
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
ClinicalSoftwareSolutions Patient focused.Business minded. Slide 1 Opus Server Architecture Fritz Feltner Sept 7, 2007 Director, IT and Systems Integration.
Reliability of KLOE Computing Paolo Santangelo for the KLOE Collaboration INFN LNF Commissione Scientifica Nazionale 1 Roma, 13 Ottobre 2003.
Parts of the computer Deandre Haynes. The Case The Case This Case is the "box" or "chassis" that holds and encloses the many parts of your computer. Its.
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
The RAL Tier-1 and the 3D Deployment Andrew Sansum 3D Meeting 22 March 2006.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016.
PROCUREMENT RULES FOR EXPERIMENTS AT CERN Dante Gregorio CERN Procurement Service.
26. Juni 2003Bernd Panzer-Steindel, CERN/IT1 LHC Computing re-costing for for the CERN T0/T1 center.
IT procurement in Research Infrastructures Standard Processes Needs Other considerations 1st February 2013Jean-François Perrin - Institut Laue Langevin.
CERN Disk Storage Technology Choices LCG-France Meeting April 8 th 2005 CERN.ch.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Extreme Scale Infrastructure
Storage Area Networks The Basics.
Cluster Status & Plans —— Gang Qin
Olof Bärring LCG-LHCC Review, 22nd September 2008
Procurements at CERN: Status and Plans
Design Unit 26 Design a small or home office network
Cost Effective Network Storage Solutions
CS 295: Modern Systems Organizing Storage Devices
IT-4464/EN Bidders conference
Presentation transcript:

Computer Hardware and Procurement at CERN Helge Meinhard (at) cern ch HEPiX fall SLAC

2 Helge Meinhard (at) Hardware procurement at CERN Outline Procedures Hardware (being) procured Power measurements Observations

Procedures

4 Helge Meinhard (at) Hardware procurement at CERN Constraints (1) CERN is an international organisation with strict administrative rules Competitive tendering required covering (at least) member states No way to avoid for commodity equipment Lowest compliant bid wins No negotiations about added value of higher offers

5 Helge Meinhard (at) Hardware procurement at CERN Constraints (2) Different procedures depending on expected volume < 10’000 CHF: IT seeks 3 offers < 200’000 CHF: Formal price enquiry by purchasing service. Four weeks response time < 750’000 CHF: Formal call for tender preceded by market survey. Six weeks response time > 750’000 CHF: As < 750’000 CHF, plus approval by CERN’s Finance Committee (5 sessions/year, papers ready two months in advance) (1 CHF = 0.78 USD = 0.65 EUR)

6 Helge Meinhard (at) Hardware procurement at CERN Our problems Procedures badly adapted to quickly evolving computing market Difficult to give preference to “good”, reliable equipment

7 Helge Meinhard (at) Hardware procurement at CERN Our choices (1) For significant purchases (> 100 kCHF) we require (a) sample system(s) with the tender for big tenders on CERN’s request for small tenders Tenders include 3 years on-site warranty for hardware Typical requirements: 4 working hours response / 12 working hours repair for critical machines 3 working days response / 5 working days repair for farm nodes Supplier can subcontract on-site warranty

8 Helge Meinhard (at) Hardware procurement at CERN Our choices (2) Payment within 30 days after provisional acceptance on receipt of bank guarantee of 5% of purchase sum valid until end of warranty period Delivery within 6 weeks, penalty for late delivery: 2% of purchase sum per complete week, max. 10%

9 Helge Meinhard (at) Hardware procurement at CERN Our choices (3) If more than 10% systems fail during acceptance or during first month after: right to return the whole batch If a system fails 3 or more times during any 6 months’ period, right to request complete replacement of system If more than 20% of any component fail during any 6 months’ period, right to request complete replacement of this component across batch If CERN adds third-party devices, no impact on warranty obligations for system as delivered

10 Helge Meinhard (at) Hardware procurement at CERN Our choices (4) If justified by volume, procure from two suppliers (lowest and second-lowest compliant) Better protection if one delivers crap or nothing at all Better chance for companies to win an order Increased workload on our part

11 Helge Meinhard (at) Hardware procurement at CERN Example of a procurement Procurement of equipment worth < 750 kCHF Approval by Finance Committee not needed Market survey already done Market survey can cover different types of equipment Valid for 1 year If not done yet, add ~ 16 weeks

12 Helge Meinhard (at) Hardware procurement at CERN Steps (1) Fix scope2 w Write technical, commercial docs3 w IT-internal review Revise technical, commercial docs2 w Specification meeting Revise technical, commercial docs1 w Tender out Deadline for replies6 w Opening of replies1 w (Total so far: 15 weeks, at best compressible to 12 weeks) Typical case

13 Helge Meinhard (at) Hardware procurement at CERN Steps (2) (Total from previous slide: 15 w, min. 12 w) Technical analysis of replies1 w Visual inspection, mounting1 w Benchmarks, reports3 w Technical clarifications1 w Purchase request, order2 w Delivery7 w Preliminary acceptance6 w Total: 36 weeks, compressible to 30 weeks Typical case

Hardware (being) procured

15 Helge Meinhard (at) Hardware procurement at CERN Objectives Cover existing needs with as few different models and as few procurement procedures as possible Closely follow technology and market evolution and satisfy requirements with modern hardware at low cost contradiction

16 Helge Meinhard (at) Hardware procurement at CERN Fabric Infrastructure and Operations (1) RedHat 7.3 phased out on public services Campaign on storage nodes far advanced New in machine room since Karlsruhe: 200 farm PCs (dual Nocona): in production 116 disk servers (> 5 TB usable each, total of 900 TB gross capacity): part in production, part under acceptance test 112 “midrange servers”: under acceptance test 32-node Infiniband-based cluster for Theory Refurbishment of machine room proceeding LHS being populated, but power remains limited Talk From CERN site report 2005/10/11

17 Helge Meinhard (at) Hardware procurement at CERN Hardware being procured (1) Large volumes – several times < 750 kCHF per year “Farm PCs” – non-redundant, cheap dual- processor work horses “Disk servers” – storage-in-a-box systems with many SATA disks for streaming applications

18 Helge Meinhard (at) Hardware procurement at CERN Hardware being procured (2) Medium-size volumes – once < 750 kCHF per year or once or several times < 200 kCHF per year “Midrange servers” – redundant building blocks for specific applications “Tape servers” – midrange servers with an FC interface “Disk arrays” – autonomous RAID units with FC uplinks SAN infrastructure (most notably FC switches) Head nodes for serial console infrastructure “Small disk servers”, somewhere between disk servers and midrange servers Miscellaneous

19 Helge Meinhard (at) Hardware procurement at CERN Specifications: Farm PCs (1) 2 boxed Intel Noconas of 2.8 GHz Mainboard: BMC (IPMI 1.5 or higher) PXE, USB boot BBS menu Console redirection Configurable to stay off on AC power loss 2 GB ECC memory From mainboard manuf. approved list Upgradable to 4 GB without removing modules

20 Helge Meinhard (at) Hardware procurement at CERN Specifications: Farm PCs (2) 1 disk > 140 GB, IDE not permitted Certified for 24/7, 3 y warranty by disk manuf. 1 GigE providing PXE and IPMI access 19” chassis max. 4 U, with rails Power, reset button Power, disk activity LED Power supply supporting machine + 50 W Active PFC C13 to C14 LSZH power cord Guaranteed to run under RHEL 3 (i386 and x86_64) Delivery within 6 weeks from dispatch of order

21 Helge Meinhard (at) Hardware procurement at CERN Specifications: Disk server (1) 1 or 2 boxed Intel Xeon with EM64T Mainboard as for Farm PCs Now adding support for memory mirroring Memory as for Farm PCs General requirements for disks etc. ≥ 7200 rpm, no EIDE, 3 y warranty, certified for 24/7 by manufacturer Metallic hot-swap trays certified by chassis manuf. Indicators for power and activity for each tray PCB backplanes for disks, multilane cabling “Intelligent” RAID controllers

22 Helge Meinhard (at) Hardware procurement at CERN Specifications: Disk server (2) System disks: 2 x ≥ 140 GB mirrored Data disks: all identical Redundant RAIDs with hot spares (min. 1/15) Total usable capacity per system above 5 TB Battery buffer if controller with active cache 1 GigE providing required performance, PXE, IPMI access 19” chassis rack-mountable with rails Min. 40 TB usable in 42 U high rack Power supply: N+1 redundant, active PFC Guaranteed to run under RHEL 3 (i386 and x86_64) Delivery within 6 weeks from dispatch of order

23 Helge Meinhard (at) Hardware procurement at CERN Specifications: Disk server (3) Performance: memory to disk: iozone with 16 GB files and 256 kb record size Single stream: 40 MB/s write, 40 MB/s read Multi-stream (at least 10): 115 MB/s write, 170 MB/s read (*) Memory to network: iperf Single stream: 100 MB/s write, 100 MB/s read Two streams: 110 MB/s write, 110 MB/s read Two streams in, two streams out: 145 MB/s

24 Helge Meinhard (at) Hardware procurement at CERN Specifications: Disk server (4) Global (disk to network) performance: At least 10 clients transferring 2 GB files via rfio Reading from system: 95 MB/s (*) Writing to system: 90 MB/s (*) (*): Requirements scale linearly with usable capacity, numbers for 5000 GB usable

Power measurements Done by Andras Horvath, CERN

26 Helge Meinhard (at) Hardware procurement at CERN Power measurements

Observations

28 Helge Meinhard (at) Hardware procurement at CERN Observations (1) Profile of winning companies Tier-1 suppliers competing with large integrators Small ‘round the corner companies eliminated at Market Survey stage Almost always the integrators win Specially tailored solutions responding to our specifications Prices of Tier-1s rather high in Europe

29 Helge Meinhard (at) Hardware procurement at CERN Observations (2) Stress test as (important) part of the acceptance test Introduced ~ 2 years ago (triggered by presentations from SLAC and FNAL at HEPiX) Very useful Based on va-ctcs No longer sufficiently actively maintained Large number of false positives Looking for a replacement

30 Helge Meinhard (at) Hardware procurement at CERN Observations (3) Pushing these procedures through requires dedicated (and knowledgeable) person power Not obvious to run multiple procedures in parallel In particular, if things go wrong, e.g. stress test fails

31 Helge Meinhard (at) Hardware procurement at CERN Summary Computer hardware procurement is an excellent experimental confirmation of two fundamental laws of human nature Murphy: “Everything that can go wrong will go wrong.” Hoffstaedter: “Things always take longer than you think, even if you take into account Hoffstaedter’s law.”