Presentation is loading. Please wait.

Presentation is loading. Please wait.

Procurements at CERN: Status and Plans

Similar presentations


Presentation on theme: "Procurements at CERN: Status and Plans"— Presentation transcript:

1 Procurements at CERN: Status and Plans
Helge Meinhard / CERN-IT HEPiX, DESY Hamburg 24 April 2007

2 Outline September 2006 deliveries December 2006 deliveries
Investigations for future tenders Coming next

3 September 2006: CPU servers (1)
First CERN procurement based on SPECint2000 total Some boundary conditions SPEC under controlled conditions Fixed OS: SLC4 i386 Fixed compiler: gcc as delivered with SLC4 Fixed compilation options: -O2 -fPIC … Initial result: Systems based on Pentium D Re-negotiated with winner for dual Woodcrest systems Same performance at lower total price

4 September 2006: CPU servers (2)
Got 112 machines 2 Intel Xeon 5150 (Woodcrest 2.66 GHz) 8 GB FBDIMM 1 x 160 GB SATA 1U chassis (Intel SR1500 with Alcolu board) Problems Memory giving ECC errors (900 modules exchanged) Few defective mainboards (exchanged), BMC firmware not well adapted (upgraded) Some dead hard disks (whole batch exchanged)

5 September 2006: Other procurements
200 midrange servers Dual Irvindale, 4 GB, 2 x 160 GB SATA, hardware RAID controller, redundant power supplies, 3U chassis Used as Oracle RAC front-ends, tape servers, special service nodes 50 small disk servers Dual Irvindale, 4 GB, 8 x 250 GB SATA, hardware RAID controller, redundant power supplies, 3U chassis 60 disk arrays Infortrend 2U chassis, single RAID controller, 8 x 250 GB SATA 16 Fibre Channel switches 16 ports 4 Gbit/s each, interconnected via 10 Gbit/s to form 64-port switches

6 December 2006 deliveries CPUs worth about 3M CHF (1.8M EUR, 2.4M USD)
Disk servers worth about 2M CHF (1.2M EUR, 1.6M USD) First server procurements that needed approval by CERN’s Finance Committee 9 months from sending tenders out to systems in production (cf. my talk at the SLAC meeting fall 2005)

7 December 2006: CPU servers (1)
Tender SPECint2000-based again Requiring 1U servers Fixed penalty per box 3 winners identified, basically same solution Batches of 235, 225, 190 machines Dual Intel Xeon 5160 (Woodcrest 3.0 GHz) 8 GB FBDIMM 1 x 160 GB SATA 1U chassis (2 batches based on Intel SR1500 chassis and Alcolu board, 1 proprietary solution)

8 December 2006: CPU servers (2)
Problems BMC firmware upgrade required on two batches One batch required a firmware upgrade of the disks Another batch had communication problems with our switches under SLC 4.4 (e1000 driver not recent enough) Temporarily solved by using previous generation switches

9 December 2006 CPUs: LINPACK (1)
Proposed and supported by Intel Theoretical max: 30 TFlops (48 GFlops per machine) Very little experience with parallel computing at CERN, in particular MPI Other systems in Top500 are either huge multiprocessor machines or clusters with low-latency interconnects; our setup: factor 60 higher latencies Standard machine setup with all daemons, no special tuning Intel MKL, Intel MPI

10 December 2006 CPUs: LINPACK (2)
Started with 530 machines, first tests run successfully with 256 machines One batch of three had to be taken out - networking problems Linpack tuning required to avoid bottlenecks in 10 Gbit/s uplinks from switches to routers In the end: 340 machines (1360 cores) achieving 8’329 GFlops N=530’000; NB=104; P=16; Q=85 25 GFlops per machine = 51% of theoretical max Would have been position 79 if submitted for SC fall 2006

11 December 2006: Disk Servers (1)
Tender as usual for capacity with boundary conditions Bandwidth requirements scaling with capacity means 5…6.2 TB usable per machine 2 winners identified 75 systems with single Woodcrest, 4 GB, 18 x 500 GB SATA, 2 x 160 GB SATA, 2 x 3ware 9550SX 86 systems with two Irvindales, 4 GB, 22 x 500 GB SATA, 2 x 160 GB SATA, 2 x 3ware 9550 SX

12 December 2006: Disk Servers (2)
Problems: System disks of one batch needed firmware upgrade Data disks of this batch currently investigated BBUs of RAID controllers: high failure rate Wrong boot order after stress testing Spurious messages in IPMI log

13 Current “Research” Activities
Blade systems Large disk servers Virtualisation Mostly interested in Xen Mostly in view of providing more virtual servers for dedicated services (virtualising midrange servers) Input to next procurement of midrange servers

14 Blade Systems Considered for our big CPU server tenders
Small footprint, manageability not important Power savings are – vendors and independent users claim 10…25% Got fully equipped blade chassis for evaluation from two major Tier1 suppliers Same configuration as our 1U servers of Dec Preliminary power measurements confirm savings of more than 20% Operationally no problem, a number of blades have run lxbatch service for weeks

15 Large Disk Servers (1) Motivation Constraints
Disks getting ever larger, 5 TB usable will be small soon CPUs ever more powerful Potential economies per TB Constraints Need a bandwidth of ~ 18 MByte/s per TB Hence everything more than 6 TB means multiple Gbit connections or a 10 Gbit link Networking group strongly advised against link aggregation

16 Large Disk Servers (2) Test systems purchased Issues found
Sun x4500, 48 x 500 GB SATA, 2 Opteron 285, 16 GB, Marvell SATA controllers, Intel 10GigE NIC 8U system, 38 x 500 GB SATA, 2 Woodcrest, 8 GB, 3ware RAID controllers (9650 series), Chelsio 10GigE NIC Issues found Sun machine not behaving very well under Linux, driver issues Other machine with a number of controller hiccups requiring several vendor interventions 10 GigE networking non-trivial (in particular with Chelsio cards under Linux)

17 Coming Next Massive tenders for CPU and disk servers
Procedures for midrange servers and disk arrays Dedicated purchase of Oracle front end servers and NAS applicances

18 Coming Next: CPU Servers
Given positive experience with blades and market evolution (e.g. Port Townsend / Atoka based solutions), gave up on strict 1U/server requirement Tender for 2’000’000 SPECint2000 per supplier (3 suppliers intended) Elements of adjudication: Purchase price Add 6 CHF per VA in primary AC circuit (20% idle, 80% full load) Add 300 CHF per mainboard (racking, network, …) Add 50 CHF per dedicated connection required for IPMI Fully equipped enclosures, price renormalised to 2’000’000 SPECint2000

19 Coming Next: Disk Servers
Specifications not changed much as compared with last time Four cores, 8 GB Requirements on synchroneous read/write operations (network to disk) Tender for 750 TB per supplier (2 suppliers intended) Elements of adjudication: Purchase price 1000 CHF per machine

20 Conclusions Principle of multiple sourcing for large orders has worked well Fraction of deliveries with problems appears to be increasing Some potential for optimisation Blades, large disk servers, virtualisation Exciting to follow technology and market trends, and to try and find the suit spot for us


Download ppt "Procurements at CERN: Status and Plans"

Similar presentations


Ads by Google