Download presentation
Presentation is loading. Please wait.
Published byGavin Rogers Modified over 9 years ago
1
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm For the Fermi Linux Vendor Qualification Taskforce
2
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 2 OUTLINE Fermilab Hardware Procurement Strategy Goals of Qualification Procedures of Qualification Results of Qualification
3
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 3 SUMMARY The 2003 Fermi Linux Server Vendor Qualification focused on 1U Intel servers. First phase was a technical evaluation which identified 18 technically qualified vendors. All these vendors participate in a price-performance bid—the top five make the vendor list. (Currently ongoing). We remember all technically qualified vendors and rotate them in as necessary. We are not making a new qualified desktop vendor list at this time Public web page: http://www-oss.fnal.gov/scs/public/qualify2003
4
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 4 Members of Fermi Linux Server Vendor Qualification Taskforce: The taskforce involved personnel from five different departments plus key members of management. All major purchasers of server hardware were represented. Also represented were the computer room logistics staff. Members: Steven Timm (chair), Margaret Greaney, Troy Dawson, Lance Weems, Hans Wenzel, Bruce Karrels, Don Holmgren, Phil Lutz, Stan Naymola, Mark Kaletka, Gerry Bellendir.
5
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 5 Fermi Hardware Procurement Strategy Buy a hardware solution fully integrated as possible, including installation Identify vendors that know Fermilab requirements and are willing to work with Fermi Linux. Replacement parts via 3 year warranty, service provided by Fermilab.
6
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 6 Fermi Linux Vendor List--History Two previous Fermi Linux qualifications, 1999 and 2001. 1999—desktops as farm workers, 5 vendors 2001—separate vendor lists for desktops and 2U rackmount servers Also two special evaluations for 2U rackmounts and AMD. Vendor list used in all major Fermi acquisitions, ~1500 machines from 1999-2002. Also used by outside groups: KEK, INFN, Northwestern, MIT, Geneva, Carnegie Mellon, Pittsburgh, Edinburgh, others
7
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 7 Evaluation: performance/price Overriding goal has been to get the best performance possible at the lowest price. We have succeeded well—From 1999 to 2002 Fermi cycles per dollar increased by a factor of 6—Moore’s law should have only given us a factor of four. Users are happy with quantity of computing that they got for their money. But still, in this evaluation, we are looking for better long term reliability, not race to the bottom for price only.
8
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 8 Evaluation: Performance/price Problem: One node not the best test of long-term price/performance by a company. Small businesses best able to take time to follow directions of evaluation process and give support. Small businesses not always able to deliver large orders in timely manner with good initial quality. Single node prices not a good predictor of bid level on a real bid—and we shouldn’t be asking anyway. Address by: getting technical qualification done first, then doing a price/performance bid.
9
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 9 Evaluation: Vendor attrition Some vendors on list have gone out of business Others disqualified for bad performance Others stopped bidding on their own, or bid ridiculously high Address by: – Select vendor list on performance/price basis from all those technically qualified. – Keeping track of all technically qualified vendors, add to list if necessary – Supplement list if special hardware (AMD, blades, desktop) required.
10
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 10 Evaluation: Initial quality Problem: Going too low on the price curve: Sometimes vendors bid too low and try to deliver poor quality systems Addressed, from the beginning, with tough 30-day acceptance test and “lemon law” In various cases Fermilab has required vendors to do swaps on all units of PS, case, motherboard, disk drives, and racks. Cost of Fermi labor to resolve the problem less than difference between the winning bid and the next highest bid. All issues have been resolved through this process and the systems have all had productive lives. NOW—also address with references and hard numbers on initial quality.
11
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 11 Evaluation: Components Problem: Rapidly changing components In commodity market, components change rapidly. From beginning of eval to issuance of purchase order—about six months CPU speeds go up, cases change. Impossible to track for laptop, difficult to track for desktop. OK for server market but results in higher heat loads and current draws. ADDRESS by thermal specs that are broad enough so that if there are problems, vendor still has to fix.
12
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 12 Goals We want to identify vendors who are best capable to deliver rackmounted solutions – Competent in Linux – Build quality 1U Servers – Can integrate into rackmount environment with good thermals in a timely and professional manner – Have high performance – Have good support and troubleshooting
13
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 13 Vendor Selection Existing vendors on Fermi Linux list Sales to other Fermi Departments Advertisements at trade shows Survey of other DOE labs at HEPiX Vendor’s direct contact to Fermilab asking to participate.
14
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 14 Chronology We made contact with 45 vendors in all. 29 vendors attended Jan 28. info meeting 24 vendors submitted acceptable configuration on Feb. 4 21 vendors submitted acceptable benchmarks and were cleared to ship unit on Mar. 4—all got it here by Mar 11. 18 vendors identified as technically qualified
15
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 15 Specifications 1U Dual Intel Xeon, 2.4 GHz or faster 400 MHz front side bus or faster 1 GB RAM (RDRAM or DDR SDRAM) Disks: 1 20Gb system 2 x 40Gb data 100Mbit Ethernet Video CDROM, Floppy
16
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 16 Why just 1U Xeon AMD hardware shows high initial failure rate, high current, high heat. 1U is most challenging thermal case…if they can build 1U we believe they can build 2U. Intel chips are supposed to be faster than AMD at the moment Intel chips supposed to run cooler, draw less current. Simplicity—a platform we already mostly understand, just one from each vendor Space—we don’t have space to put so many 2U.
17
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 17 Linux Competence Vendor identifies hardware that’s compatible with Linux. (Much easier than it used to be). Vendor loads Fermi Linux onto evaluation node Have to configure lm_sensors on the node Runs our supplied test to check and see if they did it right. They are only allowed to ship the unit to Fermilab if it is right.
18
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 18 Electrical Electric current measured with ammeter at startup, idle, and full CPU load. Current draw ranges: 2.4GHz, 1.6-2.0A, 2.8 GHz, 2.0-2.3A, 3.06GHz, 2.1-2.35A Likely that with purchase of 2.8 or 3.06GHz machines we can only have seven machines per circuit, not eight as in the past. Those with higher current draw also tend to have more fans and be better internally cooled. Bright side—This current similar to 750MHz machines bought 3 years ago, 2.5x the performance for the same current.
19
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 19 Thermal Measured T from front to back of unit for all. Used internal temperature probes on each unique type of case. All units in evaluation much cooler than the 1U units bought in FY2002. Due to better thermal characteristics of Intel chip and many more added internal fans and blowers. “Northbridge” chipset chips in some machines ran hotter than the CPU’s. Important to watch size of heatsink on these chips. Still analyzing the data we took but confident that all units are acceptable.
20
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 20 Thermals continued
21
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 21 Quality 1U Servers Open each machine to verify quality of construction Run burn-in on each machine for two weeks Thermal measurements in real rack situation Electrical current measurements Verify all components meet specs.
22
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 22 Integration capabilities contd. Vendors are asked to submit sample proposal for full rack of systems Standard Fermi rack configuration is base of proposal but they can suggest extras. Goal is to (1) learn if they can integrate and (2) get new ideas on how to improve our setup. Also they must submit info on clusters they have installed before, with real temperature and reliability numbers.
23
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 23 Performance Vendors are supplied CD-ROM of CDF and D0 Benchmark Performance measured in Fermi Cycles where PIII 1 GHz=1000 Fermi Cycles. We repeat test when machine gets here QCD benchmark, seti@home, tiny also run.seti@home Would be ideal to use SPEC CPU2000—but published results not repeatable with compilers used by Fermi. Price doesn’t enter in technical evaluation.
24
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 24 Performance 3 CPU speeds measured, 2.4, 2.8, 3.06 GHZ, 1000 FermiCycles=PIII 1 GHz. Average performance, 1779, 2041, 2223 Fermi Cycles respectively. 400MHZ vs 533 MHz front side bus is 2.5% effect for farms software, much bigger for QCD. AMD MP2200+ --1771 Fermi Cycles Performance is projected to faster clock speeds in anticipation that some vendors will bid faster chips.
25
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 25 Support and Troubleshooting Each vendor gets software call—related to the configuration of Fermi Linux, solvable by E- mail or phone Each vendor gets hardware call—designed to trigger an on-site service call. We manufacture one if necessary. Points for prompt response, correct response.
26
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 26 Conclusions 18 technically qualified vendors—in alphabetical order Ace, Angstrom, APPRO, ASA, Aspen, Atipa, Concentric, Dell, HP, IBM, Koi, Penguin, Promicro, PSSC, Rackable, Racksaver, Richardson, Western Scientific Price/performance bid will weed them down to five. 21 vendors is too many to bring in, will be more discriminating next time.
27
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 27 Component issues: Boards OK: Intel SE7501 series, Supermicro X5DPx series, Tyan 2721, Tyan 2723 Both Tyan S2721-533 (Thunder i7501 Pro) and Tyan S2723 (Tiger i7501) had issues with 10/100 ethernet…resolved by changing resistor value on the board Some manufacturers offer cold-swap and hot-swap capabilities on drives, very nice. Issues in Intel E7501 chipset—slower disk throughput than some earlier chipsets, but adequate for our needs.
28
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 28 Price/performance bid All vendors who pass our technical requirements are participating in a price/performance bid on a small number of nodes (48) Top five will be the Fermi Linux Qualified Vendors We will keep track of all technically qualified vendors to replenish the list if – A vendor goes out of business – A vendor stops bidding, or bids consistently very high on Fermi RFP’s – A particular RFP requires special capacities—Myrinet, AMD, blade servers, desktop
29
21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 29 Future Plans Blade server evaluation coming up. – Requires change in install philosophy…no floppy, CDROM, serial console available. – Essential to address power and space concerns in Feynman and elsewhere.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.