SuperMike: LSU’s TeraScale, Beowulf-class Supercomputer Presented to LASCI 2003 by Joel E. Tohline, former Interim Director Center for Applied Information Technology and Learning October 29, 2003 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered.
10/29/03LACSI AY2001/02 was a Special Year! A decade of experience acquiring and utilizing parallel architectures in LSU’s Concurrent Computing Laboratory for Materials Simulation. Beowulf systems maturing –Commodity CPUs becoming good number crunchers –Network bandwidths, robustness, and size improving –Linux OS and message-passing software stabilizing Numerous LSU groups building Beowulf clusters Governor “Mike” Foster’s $23M Information Technology Initiative
10/29/03LACSI Building a Beowulf-Class Super- computer: Considerations [Fall ’01] What processor and what chipset? What motherboard? How much RAM and on-board disk space? What I/O features? What network interconnect? How many nodes and processors/node? What encasement and form-factor? What about power and A/C requirements? Physical footprint and location? How to assemble? Must be installed before July 1, 2002!
10/29/03LACSI Top500.org [Nov. 2001]
10/29/03LACSI NCSA’s Netfinity Cluster [Nov. ’01] Intel P III 1GHz processors 256 KB L2 cache 2 processors/node 512 nodes 1 TeraFlops peak Network: Myrinet 2000 – 100 Mbit Ethernet Actual aggregate speed: 594 GFlops So … at worst, same config. w/ 1.8 GHz procs. should give 1.0 TFlops comparable in speed to SDSC’s IBM Power3 (Blue Horizon)
10/29/03LACSI Competitive Invitation to Bid [Dec. ’01] Requested bids on two configurations –512 dual-processor > 1.7 GHz P III –512 dual-processor > 1.7 GHz dual Xeon w/ Intel’s 860 chipset Sought Experienced Vendors –Must be approved Myricom OEM –Must have previously installed a cluster containing at least 128 nodes
10/29/03LACSI Lowest Qualified Bid
10/29/03LACSI GHz clock 512 KB L2 cache Dual-processor Hyper-Threading 400 MHz system bus
10/29/03LACSI Intel ® E7500 Chipset [Feb. ’02]
10/29/03LACSI Tyan “Thunder i7500” Motherboard
10/29/03LACSI Myrinet ® Network
10/29/03LACSI Building a Beowulf-Class Supercomputer: Choices What processor and what chipset? –Intel 1.8 GHz P4 Xeon DP w/ E7500 chipset What motherboard? –Tyan “Thunder i7500” motherboard How much RAM and on-board disk space? –1 GB RAM and 40 GB IDE disk drive per processor What I/O features? –CD-ROM, floppy disk, 2 USB, video, keyboard/mouse –Fast Ethernet What network interconnect? –Mycom’s Myrinet 2000 – 2 Gbit bi-directional
10/29/03LACSI Building a Beowulf-Class Supercomputer: Choices How many nodes and processors/node? –512 nodes; 2 processors/node What encasement and form-factor? –Rack-mountable; 2U form-factor What about power and A/C requirements? –300 kilowatts Physical footprint? –1300 sq. ft. Location?
10/29/03LACSI Fred C. Frey Computing Services Center
10/29/03LACSI SuperMike: Assembled at LSU
10/29/03LACSI SuperMike’s Specs [Aug. ’02] 512 dual-processor nodes –3.6 TeraFlops peak –1 TeraByte RAM –40 TeraBytes Disk space Actual aggregate speed: TeraFlops So … actually 3.7 times faster than NCSA’s Netfinity! At time of installation, 11 th fastest machine in the world. Still 2 nd fastest machine among U.S. academic institution!
10/29/03LACSI Top500.org [June 2003]
10/29/03LACSI Intel Intrigued!
10/29/03LACSI SuperMike: Operation + Management OS: Linux Redhat 7.2 [kernel: smp] Queueing/Scheduler: PBS/PBS (moving to PBS/Maui) Global File System: PVFS Nodes/Network Monitoring Tools: xpbsmon/mute + cluster scripts –Original plan was to use “clusterware” management tools; but incompatible w/ PBS –Ganglia useful but only enabled, as needed, per node in order to avoid competition with simulations (some have suggested utilizing “clumon”) Storage: Fiber-channel connection to SANs + lto tape drives
10/29/03LACSI SuperMike Usage [Aug. 2003] – Node-days: 10,102/15,872 = 64% GroupApplicationNode-days% Mech. Eng.CFD Astrophys.CFD ChemistryQ. Chem Chem + PhysMD PhysicsG. Relativity Biol. + Exp. Phys. + Civ. Eng
10/29/03LACSI SuperMike Usage [Sept. 2003] – Node-days: 10,107/15,360 = 66% GroupApplicationNode-days% Mech. Eng.CFD Astrophys.CFD ChemistryQ. Chem Chem + PhysMD PhysicsG. Relativity Biol. + Exp. Phys. + Civ. Eng
10/29/03LACSI SuperMike: Usage Case Study Astrophysics: CFD –Hyperbolic + Elliptic PDEs; Home-grown Finite- Difference algorithm with explicit mpi –Last year’s 12-month, NRAC allocation was 480,000 service units = processor-hours –One month on SuperMike: 2514 node-days = 120,700 processor-hours –Typical job uses 128 processors = 1/8 of SuperMike’s capacity
10/29/03LACSI LSU Center for Applied Information Technology and Learning Ed Seidel, Director, hired July ’03 –From Max Planck’s AEI –Numerical Relativity –Grid Computing Upcoming name change: Center for Computation & Technology [CCT]