Building a High-performance Computing Cluster Using FreeBSD BSDCon '03 September 10, 2003 Brooks Davis, Michael AuYeung, Gary Green, Craig Lee The Aerospace.

Slides:



Advertisements
Similar presentations
User Documentation.  You cannot build a system for a client and leave them without adequate documentation  Computer systems are complex, require configuration.
Advertisements

♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Beowulf Supercomputer System Lee, Jung won CS843.
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
Information Technology Center Introduction to High Performance Computing at KFUPM.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
1 Week #1 Objectives Review clients, servers, and Windows network models Differentiate among the editions of Server 2008 Discuss the new Windows Server.
A Commodity Cluster for Lattice QCD Calculations at DESY Andreas Gellrich *, Peter Wegner, Hartmut Wittig DESY CHEP03, 25 March 2003 Category 6: Lattice.
HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825.
Building a Cluster Support Service Implementation of the SCS Program UC Computing Services Conference Gary Jung SCS Project Manager
Chiba City: A Testbed for Scalablity and Development FAST-OS Workshop July 10, 2002 Rémy Evard Mathematics.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
Bill Wrobleski Director, Technology Infrastructure ITS Infrastructure Services.
Research Computing with Newton Gerald Ragghianti Nov. 12, 2010.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
THE AFFORDABLE SUPERCOMPUTER HARRISON CARRANZA APARICIO CARRANZA JOSE REYES ALAMO CUNY – NEW YORK CITY COLLEGE OF TECHNOLOGY ECC Conference 2015 – June.
Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center.
Rocks cluster : a cluster oriented linux distribution or how to install a computer cluster in a day.
Hardware. THE MOVES INSTITUTE Hardware So you want to build a cluster. What do you need to buy? Remember the definition of a beowulf cluster: Commodity.
Operational computing environment at EARS Jure Jerman Meteorological Office Environmental Agency of Slovenia (EARS)
High Performance Computing Cluster OSCAR Team Member Jin Wei, Pengfei Xuan CPSC 424/624 Project ( 2011 Spring ) Instructor Dr. Grossman.
University of Illinois at Urbana-Champaign NCSA Supercluster Administration NT Cluster Group Computing and Communications Division NCSA Avneesh Pant
University of Southampton Clusters: Changing the Face of Campus Computing Kenji Takeda School of Engineering Sciences Ian Hardy Oz Parchment Southampton.
October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.
การติดตั้งและทดสอบการทำคลัสเต อร์เสมือนบน Xen, ROCKS, และไท ยกริด Roll Implementation of Virtualization Clusters based on Xen, ROCKS, and ThaiGrid Roll.
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.
Cluster Workstations. Recently the distinction between parallel and distributed computers has become blurred with the advent of the network of workstations.
Rensselaer Why not change the world? Rensselaer Why not change the world? 1.
A study of introduction of the virtualization technology into operator consoles T.Ohata, M.Ishii / SPring-8 ICALEPCS 2005, October 10-14, 2005 Geneva,
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Laboratório de Instrumentação e Física Experimental de Partículas GRID Activities at LIP Jorge Gomes - (LIP Computer Centre)
1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
Deploying a Network of GNU/Linux Clusters with Rocks / Arto Teräs Slide 1(18) Deploying a Network of GNU/Linux Clusters with Rocks Arto Teräs.
London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney.
Cluster Software Overview
Queensland University of Technology CRICOS No J VMware as implemented by the ITS department, QUT Scott Brewster 7 December 2006.
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
Building and managing production bioclusters Chris Dagdigian BIOSILICO Vol2, No. 5 September 2004 Ankur Dhanik.
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
Ole’ Miss DOSAR Grid Michael D. Joy Institutional Analysis Center.
R. Krempaska, October, 2013 Wir schaffen Wissen – heute für morgen Controls Security at PSI Current Status R. Krempaska, A. Bertrand, C. Higgs, R. Kapeller,
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
Automating Installations by Using the Microsoft Windows 2000 Setup Manager Create setup scripts simply and easily. Create and modify answer files and UDFs.
Background Computer System Architectures Computer System Software.
Running clusters on a Shoestring Fermilab SC 2007.
Creating Grid Resources for Undergraduate Coursework John N. Huffman Brown University Richard Repasky Indiana University Joseph Rinkovsky Indiana University.
Running clusters on a Shoestring US Lattice QCD Fermilab SC 2007.
Enterprise Vitrualization by Ernest de León. Brief Overview.
Create setup scripts simply and easily.
Low-Cost High-Performance Computing Via Consumer GPUs
Cluster / Grid Status Update
Lattice QCD Computing Project Review
Low-Cost High-Performance Computing Via Consumer GPUs
Patrick Dreher Research Scientist & Associate Director
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
NCSA Supercluster Administration
Presentation transcript:

Building a High-performance Computing Cluster Using FreeBSD BSDCon '03 September 10, 2003 Brooks Davis, Michael AuYeung, Gary Green, Craig Lee The Aerospace Corporation El Segundo, CA

HPC Clustering Basics ● HPC Cluster features: – Commodity computers – Networked to enable distributed, parallel computations – Vastly lower cost compared to traditional supercomputers ● Many, but not all HPC applications work well on clusters

Cluster Overview ● Fellowship is the Aerospace Corporate Cluster – Name is short for "The Fellowship of the Ring" ● Running FreeBSD 4.8-STABLE ● Over 183GFlops of floating point performance using the LINPACK benchmark

Cluster Overview Nodes and Servers ● 160 Nodes (320 CPUs) – dual CPU 1U systems with Gigabit Ethernet – 86 Pentium III (7 1GHz, GHz, GHz – 74 Xeon 2.4GHz ● 4 Core Systems – frodo – management server – fellowship – shell server – gamgee – backup, database, monitoring server – legolas – scratch server (2.8TB)

Cluster Overview Network and Remote Access ● Gigabit Ethernet network – Cisco Catalyst 6513 switch – Populated with port 10/100/1000T blades ● Serial console access – Cyclades TS2000 and TS3000 Terminal Servers ● Power control – Baytech RPC4 and RPC14 serial power controllers

Cluster Overview Physical Layout

Design Issues ● Operating System ● Hardware Architecture ● Network Interconnects ● Addressing and Naming ● Node Configuration Management ● Job Scheduling ● System Monitoring

Operating System ● Almost anything can work ● Considerations: – Local experience – Needed applications – Maintenance model – Need to modify OS ● FreeBSD – Diskless support – Cluster architect is a committer – Ease of upgrades – Linux Emulation

Hardware Architecture ● Many choices: – i386, SPARC, Alpha ● Considerations: – Price – Performance – Power/heat – Software support (OS, apps, dev tools) ● Intel PIII/Xeon – Price – OS Support – Power

Network Interconnects ● Many choices – 10/100 Ethernet – Gigabit Ethernet – Myrinet ● Issues – price – OS support – application mix ● Gigabit Ethernet – application mix ● middle ground between tightly and loosely coupled applications – price

Addressing and Naming Schemes ● To subnet or not? ● Public or private IPs? ● Naming conventions – The usual rules apply to core servers – Large cluster probably want more mechanical names for nodes ● 10.5/16 private subnet ● Core servers named after Lord of the Rings characters ● Nodes named and numbed by location – rack 1, node 1: ● r01n01 ●

Node Configuration Management ● Major methods: – individual installs – automated installs – network booting ● Automation is critical ● Network booted nodes – PXE ● Automatic node disk configuration – version in MBR – diskprep script ● Upgrade using copy of root

Job Scheduling ● Options – manual scheduling – batch queuing systems (SGE, OpenPBS, etc.) – custom schedulers ● Sun Grid Engine – Ported to FreeBSD starting with Ron Chen's patches

System Monitoring ● Standard monitoring tools: – Nagios (aka Net Saint) – Big Sister ● Cluster specific tools: – Ganglia – Most schedulers ● Ganglia – port: sysutils/ganglia- monitor-core ● Sun Grid Engine

System Monitoring Ganglia

Lessons Learned ● Hardware attrition can be significant ● Neatness counts in cabling ● System automation is very important – If you do it to a node, automate it ● Much of the HPC community thinks the world is a Linux box

FY 2004 Plans ● Switch upgrades: Sup 720 and 48-port blades ● New racks: another row of racks adding 6 more node racks (192 nodes) ● More nodes: either more Xeons or Opterons ● Upgrade to FreeBSD 5.x

Future Directions ● Determining a node replacement policy ● Clustering on demand ● Schedular improvements ● Grid integration (Globus Toolkit) ● Trusted clusters

Wish List ● Userland: – Database driven, PXE/DHCP server ● Kernel: – Distributed files system support (i.e. GFS) – Checkpoint and restart capability – BProc style distributed process management

Acknowledgements  Aerospace – Michael AuYeung – Brooks Davis – Alan Foonberg – Gary Green – Craig Lee  Vendors – iXsystems – Off My Server – Iron Systems – Raj Chahal ● iXsystems, Iron Systems, ASA Computers

Resources ● Paper and presentation: –