Recent experience in buying and configuring a cluster John Matrow, Director High Performance Computing Center Wichita State University.

Slides:

Advertisements

Similar presentations

1 Applications Virtualization in VPC Nadya Williams UCSD.

Advertisements

Virtualisation From the Bottom Up From storage to application.

The First Microprocessor By: Mark Tocchet and João Tupinambá.

♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.

Beowulf Supercomputer System Lee, Jung won CS843.

Information Technology Center Introduction to High Performance Computing at KFUPM.

Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.

Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.

SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.

64bit Development Overview March 28 Microsoft. Objectives Learn about the current 64-bit platforms from a hardware, software and tools perspective Review.

Academic and Research Technology (A&RT)

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.

 Contents 1.Introduction about operating system. 2. What is 32 bit and 64 bit operating system. 3. File systems. 4. Minimum requirement for Windows 7.

CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.

Virtual Desktop Infrastructure Solution Stack Cam Merrett – Demonstrator User device Connection Bandwidth Virtualisation Hardware Centralised desktops.

SuperMike: LSU’s TeraScale, Beowulf-class Supercomputer Presented to LASCI 2003 by Joel E. Tohline, former Interim Director Center for Applied Information.

Computer Architecture Part III-A: Memory. A Quote on Memory “With 1 MB RAM, we had a memory capacity which will NEVER be fully utilized” - Bill Gates.

Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.

Building a High-performance Computing Cluster Using FreeBSD BSDCon '03 September 10, 2003 Brooks Davis, Michael AuYeung, Gary Green, Craig Lee The Aerospace.

1 The Virtual Reality Virtualization both inside and outside of the cloud Mike Furgal Director – Managed Database Services BravePoint.

Thrive Installation.

Hardware. THE MOVES INSTITUTE Hardware So you want to build a cluster. What do you need to buy? Remember the definition of a beowulf cluster: Commodity.

ICER User Meeting 3/26/10. Agenda What’s new in iCER (Wolfgang) Whats new in HPCC (Bill) Results of the recent cluster bid Discussion of buy-in (costs,

A Makeshift HPC (Test) Cluster Hardware Selection Our goal was low-cost cycles in a configuration that can be easily expanded using heterogeneous processors.

DIY: Your First VMware Server. Introduction to ESXi, VMWare's free virtualization Operating System.

Introduction to HPC resources for BCB 660 Nirav Merchant

“Motivating young people to be better citizens” Keeping Your Computer “Healthy and Working” Last Updated 29 May 2012 By Athlynne Tyler.

University of Illinois at Urbana-Champaign NCSA Supercluster Administration NT Cluster Group Computing and Communications Division NCSA Avneesh Pant

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

University of Southampton Clusters: Changing the Face of Campus Computing Kenji Takeda School of Engineering Sciences Ian Hardy Oz Parchment Southampton.

Types of Computers Mainframe/Server Two Dual-Core Intel ® Xeon ® Processors 5140 Multi user access Large amount of RAM ( 48GB) and Backing Storage Desktop.

EGO Computing Center site report EGO - Via E. Amaldi S. Stefano a Macerata - Cascina (PI) | Stefano Cortese INFN Computing Workshop –

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Henry Neeman, Director Horst Severini, Associate Director OU Supercomputing.

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.

Presented by: Mostafa Magdi. Contents Introduction. Cloud Computing Definition. Cloud Computing Characteristics. Cloud Computing Key features. Cost Virtualization.

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

Sensitivity of Cluster File System Access to I/O Server Selection A. Apon, P. Wolinski, and G. Amerson University of Arkansas.

CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.

Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.

Sandor Acs 05/07/

Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.

Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,

Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.

Deploying a Network of GNU/Linux Clusters with Rocks / Arto Teräs Slide 1(18) Deploying a Network of GNU/Linux Clusters with Rocks Arto Teräs.

CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)

Rob Allan Daresbury Laboratory NW-GRID Training Event 25 th January 2007 Introduction to NW-GRID R.J. Allan CCLRC Daresbury Laboratory.

CASPUR Site Report Andrei Maslennikov Lead - Systems Amsterdam, May 2003.

Weekly Report By: Devin Trejo Week of June 21, 2015-> June 28, 2015.

Installation of Storage Foundation for Windows High Availability 5.1 SP2 1 Daniel Schnack Principle Technical Support Engineer.

Developing a 64-bit Strategy Craig McMurtry Developer Evangelist, Software Vendors Developer and Platform Evangelism Microsoft Corporation.

ClinicalSoftwareSolutions Patient focused.Business minded. Slide 1 Opus Server Architecture Fritz Feltner Sept 7, 2007 Director, IT and Systems Integration.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

 System Requirements are the prerequisites needed in order for a software or any other resources to execute efficiently.  Most software defines two.

15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK

CIP HPC CIP - HPC HPC = High Performance Computer It’s not a regular computer, it’s bigger, faster, more powerful, and more.

Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.

Running clusters on a Shoestring Fermilab SC 2007.

10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.

Virtual Server Server Self Service Center (S3C) JI July.

Running clusters on a Shoestring US Lattice QCD Fermilab SC 2007.

TYBIS IP-Matrix Virtualized Total Video Surveillance System Edge Technology, World Best Server Virtualization.

Low-Cost High-Performance Computing Via Consumer GPUs

Is 64 bit computing ready for the desktop?

Lattice QCD Computing Project Review

Chapter 1: Introduction

Low-Cost High-Performance Computing Via Consumer GPUs

NCSA Supercluster Administration

Comparison of AMD64, IA-32e extensions and the Itanium architecture

Presentation transcript:

Recent experience in buying and configuring a cluster John Matrow, Director High Performance Computing Center Wichita State University

OR How I spent my summer vacation

2003 New Configuration 32p SGI Altix (1.3GHz Itanium2) 34p Xeon cluster (2.666 Xeon) 100GB RAM Gigabit Ethernet SGI O2000 scheduler (PBSPro) and 1TB NFS file server (user directories)

Primary Applications

Professor X $125K grant for cluster for one application Went for bid with minimal specification Bidding cancelled and bids never opened Preferred state contracts to bids Feb. 2005: First quote of 64p (3.2) for $136K Best Opteron (2.4): 58p for near $100K Best Xeon EM64T (3.6): 92p for near $100K

Intel® Extended Memory 64 TechnologyΦ (Intel® EM64T) address more than 4 GB of both virtual and physical memory 64-bit flat virtual address space 64-bit pointers 64-bit wide general purpose registers 64-bit integer support Up to 1 terabyte (TB) of platform address space

Texas Advanced Computing Center The Intel Xeon memory architecture provides good memory bandwidth to a single processor from all levels of the memory hierarchy. The higher CPU clock speed and slightly better bandwidth from cache to the registers relative to the AMD Opteron provide a performance edge to computationally demanding applications. However, the limitations of the shared bus memory architecture become apparent when executing two memory intensive processes in parallel on a dual-processor node. Because bandwidth is shared, per-processor throughput will decrease for memory bound applications with synchronous memory access needs. The effect of memory contention, and the resulting degradation in performance, is especially bad for random or strided memory access.

Texas Advanced Computing Center The AMD Opteron processor has a slower clock rate than the Intel Xeon, and consequently will not perform as well for compute-bound applications. Because of the memory architecture and (resulting) excellent scalability of the memory subsystem, the AMD Opteron node is best suited for memory intensive applications that are not cache-friendly, and/or have random or stride access patterns.

Bottom Line: SW Vendor “Ran some 1p and 2p jobs (on one node), and indeed the Opteron is 2x faster on 2 processors. Traditionally we have only gotten something like 60-80% out of the second processor on Intel’s. So even if the per cpu speed is close, the Opteron should outperform the Intel by 25% ish (2/1.6). So a 29p Opteron could perform as well as a 36p Xeon.” BUT,......

SW Vendor “A lot of the logic behind Opteron being better than Irwindale is extrapolation from slower/older processors. We've never run on more than 1 Opteron node, but we have run on 2040 Irwindale processors as of last week. The Intel option may not be the best price/performance wise, but I’ll bet it’s close, and very safe.”

Specification 3 year warranty Pull-out KVM 4GB RAM/node Redundancy in head node Installation/Training Console Server GANGLIA

Other costs Disks for NFS Server: $4330 PBSPro license: $50/p CFD Software: $ (#p-24)*$200 Gridgen Software: $6000 Power cables + installation: $1000

Order/Arrival PO faxed April 26 Power cable/receptacles ordered BTU requirements calculated First nodes arrive May 17 Counsel reviews Statement of Work By June 2: 65 boxes Power cable/receptacles arrive June 22. Ready for rack assembly. Conference call July 12 Dell begins assembly July 18

Assembly Side panels missing 20’ Cat 5E cables but GigE likes Cat 6E 48 port switch had year old firmware Backup PS for switch bad No console server Excellent assembler person Responsive vendor

Leftover Management node 48 port switch 90 CAT5 cables 4 Outlet boxes and 61 power cords Pair of rack side panels 45 unopened copies of RHEL 3

Introduction to ROCKS While earlier clustering toolkits expend a great deal of effort (i.e., software) to compare configurations of nodes, Rocks makes complete Operating System (OS) installation on a node the basic management tool. With attention to complete automation of this process, it becomes faster to reinstall all nodes to a known configuration than it is to determine if nodes were out of synchronization in the first place. Unlike a user's desktop, the OS on a cluster node is considered to be soft state that can be changed and/or updated rapidly.

Rocks Cluster Register 525 systems registered cpu’s (average 60 cpu’s each) NCSA system rated at 7488 GFLOPS (1040p) [#47 on TOP500]

Commercial ROCKS Based on V3.3.0 Uses PXE Boot for bootstart Includes GANGLIA, PBS, MPI, Lava, LSF, … Existing PBSPro uses RSH. Missing svr rpm Configured OK for NFS mounts, /temp dir Aug. 16: nodes stuck on anaconda bug Aug. 23: have to delete partitions to reinstall Move 5 KVM cables to sets of 5 nodes Created custom config script

Log Aug. 26: primary app fails Aug. 29: compute node dead Sept. 9: installed two other apps Sept. 14: rsh fails. SSH works Sept. 19: pressing the POWER button to shut down causes node to boot in REINSTALL ROCKS mode Sept 26: all nodes up-to-date

Future Get primary app to work Add other apps Configure PBSPro for group priority Get console server