Stallo: First impressions

Slides:



Advertisements
Similar presentations
Computing Infrastructure
Advertisements

Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.
IBM 1350 Cluster Expansion Doug Johnson Senior Systems Developer.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads Matthew Cawood University of Cape.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO SDSC RP Update October 21, 2010.
Scale-out Central Store. Conventional Storage Verses Scale Out Clustered Storage Conventional Storage Scale Out Clustered Storage Faster……………………………………………….
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
An Introduction to Princeton’s New Computing Resources: IBM Blue Gene, SGI Altix, and Dell Beowulf Cluster PICASso Mini-Course October 18, 2006 Curt Hillegas.
New Cluster for Heidelberg TRD(?) group. New Cluster OS : Scientific Linux 3.06 (except for alice-n5) Batch processing system : pbs (any advantage rather.
CSC Site Update HP Nordic TIG April 2008 Janne Ignatius Marko Myllynen Dan Still.
Earth Simulator Jari Halla-aho Pekka Keränen. Architecture MIMD type distributed memory 640 Nodes, 8 vector processors each. 16GB shared memory per node.
Research Computing with Newton Gerald Ragghianti Nov. 12, 2010.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
Aim High…Fly, Fight, Win NWP Transition from AIX to Linux Lessons Learned Dan Sedlacek AFWA Chief Engineer AFWA A5/8 14 MAR 2011.
SALSASALSASALSASALSA Performance Analysis of High Performance Parallel Applications on Virtualized Resources Jaliya Ekanayake and Geoffrey Fox Indiana.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Administration and management of Windows-based clusters Windows HPC Server 2008 Matej Ciesko HPC Consultant, PM
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
Rocks cluster : a cluster oriented linux distribution or how to install a computer cluster in a day.
HPC at IISER Pune Neet Deo System Administrator
Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Big Red II & Supporting Infrastructure Craig A. Stewart, Matthew R. Link, David Y Hancock Presented at IUPUI Faculty Council Information Technology Subcommittee.
The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services.
David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\
Cluster Workstations. Recently the distinction between parallel and distributed computers has become blurred with the advent of the network of workstations.
HPC system for Meteorological research at HUS Meeting the challenges Nguyen Trung Kien Hanoi University of Science Melbourne, December 11 th, 2012 High.
IM&T Vacation Program Benjamin Meyer Virtualisation and Hyper-Threading in Scientific Computing.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
The LBNL Perceus Cluster Infrastructure Next Generation Cluster Provisioning and Management October 10, 2007 Internet2 Fall Conference Gary Jung, SCS Project.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Atlas Software Structure Complicated system maintained at CERN – Framework for Monte Carlo and real data (Athena) MC data generation, simulation and reconstruction.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
Getting Started: XSEDE Comet Shahzeb Siddiqui - Software Systems Engineer Office: 222A Computer Building Institute of CyberScience May.
Introduction to Exadata X5 and X6 New Features
HELMHOLTZ INSTITUT MAINZ Dalibor Djukanovic Helmholtz-Institut Mainz PANDA Collaboration Meeting GSI, Darmstadt.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Testing the Zambeel Aztera Chris Brew FermilabCD/CSS/SCS Caveat: This is very much a work in progress. The results presented are from jobs run in the last.
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
Compute and Storage For the Farm at Jlab
HPC Roadshow Overview of HPC systems and software available within the LinkSCEEM project.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
HPC usage and software packages
Achieving the Ultimate Efficiency for Seismic Analysis
Belle II Physics Analysis Center at TIFR
Working With Azure Batch AI
Cluster / Grid Status Update
Scaling Spark on HPC Systems
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME Outreach SESAME,
An Introduction to Cloud Computing
Appro Xtreme-X Supercomputers
Experience of Lustre at a Tier-2 site
Infrastructure for testing accelerators and new
Is System X for Me? Cal Ribbens Computer Science Department
Integration of Singularity With Makeflow
System G And CHECS Cal Ribbens
OffLine Physics Computing
Grid Canada Testbed using HEP applications
Footer.
Overview of HPC systems and software available within
SiCortex Update IDC HPC User Forum
CS 345A Data Mining MapReduce This presentation has been altered.
HPC Operating Committee Spring 2019 Meeting
Presentation transcript:

Stallo: First impressions Roy Dragseth Team Leader, HPC The Computer Center roy.dragseth@cc.uit.no 02.08.2018

University of Tromsø Northernmost university in the world. Staff: 2000 Students: 6000 Tromsø 02.08.2018

Background Uit installed a new HPC system in late 2007. Extremely tight time schedule. Also established a new machine room for the new system. 02.08.2018

Timeplan 30. oct. Machine room ready 2. nov. HW installation starts. 10. nov. HW installation done. 1. des. First users. 1. jan. Full production. 02.08.2018

System config 704 HP BL460c blades HP SFS (lustre) storage 5632 CPUcores 12 TB memory 384 nodes with ib. HP SFS (lustre) storage 66 SFS20 arrays 18 DL380 servers 128 TB net storage 02.08.2018

Measured performance. Theoretical peak 59.9TF/s HPL linpack: 15TF/s (83. on Top500) (later: 17TF/s with IB) Iozone read/write: 9.5/6.5 GB/s (64 clients, 32GB files each) MPI Latency: 1.3/2.1 μs ping-pong MPI Bandwidth: 1300MB/s one-way 02.08.2018

Usage profile so far The system is designed for througput so the typical job is using 32-256 cores per run. 02.08.2018

interconnect usage The infiniband nodes (c1-c24) seems to be in higher demand than the ethernet ones 02.08.2018

System software OS: Rocks cluster distribution (CentOS) Storage: HP SFS (Lustre) Batch: Torque/maui Compilers: INTEL and GCC MPI: OpenMPI/OFED 02.08.2018

Pleasant suprises Rocks scales to this level rather effortless. This is also true for the batch system. The blade management CLI works really well. The MCS cooling systems makes it bearable to work in the machine room. User feedback has been overwhelmingly positive! 02.08.2018

Unpleasant suprises Single disk errors take down the global filesystem (HP SFS/Lustre) OpenMPI/OFED has some quirks and sometimes needs tuning on a per app. basis. If we loose cooling the machine room will overheat even if we turn off all systems! 02.08.2018

Whishes Fix the SFS20 firmware!!! Publish all SNMP MIBs. Publish all freely available rpms in a searchable manner (at least, do not hide them in isos). 02.08.2018

Questions? 02.08.2018