Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

Resources needed for applications arising from Nanotechnology  Large memory – Tbytes  High floating point computing speed – Tflops  High data throughput – state of the art …

SMP architecture P P PP Memory

Cluster architecture Interconnection network

Why not a cluster  Single SMP system easier to purchase/maintain  Ease of programming in SMP systems

Why a cluster  Scalability  Total available physical RAM  Reduced cost  But …

Having an application which exploits the parallel capabilities Studying the application or applications which will run on the cluster

Things to include in design Property of code Essential component CPU bound Fast computing unit Memory bound Large memory, fast access Global flow of data in parallel app Fast interconnect

Our choices Property of code Essential component Choice Computationn ally intensive,FP Fast computing unit 64 bit dual core,Opteron, Rev.F Large matrices Large memory, fast access 8 GB /node Finite element, spectral codes, Fast interconnect Infiniband DDR (20 Gb/s,low latency)

Other requirements  Space, power,cooling constraints, strength of floors  Software configuration: 1. Operating system 2. Compilers & application deve. tools 3. Load balancing and job scheduling 4. System management tools

Configuration PPPP PP MM M Infiniband Switch

Before finalizing our choice … One should check, on a similar system :  Single processor peak performance  Infiniband interconnect performance  SMP behaviour  Non commercial parallel applications behaviour

Parallel applications issues  Execution time  Parallel speedup Sp= T1/Tp  Scalability

Benchmark design  Must give a good estimate of performance of your application  Acceptance test -should match all its components

Comparison of performance NancoCarmelComputer 3826.4 Mflops Ratio of 7.8 !! 487 Mflops Lapack program, N=9000

Execution time of Monte-Carlo parallel code (MPI) Nanco Carmel1 Processes) 4389 (~1 hr) 22042 (~6hrs !) 1 1739122462 1154.848094 642.1235408 282.516

What did work  Running MPI code interactively  Running a serial job through the queue  Compiling C code with MPI

What did not work  Compiling F90 or C++ code with MPI  Running MPI code through the queue  Queues do not do accounting per CPU

Parallel performance results Theoretical peak 2.1 Tflops Nanco performance on HPL: 0.58 Tflops

Comparison with Sun Benchmark

Execution time –comparison of compilers

Performance with different optimizations

Conclusions from acceptance tests  New gcc (gcc4) is faster than Pathscale for some applications  MPI collective communication functions are differently implemented in various MPI versions  Disk access times are crucial - use attached storage when possible

Scheduling decisions  Assessing priorities between user groups  Assessing parallel efficiency of different job types (MPI,serial,OPenMP) /commercial software and designing special queues for them  Avoiding starvation by giving weight to the urgency parameter

Observations during production mode  Assessing user ’ s understanding of machine – support in writing scripts and efficient parallelization  Lack of visualization tools – writing of script to show current usage of cluster

Utilization of cluster

Utilization of nanco sep08

Nanco jobs by type

Conclusion  Benchmark correct design is crucial to test capabilities of proposed architecture  Acceptance tests allow to negotiate with vendors and give insights on future choices  Only after several weeks and running of the cluster at full capacity can we make informed decisions on management of the cluster

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.

Similar presentations

Presentation on theme: "Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.

Similar presentations

Presentation on theme: "Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008."— Presentation transcript:

Similar presentations

About project

Feedback