Download presentation
Presentation is loading. Please wait.
Published byAntony Boyd Modified over 9 years ago
1
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008
2
Resources needed for applications arising from Nanotechnology Large memory – Tbytes High floating point computing speed – Tflops High data throughput – state of the art …
3
SMP architecture P P PP Memory
4
Cluster architecture Interconnection network
5
Why not a cluster Single SMP system easier to purchase/maintain Ease of programming in SMP systems
6
Why a cluster Scalability Total available physical RAM Reduced cost But …
7
Having an application which exploits the parallel capabilities Studying the application or applications which will run on the cluster
8
Things to include in design Property of code Essential component CPU bound Fast computing unit Memory bound Large memory, fast access Global flow of data in parallel app Fast interconnect
9
Our choices Property of code Essential component Choice Computationn ally intensive,FP Fast computing unit 64 bit dual core,Opteron, Rev.F Large matrices Large memory, fast access 8 GB /node Finite element, spectral codes, Fast interconnect Infiniband DDR (20 Gb/s,low latency)
10
Other requirements Space, power,cooling constraints, strength of floors Software configuration: 1. Operating system 2. Compilers & application deve. tools 3. Load balancing and job scheduling 4. System management tools
11
Configuration PPPP PP MM M Infiniband Switch
12
Before finalizing our choice … One should check, on a similar system : Single processor peak performance Infiniband interconnect performance SMP behaviour Non commercial parallel applications behaviour
13
Parallel applications issues Execution time Parallel speedup Sp= T1/Tp Scalability
14
Benchmark design Must give a good estimate of performance of your application Acceptance test -should match all its components
15
Comparison of performance NancoCarmelComputer 3826.4 Mflops Ratio of 7.8 !! 487 Mflops Lapack program, N=9000
16
Execution time of Monte-Carlo parallel code (MPI) Nanco Carmel1 Processes) 4389 (~1 hr) 22042 (~6hrs !) 1 1739122462 1154.848094 642.1235408 282.516
18
What did work Running MPI code interactively Running a serial job through the queue Compiling C code with MPI
19
What did not work Compiling F90 or C++ code with MPI Running MPI code through the queue Queues do not do accounting per CPU
20
Parallel performance results Theoretical peak 2.1 Tflops Nanco performance on HPL: 0.58 Tflops
21
Comparison with Sun Benchmark
22
Execution time –comparison of compilers
24
Performance with different optimizations
25
Conclusions from acceptance tests New gcc (gcc4) is faster than Pathscale for some applications MPI collective communication functions are differently implemented in various MPI versions Disk access times are crucial - use attached storage when possible
26
Scheduling decisions Assessing priorities between user groups Assessing parallel efficiency of different job types (MPI,serial,OPenMP) /commercial software and designing special queues for them Avoiding starvation by giving weight to the urgency parameter
27
Observations during production mode Assessing user ’ s understanding of machine – support in writing scripts and efficient parallelization Lack of visualization tools – writing of script to show current usage of cluster
28
Utilization of cluster
29
Utilization of nanco sep08
30
Nanco jobs by type
31
Conclusion Benchmark correct design is crucial to test capabilities of proposed architecture Acceptance tests allow to negotiate with vendors and give insights on future choices Only after several weeks and running of the cluster at full capacity can we make informed decisions on management of the cluster
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.