Tests and tools for ENEA GRID Performance test: HPL (High Performance Linpack) Network monitoring A.Funel December 11, 2007
HPL TEST HPL measures the floating point execution rate for solving a sistem of linear equations AX = B HPL requires the availibility of MPI and libraries for linear algebra (BLAS, VSIPL, ATLAS) HPL is scalable: parallel efficiency constant with respect to the processor memory usage
HPL Results (1) A(n n) X = B GFLOPS = [(2/3)n 3 +(3/2)n 2 ]/[th 10 9 ] th = CPU time Th. Peak = # of CORES CPU CLOCK SPEED FPO ISSUE RATE Linux (bw305): 15 CORES, Th. Peak = 72 GFLOPS Test completed AIX (sp4-2): 32 CORES, Th. Peak 96 GFLOPS AIX (sp4-3-4): 32 CORES Th. Peak = 122 GFLOPS Test did not complete!!! LSF SUBMITION
HPL Results (2) Expected CPU time th : # FPO / (# of CORES CPU CLOCK SPEED FPO ISSUE RATE) USED: ATLAS Version 3.6 HIGH USER WAIT TIME (NOT CPU TIME) MAYBE DUE TO THE NETWORK INTERCONNECTS (PUBLIC WHEN THE TEST WAS DONE) Linux (bw305), P Q = 3 5 CORES (LSF SUBMITION) (HPL COMPLETED) n (matrix size) bytes (B) = 8 n 2 % MEMORY (TOTAL = 12 GB) obtained CPU time (sec) expected CPU time th (sec) GFLOPS n 10 3 B 1.2 % n 10 3 B 4.0 % 74.0 n 10 3 B 9.0 %
A PONT-TO-POINT COMMUNICATION TEST USING MPI HPL POINT-TO-POINT COMMUNICATION BETWEEN PROCESSORS IS BASED ON MPI (MPI_Send MPI_Recv) ROUTINES
HPL Results (3) PROBLEMS FOR AIX (LSF SUBMITION) HPL MAKES THE MACHINES HANGING OUT, THE TEST DOES NOT COMPLETE EVEN IF MEMORY USAGE < 10% l ONLY A FEW CPU SECONDS OVER DAYS OF RUNNING TIME!!! UNDER INVESTIGATION AIX (sp4-1), 4 8=32 CORES HPL COMPLETED n (matrix size) bytes (B) = 8 n 2 % MEMORY (TOTAL = 25.6 GB) obtained CPU time (sec) expected CPU time th (sec) GFLOPS n 10 3 B 10 % AIX (sp4-2), 4 8=32 CORES, 20% TOTAL (32 GB) MEMORY HPL NOT COMPLETED INTERACTIVE SUBMITIONS AIX (ostro), 4 4=16 CORES, 20% TOTAL (16 GB) MEMORY HPL NOT COMPLETED
NETWORK MONITORING (coll. G. Guarnieri) A TOOL HAS BEEN PROVIDED IN ORDER TO DETECT WHETHER THE COMMUNICATION SPEED BETWEEN TWO HOSTS (CLIENT AND SERVER) OF THE ENEA GRID CHANGES OVER TIME THE TEST MEASURES THE ROUND TRIP TIME IT TAKES TO SEND A SMALL PACKET (10, 100, 1000 BYTES) OF DATA AND RECEIVE IT BACK SMALL PACKETS: NOT CHOPPED (NO SPURIOUS DELAY EFFECTS), FAST FLUCTUATIONS NOT HIDDEN BY THE FINAL INTEGRATED AVERAGE TIME NEEDED FOR WAITING BIG SIZE PACKETS client server start stop 60 PACKETS SENT IN SEQUENCE EACH SECOND BOTH CLIENT AND SERVER BLOCK UNTIL THE FULL PACKET IS SENT/RECEIVED: NO LOSS OF DATA TCP/IP PROTOCOL
NETWORK MONITORING (2) Client: eurofel00 Server: bw305-2 HIGH SPIKES CLEARLY DETECTED OVERALL COMMUNICATION DELAY Client: kleos Server: feronix0
Conclusions HPL BENCHMARK TEST: Linux (LSF) THE TEST COMPLETES HOWEVER: 1.OBTAINED CPU TIME >> EXPECTED CPU TIME (PEAK) exp < (PEAK) th 2.TOO MUCH (USER) TIME TO COMPLETE AIX (LSF) THE TEST DOES NOT COMPLETE: ONLY A FEW CPU SECONDS OVER DAYS OF RUNNING TIME!!!! AIX (INTERACTIVE SUBMITION): ONLY sp4-1 (32 CORES, 10% TOTAL MEMORY) TESTED TEST COMPLETED BUT STILL (CPU TIME) >> (EXPECTED CPU TIME) USER WAIT TIME 35 minutes NETWORK MONITORING: A TOOL HAS BEEN PROVIDED TO DETECT VARIATIONS IN THE COMMUNICATION SPEED BEWTEEN TWO HOSTS OF THE ENEA GRID USEFUL FOR IMPROVING THE OVERALL NETWORK EFFICIENCY