Speaker : Che-Wei Chang Optimizing Protocol Parameters to Large Scale PC Cluster and Evaluation of its Effectiveness with Parallel Data Mining Speaker : Che-Wei Chang
Outline Introduction Features Communication performance Optimization of transport layer protocol parameters Parallel data mining application Conclusion
Introduction(1/2) ATM ( Asynchronous Transfer Mode ) technology is one of the strong candidates as a de facto standard of high speed communication networks. Several features of an ATM connected PC cluster consists of 100 PCs are examined.
Introduction(2/2) Characteristics of a transport layer protocol for the interconnection network are discussed. The proposed method is evaluated using a parallel data mining application, and considerably good performance scale-up is achieved up to 100 PCs.
Features(1/3) We have constructed a large scale PC cluster. ATM is used as a communication network in our cluster. We used data intensive applications for the evaluation of the PC cluster.
Features(2/3) Figure 1. PC cluster pilot system Table 1. Each node of PC cluster
Features(3/3) Figure 2. An overview of the PC cluster
Communication performance(1/3) Use TCP protocol in Solaris OS, whose parameters can be changed with user-level command. In this experiment, point-to-point throughput is measured when a maximum TCP window size is changed on the cluster. Two suites of experiments are performed: MSS(Maximum Segment Size) is set to be 8192 bytes and 1024 bytes, respectively.
Communication performance(2/3) throughput is not a smooth curve Figure 3. Point-to-point throughput
Communication performance(3/3) TCP header’s window size bits limitation Figure 4. Number of transmission packet (MSS=8192[bytes]) Figure 5. Number of transmission packet (MSS=1024[bytes])
Optimization of transport layer protocol parameters(1/5) Even if the amount of broadcast data is not large, a lot of collisions happen in a large scale ATM connected PC cluster, if timing of the broadcasting is the same at all nodes. A network becomes heavily congested, cells are discarded at the ATM switch and TCP retransmission should happen as a result.
Optimization of transport layer protocol parameters(2/5) Several experiments are executed on 100 nodes of the PC cluster, in order to investigate retransmission characteristics. maximum interval of TCP retransmission MAX =60000[msec] Minimum interval of TCP retransmission MIN =200[msec]
Optimization of transport layer protocol parameters(3/5) It takes an exponential back-off value. 24 sec 12 sec 6 sec Figure 6. Execution time of the broadcasting program (MIN = 200[msec])
Optimization of transport layer protocol parameters(4/5) Figure 7. Execution time of the broadcasting program (MAX = MIN + 100[msec])
Optimization of transport layer protocol parameters(5/5) Figure 8. Amount of retransmission (MAX =MIN + 100 [msec]) Figure 9. Amount of retransmission (MAX =MIN + 100 [msec])
Parallel data mining application(1/2) Figure 10. Execution time of HPA program.
Parallel data mining application(2/2) Figure 11. Speedup ratio of HPA program.
Conclusion Retransmission caused by cell loss at the ATM switch was analyzed. Default TCP protocol could not provide good performance, since a lot of collisions occur in all-to-all broadcasting executed. Using a TCP parameters according to the proposed optimization, sufficient performance improvement has been achieved with parallel data mining on 100 PCs.
Thanks For Your Attention.