Reduced Communication Protocol for Clusters Clunix Inc. Donghyun Kim
Clunix Inc. Introduction Communication Sub-system Performance is decided by followings Transmission speed of physical network I/O handling capability Overheads of the communication protocol Communication using traditional protocols is the bottle-neck of parallel systems Myrinet with TCP/IP is not FAST. Small-granularity or communication-dense apps show poor performance
Clunix Inc. Introduction – cont’d A high proportion of apps don’t need very complicated communication functions By practice and theoretic analysis
Clunix Inc. Overheads analysis of traditional protocols Traditional protocols overheads Time of context switching Time of data copying User space – system space, adjacent protocol layers Time of data partitioning, re-constructing, data analyzing Time of transmitting packet headers Time of routing, connection maintaining, traffic controlling, error detecting, recovering, buffer management
Clunix Inc. Overheads analysis of traditional protocols - cont’d End-to-end latency L, bandwidth W modeling Assumptions : homogeneous, low network traffic T(n) : n-bytes transmission time n max : comm. subsystem max packet length m : # of protocol layers T i (n) : i-th protocol layer processing time (T 0 (n) : physical network transmission time)
Clunix Inc. Overheads analysis of traditional protocols - cont’d : context switching time : memory bandwidth 0 : physical network transmission bandwidth i : max packet length of i-th layer I : packet header length of i-th layer n i : data length of i-th layer i : calling expense (routing,traffic control, error detecting, buffer management, connection maintaining)
Clunix Inc. Overheads analysis of traditional protocols - cont’d Analytical & testing results Testing conclusions Very large overhead using above IP protocol layer Memory-to-memory copying is not neglected If transmission bandwidth is the same as memory bandwidth, data copying(n i+1 / ) problem is bigger ProtocolAnalyticalTesting Layer L( s) W(Mbps) L( s) W(Mbps) TCP UDP DLPI
Clunix Inc. Design Strategies for RPC Support reliable, synchronous, asynchronous communications Implement reliale broadcast and multicast basing directly on the physical layer Lay the protocol below the IP layer Above physical or datalink layer Avoid data copying AFAP If possible, avoid buffer management using hardware buffering Run the protocol entirely in the user space In the form of libraries
Clunix Inc. Implementation of RCP OSI-DLPI version Standard physical-device independent data link layer interface Can write uniform program on different machines and network devices Myrinet version Providing user interface like the TCP-socket
Clunix Inc. Implementation of RCP – cont’d RCP supports unicast, broadcast, multicast RCP addressing Unique source/destination using hostname+port# Static address configuration Supports heterogeneous machines No connection maintaining, error detecting Assuming that underlying network is reliable
Clunix Inc. Implementation of RCP – cont’d Sequencing control, traffic control Sliding-window algorithm+selective retransmission Windows size is adjusted accoring to retransmission frequency Fast-Adapt and Slow-Recover algorithm Very efficient traffic control Data partitioning and packaging algorithm Almost no data-copy, work in user-space
Clunix Inc. RCP Tesing results Bandwidth(W)Lantency(L)
Clunix Inc. Conclusions and future issues RCP design considerations How to reduce the overheads Over-complicated protocol processing Context switching Overhead of data copying How to use the transmission control functions supported by hardware To reduce the protocol processing Future Work To gurantee the quality of the communication.