High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim Performance Analysis of TCP/IP Data Send/Receive Processing Under UNIX Operating Systems
High Performance Computing & Communication Research Laboratory 12/11/1997 [2] Talk Outline ä Project overview ä Performance analysis of TCP/IP protocol ä Performance analysis of Parallel TCP/IP ä Bottlenecks in processing TCP/IP ä Performance analysis techniques ä Measurement tool and performance metrics ä Empirical results ä Future & on-going works ä Concluding remarks
High Performance Computing & Communication Research Laboratory 12/11/1997 [3] Project Overview ä H/W implementation of TCP/IP protocol ä Handling ATM traffic(155Mbps or higher) ä ATM interfacing ä Specification ä Design of TCP/IP protocol processor ä ATM interfacing ä PCI/AMBA interfacing ä API implementation for TCP/IP H/W ä Joint project with ä Hallym U. & Pusan National U. (major institute) ä Kwangwoon U. & Kyungpook National U.
High Performance Computing & Communication Research Laboratory 12/11/1997 [4] Internet Layering and Peer Model FTP client FTP server TCP IP data link driver data link driver data link protocol IP protocol FTP protocol TCP protocol medium application transport network link
High Performance Computing & Communication Research Laboratory 12/11/1997 [5] Bandwidth delivery by TCP/IP Application ATMFast Ethernet FDDI Bandwidth requirement Bandwidth supply Reasonable bandwidth delivery ? Application ATM Fast Ethernet FDDI TCP/IP
High Performance Computing & Communication Research Laboratory 12/11/1997 [6] Coarse Grain Architecture of Parallel TCP/IP
High Performance Computing & Communication Research Laboratory 12/11/1997 [7] Wnd. Sizing Wnd. Sizing Urgent request Urgent request Segment re-assembly Segment re-assembly TCP Error Check TCP Error Check TCP checksum TCP checksum Queue Flag test Flag test Security check Security check connection name check connection name check ACK check ACK check Status check Status check Wnd. check Wnd. check Application TCP Control Info. TCP Conn. Info. IP Layer Parallel Architecture of TCP Data Receiver
High Performance Computing & Communication Research Laboratory 12/11/1997 [8] Performance of TCP data receiver Performance of TCPReceive
High Performance Computing & Communication Research Laboratory 12/11/1997 [9] Performance of Parallel TCP/IP Estimated speed-up against sequential execution
High Performance Computing & Communication Research Laboratory 12/11/1997 [10] Bottlenecks in TCP/IP Processing ä data copies ä between user space and kernel space ä between kernel space and network device ä checksum calculation ä memory/timer management ä interaction between protocol and OS ä NOT the protocol itself
High Performance Computing & Communication Research Laboratory 12/11/1997 [11] Performance Measurement (I) ä S/W based measurement ä unacceptable perturbation due to interrupt handling or memory swapping ä H/W based measurement ä specially designed H/W or logic analyzer ä limited flexibility ä data acquisition only on execution time ä ex) MultiKron chip(project) by NIST ä Probabilistic Analysis : Queueing Theory
High Performance Computing & Communication Research Laboratory 12/11/1997 [12] Performance Measurement (II) ä Our measurement ä using counters in Intel Pentium processor ä time resolution is the same as system clock tick ä 166MHz -> 6ns ä 200MHz -> 5ns ä provides additional information ä memory access counts (memory bandwidth) ä number of H/W interrupts ä mis-aligned data memory references ä branches
High Performance Computing & Communication Research Laboratory 12/11/1997 [13] Performance measurement setup - sender’s part - Communicating partySystem under measurement connection write disconnect user process TCP IP data link socket (4)(5) (3)(6) (2)(7) (1) Isolated 10BaseT Ethernet Legends: (1) memory allocation and data copy (2) TCP processing (3) IP processing (4) data send to media (5) ACK arrives at datalink layer (6) ACK processing at IP (7) ACK processing at TCP socket initialization
High Performance Computing & Communication Research Laboratory 12/11/1997 [14] Performance Measurement Setup - receiver’s part - Communicating partySystem under measurement socket() bind() listen() accept() read() disconnect() user process TCP IP data link socket (1)(7) (2)(6) (3)(5) (4) Isolated 10BaseT Ethernet Legends: (1) Frame arrives at data link layer (2) IP processing (3) TCP processing (4) data copy from kernel space to user space (5) ACK construction at TCP (6) IP processing (7) data send to media
High Performance Computing & Communication Research Laboratory 12/11/1997 [15] Empirical Result (I) Cycle counts in TCP/IP send processing
High Performance Computing & Communication Research Laboratory 12/11/1997 [16] Empirical Result (II) Dynamic instruction counts in TCP/IP send processing
High Performance Computing & Communication Research Laboratory 12/11/1997 [17] Empirical Result (III) Memory access counts in TCP/IP send processing
High Performance Computing & Communication Research Laboratory 12/11/1997 [18] Empirical Result (IV) Cycle counts in TCP/IP receive processing
High Performance Computing & Communication Research Laboratory 12/11/1997 [19] Empirical Result (V) Dynamic Instruction counts in TCP/IP receive processing
High Performance Computing & Communication Research Laboratory 12/11/1997 [20] Empirical Result (VI) Memory access counts in TCP/IP receive processing
High Performance Computing & Communication Research Laboratory 12/11/1997 [21] Memory Bandwidth Requirement (I) ä “By matching the memory to the special needs of packet processing, ä one could achieve high performance at an acceptable cost”, by V. Jacobson.
High Performance Computing & Communication Research Laboratory 12/11/1997 [22] Memory Bandwidth Requirement (II) ä Then, how many memory accesses occur ? ä we measured it
High Performance Computing & Communication Research Laboratory 12/11/1997 [23] Pure TCP/IP performance * Calculation on 1440 bytes packet ä not considering data link latency ä considering data send/receive and ACK segment send/receive time only in TCP/IP layer
High Performance Computing & Communication Research Laboratory 12/11/1997 [24] From Empirical Results ä To enhance the performance of TCP/IP ä design of efficient interface between protocol stack and OS is required ä And How?
High Performance Computing & Communication Research Laboratory 12/11/1997 [25] Future & On-going Works ä Feasibility Study of ATM internetworking ä Analysis of ä ALL5 traffic ä signaling protocols ä commercial SAR chips & bus interfaces ä Internetworking technology ä LANE, IP over ATM, Multiprotocol over ATM ä Next Hop Resolution Protocol, etc. ä Development of TCP/IP H/W module ä now, Ethernet-based implementation
High Performance Computing & Communication Research Laboratory 12/11/1997 [26] Overview of TCP/IP H/W Implementation TCP timer module Checksum module Memory management Unit ARM Target System ä ARM7TDMI RISC processor ä AMBA expansion connectors ä FPGA implementation
High Performance Computing & Communication Research Laboratory 12/11/1997 [27] Duration Register ID Manager Lookup Table State Scheduler Expired & Reference Time Generator Timing Scheduler CAM Timer Record Memory Stack Manager Zero Detect Timer Management Module
High Performance Computing & Communication Research Laboratory 12/11/1997 [28] Conclusion ä OS overheads play major role in high performance TCP/IP processing ä Measurement of memory access counts ä estimation of memory bandwidth requirement ä H/W implementation is needed for time- consuming modules