University of Mannheim1 ATOLL ATOmic Low Latency – A high-perfomance, low cost SAN Patrick R. Haspel Computer Architecture Group University of Mannheim, Germany
University of Mannheim2 Cluster Computing Cluster Computing evolves as a new way of High Performance Computing as result of its superior price/performance ratio the key to Cluster Computing is a SAN delivering the communication performance normally found in Supercomputers several SANs have been developed in the last years: ServerNet Memory Channel QsNet SCI
University of Mannheim3 ATOLL Basic Architecture ATOLL-Chip 4,5 Mio transistors 0.18µm CMOS process 5,7 x 5,7 mm Chip Fastest and Second Biggest Design of a European University
University of Mannheim4 Optimization for Performance and Cost
University of Mannheim5 ATOLL Latency ONLY 27 clock cycles (~100 ns) latency per hop. Test system: P (Serverworks) PCI 66/64bit Measured HW Latency * sampling granularity of PCI Bus of ~500ns *
University of Mannheim6 ATOLL Performance DMA-Mode Test Test system: P (Serverworks) PCI 66/64bit SW send SW send SW receive SW receive 4µs3,8µs 1,2µs Not fully optimized yet 533MB/s write burst rate 137MB/s read burst rate (bridge problem w. stop) 240 Byte Message Sum 9µs
University of Mannheim7 ATOLL Performance A module has been developed in collaboration with the University of Mannheim to evaluate their ATOLL network cards. This experimental hardware delivers the best performance for messages smaller than 10 kB, and matches the 2 Gbps throughput seen with many proprietary solutions like SCI and Myrinet.
University of Mannheim8 ATOLL-Software User Application MPIPVMTCP/IP Kernel Driver ATOLL HW ATOLL API ATOLL daemon Controls Network Startup (clock distribution, routing) Supervises NIC at runtime Provides routing information Open Source SW
University of Mannheim9 Future Development Future of ATOLL Hardware-Development EXTOLL MHz clock higher dimensional Crossbar for multidimensional IN structures multithreaded cached host interface memory management support command extension for direct memory operations (put, get, …) => MPI-2
University of Mannheim10 Chip Photo
University of Mannheim11 Chip Photo
University of Mannheim12 ATOLL Board
University of Mannheim13 Interconnect