Presented by: Quinn Gaumer CPS 221
16,384 Processing Nodes (32 MHz) 30 m x 30 m Teraflop 1992
With 16,384 processors the interconnect plays a large role 3 Types of Networks ◦ Data ◦ Control ◦ Diagnostic
Easily Attainable High Performance Scaling Data Parallel Programming High Reliability and Availability Space/Time Shared Fast Time to Market Modular
Include Control Processor Processing Nodes Slices of Data and Control Networks ◦ Privileged vs. Non-Privileged Program Isolation Time Sharing
Provide Simple View of Network to Processors Sharing and Fault Tolerance Decouple Network/Processor by Providing Contract ◦ Software -> ISA -> Hardware
“The data network promises to eventually accept and deliver all messages injected into the network by the processors as long as the processors promise to eventually eject all messages from the network when they are delivered to the processors. ”
Collection of Memory Mapped FIFOs ◦ Outgoing/Ingoing Restricted Operations ◦ Implemented with protected pages Physical/Relative(Virtual) Address ◦ Programs use only relative addresses Network Independent of User ◦ Delivery guaranteed by network not processing node ◦ Requires network diagnostics
Fat Tree Structure ◦ Closer to the root, thicker the tree ◦ Ensures no bottlenecks at root User Partitions and I/O are Sub-trees ◦ Guarantees network independence ◦ Messages in partition stay within partition Many Optimal Node to Node Paths ◦ Choose randomly among open links
Data can be only 1-5 Words Wormhole Routing CRC Checking done at every Link ◦ Additional !CRC sent when error first found Primary Errors allow Diagnostic Network to Determine location
Message Counters at every Link Kirchoff’s Law to Determine Missing Messages What to do with a Bad Chip or Link? ◦ Route Messages Away from Failure ◦ Map Out Nearby Processors ◦ Which is better? Both.
Solution: Virtual Channels ◦ One channel for request and response ◦ 4 channels per chip (Incoming and Outgoing) Deadlock still possible! ◦ User sends but never attempts to receive messages ◦ Higher level languages to implement communication protocol
Objectives ◦ Clear all messages for new user ◦ Allow all messages in transit to eventually finish “All Fall Down” Method ◦ Evenly misroute all messages in transit to nodes ◦ Message saved at node ◦ Resent when swapped in
Control Processor broadcasts program ◦ Not instructions(SIMD) Each Processor runs program on data set Inter-Processor Communication ◦ Hardware Barriers allow for processes to communicate without shared semaphores
Program smaller than instructions ◦ Easier to deliver Local fetch allows commodity processors ◦ Fast new RISC processors, less R & D. Control system useful for other problems Execution of generic MIMD code ◦ Message passing
Broadcasting ◦ User/Supervisor ◦ Interrupt ◦ Utility Combining ◦ Reduction ◦ Forward/Backward Scan ◦ Router Done Global Operations ◦ Synchronous/Asynchronous OR
Binary Tree Four Types of Packets ◦ Single Source : Broadcasting ◦ Multiple Source: Combining ◦ Idle: Filler ◦ Abstain: Allow control node to skip waiting Collisions on Network ◦ Multiple/Multiple: Buffering based on arrival time ◦ Multiple/Single: Single Source Packets Prioritized ◦ Single/Single: Error
Control Processor for each Partition ◦ Executes scalar code while processing nodes execute parallel code Connect any Control Processor to any Partition ◦ Problems can occur in control networks too ◦ Diagnostics may show part of control network must be mapped out
Binary Network ◦ Pods(physical subsystem) are leaves JTAG ◦ Designed for Multichip…but serial Do JTAG for each Pod Combine Responses with OR/AND