Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection networks
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Symmetric multiprocessor (SMP) MemoryMemory... CPU Memory Controller CPU I/O Hub/Bridge Key Board Mouse Monitor BIOS EtherNet Power Supply Cooling Fan One address space, uniform access time
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Coherency requirements 1.Memory operations occur in the order they were issued 2.All reads return the most current value
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering MESI protocol Controller Observes/Action Bus/snoop generatedProcessor generated
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering How can a SMP go faster? More cpus Faster cpus Bigger cache Optimize OS Better scheduling
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Speedup vs. processors Number of processors Speedup Ideal Actual
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering What limits SMP speedup? Competing for resources: memory Bus performance Sequential program
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Distributed system Goal –Connect processors (nodes) together to create fast computer –Each node works on part of the problem Node –SMP processor, cache coherent –No global or shared memory Interconnect: network for node-to-node communication
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Distributed computer No global or shared memory Interconnect P M... P M P M
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Distributed application Identify parallelism: often the same algorithm for each node Map data set to node ID Determine communication requirements –Synchronization –Data exchange of intermediate results –No shared data so must use message passing paradigms
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Parallel programming Four key steps: done by programmer and OS
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering How do you communicate fast? Auctioneer Direct bus between nodes Guess what response prior to receiving
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Message passing SendRecv Time Reply Blocked Latency Request Message Reply Message P1P1 P2P2
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Message latency O s = overhead to inject message L = time to traverse network O r = overhead to extract message
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Interconnection network Interconnect critical to communication Interconnect P M... P M P M
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Interconnect goals? Low latency High bandwidth Direct route Low cost, low power Reliable
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Interconnect properties Routing distance: number of links on a route Diameter: maximum routing distance Average distance: average number of links on a route
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Interconnect performance Latency (ms): overhead, routing delay, capacity, contention Bandwidth (B/s) –Total: sum of all links –Bisection: sum of links that cut the network in half
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Interconnect topologies Linear array and rings Meshes and tori Trees Hypercubes
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Linear array and rings Linear array Ring Ring with short links
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Find the following? 1.Diameter: n/2 2.Average distance: n/4 3.Bisection BW: 2 Ring, n nodes
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Mesh and torus 2-D mesh 2-D torus Both fairly easy to implement
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Example mesh Paragon: 1824 nodes, 16 x 114 mesh
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Find the following? Square mesh, n nodes 1.Diameter: 2(n 1/2 -1) 2.Average distance: (n 1/2 -1) 3.Bisection BW: n 1/2
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Multi-dimensional meshes 6 x 3 x 2 n = 36 3 x 3 x 3 x 3, n = 81 May require long wires between nodes
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering How can a SMP go faster? Faster clock (smaller feature size, ) More processors, 64-bit Larger, faster shared memory Larger, faster local cache memory Wider, faster, shorter shared bus
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering What limits SMP speedup? Physics –IC technology (speed) –Time of flight across bus Cost –Power –Memory –Disks Key limiter
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering How do you communicate fast? Low latency between nodes High bandwidth between nodes Low OS overhead
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Interconnect goals? Latency as small as possible As many concurrent transfers as possible –operation bandwidth –data bandwidth Cost as low as possible
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Find the following? 1.Diameter: n/2 2.Average distance: n/4 3.Bisection BW: 2 Ring, n nodes
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Find the following? 1.Diameter: 2 (n 1/2 -1) 2.Average distance: n 1/ Bisection BW: n 1/2 Square mesh, n nodes