Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 11
Computer Science and Engineering Copyright by Hesham El-Rewini Contents Dynamic Networks (Cont.) Static Networks (Revisited) Performance Analysis
Computer Science and Engineering Copyright by Hesham El-Rewini Multistage Interconnection Networks ISC1 ISC2 ISCn switches ISC Inter-stage Connection Patterns
Computer Science and Engineering Copyright by Hesham El-Rewini Perfect-Shuffle Routing Function Given x = {a n, a n-1, …, a 2, a 1 } P(x) = {a n-1, …, a 2, a 1, a n } X = P(x) =
Computer Science and Engineering Copyright by Hesham El-Rewini Perfect Shuffle Example 000 111
Computer Science and Engineering Copyright by Hesham El-Rewini Perfect-Shuffle
Computer Science and Engineering Copyright by Hesham El-Rewini Exchange Routing Function Given x = {a n, a n-1, …, a 2, a 1 } E i (x) = {a n, a n-1, …, a i, …, a 2, a 1 } X = E 3 (x) =
Computer Science and Engineering Copyright by Hesham El-Rewini Exchange E 110
Computer Science and Engineering Copyright by Hesham El-Rewini Exchange E
Computer Science and Engineering Copyright by Hesham El-Rewini Butterfly Routing Function Given x = {a n, a n-1, …, a 2, a 1 } B(x) = {a 1, a n-1, …, a 2, a n } X = P(x) =
Computer Science and Engineering Copyright by Hesham El-Rewini Butterfly Example 000 111
Computer Science and Engineering Copyright by Hesham El-Rewini Butterfly
Computer Science and Engineering Copyright by Hesham El-Rewini Multi-stage network
Computer Science and Engineering Copyright by Hesham El-Rewini MIN (cont.) An 8X8 Banyan network
Computer Science and Engineering Copyright by Hesham El-Rewini Min Implementation Control (X) Source (S) Destination (D) X = f(S,D)
Computer Science and Engineering Copyright by Hesham El-Rewini Example X = 0 X = 1 ( crossed ) (straight ) A B C D A B C D
Computer Science and Engineering Copyright by Hesham El-Rewini Consider this MIN S1 S2 S3 S4 S5 S6 S7 S8 D1 D2 D3 D4 D5 D6 D7 D8 stage 1 stage 2 stage 3
Computer Science and Engineering Copyright by Hesham El-Rewini Example (Cont.) Let control variable be X 1, X 2, X 3 Find the values of X 1, X 2, X 3 to connect: S1 D6 S7 D5 S4 D1
Computer Science and Engineering Copyright by Hesham El-Rewini The 3 connections S1 S2 S3 S4 S5 S6 S7 S8 D1 D2 D3 D4 D5 D6 D7 D8 stage 1 stage 2 stage 3
Computer Science and Engineering Copyright by Hesham El-Rewini Boolean Functions X = x 1, x 2, x 3 S = s 2, s 2, s 3 D = d 1, d 2, d 3 Find X = f(S,D)
Computer Science and Engineering Copyright by Hesham El-Rewini Crossbar Switch M1 M2 M3 M4 M5 M6 M7 M8 P1 P2 P3 P4 P5 P6 P7 P8
Computer Science and Engineering Copyright by Hesham El-Rewini Analysis and performance metrics dynamic networks NetworksDelayCostBlockingDegree of FT BusO(N)O(1)Yes0 Multiple-busO(mN)O(m)Yes(m-1) MINO(logN)O(NlogN)Yes0 CrossbarO(1)O(N 2 )No0
Computer Science and Engineering Copyright by Hesham El-Rewini Static Network Analysis (Revisited) Graph Representation Parameters Cost Degree Diameter Fault tolerance
Computer Science and Engineering Copyright by Hesham El-Rewini Graph Review G = (V,E) -- V: nodes, E: edges Directed vs. Undirected Weighted Graphs Path, path length, shortest path Cycles, cyclic vs. acyclic Connectivity: connected, weakly connected, strongly connected, fully connected
Computer Science and Engineering Copyright by Hesham El-Rewini Linear Array N nodes, N-1 edges Node Degree: Diameter: Cost: Fault Tolerance:
Computer Science and Engineering Copyright by Hesham El-Rewini Ring N nodes, N edges Node Degree: Diameter: Cost: Fault Tolerance:
Computer Science and Engineering Copyright by Hesham El-Rewini Chordal Ring N nodes, N edges Node Degree: Diameter: Cost: Fault Tolerance:
Computer Science and Engineering Copyright by Hesham El-Rewini Barrel Shifter Number of nodes N = 2 n Start with a ring Add extra edges from each node to those nodes having power of 2 distance i & j are connected if |j-i| = 2 r, r = 0, 1, 2, …, n-1
Computer Science and Engineering Copyright by Hesham El-Rewini Mesh and Torus Node Degree: Internal 4 Other 3, 2 Diameter: 2(n-1) N = n*n Node Degree: 4 Diameter: 2* floor(n/2)
Computer Science and Engineering Copyright by Hesham El-Rewini Hypercubes N = 2 d d dimensions (d = log N) A cube with d dimensions is made out of 2 cubes of dimension d-1 Symmetric Degree, Diameter, Cost, Fault tolerance Node labeling – number of bits
Computer Science and Engineering Copyright by Hesham El-Rewini Hypercubes d = 0d = 1d = 2d =
Computer Science and Engineering Copyright by Hesham El-Rewini Hypercubes S d = 4
Computer Science and Engineering Copyright by Hesham El-Rewini Hypercube of dimension d N = 2 d d = log n Node degree = d Number of bits to label a node = d Diameter = d Number of edges = n*d/2 Hamming distance! Routing
Computer Science and Engineering Copyright by Hesham El-Rewini Subcubes and Cube Fragmentation What is a subcube? Shared Environment Fragmentation Problem Is it Similar to something you know?
Computer Science and Engineering Copyright by Hesham El-Rewini Cube Connected Cycles (CCC) k-cube 2 k nodes k-CCC from k-cube, replace each vertex of the k cube with a ring of k nodes K-CCC k* 2 k nodes Degree, diameter 3, 2k Try it for 3-cube
Computer Science and Engineering Copyright by Hesham El-Rewini K-ary n-Cube d = cube dimension K = # nodes along each dimension N = k d Wraparound Hupercube binary d-cube Tours k-ary 2-cube
Computer Science and Engineering Copyright by Hesham El-Rewini Grosch’s Law Moore’s Law Von Neumann’s Bottlneck Parallelism Speedup Amdahl’s Law The Gustafson-Barsis Law Benchmarks Performance Evaluation
Computer Science and Engineering Copyright by Hesham El-Rewini Grosch’s Law (1960s) “To sell a computer for twice as much, it must be four times as fast” Vendors skip small speed improvements in favor of waiting for large ones Buyers of expensive machines would wait for a twofold improvement in performance for the same price.
Computer Science and Engineering Copyright by Hesham El-Rewini Moore’s Law Gordon Moore (cofounder of Intel) Processor performance would double every 18 months This prediction has held for several decades Unlikely that single-processor performance continues to increase indefinitely
Computer Science and Engineering Copyright by Hesham El-Rewini Von Neumann’s bottleneck Great mathematician of the 1940s and 1950s Single control unit connecting a memory to a processing unit Instructions and data are fetched one at a time from memory and fed to processing unit Speed is limited by the rate at which instructions and data are transferred from memory to the processing unit.
Computer Science and Engineering Copyright by Hesham El-Rewini Problem Assume that a switching component such as a transistor can switch in zero time. We propose to construct a disk- shaped computer chip with such a component. The only limitation is the time it takes to send electronic signals from one edge of the chip to the other. Make the simplifying assumption that electronic signals travel 300,000 kilometers per second. What must be the diameter of a round chip so that it can switch 10 9 times per second? What would the diameter be if the switching requirements were time per second?
Computer Science and Engineering Copyright by Hesham El-Rewini Parallelism Multiple CPUs Within the CPU One Pipeline Multiple pipelines
Computer Science and Engineering Copyright by Hesham El-Rewini Superscalar Parallelism Scheduling
Computer Science and Engineering Copyright by Hesham El-Rewini Past Trends in Parallel Architecture (inside the box) Completely custom designed components (processors, memory, interconnects, I/O) Longer R&D time (2-3 years) Expensive systems Quickly becoming outdated Bankrupt companies!!
Computer Science and Engineering Copyright by Hesham El-Rewini New Trends in Parallel Architecture (outside the box) Advances in commodity processors and network technology Network of PCs and workstations connected via LAN or WAN forms a Parallel System Network Computing Compete favorably (cost/performance) Utilize unused cycles of systems sitting idle
Computer Science and Engineering Copyright by Hesham El-Rewini Speedup S = Speed(new) / Speed(old) S = Work/time(new) / Work/time(old) S = time(old) / time(new) S = time(before improvement) / time(after improvement)
Computer Science and Engineering Copyright by Hesham El-Rewini Speedup Time (one CPU): T(1) Time (n CPUs): T(n) Speedup: S S = T(1)/T(n)
Computer Science and Engineering Copyright by Hesham El-Rewini Amdahl’s Law The performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used
Computer Science and Engineering Copyright by Hesham El-Rewini 20 hours 200 miles A B Walk 4 miles /hour Bike 10 miles / hour Car-1 50 miles / hour Car miles / hour Car miles /hour must walk Example
Computer Science and Engineering Copyright by Hesham El-Rewini 20 hours 200 miles A B Walk 4 miles /hour = 70 hours S = 1 Bike 10 miles / hour = 40 hours S = 1.8 Car-1 50 miles / hour = 24 hours S = 2.9 Car miles / hour = hours S = 3.2 Car miles /hour = hours S = 3.4 must walk Example
Computer Science and Engineering Copyright by Hesham El-Rewini Amdahl’s Law (1967) : The fraction of the program that is naturally serial (1- ): The fraction of the program that is naturally parallel
Computer Science and Engineering Copyright by Hesham El-Rewini S = T(1)/T(N) T(N) = T(1) + T(1)(1- ) N S = 1 + (1- ) N = N N + (1- )
Computer Science and Engineering Copyright by Hesham El-Rewini Amdahl’s Law
Computer Science and Engineering Copyright by Hesham El-Rewini Gustafson-Barsis Law N & are not independent from each other T(N) = 1 T(1) = + (1- ) N S = N – (N-1) : The fraction of the program that is naturally serial
Computer Science and Engineering Copyright by Hesham El-Rewini Gustafson-Barsis Law
Computer Science and Engineering Copyright by Hesham El-Rewini
Computer Science and Engineering Copyright by Hesham El-Rewini Distributed Computing Performance Single Program Performance Multiple Program Performance
Computer Science and Engineering Copyright by Hesham El-Rewini
Computer Science and Engineering Copyright by Hesham El-Rewini Benchmark Performance Serial Benchmarks Parallel Benchmarks PERFECT Benchmarks NAS Kernel The SLALOM The Golden Bell Prize WebSTONE for the Web Performance Comparisons