Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6.

Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering Contents Interconnection Networks (cont.)  Static (cont.)  Dynamic Performance Evaluations  Grosch’s Law  Moore’s Law  Von Neumann’s Bottlneck  Speedup  Amdahl’s Law  The Gustafson-Barsis Law

Computer Science and Engineering Hypercubes nN = 2 d nd dimensions (d = log N) nA cube with d dimensions is made out of 2 cubes of dimension d-1 nSymmetric nDegree, Diameter, Cost, Fault tolerance nNode labeling – number of bits

Computer Science and Engineering Hypercubes d = 0d = 1d = 2d = 3 0 1 01 00 11 10 000 001 100110 111 011 101 010

Computer Science and Engineering Hypercubes 1110 1111 1010 1011 0110 0111 0010 0011 1101 1010 1000 1001 0100 0101 0010 0000 0001 S d = 4

Computer Science and Engineering Hypercube of dimension d N = 2 d d = log n Node degree = d Number of bits to label a node = d Diameter = d Number of edges = n*d/2 Hamming distance! Routing

Computer Science and Engineering Subcubes and Cube Fragmentation nWhat is a subcube? nShared Environment nFragmentation Problem nIs it Similar to something you know?

Computer Science and Engineering Cube Connected Cycles (CCC) nk-cube  2 k nodes n k-CCC from k-cube, replace each vertex of the k cube with a ring of k nodes nK-CCC  k* 2 k nodes nDegree, diameter  3, 2k nTry it for 3-cube

Computer Science and Engineering K-ary n-Cube nd = cube dimension nK = # nodes along each dimension nN = k d nWraparound nHupercube  binary d-cube nTours  k-ary 2-cube

Computer Science and Engineering Analysis and performance metrics static networks NetworkDegree(d)Diameter(D)Cost Symmetry Worst delay CCNsN-11N(N-1)/2Yes1 Linear Array2N-1 NoN Binary Tree3 2(  log 2 N  –1) N-1Nolog 2 N n-cubelog 2 N nN/2Yeslog 2 N 2D-Mesh42(n-1)2(N-n)No NN K-ary n-cube2n n  k/2  nNYesK x log 2 N

Computer Science and Engineering Dynamic IN

Computer Science and Engineering Bus Based IN Global Memory P P C P C P C P P

Computer Science and Engineering Dynamic Interconnection Networks nCommunication patterns are based on program demands nConnections are established on the fly during program execution nMultistage Interconnection Network (MIN) and Crossbar

Computer Science and Engineering Switch Modules nA x B switch module nA inputs and B outputs nIn practice, A = B = power of 2 nEach input is connected to one or more outputs (conflicts must be avoided) nOne-to-one (permutation) and one-to-many are allowed

Computer Science and Engineering Binary Switch 2x2 Switch Legitimate States = 4 Permutation Connections = 2

Computer Science and Engineering Legitimate Connections StraightExchange Upper-broadcast Lower-broadcast The different setting of the 2X2 SE

Computer Science and Engineering Group Work General Case ??

Computer Science and Engineering Multistage Interconnection Networks ISC1 ISC2 ISCn switches ISC  Inter-stage Connection Patterns

Computer Science and Engineering Perfect-Shuffle Routing Function nGiven x = {a n, a n-1, …, a 2, a 1 } nP(x) = {a n-1, …, a 2, a 1, a n } X = 110001 P(x) = 100011

Computer Science and Engineering Perfect Shuffle Example 000  000 001  010 010  100 011  110 100  001 101  011 110  101 111  111

Computer Science and Engineering Perfect-Shuffle 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111

Computer Science and Engineering Exchange Routing Function nGiven x = {a n, a n-1, …, a 2, a 1 } nE i (x) = {a n, a n-1, …, a i, …, a 2, a 1 } X = 0000000 E 3 (x) = 0000100

Computer Science and Engineering Exchange E 1 000  001 001  000 010  011 011  010 100  101 101  100 110  111 111  110

Computer Science and Engineering Exchange E 1 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111

Computer Science and Engineering Butterfly Routing Function nGiven x = {a n, a n-1, …, a 2, a 1 } nB(x) = {a 1, a n-1, …, a 2, a n } X = 010001 P(x) = 110000

Computer Science and Engineering Butterfly Example 000  000 001  100 010  010 011  110 100  001 101  101 110  011 111  111

Computer Science and Engineering Butterfly 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111

Computer Science and Engineering Multi-stage network

Computer Science and Engineering MIN (cont.) An 8X8 Banyan network

Computer Science and Engineering Min Implementation Control (X) Source (S) Destination (D) X = f(S,D)

Computer Science and Engineering Example X = 0 X = 1 ( crossed ) (straight ) A B C D A B C D

Computer Science and Engineering Consider this MIN S1 S2 S3 S4 S5 S6 S7 S8 D1 D2 D3 D4 D5 D6 D7 D8 stage 1 stage 2 stage 3

Computer Science and Engineering Example (Cont.) nLet control variable be X 1, X 2, X 3 nFind the values of X 1, X 2, X 3 to connect: n S1  D6 n S7  D5 n S4  D1

Computer Science and Engineering The 3 connections S1 S2 S3 S4 S5 S6 S7 S8 D1 D2 D3 D4 D5 D6 D7 D8 stage 1 stage 2 stage 3

Computer Science and Engineering Boolean Functions nX = x 1, x 2, x 3 nS = s 2, s 2, s 3 nD = d 1, d 2, d 3 nFind X = f(S,D)

Computer Science and Engineering Crossbar Switch M1 M2 M3 M4 M5 M6 M7 M8 P1 P2 P3 P4 P5 P6 P7 P8

Computer Science and Engineering Analysis and performance metrics dynamic networks NetworksDelayCostBlockingDegree of FT BusO(N)O(1)Yes0 Multiple-busO(mN)O(m)Yes(m-1) MINO(logN)O(NlogN)Yes0 CrossbarO(1)O(N 2 )No0

Computer Science and Engineering Performance Evaluations

Computer Science and Engineering Grosch’s Law (1960s) “To sell a computer for twice as much, it must be four times as fast” Vendors skip small speed improvements in favor of waiting for large ones Buyers of expensive machines would wait for a twofold improvement in performance for the same price.

Computer Science and Engineering Moore’s Law nGordon Moore (cofounder of Intel) nProcessor performance would double every 18 months nThis prediction has held for several decades nUnlikely that single-processor performance continues to increase indefinitely

Computer Science and Engineering Von Neumann’s bottleneck nGreat mathematician of the 1940s and 1950s nSingle control unit connecting a memory to a processing unit nInstructions and data are fetched one at a time from memory and fed to processing unit nSpeed is limited by the rate at which instructions and data are transferred from memory to the processing unit.

Computer Science and Engineering Past Trends in Parallel Architecture (inside the box) Completely custom designed components (processors, memory, interconnects, I/O) n Longer R&D time (2-3 years) n Expensive systems n Quickly becoming outdated –Bankrupt companies!!

Computer Science and Engineering Current Trends in Parallel Architecture (outside the box) -- before multicore!! nAdvances in commodity processors and network technology nNetwork of PCs and workstations connected via LAN or WAN forms a Parallel System nNetwork Computing nCompete favorably (cost/performance) nUtilize unused cycles of systems sitting idle

Computer Science and Engineering Speedup nS = Speed(new) / Speed(old) nS = Work/time(new) / Work/time(old) nS = time(old) / time(new) nS = time(before improvement) / time(after improvement)

Computer Science and Engineering Speedup nTime (one CPU): T(1) nTime (n CPUs): T(n) nSpeedup: S nS = T(1)/T(n)

Computer Science and Engineering Two Important Laws Influenced Parallel Computing

Computer Science and Engineering Argument Against Massively Parallel Processing. Gene Amdahl, 1967. For over a decade prophets have voiced the contention that the organization of a single computer has reached its limits and that truly significant advances can be made only by interconnection of multiplicity of computers in such a manner as to permit cooperative solution.. The nature of this overhead (in parallelism) appears to be sequential so that it is unlikely to be amenable to parallel processing techniques. Overhead alone would then place an upper limit on throughput of five to seven times the sequential processing rate, even if the housekeeping were done in a separate processor… At any point in time it is difficult to foresee how the previous bottlenecks in a sequential computer will be effectively overcome.

Computer Science and Engineering What does that mean? The performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode cannot be used. Unparallelizable part of the code severely limits the speedup Unparallelizable part of the code severely limits the speedup.

Computer Science and Engineering Walk 4 miles /hour Bike 10 miles / hour Car-1 50 miles / hour Car-2 120 miles / hour Car-3 600 miles /hour 200 miles 20 hours A B must walk Trip Analogy

Computer Science and Engineering Speedup Analysis (4 miles /hour) Time = 70 hours (10 miles / hour) Time = 40 hours (50 miles / hour) Time = 24 hours (120 miles / hour) Time = 21.67 hours S = 1.8 S = 2.9 S = 3.2 S = 3.4 (600 miles /hour) Time = 20.33 hours

Computer Science and Engineering S = T(1)/T(N) T(N) = T(1)  + T(1)(1-  ) N S = 1  + (1-  ) N = N  N + (1-  )  : The fraction of the program that is naturally serial (1-  ): The fraction of the program that is naturally parallel Amdahl’s Law

Computer Science and Engineering 10%20%30%40%50%60%70%80%90%99% 0 5 10 15 20 25 Speedup % Serial 1000 CPUs 16 CPUs 4 CPUs Amdahl’s Law

Computer Science and Engineering Gustafson – Barsis Law (1988)  Gordon Bell Prize  Overcoming the conceptual barrier established by Amdahl’s law  Scale the problem to the size of the parallel system  No fixed size problem  : The fraction of the program that is naturally serial T(N) = 1 T(1) =  + (1-  ) N S = N – (N-1) 

Computer Science and Engineering 0 20 40 60 80 100 10%20%30%40%50%60%70%80%90%99% % Serial Speedup Gustafson-Barsis Amdhal Amdahl vs. Gustafson-Barsis

Computer Science and Engineering Data Parallelism – Scale up  Parallelism is in the data, not the control portion of the application  Problem size scales up to the size of the system  Data Parallelism is to the 1990’s what vector parallelism was to the 1970’s  Supercomputer  data parallel

Computer Science and Engineering Problem Assume that a switching component such as a transistor can switch in zero time. We propose to construct a disk-shaped computer chip with such a component. The only limitation is the time it takes to send electronic signals from one edge of the chip to the other. Make the simplifying assumption that electronic signals travel 300,000 kilometers per second. What must be the diameter of a round chip so that it can switch 10 9 times per second? What would the diameter be if the switching requirements were 10 12 time per second?

Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6.

Similar presentations

Presentation on theme: "Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6.

Similar presentations

Presentation on theme: "Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6."— Presentation transcript:

Similar presentations

About project

Feedback