Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February 21 2006 Session 11.

Slides:



Advertisements
Similar presentations
Comparison Of Network On Chip Topologies Ahmet Salih BÜYÜKKAYHAN Fall.
Advertisements

SE-292 High Performance Computing
Classification of Distributed Systems Properties of Distributed Systems n motivation: advantages of distributed systems n classification l architecture.
Parallel Architectures: Topologies Heiko Schröder, 2003.
Parallel Architectures: Topologies Heiko Schröder, 2003.
1 Interconnection Networks Direct Indirect Shared Memory Distributed Memory (Message passing)
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
1 CSE 591-S04 (lect 14) Interconnection Networks (notes by Ken Ryu of Arizona State) l Measure –How quickly it can deliver how much of what’s needed to.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
DCS/2003/1 CENG Distributed Computing Systems Measures of Performance.
1 Static Interconnection Networks CEG 4131 Computer Architecture III Miodrag Bolic.
Introduction to Parallel Processing Ch. 12, Pg
Chapter 5 Array Processors. Introduction  Major characteristics of SIMD architectures –A single processor(CP) –Synchronous array processors(PEs) –Data-parallel.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Interconnect Network Topologies
Interconnection Networks. Applications of Interconnection Nets Interconnection networks are used everywhere! ◦ Supercomputers – connecting the processors.
Computer Science Department
Interconnect Networks
Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
A brief overview about Distributed Systems Group A4 Chris Sun Bryan Maden Min Fang.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture.
PPC Spring Interconnection Networks1 CSCI-4320/6360: Parallel Programming & Computing (PPC) Interconnection Networks Prof. Chris Carothers Computer.
CSE Advanced Computer Architecture Week-11 April 1, 2004 engr.smu.edu/~rewini/8383.
Parallel Computer Architecture and Interconnect 1b.1.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
1 Dynamic Interconnection Networks Miodrag Bolic.
Lecture 3 Innerconnection Networks for Parallel Computers
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 6.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 January Session 4.
An Overview of Parallel Computing. Hardware There are many varieties of parallel computing hardware and many different architectures The original classification.
Birds Eye View of Interconnection Networks
1 Interconnection Networks. 2 Interconnection Networks Interconnection Network (for SIMD/MIMD) can be used for internal connections among: Processors,
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 11, 2006 Session 23.
Copyright Hesham El-Rewini1 Speedup S = Speed(new) / Speed(old) S = Work/time(new) / Work/time(old) S = time(old) / time(new) S = time(before improvement)
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture.
Computer Science and Engineering Advanced Computer Architecture CSE 8383 February Session 6.
Computer performance issues* Pipelines, Parallelism. Process and Threads.
Super computers Parallel Processing
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 13.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 9.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 10.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.12.1 FAULT TOLERANT SYSTEMS Part 12 - Networks.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 7.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
Interconnection Networks Communications Among Processors.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 6, 2006 Session 22.
DCS/1 CENG Distributed Computing Systems Measures of Performance.
INTERCONNECTION NETWORK
Overview Parallel Processing Pipelining
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.
Lecture 23: Interconnection Networks
Connection System Serve on mutual connection processors and memory .
Interconnection topologies
Parallel and Multiprocessor Architectures
Mesh-Connected Illiac Networks
Static Interconnection Networks
High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub
Birds Eye View of Interconnection Networks
Advanced Computer and Parallel Processing
Static Interconnection Networks
Presentation transcript:

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 11

Computer Science and Engineering Copyright by Hesham El-Rewini Contents Dynamic Networks (Cont.) Static Networks (Revisited) Performance Analysis

Computer Science and Engineering Copyright by Hesham El-Rewini Multistage Interconnection Networks ISC1 ISC2 ISCn switches ISC  Inter-stage Connection Patterns

Computer Science and Engineering Copyright by Hesham El-Rewini Perfect-Shuffle Routing Function Given x = {a n, a n-1, …, a 2, a 1 } P(x) = {a n-1, …, a 2, a 1, a n } X = P(x) =

Computer Science and Engineering Copyright by Hesham El-Rewini Perfect Shuffle Example 000         111

Computer Science and Engineering Copyright by Hesham El-Rewini Perfect-Shuffle

Computer Science and Engineering Copyright by Hesham El-Rewini Exchange Routing Function Given x = {a n, a n-1, …, a 2, a 1 } E i (x) = {a n, a n-1, …, a i, …, a 2, a 1 } X = E 3 (x) =

Computer Science and Engineering Copyright by Hesham El-Rewini Exchange E         110

Computer Science and Engineering Copyright by Hesham El-Rewini Exchange E

Computer Science and Engineering Copyright by Hesham El-Rewini Butterfly Routing Function Given x = {a n, a n-1, …, a 2, a 1 } B(x) = {a 1, a n-1, …, a 2, a n } X = P(x) =

Computer Science and Engineering Copyright by Hesham El-Rewini Butterfly Example 000         111

Computer Science and Engineering Copyright by Hesham El-Rewini Butterfly

Computer Science and Engineering Copyright by Hesham El-Rewini Multi-stage network

Computer Science and Engineering Copyright by Hesham El-Rewini MIN (cont.) An 8X8 Banyan network

Computer Science and Engineering Copyright by Hesham El-Rewini Min Implementation Control (X) Source (S) Destination (D) X = f(S,D)

Computer Science and Engineering Copyright by Hesham El-Rewini Example X = 0 X = 1 ( crossed ) (straight ) A B C D A B C D

Computer Science and Engineering Copyright by Hesham El-Rewini Consider this MIN S1 S2 S3 S4 S5 S6 S7 S8 D1 D2 D3 D4 D5 D6 D7 D8 stage 1 stage 2 stage 3

Computer Science and Engineering Copyright by Hesham El-Rewini Example (Cont.) Let control variable be X 1, X 2, X 3 Find the values of X 1, X 2, X 3 to connect: S1  D6 S7  D5 S4  D1

Computer Science and Engineering Copyright by Hesham El-Rewini The 3 connections S1 S2 S3 S4 S5 S6 S7 S8 D1 D2 D3 D4 D5 D6 D7 D8 stage 1 stage 2 stage 3

Computer Science and Engineering Copyright by Hesham El-Rewini Boolean Functions X = x 1, x 2, x 3 S = s 2, s 2, s 3 D = d 1, d 2, d 3 Find X = f(S,D)

Computer Science and Engineering Copyright by Hesham El-Rewini Crossbar Switch M1 M2 M3 M4 M5 M6 M7 M8 P1 P2 P3 P4 P5 P6 P7 P8

Computer Science and Engineering Copyright by Hesham El-Rewini Analysis and performance metrics dynamic networks NetworksDelayCostBlockingDegree of FT BusO(N)O(1)Yes0 Multiple-busO(mN)O(m)Yes(m-1) MINO(logN)O(NlogN)Yes0 CrossbarO(1)O(N 2 )No0

Computer Science and Engineering Copyright by Hesham El-Rewini Static Network Analysis (Revisited) Graph Representation Parameters Cost Degree Diameter Fault tolerance

Computer Science and Engineering Copyright by Hesham El-Rewini Graph Review G = (V,E) -- V: nodes, E: edges Directed vs. Undirected Weighted Graphs Path, path length, shortest path Cycles, cyclic vs. acyclic Connectivity: connected, weakly connected, strongly connected, fully connected

Computer Science and Engineering Copyright by Hesham El-Rewini Linear Array N nodes, N-1 edges Node Degree: Diameter: Cost: Fault Tolerance:

Computer Science and Engineering Copyright by Hesham El-Rewini Ring N nodes, N edges Node Degree: Diameter: Cost: Fault Tolerance:

Computer Science and Engineering Copyright by Hesham El-Rewini Chordal Ring N nodes, N edges Node Degree: Diameter: Cost: Fault Tolerance:

Computer Science and Engineering Copyright by Hesham El-Rewini Barrel Shifter Number of nodes N = 2 n Start with a ring Add extra edges from each node to those nodes having power of 2 distance i & j are connected if |j-i| = 2 r, r = 0, 1, 2, …, n-1

Computer Science and Engineering Copyright by Hesham El-Rewini Mesh and Torus Node Degree: Internal  4 Other  3, 2 Diameter: 2(n-1) N = n*n Node Degree: 4 Diameter: 2* floor(n/2)

Computer Science and Engineering Copyright by Hesham El-Rewini Hypercubes N = 2 d d dimensions (d = log N) A cube with d dimensions is made out of 2 cubes of dimension d-1 Symmetric Degree, Diameter, Cost, Fault tolerance Node labeling – number of bits

Computer Science and Engineering Copyright by Hesham El-Rewini Hypercubes d = 0d = 1d = 2d =

Computer Science and Engineering Copyright by Hesham El-Rewini Hypercubes S d = 4

Computer Science and Engineering Copyright by Hesham El-Rewini Hypercube of dimension d N = 2 d d = log n Node degree = d Number of bits to label a node = d Diameter = d Number of edges = n*d/2 Hamming distance! Routing

Computer Science and Engineering Copyright by Hesham El-Rewini Subcubes and Cube Fragmentation What is a subcube? Shared Environment Fragmentation Problem Is it Similar to something you know?

Computer Science and Engineering Copyright by Hesham El-Rewini Cube Connected Cycles (CCC) k-cube  2 k nodes k-CCC from k-cube, replace each vertex of the k cube with a ring of k nodes K-CCC  k* 2 k nodes Degree, diameter  3, 2k Try it for 3-cube

Computer Science and Engineering Copyright by Hesham El-Rewini K-ary n-Cube d = cube dimension K = # nodes along each dimension N = k d Wraparound Hupercube  binary d-cube Tours  k-ary 2-cube

Computer Science and Engineering Copyright by Hesham El-Rewini Grosch’s Law Moore’s Law Von Neumann’s Bottlneck Parallelism Speedup Amdahl’s Law The Gustafson-Barsis Law Benchmarks Performance Evaluation

Computer Science and Engineering Copyright by Hesham El-Rewini Grosch’s Law (1960s) “To sell a computer for twice as much, it must be four times as fast” Vendors skip small speed improvements in favor of waiting for large ones Buyers of expensive machines would wait for a twofold improvement in performance for the same price.

Computer Science and Engineering Copyright by Hesham El-Rewini Moore’s Law Gordon Moore (cofounder of Intel) Processor performance would double every 18 months This prediction has held for several decades Unlikely that single-processor performance continues to increase indefinitely

Computer Science and Engineering Copyright by Hesham El-Rewini Von Neumann’s bottleneck Great mathematician of the 1940s and 1950s Single control unit connecting a memory to a processing unit Instructions and data are fetched one at a time from memory and fed to processing unit Speed is limited by the rate at which instructions and data are transferred from memory to the processing unit.

Computer Science and Engineering Copyright by Hesham El-Rewini Problem Assume that a switching component such as a transistor can switch in zero time. We propose to construct a disk- shaped computer chip with such a component. The only limitation is the time it takes to send electronic signals from one edge of the chip to the other. Make the simplifying assumption that electronic signals travel 300,000 kilometers per second. What must be the diameter of a round chip so that it can switch 10 9 times per second? What would the diameter be if the switching requirements were time per second?

Computer Science and Engineering Copyright by Hesham El-Rewini Parallelism Multiple CPUs Within the CPU One Pipeline Multiple pipelines

Computer Science and Engineering Copyright by Hesham El-Rewini Superscalar Parallelism Scheduling

Computer Science and Engineering Copyright by Hesham El-Rewini Past Trends in Parallel Architecture (inside the box) Completely custom designed components (processors, memory, interconnects, I/O) Longer R&D time (2-3 years) Expensive systems Quickly becoming outdated Bankrupt companies!!

Computer Science and Engineering Copyright by Hesham El-Rewini New Trends in Parallel Architecture (outside the box) Advances in commodity processors and network technology Network of PCs and workstations connected via LAN or WAN forms a Parallel System Network Computing Compete favorably (cost/performance) Utilize unused cycles of systems sitting idle

Computer Science and Engineering Copyright by Hesham El-Rewini Speedup S = Speed(new) / Speed(old) S = Work/time(new) / Work/time(old) S = time(old) / time(new) S = time(before improvement) / time(after improvement)

Computer Science and Engineering Copyright by Hesham El-Rewini Speedup Time (one CPU): T(1) Time (n CPUs): T(n) Speedup: S S = T(1)/T(n)

Computer Science and Engineering Copyright by Hesham El-Rewini Amdahl’s Law The performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used

Computer Science and Engineering Copyright by Hesham El-Rewini 20 hours 200 miles A B Walk 4 miles /hour Bike 10 miles / hour Car-1 50 miles / hour Car miles / hour Car miles /hour must walk Example

Computer Science and Engineering Copyright by Hesham El-Rewini 20 hours 200 miles A B Walk 4 miles /hour  = 70 hours S = 1 Bike 10 miles / hour  = 40 hours S = 1.8 Car-1 50 miles / hour  = 24 hours S = 2.9 Car miles / hour  = hours S = 3.2 Car miles /hour  = hours S = 3.4 must walk Example

Computer Science and Engineering Copyright by Hesham El-Rewini Amdahl’s Law (1967)  : The fraction of the program that is naturally serial (1-  ): The fraction of the program that is naturally parallel

Computer Science and Engineering Copyright by Hesham El-Rewini S = T(1)/T(N) T(N) = T(1)  + T(1)(1-  ) N S = 1  + (1-  ) N = N  N + (1-  )

Computer Science and Engineering Copyright by Hesham El-Rewini Amdahl’s Law

Computer Science and Engineering Copyright by Hesham El-Rewini Gustafson-Barsis Law N &  are not independent from each other T(N) = 1 T(1) =  + (1-  ) N S = N – (N-1)   : The fraction of the program that is naturally serial

Computer Science and Engineering Copyright by Hesham El-Rewini Gustafson-Barsis Law

Computer Science and Engineering Copyright by Hesham El-Rewini

Computer Science and Engineering Copyright by Hesham El-Rewini Distributed Computing Performance Single Program Performance Multiple Program Performance

Computer Science and Engineering Copyright by Hesham El-Rewini

Computer Science and Engineering Copyright by Hesham El-Rewini Benchmark Performance Serial Benchmarks Parallel Benchmarks PERFECT Benchmarks NAS Kernel The SLALOM The Golden Bell Prize WebSTONE for the Web Performance Comparisons