Parallel Architectures: Topologies Heiko Schröder, 2003
Parallel Architectures 2 Types of sequential processors (SISD) processor memory processor memory cache memory processor Von Neumann bottleneck
Heiko Schröder, 2003 Parallel Architectures 3 SIMD MIMD PE Global control unit Interconnection network PE + control unit PE + control unit PE + control unit PE + control unit Interconnection network SPMD SIMD
Heiko Schröder, 2003 Parallel Architectures 4 Message passing / shared address space PE + M control unit PE + M control unit PE + M control unit PE + M control unit Interconnection network P P P P P M M M M P/M
Heiko Schröder, 2003 Parallel Architectures 5 Various communication networks State of the art technology Important aspects of routing schemes Known results (theory) The internet
Heiko Schröder, 2003 Parallel Architectures 6 Desirable feature of a network 1. Algorithmic Low diameter (1, complete graph) High bisection width (complete graph) n(n-1)/2 edges Degree n-1 2. Technical Low degree (pin limitations – constant – modular – mesh) Short wires (mesh) Small area (mesh) Regular structure (mesh)
Heiko Schröder, 2003 Parallel Architectures 7 Diameter n-1 Bisection width 1 Connection networks I 1-D mesh (linear array)
Heiko Schröder, 2003 Parallel Architectures 8 Tree Diameter 2(log n) Bisection width 1
Heiko Schröder, 2003 Parallel Architectures 9 H-tree Area: O(n) Longest wire :O( n) Clock distribution
Heiko Schröder, 2003 Parallel Architectures 10 2-D Mesh Diameter: Bisection width :
Heiko Schröder, 2003 Parallel Architectures 11 Torus Reduced diameter Increased bisection width All nodes equivalent Long wires?
Heiko Schröder, 2003 Parallel Architectures 12 3-D Mesh Diameter: Bisection:
Heiko Schröder, 2003 Parallel Architectures 13 Hypercube 0-D D D D 01 4-D diameter log n bisection width n/2
Heiko Schröder, 2003 Parallel Architectures 14 Cube Connected Cycles nodes # nodes nodes Diameter> bisection
Heiko Schröder, 2003 Parallel Architectures 15 Exchange (lsb) Shuffle (rotate -- left or right) node shuffle-exchange graph Degree: 3 Diameter: 2 log n –1 : at most (log n –1) shuffles + (log n ) exchanges Bisection width: (n / log n)
Heiko Schröder, 2003 Parallel Architectures Exchange (lsb) Shuffle (rotate -- left or right) 16-node shuffle-exchange graph u 1 u 2 …u k-1 u k ex u 1 u 2 …u k-1 v 1 u k v 1 v 2 …v k-1 … u 2 …u k v 1 v 2 ls+ex v 1 v 2 …v k ls+ex Diameter: 2 log n –1 : at most (log n –1) shuffles + (log n ) exchanges Bisection width: (n / log n) Degree: 3
Heiko Schröder, 2003 Parallel Architectures 17 u 1 u 2 …u k-1 u k u 2 u 3 …u k-1 u k 0 0 u 1 u 2 …u k-1 u k u 2 u 3 …u k-1 u k dimensional de Bruijn graph In-degree = out-degree = 2 Diameter: log n Bisection width: (n / log n) Each Eulerian tour = De Bruijn sequence = contains each possible sub-string of length 4 exactly once De Bruijn sequence
Heiko Schröder, 2003 Parallel Architectures 18 Butterfly network Unique path FFT routing sorting
Heiko Schröder, 2003 Parallel Architectures 19 Benes network
Heiko Schröder, 2003 Parallel Architectures 20 Mesh of trees Diameter (log n) Bisection width ( )
Heiko Schröder, 2003 Parallel Architectures 21 The Power of Hypercubes 4-D Hamiltonian cycle Gray codes k-D meshes (tori), N-nodes simulates mesh of trees simulates hypercubic networks contains complete binary tree, almost normal algorithms
Heiko Schröder, 2003 Parallel Architectures 22 Hamiltonian Cycle A hypercube contains a Hamiltonian cycle -- proof by induction. Each Hamiltonian cycle corresponds to a Gray code (only one bit is changed per link).
Heiko Schröder, 2003 Parallel Architectures 23 Gray code reflection
Heiko Schröder, 2003 Parallel Architectures 24 Hypercube contains meshes/tori wrap around Theorem: Any n 1 x n 2 x … x n k mesh (with or without wrap arounds) is a sub-graph of an n-D hypercube if n i = 2 n. Proof: (see Leighton: Each sub-cube has Hamiltonian cycle)
Heiko Schröder, 2003 Parallel Architectures 25 Hypercube contains double-rooted trees HC can implement all tree algorithms and also all mesh-of-tree-algorithms (possibly with minor delay). double-roots (different dimension)
Heiko Schröder, 2003 Parallel Architectures 26 Normal algorithms A hypercube algorithm is said to be normal if only one dimension of hypercube edges is used at any step and if consecutive dimensions are used in consecutive steps. Most hypercube algorithms are normal. Normal algorithms can be embedded efficiently on hypercubic networks
Heiko Schröder, 2003 Parallel Architectures 27 Josephus graph: Every even node k is connected to k+2 i -3 Diameter: about (log n) /
Heiko Schröder, 2003 Parallel Architectures Star graph: Set of nodes: k! nodes of degree k-1. Permutations of k elements. Set of edges: Exchange of first element with one other. Small degree, diameter about 2 log n. Open problems: E.g. are there (k-1)/2 edge disjoint Hamiltonian cycles? Number of nodes versus degree (Star/HC): 24, 120, 720, 4340, 34720, , 32, 64, 128, 256, 512
Heiko Schröder, 2003 Parallel Architectures 29 pin - limitations 1 4-D
Heiko Schröder, 2003 Parallel Architectures 30 wiring - limitations 4-D nodes bisection width: K 25cm 32 m
Heiko Schröder, 2003 Parallel Architectures 31 Improve the topology? The internet
Heiko Schröder, 2003 Parallel Architectures 32 against parallelism cost(large) < cost (2 small) all the FORTRAN / C software let’s stick to pipelining let’s wait for faster machines Amdahl’s Law