Heiko Schröder, 2003 Parallel Architectures 1 Various communication networks State of the art technology Important aspects of routing schemes Known results.

Slides:



Advertisements
Similar presentations
Single Source Shortest Paths
Advertisements

Lecture 19: Parallel Algorithms
Routing in a Parallel Computer. A network of processors is represented by graph G=(V,E), where |V| = N. Each processor has unique ID between 1 and N.
Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.
Discussion #34 1/17 Discussion #34 Warshall’s and Floyd’s Algorithms.
Chapter 3 The Greedy Method 3.
Parallel Architectures: Topologies Heiko Schröder, 2003.
Parallel Architectures: Topologies Heiko Schröder, 2003.
Lecture 5: Linear Time Sorting Shang-Hua Teng. Sorting Input: Array A[1...n], of elements in arbitrary order; array size n Output: Array A[1...n] of the.
1 CSE 591-S04 (lect 14) Interconnection Networks (notes by Ken Ryu of Arizona State) l Measure –How quickly it can deliver how much of what’s needed to.
Shortest Paths Definitions Single Source Algorithms –Bellman Ford –DAG shortest path algorithm –Dijkstra All Pairs Algorithms –Using Single Source Algorithms.
CSE621/JKim Lec4.1 9/20/99 CSE621 Parallel Algorithms Lecture 4 Matrix Operation September 20, 1999.
Lecture 21: Parallel Algorithms
Parallel Routing Bruce, Chiu-Wing Sham. Overview Background Routing in parallel computers Routing in hypercube network –Bit-fixing routing algorithm –Randomized.
Online Algorithms for Network Design Adam Meyerson UCLA.
Communication operations Efficient Parallel Algorithms COMP308.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
TCOM 501: Networking Theory & Fundamentals
Heiko Schröder, 1998 ROUTING ? Sorting? Image Processing? Sparse Matrices? Reconfigurable Meshes !
Shortest Paths Definitions Single Source Algorithms
CS 684.
1 Parallel Algorithms III Topics: graph and sort algorithms.
Topic Overview One-to-All Broadcast and All-to-One Reduction
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
CMPE 150- Introduction to Computer Networks 1 CMPE 150 Fall 2005 Lecture 21 Introduction to Computer Networks.
EECC694 - Shaaban #1 lec #7 Spring The OSI Reference Model Network Layer.
Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.
Shortest Paths C B A E D F
Distributed Routing Algorithms. In a message passing distributed system, message passing is the only means of interprocessor communication. Unicast, Multicast,
Lecture 12: Parallel Sorting Shantanu Dutt ECE Dept. UIC.
Chapter 5 Dynamic Programming 2001 년 5 월 24 일 충북대학교 알고리즘연구실.
CSE Advanced Computer Architecture Week-11 April 1, 2004 engr.smu.edu/~rewini/8383.
1 Parallel Sorting Algorithms. 2 Potential Speedup O(nlogn) optimal sequential sorting algorithm Best we can expect based upon a sequential sorting algorithm.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
TCP Traffic and Congestion Control in ATM Networks
1 Oblivious Routing in Wireless networks Costas Busch Rensselaer Polytechnic Institute Joint work with: Malik Magdon-Ismail and Jing Xi.
Network and Communications Ju Wang Chapter 5 Routing Algorithm Adopted from Choi’s notes Virginia Commonwealth University.
Chi-Cheng Lin, Winona State University CS 313 Introduction to Computer Networking & Telecommunication Chapter 5 Network Layer.
The Lower Bounds of Problems
Lecture 6 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
The Network Layer & Routing
Complexity 20-1 Complexity Andrei Bulatov Parallel Arithmetic.
 Rooted tree and binary tree  Theorem 5.19: A full binary tree with t leaves contains i=t-1 internal vertices.
MA/CSSE 473 Day 28 Dynamic Programming Binomial Coefficients Warshall's algorithm Student questions?
1 The Floyd-Warshall Algorithm Andreas Klappenecker.
Introduction to Algorithms Jiafen Liu Sept
Lectures on Greedy Algorithms and Dynamic Programming
Winter 2014Parallel Processing, Fundamental ConceptsSlide 1 2 A Taste of Parallel Algorithms Learn about the nature of parallel algorithms and complexity:
Lecture 9COMPSCI.220.FS.T Lower Bound for Sorting Complexity Each algorithm that sorts by comparing only pairs of elements must use at least 
Graph Theory. undirected graph node: a, b, c, d, e, f edge: (a, b), (a, c), (b, c), (b, e), (c, d), (c, f), (d, e), (d, f), (e, f) subgraph.
HYPERCUBE ALGORITHMS-1
Basic Communication Operations Carl Tropper Department of Computer Science.
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
1 Closures of Relations Based on Aaron Bloomfield Modified by Longin Jan Latecki Rosen, Section 8.4.
All Pairs Shortest Path Algorithms Aditya Sehgal Amlan Bhattacharya.
Trees.
COMP 6/4030 ALGORITHMS Prim’s Theorem 10/26/2000.
Shortest Path Problems
Bitonic Sorting and Its Circuit Design
Greedy Algorithms / Dijkstra’s Algorithm Yin Tat Lee
Carlos Ordonez, Predrag T. Tosic
Mesh-Connected Illiac Networks
Algorithms (2IL15) – Lecture 5 SINGLE-SOURCE SHORTEST PATHS
Shortest Path Problems
High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub
Lecture 6 Algorithm Analysis
Lecture 6 Algorithm Analysis
Parallel Sorting Algorithms
Directed Graphs (Part II)
Presentation transcript:

Heiko Schröder, 2003 Parallel Architectures 1 Various communication networks State of the art technology Important aspects of routing schemes Known results (theory) The internet

Heiko Schröder, 2003 Parallel Architectures 2 Routing Models Store-and-forward (packet switching) model: --Packet is entity – one packet per edge per time unit --Queues can be allowed to build up in nodes – try to keep them short Circuit switching (path-lockdown) --entire path is dedicated to packet (from source to destination) Wormhole routing Static routing problems: all packets are present when routing commences. (Dynamic routing: packets arrive at arbitrary times.) Types of static routing problems: General assumption: each processor sends only one packet One-to-one: -- each packet has precisely one destination -- at most one packet is destined for each processor Many-to-one: More than one packet can have same destination. One-to-many: A single packet can have more than one destinations (copies). Hot spots = bottlenecks (example: Many-to-one) – try to avoid !

Heiko Schröder, 2003 Parallel Architectures 3 Wormhole routing Used in clusters

Heiko Schröder, 2003 Parallel Architectures 4 Hot potato routing Try to move as many as possible into a “good” direction Very good average performance! Hot potato routing on the internet

Heiko Schröder, 2003 Parallel Architectures 5 Greedy routing Move along row to correct column move along column Possible queue

Heiko Schröder, 2003 Parallel Architectures 6 Butterfly network Unique path FFT routing sorting

Heiko Schröder, 2003 Parallel Architectures 7 Benes network

Heiko Schröder, 2003 Parallel Architectures 8 Benes network

Heiko Schröder, 2003 Parallel Architectures 9 Packet-Routing Algorithms Most important in parallel architectures Meshes have big diameter Benes networks – fast routing – no fast way of finding the paths is known (might be computed off-line – might not be suitable) On-line algorithms ?

Heiko Schröder, 2003 Parallel Architectures 10 Greedy Routing – in BF 0 level log N (= k) 1 N 1 N row 000 row 001 row 010 row 011 row 100 row 101 row 110 row 111 (u 1 u 2 … u k-1 u k, 0)  (v 1 u 2 … u k-1 u k, 1)  (v 1 v 2 … v k-1 v k, k) (v 1 v 2 … u k-1 u k, 2) ... u 1 u 2 … u k v 1 v 2 … v k (u 1 u 2 …u (k-1)/2 00…0, 0)  (00…0u (k-1)/2 …u 2 u 1, k)

Heiko Schröder, 2003 Parallel Architectures 11 Greedy Routing – Worst Cases Rout N packets in a butterfly:  :  [1,N]  [1,N] ; Example: bit-reversal permutation 0 level log N (= k) 1 N 1 N row 000 row 001 row 010 row 011 row 100 row 101 row 110 row 111 (u 1 u 2 …u (k-1)/2 00…0, 0)  (0u 2 …u (k-1)/2 00…0, 1) ... (00..0u (k-1)/2 00…0, (k-3)/2)  ( …0, (k-1)/2)  ( …0, (k+1)/2)  (00..00u (k-1)/2 0…0, (k+3)/2) ... (00…0u (k-1)/2 …u 2 u 1, k) 2 (k-1)/2 = paths go thru Time

Heiko Schröder, 2003 Parallel Architectures 12 Oblivious Routing Definition: A routing algorithm is called oblivious if its path depends only on the addresses of source and destination of the packet. Example: Greedy routing. Theorem: Let G=(V,E) be any N-node degree-d network. Then for every oblivious routing algorithm there exists a 1-1 packet routing problem which Will take at least steps to complete. Proof: see Leighton. Thus a “good” routing algorithm cannot be oblivious (or greedy) – it has to take into account other packets and/or congestions.

Heiko Schröder, 2003 Parallel Architectures 13 Routing via sorting Routing can be (and is often) done via sorting. Merge-sort on the hypercube and hypercubic networks can be done in time O(log 2 N) – much better results are known – it might be possible to sort in time O(log N) (unknown for hypercubic networks). If M<N keys need to be sorted it is advisable to “pack” first, then sort, then “spread”.

Heiko Schröder, 2003 Parallel Architectures 14 Packing on the butterfly A B C D E C D A B E row 000 row 001 row 010 row 011 row 100 row 101 row 110 row 111 Unique greedy path  monotone packing without collisions. Proof? Destination unknown  firstly determine destination!

Heiko Schröder, 2003 Parallel Architectures 15 neighbor not neighbor distance < 4distance >= 4

Heiko Schröder, 2003 Parallel Architectures 16 Prefix sum Complete binary tree is sub-graph of butterfly

Heiko Schröder, 2003 Parallel Architectures 17 Wrapped butterfly (WBF)

Heiko Schröder, 2003 Parallel Architectures 18 0/1 principle If an oblivious comparison exchange algorithm sorts all input sets consisting solely of 0s and 1s, then it sorts all input sets with r values. Proof (by contradiction): Assume it sorts all input sets consisting solely of 0s and 1s, but it fails to sort some sequence of arbitrary values. Instead of the correct output: x 1  x 2  x 3  …  x k-1  x k  …  x n it outputs: x 1  x 2  x 3  …  x k-1 < x r … x k... Now replace all x i with i  k with 0s and all others with 1s.  0 x k 1 x s 0 x k 1 x s  0 x k 0 x c 0 x k wrong position! An 0 ends up where x k ended up, i.e. in a wrong position -- contradiction!. 0 x k 0 x c 0 x k

Heiko Schröder, 2003 Parallel Architectures 19 Inductive proof: butterfly sorts bitonic sequences Use 0/1-principle bitonic sequence : a concatenation of two sorted sequences (arbitrary length) -- sorted in opposite directions

Heiko Schröder, 2003 Parallel Architectures 20 Inductive step Case 1: at least n/2 1s n/2 1s (max) bitonic (min) Case 2: at most n/2 1s bitonic (max) n/2 0s (min) Case 3 & 4 : 0  1

Heiko Schröder, 2003 Parallel Architectures 21 Time/Area complexity? For sorting on BF Last merge: log n steps previous merge: log n -1 steps... first merge: 1 step Total time: (log n +1) log n / 2 steps Time:  (log 2 n) Area of butterfly:  (n 2 ) -- # of crossing points! AT 2 =  (n 2 log 4 n) not quite optimal.

Heiko Schröder, 2003 Parallel Architectures 22 Sorting on the ISA Repeat log n times: vertical merge; horizontal merge.

Heiko Schröder, 2003 Parallel Architectures 23 Sorting on the ISA 2x24x2

Heiko Schröder, 2003 Parallel Architectures 24 Horizontal merge in-shuffle: out-shuffle -- result? sorted!Only one dirty row - prove!

Heiko Schröder, 2003 Parallel Architectures 25 In-shuffle Same number per column xxxxxxx

Heiko Schröder, 2003 Parallel Architectures 26 Time/Area complexity for sorting on the mesh Time for a merge step from k x k to 2k x 2k: Ck Total time: O(log n  n) (  n x  n mesh) (remark: is possible) Area: n log n AT 2 = n 2 log 3 n (n 2 log 2 n is possible) AT: BF: AT= n 2 log 2 n Mesh: AT=n 3/2 log 2 n

Heiko Schröder, 2003 Parallel Architectures 27 Warshall’s algorithm for k:=1 to n do for i:=1 to n do for j:=1 to n do a ij :=F(a ij, a ik, a kj ) a kj a ik a ij Algebraic path problem. Examples: all shortest paths 1.) a ij := a ij  ( a ik  a kj ) -- start with adjacency matrix A 2.) d ij := min  d ij ; ( d ik + d kj )  -- start with distance matrix D [also carry first/last node on path]

Heiko Schröder, 2003 Parallel Architectures 28 Parallaxis versus ISA a kj a ik a ij a 1j a ij a i1 a 2j a i2 a ij ParallaxisISA   V:=C  V C:=A  V Adjacency matrix in C Instructions (only a suggestion): C:=CN C:=CE C:=CW C:=CS A:=C C:=A V:=C AC CA VC

? ? ? ?