PERMUTATION CIRCUITS Presented by Wooyoung Kim, 1/28/2009 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad.

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Parallel List Ranking Advanced Algorithms & Data Structures Lecture Theme 17 Prof. Dr. Th. Ottmann Summer Semester 2006.
Parallel Algorithms.
Algorithms (and Datastructures) Lecture 3 MAS 714 part 2 Hartmut Klauck.
Lecture 7. Network Flows We consider a network with directed edges. Every edge has a capacity. If there is an edge from i to j, there is an edge from.
Introduction to Algorithms Quicksort
Advanced Topics in Algorithms and Data Structures
Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.
Heaps1 Part-D2 Heaps Heaps2 Recall Priority Queue ADT (§ 7.1.3) A priority queue stores a collection of entries Each entry is a pair (key, value)
Advanced Topics in Algorithms and Data Structures Lecture 7.2, page 1 Merging two upper hulls Suppose, UH ( S 2 ) has s points given in an array according.
Lecture 3: Parallel Algorithm Design
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
1 Potential for Parallel Computation Module 2. 2 Potential for Parallelism Much trivially parallel computing  Independent data, accounts  Nothing to.
Advanced Topics in Algorithms and Data Structures Lecture 7.1, page 1 An overview of lecture 7 An optimal parallel algorithm for the 2D convex hull problem,
Lectures on Network Flows
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
1 Lecture 8: Voronoi Diagram Computational Geometry Prof. Dr. Th. Ottmann Voronoi Diagrams Definition Characteristics Size and Storage Construction Use.
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Lecture 5: Linear Time Sorting Shang-Hua Teng. Sorting Input: Array A[1...n], of elements in arbitrary order; array size n Output: Array A[1...n] of the.
1 Tuesday, November 14, 2006 “UNIX was never designed to keep people from doing stupid things, because that policy would also keep them from doing clever.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
1 A Quantum self- Routing Packet Switching Manish Kumar Shukla, Rahul Ratan and A. Yavuz Oruc, Department of Electrical and Computer Engineering, University.
CSE 421 Algorithms Richard Anderson Lecture 4. What does it mean for an algorithm to be efficient?
Point Location Computational Geometry, WS 2007/08 Lecture 5 Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen, Institut für Informatik Fakultät für.
A Distributed Algorithm for Minimum-Weight Spanning Trees by R. G. Gallager, P.A. Humblet, and P. M. Spira ACM, Transactions on Programming Language and.
Algorithm Efficiency and Sorting
Lecture 6: Point Location Computational Geometry Prof. Dr. Th. Ottmann 1 Point Location 1.Trapezoidal decomposition. 2.A search structure. 3.Randomized,
Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 14 Strongly connected components Definition and motivation Algorithm Chapter 22.5.
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
Combinational Circuits and Sorting Networks
Cardinality & Sorting Networks. Cardinality constraint Appears in many practical problems: scheduling, timetabling etc’. Also takes place in the Max-Sat.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Domain testing Tor Stålhane. Domain testing revisited We have earlier looked at domain testing as a simple strategy for selecting test cases. We will.
Lecture 12: Parallel Sorting Shantanu Dutt ECE Dept. UIC.
1 More Sorting radix sort bucket sort in-place sorting how fast can we sort?
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
1 Parallel Sorting Algorithms. 2 Potential Speedup O(nlogn) optimal sequential sorting algorithm Best we can expect based upon a sequential sorting algorithm.
A Simple Algorithm for Stable Minimum Storage Merging Pok-Son Kim Kookmin University, Department of Mathematics, Seoul , Korea Arne Kutzner Seokyeong.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
CS 61B Data Structures and Programming Methodology July 28, 2008 David Sun.
Bulk Synchronous Processing (BSP) Model Course: CSC 8350 Instructor: Dr. Sushil Prasad Presented by: Chris Moultrie.
Communication and Computation on Arrays with Reconfigurable Optical Buses Yi Pan, Ph.D. IEEE Computer Society Distinguished Visitors Program Speaker Department.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
CSE373: Data Structures & Algorithms Lecture 22: The P vs. NP question, NP-Completeness Lauren Milne Summer 2015.
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
Data Structures and Algorithms in Parallel Computing Lecture 2.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
“Sorting networks and their applications”, AFIPS Proc. of 1968 Spring Joint Computer Conference, Vol. 32, pp
Student: Fan Bai Instructor: Dr. Sushil Prasad CSc8530.
Analysis of the Traveling Salesman Problem and current approaches for solving it. Rishi B. Jethwa and Mayank Agarwal. CSE Department. University of Texas.
LIMITATIONS OF ALGORITHM POWER
Based on An Engineering Approach to Computer Networking/ Keshav
Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
CSE 326: Data Structures Lecture 23 Spring Quarter 2001 Sorting, Part 1 David Kaplan
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
CSC317 1 At the same time: Breadth-first search tree: If node v is discovered after u then edge uv is added to the tree. We say that u is a predecessor.
Advanced Algorithms Analysis and Design By Dr. Nazir Ahmad Zafar Dr Nazir A. Zafar Advanced Algorithms Analysis and Design.
Lecture 3: Parallel Algorithm Design
Decision trees Polynomial-Time
Lectures on Network Flows
(2,4) Trees 11/15/2018 9:25 AM Sorting Lower Bound Sorting Lower Bound.
Bitonic Sorting and Its Circuit Design
(2,4) Trees 12/4/2018 1:20 PM Sorting Lower Bound Sorting Lower Bound.
Chapter 11 Limitations of Algorithm Power
(2,4) Trees 2/28/2019 3:21 AM Sorting Lower Bound Sorting Lower Bound.
Presentation transcript:

PERMUTATION CIRCUITS Presented by Wooyoung Kim, 1/28/2009 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad

Outline  Introduction: Problem Definition, Terminology.  Lower bound.  Permutation Circuit Design  Example  Constructive Proof  Analysis  Application

A permutation circuit is a combinational circuit that applies a given permutation ψ n to its input x1,x2,… xn to get an output y1, y2, …, yn such that: y1, y2, …yn = ψ n (x1, x2, …, xn) An example: Ψ 8 = Definition This means: InputOutput 1  4 2  8 ….. Hence, y1 = 5, y2 = 4, y3 = 3, y4=1, …

Circuit component : Switch A switch as the name suggests is a simple component that can do the following: 1.OFF state – Inputs are sent to output in the same order. 2.ON state – Inputs are switched or interchanged at the output. OFFON

Some basic terminology Size of a circuit – Number of components in the circuit. Depth of a circuit – Max number of stages in the path. Width of a circuit – Max number of components in a stage. Hence, Width Depth

Lower Bounds Let us say that for an input size n we need s switches. Each switch  2 stages (ON/OFF) s switches  2 s stages To satisfy any permutation, 2 s >= n! s >= n log n.LB on size is Ω (n log n)

Lower Bounds Therefore,, since there are n inputs and n outputs, since 1.each input line must have a path to each output line. 2.each switch has only two inputs and two outputs.

Permutation Vs. Sorting The order in which the inputs to a Sorting Circuit appear at the output, depends on the values of the input. Hence, by having inputs in the form of a pair (i, j) (which implies input i is sent to output j ) we can perform permutations by using a sorting circuit and sorting by the j values.

Permutation Vs. Sorting Sorting circuits are self-routing. That is, each comparator makes its decision as to which way the data it receives are to be directed; this decision is made when they reach the comparator and is based on their values. In permutation circuits, switches are to be set ahead of time.

Circuit Design Once again we use a recursive design based on smaller permutation circuits. The basic idea is to design the circuit in 3 layers: Stage 1 – The first layer decides which of the 2 Stage 2 circuits the Input goes to. Stage 2 – Permutes the input at one scale less. Stage 3 – Decides where Output of Stage 2 goes in the final output sequence.

Description We need to show that any permutation can be performed for the given input. 1.If for some output y l we trace back to input x 2k then select its neighbor in switch I k (x 2k-1 ) and set the switches from there to its correct output. If neighbor is already selected, select any other i/p. 2.If for some input x l output y 2k is reached, select its neighbor in switch O k (y 2k-1 ) and set switches from there to correct input. Ping – Pong Technique

An Example Let us construct the circuit for the example shown earlier. It is given below. Ψ 8 = We shall consider it step by step. Our basic building blocks are a based on the following: n=1 No switches needed. n=2 One Switch sufficient. n > 2 Input fed into switches I that direct them towards two n/2 permutation circuits.

Constructive Proof [Waksman67] Consider a network like the one above with no links. We are given any arbitrary permutation. The upper n/2 circuit is called Pa and the lower Pb. Start with y1 and establish a link through Pa to some x through its corresponding I. Switch I is set if u is even. Proceed next with the second u associated with this I and establish a link through Pb to its y through the O associated with it. Set this O if y is even.

Repeat the process until all input-output pairs have been matched. Now, since by construction Pa and Pb, are each associated with exactly N/2 inputs and N/2 outputs, and since by assumption Pa and Pb are permutation networks the assignment is complete and the link pattern is as in the figure.

Analysis 1.Depth d(n) = d (n/2) + 2 = [ d(n/4) + 2 ] + 2 = d( n/ 2 k ) + 2k ( d(2)=1) n/ 2 k = 2 log n = k+1; k=log n -1 d(n) = 2 log n – 1

Analysis (contd.) 2. Width : n/2 3. Size -> p(n) : p(1) = 0, p(2) =1 p(n) = 2 p (n/2) + n – 1 Hence, p(n) = n log n –n +1

Applications  Investigate the problem of permuting n data items on an EREW PRAM with p processors using little additional storage.  Present a simple algorithm with run time O(n/p logn) and an improved algorithm with run time O(n/p+ lognlog log(n/p)).  Both algorithms require n additional global bits and O local storage per processor.  If prex summation is supported at the instruction level the run time of the improved algorithm is O(n/p)  The algorithms can be used to rehash the address space of a PRAM emulation Fast Parallel Permutation Algorithms [Hagerup 95]

Applications  Permute along the cycle until you reach it again.  Mark all positions that visited.  Continue until all positions have been visited.  O(n) to move all items and O(n) t search for unvisited positions. Sequential Algorithm

Applications  EREW PRAM with p processors.  Each processor P takes care of one block of B=n/p positions.  P starts with x in its block and follows the cycle until it meets a position y that is already marked as visited.  P is one of three states: searching, working on a cycle, terminated.  Time: O((n/p)logn) Basic Algorithm

Applications  Basic algorithm is not optimal because many processors could terminate early- unbalanced.  The array of items is dynamically partitioned into active and passive blocks.  passive: all positions have been visited.  active: split into smaller ones as the algorithm proceeds.  Time: O(n/p + logn log log (n/p)) Improved Algorithm

References [Akl97] Selim G Akl, Parallel Computation, Prentice Hall, New Jersey, [Waksman68] A permutation network. Journal of the ACM, Vol. 15, 1968, pp [Hagerup 95] Fast Parallel Permutation Algorithms, Journal of Parallel Processing Letters, Vol. 5, No. 2, 995, pp