II Extreme Models Study the two extremes of parallel computation models: Abstract SM (PRAM); ignores implementation issues Concrete circuit model; incorporates.

Slides:



Advertisements
Similar presentations
Analysis of Algorithms II
Advertisements

Turing Machines January 2003 Part 2:. 2 TM Recap We have seen how an abstract TM can be built to implement any computable algorithm TM has components:
Comp 122, Spring 2004 Order Statistics. order - 2 Lin / Devi Comp 122 Order Statistic i th order statistic: i th smallest element of a set of n elements.
CS224 Spring 2011 Computer Organization CS224 Chapter 6A: Disk Systems With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide.
Parallel List Ranking Advanced Algorithms & Data Structures Lecture Theme 17 Prof. Dr. Th. Ottmann Summer Semester 2006.
1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
Chapter 4 Gates and Circuits.
Reversible Gates in various realization technologies
Other Gate Types COE 202 Digital Logic Design Dr. Aiman El-Maleh
Chapter 4 Gates and Circuits.
Introduction to Algorithms Quicksort
110/6/2014CSE Suprakash Datta datta[at]cse.yorku.ca CSE 3101: Introduction to the Design and Analysis of Algorithms.
Epp, section 10.? CS 202 Aaron Bloomfield
PERMUTATION CIRCUITS Presented by Wooyoung Kim, 1/28/2009 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad.
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.
Batcher’s merging network Efficient Parallel Algorithms COMP308.
Divide and Conquer. Recall Complexity Analysis – Comparison of algorithm – Big O Simplification From source code – Recursive.
Nattee Niparnan. Recall  Complexity Analysis  Comparison of Two Algos  Big O  Simplification  From source code  Recursive.
Parallel Sorting Algorithms Comparison Sorts if (A>B) { temp=A; A=B; B=temp; } Potential Speed-up –Optimal Comparison Sort: O(N lg N) –Optimal Parallel.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
1 Tuesday, November 14, 2006 “UNIX was never designed to keep people from doing stupid things, because that policy would also keep them from doing clever.
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
1 Lecture 11 Sorting Parallel Computing Fall 2008.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
Design of parallel algorithms Sorting J. Porras. Problem Rearrange numbers (x 1,...,x n ) into ascending order ? What is your intuitive approach –Take.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 3 – Combinational Logic Design Part 1 –
Bitonic and Merging sorting networks Efficient Parallel Algorithms COMP308.
The Complexity of Algorithms and the Lower Bounds of Problems
CSCI-455/552 Introduction to High Performance Computing Lecture 22.
Sorting Networks Uri Zwick Tel Aviv University May 2015.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
1 Sorting Algorithms - Rearranging a list of numbers into increasing (strictly non-decreasing) order. ITCS4145/5145, Parallel Programming B. Wilkinson.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Lecture 12 – Design Procedure.
Lecture 12: Parallel Sorting Shantanu Dutt ECE Dept. UIC.
1 Parallel Sorting Algorithms. 2 Potential Speedup O(nlogn) optimal sequential sorting algorithm Best we can expect based upon a sequential sorting algorithm.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Parallel and Distributed Algorithms Eric Vidal Reference: R. Johnsonbaugh and M. Schaefer, Algorithms (International Edition) Pearson Education.
Lecture 6 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
Comparison Networks Sorting Sorting binary values
Design for Testability By Dr. Amin Danial Asham. References An Introduction to Logic Circuit Testing.
More on Efficiency Issues. Greatest Common Divisor given two numbers n and m find their greatest common divisor possible approach –find the common primes.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
Winter 2014Parallel Processing, Fundamental ConceptsSlide 1 2 A Taste of Parallel Algorithms Learn about the nature of parallel algorithms and complexity:
“Sorting networks and their applications”, AFIPS Proc. of 1968 Spring Joint Computer Conference, Vol. 32, pp
Comparison Networks Sorting Sorting binary values Sorting arbitrary numbers Implementing symmetric functions.
May 2015Sorting NetworksSlide 1 Sorting Networks A Lecture in CE Freshman Seminar Series: Ten Puzzling Problems in Computer Engineering.
Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
+ Even Odd Sort & Even-Odd Merge Sort Wolfie Herwald Pengfei Wang Rachel Celestine.
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
CSCI-455/552 Introduction to High Performance Computing Lecture 21.
Sorting: Parallel Compare Exchange Operation A parallel compare-exchange operation. Processes P i and P j send their elements to each other. Process P.
May 2007Sorting NetworksSlide 1 Sorting Networks A Lecture in CE Freshman Seminar Series: Ten Puzzling Problems in Computer Engineering.
Lecture 4 Sorting Networks
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.
Sorting Networks Characteristics The basic unit: a comparator
7.1 What is a Sorting Network?
Parallel Sorting Algorithms
Bitonic Sorting and Its Circuit Design
Comparison Networks Sorting Sorting binary values
Triangular Sorter Using Memristive Architecture.
8/04/2009 Many thanks to David Sun for some of the included slides!
Parallel Sorting Algorithms
Bitonic and Merging sorting networks
Lecture 6 Algorithm Analysis
Lecture 6 Algorithm Analysis
Sorting Networks A Lecture in CE Freshman Seminar Series:
Parallel Sorting Algorithms
Presentation transcript:

II Extreme Models Study the two extremes of parallel computation models: Abstract SM (PRAM); ignores implementation issues Concrete circuit model; incorporates hardware details Everything else falls between these two extremes Topics in This Part Chapter 5 PRAM and Basic Algorithms Chapter 6 More Shared-Memory Algorithms Chapter 7 Sorting and Selection Networks Chapter 8 Other Circuit-Level Examples

7 Sorting and Selection Networks Become familiar with the circuit model of parallel processing: Go from algorithm to architecture, not vice versa Use a familiar problem to study various trade-offs Topics in This Chapter 7.1 What is a Sorting Network? 7.2 Figures of Merit for Sorting Networks 7.3 Design of Sorting Networks 7.4 Batcher Sorting Networks 7.5 Other Classes of Sorting Networks 7.6 Selection Networks

7.1 What is a Sorting Network? Fig. 7.1 An n-input sorting network or an n-sorter. Fig. 7.2 Block diagram and four different schematic representations for a 2-sorter.

2-sorter Building Blocks for Sorting Networks Fig. 7.3 Parallel and bit-serial hardware realizations of a 2-sorter. Implementation with bit-parallel inputs Implementation with bit-serial inputs

Proving a Sorting Network Correct Fig. 7.4 Block diagram and schematic representation of a 4-sorter. Method 1: Exhaustive test – Try all n! possible input orders Method 2: Ad hoc proof – for the example above, note that y 0 is smallest, y 3 is largest, and the last comparator sorts the other two outputs Method 3: Use the zero-one principle – A comparison-based sorting algorithm is correct iff it correctly sorts all 0-1 sequences (2 n tests)

Validating a Sorting Network Stick diagram schematic b a d c Track 1 Track 2 Track 3 Track 4 min(a, b) max(a, b) min(c, d) max(c, d) min(a, b, c, d) max(a, b, c, d) ???? In the example above, it was fairly easy to show the validity of the sorting network. Generally, it is much more difficult How would one establish the validity of this 16-input sorting network? More importantly, how does one come up with this design in the first place?

The Zero-One Principle A sorting net built of comparators is valid if it correctly sorts all 0-1 sequences So, we can validate a sorting network using 2 n rather than n! input patterns n = 12: 2 n = 4096, n! = 479,001,600 (thousands vs. half a billion)

Elaboration on the Zero-One Principle Deriving a 0-1 sequence that is not correctly sorted, given an arbitrary sequence that is not correctly sorted. Let outputs y i and y i+1 be out of order, that is y i > y i+1 Replace inputs that are strictly less than y i with 0s and all others with 1s The resulting 0-1 sequence will not be correctly sorted either

7.2 Figures of Merit for Sorting Networks Delay: Number of levels Cost: Number of comparators Cost Delay In the following example, we have 5 comparators The following 4-sorter has 3 comparator levels on its critical path The cost-delay product for this example is 15 Fig. 7.4 Block diagram and schematic representation of a 4-sorter.

Cost as a Figure of Merit Fig. 7.5 Some low-cost sorting networks.

Delay as a Figure of Merit Fig. 7.6 Some fast sorting networks. These 3 comparators constitute one level

Cost-Delay Product as a Figure of Merit Fast 10-sorter from Fig. 7.6 Low-cost 10-sorter from Fig. 7.5 Cost Delay = 29 9 = 261Cost Delay = 31 7 = 217 The most cost-effective n-sorter may be neither the fastest design, nor the lowest-cost design Applications of sorting networks: Directing information packets to their destinations in a network router Connecting n processors to n memory modules in a parallel computer

7.3 Design of Sorting Networks Fig. 7.7 Brick-wall 6-sorter based on odd–even transposition. C(n)= n(n – 1)/2 D(n )= n Cost Delay = n 2 (n – 1)/2 = (n 3 )

Insertion Sort and Selection Sort Fig. 7.8 Sorting network based on insertion sort or selection sort. C(n) = n(n – 1)/2 D(n ) = 2n – 3 Cost Delay = (n 3 )

Theoretically Optimal Sorting Networks AKS sorting network (Ajtai, Komlos, Szemeredi: 1983) O(log n) depth O(n log n) size Unfortunately, AKS networks are not practical owing to large (4-digit) constant factors involved; improvements since 1983 not enough Note that even for these optimal networks, delay-cost product is suboptimal; but this is the best we can do Existing sorting networks have O(log 2 n) latency and O(n log 2 n) cost Given that log 2 n is only 20 for n = , the latter are more practical

7.4 Batcher Sorting Networks One type of sorting network proposed by Batcher is based on the idea of an (m, m' )-merger and uses a technique known as even–odd merge or odd-even merge. An (m, m' )-merger is a circuit that merges two sorted sequences of lengths m and m' into a single sorted sequence of length m + m'. Let the two sorted sequences be: The odd–even merge is done by merging the even- and odd-indexed elements of the two lists separately: If we now compare–exchange the pairs of elements the resulting sequence will be completely sorted. Note that v 0, which is known to be the smallest element overall, is excluded from the final compare–exchange operations

Batcher Sorting Networks

Fig. 7.9 Batchers even–odd merging network for inputs. (2, 3)-merger a0a0 a1a1 b0b0 b1b1 b2b2 (1, 2)-merger c0c0 d0d0 d1d1 (1, 1)(1, 2) (1, 1)

Batchers Even-Odd Merge Sorting Batchers (m, m) even-odd merger, for m a power of 2: C(m) = 2C(m/2) + m – 1 = (m – 1) + 2(m/2 – 1) + 4(m/4 – 1) +... = m log 2 m + 1 D(m) = D(m/2) + 1 = log 2 m + 1 Cost Delay = (m log 2 m)

Batchers Even-Odd Merge Sorting Batcher sorting networks based on the even-odd merge technique: C(n) = 2C(n/2) + (n/2)(log 2 (n/2)) + 1 n(log 2 n) 2 / 2 D(n) = D(n/2) + log 2 (n/2) + 1 = D(n/2) + log 2 n = log 2 n (log 2 n + 1)/2 Cost Delay = (n log 4 n) Fig The recursive structure of Batchers even– odd merge sorting network.

Example Batchers Even-Odd 8-Sorter Fig Batchers even-odd merge sorting network for eight inputs.

Bitonic-Sequence Sorter Fig Sorting a bitonic sequence on a linear array. Bitonic sequence: Rises, then falls Falls, then rises The previous sequence, right- rotated by 2

Batchers Bitonic Sorting Networks Fig The recursive structure of Batchers bitonic sorting network.

Batchers Bitonic Sorting Networks Batchers bitonic sorting network for 16 inputs. Bitonic Sequence of inputs

Batchers Bitonic Sorting Networks Batchers bitonic sorting network for eight inputs.

7.5 Other Classes of Sorting Networks Fig Periodic balanced sorting network for eight inputs. Desirable properties: a. Regular / modular (easier VLSI layout). b. Simpler circuits via reusing the blocks c. With an extra block tolerates some faults (missed exchanges) d. With 2 extra blocks provides tolerance to any single faults (a missed or incorrect exchange) e. Multiple passes through faulty network (graceful degradation) f. Single-block design becomes fault-tolerant by using an extra stage

Shearsort-Based Sorting Networks (1) Fig Design of an 8-sorter based on shearsort on 2 4 mesh.

Shearsort-Based Sorting Networks (2) Fig Design of an 8-sorter based on shearsort on 2 4 mesh. Some of the same advantages as periodic balanced sorting networks

7.6 Selection Networks Direct design may yield simpler/faster selection networks Deriving an (8, 3)-selector from Batchers even-odd merge 8-sorter.

Categories of Selection Networks Unfortunately we know even less about selection networks than we do about sorting networks. One can define three selection problems [Knut81]: I.Select the k smallest values; present in sorted order II.Select kth smallest value III. Select the k smallest values; present in any order Circuit and time complexity: (I) hardest, (III) easiest Classifiers: Selectors that separate the smaller half of values from the larger half Smaller 4 values Larger 4 values 8 inputs8-Classifier

Type-III Selection Networks Figure 7.17 A type III (8, 4)-selector.8-Classifier