Jump to first page DISTRIBUTED GENERATION OF PAIRWISE COMBINATIONS PARALLEL GRAPH PARTITIONING ON A HYPERCUBE F. Ercal, P. Sadayappan, and J. Ramanujan.

Slides:



Advertisements
Similar presentations
Multilevel Hypergraph Partitioning Daniel Salce Matthew Zobel.
Advertisements

1 Parallel Parentheses Matching Plus Some Applications.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Efficient Realization of Hypercube Algorithms on Optical Arrays* Hong Shen Department of Computing & Maths Manchester Metropolitan University, UK ( Joint.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
1 Potential for Parallel Computation Module 2. 2 Potential for Parallelism Much trivially parallel computing  Independent data, accounts  Nothing to.
1 Network Coding: Theory and Practice Apirath Limmanee Jacobs University.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
1 Friday, September 29, 2006 If all you have is a hammer, then everything looks like a nail. -Anonymous.
VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.
1 Friday, November 17, 2006 “In the confrontation between the stream and the rock, the stream always wins, not through strength but by perseverance.” -H.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Chapter 2 – Netlist and System Partitioning
A scalable multilevel algorithm for community structure detection
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
1 Tuesday, September 26, 2006 Wisdom consists of knowing when to avoid perfection. -Horowitz.
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
Study Group Randomized Algorithms Jun 7, 2003 Jun 14, 2003.
Additive Spanners for k-Chordal Graphs V. D. Chepoi, F.F. Dragan, C. Yan University Aix-Marseille II, France Kent State University, Ohio, USA.
On the Task Assignment Problem : Two New Efficient Heuristic Algorithms.
Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.
1 Circuit Partitioning Presented by Jill. 2 Outline Introduction Cut-size driven circuit partitioning Multi-objective circuit partitioning Our approach.
15-853Page :Algorithms in the Real World Separators – Introduction – Applications.
ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.
1 CSC 6001 VLSI CAD (Physical Design) January
Partitioning Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem –Understand.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Graph partition in PCB and VLSI physical synthesis Lin Zhong ELEC424, Fall 2010.
Graph Partitioning Donald Nguyen October 24, 2011.
CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Graph Partitioning Problem Kernighan and Lin Algorithm
Network Aware Resource Allocation in Distributed Clouds.
Graph Partitioning and Clustering E={w ij } Set of weighted edges indicating pair-wise similarity between points Similarity Graph.
CSE 494: Electronic Design Automation Lecture 4 Partitioning.
1 Parallel Sorting Algorithms. 2 Potential Speedup O(nlogn) optimal sequential sorting algorithm Best we can expect based upon a sequential sorting algorithm.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 01, 2005 Session 14.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
10/25/ VLSI Physical Design Automation Prof. David Pan Office: ACES Lecture 3. Circuit Partitioning.
Restricted Track Assignment with Applications 報告人:林添進.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Static Process Scheduling Section 5.2 CSc 8320 Alex De Ruiter
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan.
Circuit Partitioning Divides circuit into smaller partitions that can be efficiently handled Goal is generally to minimize communication between balanced.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Static Process Scheduling
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
Super computers Parallel Processing
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: Distributed System Co- Synthesis Part of HW/SW Codesign of Embedded Systems Course.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Fast Parallel Algorithms for Edge-Switching to Achieve a Target Visit Rate in Heterogeneous Graphs Maleq Khan September 9, 2014 Joint work with: Hasanuzzaman.
A stochastic scheduling algorithm for precedence constrained tasks on Grid Future Generation Computer Systems (2011) Xiaoyong Tang, Kenli Li, Guiping Liao,
Introduction to Multiple-multicast Routing Chu-Fu Wang.
Genetic algorithms for task scheduling problem J. Parallel Distrib. Comput. (2010) Fatma A. Omara, Mona M. Arafa 2016/3/111 Shang-Chi Wu.
CS 312: Algorithm Design & Analysis Lecture #29: Network Flow and Cuts This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
Pradeep Konduri Static Process Scheduling:  Proceedance process model  Communication system model  Application  Dicussion.
Mincut Placement (1/12)Practical Problems in VLSI Physical Design Mincut Placement Perform quadrature mincut onto 4 × 4 grid  Start with vertical cut.
Minimum Spanning Tree 8/7/2018 4:26 AM
Nithin Michael, Yao Wang, G. Edward Suh and Ao Tang Cornell University
Plan Introduction to multilevel heuristics Rich partitioning problems
Parallel ClockDesigner
A Fundamental Bi-partition Algorithm of Kernighan-Lin
Major Design Strategies
Major Design Strategies
Presentation transcript:

Jump to first page DISTRIBUTED GENERATION OF PAIRWISE COMBINATIONS PARALLEL GRAPH PARTITIONING ON A HYPERCUBE F. Ercal, P. Sadayappan, and J. Ramanujan University of Missouri-Rolla and The Ohio State University

Jump to first page PROBLEM DEFINITION n Given a graph G(V,E), |V|=N |E|=e n Obtain a K partitions from G with the following constraints: u Balanced: Each partition has equal size u Minimum cut: number of edges across partition is minimized n arises in: TasK Allocation, VLSI layout, File Placement etc. n Intractable, no polynomial time algorithm is Known n Heuristics needed n Kernighan-Lin Mincut Heuristic (1970) u Time complexity: O(N 2 logN) n Extension by Fiduccia and Mattheyses (1982) u Used Buckets and moves. Linear time algorithm: O(e)

Jump to first page MINCUT ALGORITHM v1v1 v2v2 v3v3 v4v4 v6v6 v7v7 v5v5 v8v P1P1 P2P2 CUT=5 v1v1 v2v2 v3v3 v4v4 v6v6 v7v7 v5v5 v8v IF V 2 MOVES GAIN=2 and TOT_GAIN=2 IF V 5 MOVES GAIN=1 and TOT_GAIN=3 CUT=3

Jump to first page MINCUT ALGORITHM (Contd..) v1v1 v2v2 v3v3 v4v4 v6v6 v7v7 v5v5 v8v IF V1 MOVES GAIN=0 and TOT_GAIN=3 CUT=2

Jump to first page RECURSIVE BISECTION

Jump to first page TIME COMPLEXITY Sequential Time Complexity for Recursive Bisection N + 2*(N/2) + 4*(N/4) + …….2 p *(N/2 p ) ===> O(N*logK) Parallel Time Complexity for Recursive Bisection N + N/2 + N/4 + ……. N/2 p ===> O(N) COMMENT: speedup is very limited to increase speedup, bisection algorithm must be parallelized

Jump to first page PAIRWISE MINCUT P1P2P3 P4P5 P6 P7 P8 PAIRS TO BE CONSIDERED FOR MINCUT (1,2) (1,3) (1,4) (1,5) (1,6) (1,7) (1,8) (2,3) (2,4) ………….. (2,8) ……. (7,8)

Jump to first page TIME COMPLEXITY Sequential Time Complexity for Pairwise Mincut Parallel Time Complexity for Recursive Bisection CONCLUSIONS Sequential Recursive Bisection (RB) has much lower time complexity than Pairwise Mincut (PM) but superior parallelizability of PM renders its parallel time complexity comparable to that of parallel RB (100% processor utilization)

Jump to first page 1) RECURSIVE BISECTION Perform repeated bisection, each time doubling the number of partitions, until K partitions are obtained Time Complexity N+ 2*(N/2) + 4*(N/4)+….+2P*(N/2P) ==> O(N*logK) 2) PAIRWISE MINCUT Initially obtain K partitions. Try to reduce the cut-size between each pair of partitions. K(K-1)/2 pairs (each of size 2N/K) must be considered Time Complexity 3) Any combination of RECURSIVE BISECTION+PAIRWISE MINCUT

Jump to first page DISTRIBUTED GENERATION OF PAIRWISE COMBINATIONS ON A HYPERCUBE Problem Given 2P disjoint items, P*(2P-1) distinct pairs can be formed. How would you efficiently generate these pairs on the processors of a hypercube ? Similar to the problem of distributed scheduling of a round-robin tournament between 2C players using C courts, where the paths between courts form a hypercube topology maximum utilization of courts (processor utilization) + minimum walking between courts (min. comm. overhead)

Jump to first page A 00 A 01 A 10 A 11 B 00 B 01 B 10 B 11 P 00 P 01 P 10 P 11 C1C2 A 00 A 01 A 10 A 11 P 00 P 01 C1 C2 B 00 B 01 B 10 B 11 C1C2 P 10 P 11 P 00 A 00 A 01 C1C2 A 10 A 11 P 01 C1C2 B 00 B 01 P 10 C1C2 B 10 B 11 P 11 C1C2 Distributed PC Algorithm on a 2d Hypercube (4 Processors) d=0 d=1 d=2

Jump to first page A 1 A 2 A 3 : A K/2 A K/2+1 : A K B 1 B 2 B 3 : B K/2 B K/2+1 : B K A 1 A 2 : A K/4 A K/4+1 : A K/2 A K/2+1 : A 3K/4 A 3K/4+1 : A K B 1 B 2 : B K/4 B K/4+1 : B K/2 B K/2+1 : B 3K/4 B 3K/4+1 : B K RING-FRAGMENTATION CYCLIC-TOUR RING-FRAGMENTATION 1 2

Jump to first page Ring Communication in different phases of Distributed PC algorithm (a) d=0 1 ring of size (b) d=1 2 rings of size 8

Jump to first page Ring Communication in different phases of Distributed PC algorithm (Contd..) (c) d=2 4 rings of size (d) d=3 8 rings of size 2

Jump to first page Ring Communication in different phases of Distributed PC algorithm (Contd..) (e) d=4 16 rings of size 1