Circular q-shift - Hypercube

Slides:



Advertisements
Similar presentations
Basic Communication Operations
Advertisements

Optimization Problems in Optical Networks. Wavelength Division Multiplexing (WDM) Directed: Symmetric: Undirected: Optic Fiber.
Optimization problems using excel solver
Dense Matrix Algorithms. Topic Overview Matrix-Vector Multiplication Matrix-Matrix Multiplication Solving a System of Linear Equations.
5-1 Chapter 5 Tree Searching Strategies. 5-2 Satisfiability problem Tree representation of 8 assignments. If there are n variables x 1, x 2, …,x n, then.
Precedence Constrained Scheduling Abhiram Ranade Dept. of CSE IIT Bombay.
1 Chapter 7 Behind the Supply Curve: 2 Recall: Optimal Consumer Behavior Consumer Behavior –(behind the demand curve): Consumption of G&S (Q) produces.
6 - 1 § 6 The Searching Strategies e.g. satisfiability problem x1x1 x2x2 x3x3 FFF FFT FTF FTT TFF TFT TTF TTT.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Communication [Lower] Bounds for Heterogeneous Architectures Julian Bui.
Design of parallel algorithms
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Topic Overview One-to-All Broadcast and All-to-One Reduction
Complexity 19-1 Parallel Computation Complexity Andrei Bulatov.
5-1 Chapter 5 Tree Searching Strategies. 5-2 Breadth-first search (BFS) 8-puzzle problem The breadth-first search uses a queue to hold all expanded nodes.
Flow Models and Optimal Routing. How can we evaluate the performance of a routing algorithm –quantify how well they do –use arrival rates at nodes and.
CS 420 Design of Algorithms Analytical Models of Parallel Algorithms.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.
June 10, 2003STOC 2003 Optimal Oblivious Routing in Polynomial Time Harald Räcke Paderborn Edith Cohen AT&T Labs-Research Yossi Azar Amos Fiat Haim Kaplan.
1 Introduction to Parallel Computing. 2 Multiprocessor Architectures Message-Passing Architectures –Separate address space for each processor. –Processors.
“elbowing out” Processors used Speedup Efficiency timeexecution Parallel Processors timeexecution Sequential Efficiency   
Topology aggregation and Multi-constraint QoS routing Presented by Almas Ansari.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Combining the strengths of UMIST and The Victoria University of Manchester COMP60611 Fundamentals of Parallel and Distributed Systems Lecture 7 Scalability.
The Lower Bounds of Problems
Approximation algorithms for TSP with neighborhoods in the plane R 郭秉鈞 R 林傳健.
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
The Fundamentals: Algorithms, the Integers & Matrices.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Parallel Algorithms. Parallel Models u Hypercube u Butterfly u Fully Connected u Other Networks u Shared Memory v.s. Distributed Memory u SIMD v.s. MIMD.
Minimal Spanning Tree Problems in What is a minimal spanning tree An MST is a tree (set of edges) that connects all nodes in a graph, using.
Parallel Programming with MPI and OpenMP
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
Warm–up #9. Solve by Factoring 2 #s that mult to 56 –15 & add to –8 –7 set each factor = 0 Common factor first Make = 0!!!
CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.
A Recap. Absolute Optimization 1.The domain is constrained to a closed and bounded region most of the time. 2.Closed and bounded regions are guaranteed.
Introduction: Efficient Algorithms for the Problem of Computing Fibonocci Numbers Prepared by John Reif, Ph.D. Analysis of Algorithms.
Basic Communication Operations Carl Tropper Department of Computer Science.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
Performance Evaluation Frédéric Desprez INRIA
Cplex: Advanced Example
SCALABILITY ANALYSIS.
What Exactly is Parallel Processing?
Lecture 2: Parallel computational models
Distributed Submodular Maximization in Massive Datasets
MPSearch Multi-Path Search for Tree-based Indexes to Exploit Internal Parallelism of Flash SSDs.
Lecture 16: Parallel Algorithms I
Chapter 3: Principles of Scalable Performance
MST in Log-Star Rounds of Congested Clique
Analytical Modeling Of Parallel Programs
CSE838 Lecture notes copy right: Moon Jung Chung
Tests of Divisibility 1 - All integers can be divided by 1
COMP60611 Fundamentals of Parallel and Distributed Systems
Analytical Modeling of Parallel Systems
Advanced LP models column generation.
ICS 252 Introduction to Computer Design
Analytical Modeling of Parallel Systems
Hash Functions for Network Applications (II)
To accompany the text “Introduction to Parallel Computing”,
Major Design Strategies
Parallel Programming in C with MPI and OpenMP
Multiplication and Division of Integers
Presentation transcript:

Circular q-shift - Hypercube Using E-cube routing q-shift in a hypercube with p nodes: longest path has log p - (q) links, where (q) is the highest integer j, such that q is divisable by 2j Pairwise communication using E-cube routing: no congestion!

Minimal cost-optimal execution time Isoefficiency function 2 (f(p)): A problem of size W is solved cost-optimally , W 2 (f(p)) p W grows at least like f(p) or: Cost-opimality for a problem of size W , p 2 O(f-1(W)) p grows maximally like f-1(W) W Parallel execution time of a cost-optimal parallel system is (W/p). Furthermore it holds 1/p 2  (1/f-1(W)).  Lower bound for parallel runtime for solving a problem of size W cost-optimally

Scaled speedup What is the speedup S(n,p) behavior for growing values for p? How to choose n? Memory-constrained scaled speedup: Memory grows propotionally with p ) determine n Memory requirement: m = fm(n) Memory per node: m0 fm(n) = p m0 ) n = fm-1 (p m0) Time-constrained speedup: Parallel runtime: TP = Tfix ) determine n

Scaled speedup Example 1: Matrix-vector product: TS=tc n2, TP 2 (n2/p), tc : execution time for a mult-add-operation memory requirement: m=fm(n)2(n2) Memory-constrained scaled speedup: available memory: m = m0p2(p), i.e. n2 = c £ p Time-constrained scaled speedup: TP 2 (n2/p), TP constant, i.e. n2 = c £ p, scaled speedup see above

Example 2: Matrix-Matrix multiplication Memory requirement: (n2) Memory-constrained scaled speedup: m2(p), m2(n2), i.e. n2=c £ p Time-constrained scaled speedup: TP2(n3/p) const., i.e. n3=c £ p