1 Tuesday, September 26, 2006 Wisdom consists of knowing when to avoid perfection. -Horowitz.

Slides:

Advertisements

Similar presentations

U of Houston – Clear Lake

Advertisements

Starting Parallel Algorithm Design David Monismith Based on notes from Introduction to Parallel Programming 2 nd Edition by Grama, Gupta, Karypis, and.

CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.

1 Meshes of Trees (MoT) and Applications in Integer Arithmetic Panagiotis Voulgaris Petros Mol Course: Parallel Algorithms.

OpenFOAM on a GPU-based Heterogeneous Cluster

1 Interconnection Networks Direct Indirect Shared Memory Distributed Memory (Message passing)

1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)

Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.

Interconnection Networks 1 Interconnection Networks (Chapter 6) References: [1,Wilkenson and Allyn, Ch. 1] [2, Akl, Chapter 2] [3, Quinn, Chapter 2-3]

1 Friday, September 29, 2006 If all you have is a hammer, then everything looks like a nail. -Anonymous.

Communication operations Efficient Parallel Algorithms COMP308.

Parallel Computing Platforms

1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)

Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.

Examples Broadcasting and Gossiping. Broadcast on ring (Safe and Forward)

1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,

1 Static Interconnection Networks CEG 4131 Computer Architecture III Miodrag Bolic.

ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.

Virtues of Good (Parallel) Software

Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.

1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:

Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.

Lecture 3 Innerconnection Networks for Parallel Computers

1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.

CS 584. Load Balancing Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

Lecture 7: Design of Parallel Programs Part II Lecturer: Simon Winberg.

InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.

INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors.

Complex Contagions Models in Opportunistic Mobile Social Networks Yunsheng Wang Dept. of Computer Science, Kettering University Jie Wu Dept. of Computer.

CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.

Parallel Programming Sathish S. Vadhiyar. 2 Motivations of Parallel Computing Parallel Machine: a computer system with more than one processor Motivations.

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.

Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/

CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University.

CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.

Super computers Parallel Processing

An Optimal Certificate Dispersal Algorithm for Mobile Ad Hoc Networks Nagoya Institute of Technology Hua Zheng Shingo Omura Jiro Uchida Koichi Wada.

Paper_topic: Parallel Matrix Multiplication using Vertical Data.

HYPERCUBE ALGORITHMS-1

Lecture 3: Designing Parallel Programs. Methodological Design Designing and Building Parallel Programs by Ian Foster www-unix.mcs.anl.gov/dbpp.

Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.

Load Balancing : The Goal Given a collection of tasks comprising a computation and a set of computers on which these tasks may be executed, find the mapping.

Parallel Processing & Distributed Systems Thoai Nam Chapter 3.

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University

Interconnection Networks Communications Among Processors.

COMP8330/7330/7336 Advanced Parallel and Distributed Computing Decomposition and Parallel Tasks (cont.) Dr. Xiao Qin Auburn University

Auburn University

Parallel Architecture

Distributed and Parallel Processing

Computer Network Topology

Parallel Tasks Decomposition

Lecture 23: Interconnection Networks

Interconnection topologies

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs (cont.) Dr. Xiao.

Network analysis.

Introduction to Scalable Interconnection Network Design

Interconnection Network Design Lecture 14

Communication operations

Static Interconnection Networks

High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub

Birds Eye View of Interconnection Networks

Static Interconnection Networks

Topology What are the consequences of using fewer than n2 connections?

Presentation transcript:

1 Tuesday, September 26, 2006 Wisdom consists of knowing when to avoid perfection. -Horowitz

2 §Quiz 2 §Assignment 1

3 Hypercube: log p dimensions with two nodes in each dimension 0-D hypercube

4 Hypercube: log p dimensions with two nodes in each dimension 1-D hypercube 0-D hypercube

5 Hypercube: log p dimensions with two nodes in each dimension 2-D hypercube 1-D hypercube

6 Hypercube: log p dimensions with two nodes in each dimension 3-D hypercube 2-D hypercube

7 Hypercube: log p dimensions with two nodes in each dimension 3-D hypercube 4-D hypercube Each node is connected to d=log p other nodes

8 Numbering Minimum distance between nodes

9 §Diameter: Maximum distance between any two processing nodes in the network l Ring l 2-D Mesh l Hypercube

10 §Diameter: Maximum distance between any two processing nodes in the network l Ring └p/2┘ l 2-D Mesh 2(√p -1) no-wraparound 2 └(√p /2) ┘ wraparound l Hypercube log p

11 §Connectivity: Multiplicity of paths l Minimum arcs that need to be removed to disconnect the network into two §Ring 2 l 2-D Mesh 2 no-wraparound 4 wraparound l Hypercube d=log p

12 §Bisection width: l Minimum arcs that need to be removed to partition the network into two equal halves §Ring 2 l 2-D Mesh √p no-wraparound 2√p wraparound l Hypercube p/2

13

14 Domain Decomposition §In this type of partitioning, the data associated with a problem is decomposed. Each parallel task then works on a portion of the data.

15 Domain Decomposition

16 Functional Decomposition

17 Signal processing

18 Climate modeling.

19 Examples of decomposition and task dependencies

20 Examples of decomposition and task dependencies.

21 Examples of decomposition and task dependencies.

22 Granularity §Fine vs. Coarse l Decomposition in large number of small tasks vs. small number of large tasks. §Maximum degree of concurrency §Average degree of concurrency §Concurrency vs. Granularity?

23 Granularity

24 Granularity §Critical Path length: l Longest directed path between any pair of start and finish nodes is critical path §Average degree of concurrency: l Ratio of total amount of work to the critical path length

25 Granularity Another example

26 Granularity Measure of the ratio of computation to communication. §Fine-grain Parallelism: l Facilitates load balancing l Implies high communication overhead and less opportunity for performance enhancement §Coarse-grain Parallelism: l High computation to communication ratio l Implies more opportunity for performance increase l Harder to load balance efficiently

27 Granularity §Example: l Domain decompositions for a problem involving a three-dimensional grid.