1 Chapter4 Partitioning and Divide-and-Conquer Strategies 划分和分治的并行技术 Lecture 5.

Slides:

Advertisements

Similar presentations

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Advertisements

Back to Sorting – More efficient sorting algorithms.

首页首页上一页下一页本讲内容投影法概述三视图形成及其投影规律平面立体三视图、尺寸标注本讲内容复习： P25~P31 、 P84~P85 作业： P7, P8, P14[2-32(2) A3 (1:1)]

FOUNDATION TO PARALLEL PROGRAMMING. CONTENT 并行程序设计简介并行程序设计模型并行程序设计范型 2.

Lecture 3: Parallel Algorithm Design

1 Parallel Parentheses Matching Plus Some Applications.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

CSCI-455/552 Introduction to High Performance Computing Lecture 11.

Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.

4.1 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M.

Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.

Partitioning and Divide-and-Conquer Strategies Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.

Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.

Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.

Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.

Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.

数据挖掘实验 1 Apriori 算法编程实现. 数据挖掘实验一 (20’) 实验目的：了解关联规则在数据挖掘中的应用，理解和掌握关联挖掘的经典算法 Apriori 算法的基本原理和执行过程并完成程序设计。实验内容：对给定数据集用 Apriori 算法进行挖掘，找出其中的频繁集并生成关联规则。

细分曲面傅孝明 SA 目录细分曲面的基本思想两个关键问题一些基本概念几种简单的细分曲面算法细分曲面方法分类.

Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.

线性代数习题课吉林大学术洪亮第一讲行列式前面我们已经学习了关于行列式的概念和一些基本理论，其主要内容可概括为：

吉林大学远程教育课件主讲人 : 杨凤杰学时： 64 ( 第二十五讲 ) 离散数学. 定理群定义中的条件（ 1 ）和（ 2 ）可以减弱如下：（ 1 ） ’ G 中有一个元素左壹适合 1 · a=a; （ 2 ） ’ 对于任意 a ，有一个元素左逆 a -1 适合 a -1 ·

High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.

and Divide-and-Conquer Strategies

实验三：用双线性变换法设计 IIR 数字滤波器一、实验目的 1 熟悉用双线性变换法设计 IIR 数字滤波器的原理与方法。 2 掌握数字滤波器的计算机仿真方法。 3 通过观察对实际心电图信号的滤波作用，获得数字滤波的感性知识。

操作系统原理课程设计指南姜海燕设计考核幻灯制作  1.1 封皮：系统名称，研制人员  1.2 目的及意义  1.3 功能设计：功能框图、用例图  1.4 结构设计：系统结构  1.5 核心技术及技术路线：画图  1.6 进度安排  1.7 人员安排  1.8.

Topic Overview One-to-All Broadcast and All-to-One Reduction

第 3 章控制流分析内容概述 – 定义一个函数式编程语言，变量可以指称函数 – 以 dynamic dispatch problem 为例（作为参数的函数被调用时，究竟执行的是哪个函数） – 规范该控制流分析问题，定义什么是可接受的控制流分析 – 定义可接受分析在语义模型上的可靠性 – 讨论分析算法.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

 符号表  标识符的作用：声明部分：定义了各种对象及对应的属性和使用规则。程序体：对所定义的对象进行各种操作。 $ididname IdnameAttributeIR  必要性 Token ：新表－符号表（种类、类型等信息）：

Department of Mathematics 第二章解析函数第一节解析函数的概念与 C-R 条件第二节初等解析函数第三节初等多值函数.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

《 UML 分析与设计》交互概述图授课人：唐一韬. 知识图谱知识图谱知识图谱知识图谱.

Basic Communication Operations Based on Chapter 4 of Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar These.

Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “ Introduction to the Design & Analysis of Algorithms, ” 2 nd ed., Ch. 1 Chapter.

Fundamentals of Algorithms MCS - 2 Lecture # 7

Chapter 8 Algorithms. Understand the concept of an algorithm. Define and use the three constructs for developing algorithms: sequence, decision, and repetition.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

1. 2 Sorting Algorithms - rearranging a list of numbers into increasing (strictly nondecreasing) order.

CSCI-455/552 Introduction to High Performance Computing Lecture 11.5.

第四章不定积分. 二、第二类换元积分法一、第一类换元积分法 4.2 换元积分法第二类换元法第一类换元法基本思路设可导, 则有.

近似搜索邹权博士、助理教授

个体精子卵细胞父亲受精卵母亲人类生活史问题：人类产生配子（精、卵细胞）是不是有丝分裂？

CSCI-455/552 Introduction to High Performance Computing Lecture 23.

Data Structures - CSCI 102 Selection Sort Keep the list separated into sorted and unsorted sections Start by finding the minimum & put it at the front.

Computer Sciences Department1.  Property 1: each node can have up to two successor nodes (children)  The predecessor node of a node is called its.

1 Ch.19 Divide and Conquer. 2 BIRD’S-EYE VIEW Divide and conquer algorithms Decompose a problem instance into several smaller independent instances May.

Data Structures and Algorithms in Parallel Computing Lecture 8.

Hadoop Daniel Hu. Scale-up vs Scale-out 并行计算分解任务。关键是消除任务间的依赖。整合结果。 ◦ 每个任务产生一个结果，然后要把这些结果组合起来得出最终结果。 ◦ 结果相互独立，但每个任务产生一个结果。 ◦ 有的任务不产生结果。 ◦ 只有一个任务产生最终的结果。

CSCI-455/552 Introduction to High Performance Computing Lecture 21.

Data Structures and Algorithms Instructor: Tesfaye Guta [M.Sc.] Haramaya University.

Lecture 6 Sorting II Divide-and-Conquer Algorithms.

1 CS4402 – Parallel Computing Lecture 7 - Simple Parallel Sorting. - Parallel Merge Sort.

Divide and Conquer Algorithms Sathish Vadhiyar. Introduction  One of the important parallel algorithm models  The idea is to decompose the problem into.

Partitioning & Divide and Conquer Strategies Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall

SME.USTB Human Factors 人机工程学 By Wei Dong Department of Industry Design, SME, USTB.

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! 1 ITCS 4/5145 Parallel Computing,

Divide-and-Conquer Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, June 25, 2012 slides4.ppt. 4.1.

Algorithm Design Techniques, Greedy Method – Knapsack Problem, Job Sequencing, Divide and Conquer Method – Quick Sort, Finding Maximum and Minimum, Dynamic.

Numerical Algorithms Chapter 11.

Lecture 3: Parallel Algorithm Design

Partitioning and Divide-and-Conquer Strategies

Parallel Sorting Algorithms

and Divide-and-Conquer Strategies

Parallel Sorting Algorithms

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,

All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,

Presentation transcript:

1 Chapter4 Partitioning and Divide-and-Conquer Strategies 划分和分治的并行技术 Lecture 5

2 Partitioning 划分就是简单地将问题分成几部分，然后计算每部分，最后组合各部分得结果。

3 Divide and Conquer 分治 Characterized by dividing problem into sub-problems of same form as larger problem. Further divisions into still smaller sub-problems, usually done by recursion. 通过递归，将问题分成与该问题形式一样的子问题，然后继续将子问题继续分成更小规模的子问题。递归的分治可以并行，因为每个进程可以对子问题进行处理。

4 Partitioning/Divide and Conquer Examples Many possibilities. Operations on sequences of number such as simply adding them together Several sorting algorithms can often be partitioned or constructed in a recursive fashion Numerical integration N-body problem

5 Partitioning a sequence of numbers into parts and adding the parts

6 方法 Send+recv (for loop send, for loop recv) Bcast+recv (master: bcast all to slaves + for loop recv, slaves ： receive all from master+ send result to master) Scatter+reduce

7 Tree construction 二叉树

8 Dividing a list into parts send recv

9 Partial summation

10 进程 P 0 /*division phase*/ Divide(s1,s1,s2);/*divide s1 into two, s1 and s2*/ Send(s2,P4); Divide(s1,s1,s2); Send(s2,P2); Divide(s1,s1,s2); Send(s2,P1); /*combining phase*/ part_sum=*s1; Recv(&part_sum1,P1); part_sum=part_sum+part_sum1; Recv(&part_sum1,P2); part_sum=part_sum+part_sum1; Recv(&part_sum1,P4); part_sum=part_sum+part_sum1;

11 进程 P 4 /*division phase*/ Recv(s1,p0); Divide(s1,s1,s2); Send(s2,P6); Divide(s1,s1,s2); Send(s2,P5); /*combining phase*/ part_sum=*s1; Recv(&part_sum1,P5); part_sum=part_sum+part_sum1; Recv(&part_sum1,P6); part_sum=part_sum+part_sum1; Send(&part_sum1,P0);

12 M 路分治

13 Many Sorting algorithms can be parallelized by partitioning and by divide and conquer. Example Bucket sort （桶排序）

14 Bucket sort （桶排序） One “bucket” assigned to hold numbers that fall within each region. Numbers in each bucket sorted using a sequential sorting algorithm. Sequential sorting time complexity: O(nlog(n/m). Works well if the original numbers uniformly distributed across a known interval, say 0 to a - 1.

15 Parallel version of bucket sort Simple approach Assign one processor for each bucket.

16 big_array = malloc(n*sizeof(long int)); Make_numbers(big_array, n, n/p, p); n_bar = n/p; local_array = malloc(n_bar*sizeof(long int)); MPI_Scatter(big_array, n_bar, MPI_LONG, local_array, n_bar, MPI_LONG, 0, MPI_COMM_WORLD); Sequential_sort(local_array, n_bar); MPI_Gather(local_array, n_bar, MPI_LONG, big_array, n_bar, MPI_LONG, 0, MPI_COMM_WORLD);

17

18

19 Further Parallelization Partition sequence into m regions. Each processor assigned one region. (Hence number of processors, p, equals m.) Each processor maintains one “big” bucket for its region. Each processor maintains m “small” buckets, one for each region. Each processor separates numbers in its region into its own small buckets. All small buckets emptied into p big buckets Each big bucket sorted by its processor

20 Another parallel version of bucket sort

21 An MPI function that can be used to advantage here. Sends one data element from each process to every other process. Corresponds to multiple scatter operations, but implemented together. Should be called all-to-all scatter? all-to-all broadcast

22 From:

Applying “all-to-all” broadcast to emptying small buckets into big buckets Suppose 4 regions and 4 processors Small bucket Big bucket

24 “all-to-all” routine actually transfers rows of an array to columns: Transposes a matrix.

25 Numerical Integration Computing the “area under the curve.” Parallelized by divided area into smaller areas (partitioning). Can also apply divide and conquer -- repeatedly divide the areas recursively.

26 Numerical integration using rectangles Each region calculated using an approximation given by rectangles: Aligning the rectangles:

27 Numerical integration using trapezoidal method May not be better!

28 Applying Divide and Conquer

29 Adaptive Quadrature Solution adapts to shape of curve. Use three areas, A, B, and C. Computation terminated when largest of A and B sufficiently close to sum of remain two areas.

30 Adaptive quadrature with false termination. Some care might be needed in choosing when to terminate. Might cause us to terminate early, as two large regions are the same (i.e., C = 0).

31 Barnes-Hut Algorithm Start with whole space in which one cube contains the bodies (or particles). First, this cube is divided into eight subcubes. If a subcube contains no particles, subcube deleted from further consideration. If a subcube contains one body, subcube retained. If a subcube contains more than one body, it is recursively divided until every subcube contains one body.

32 Creates an octtree - a tree with up to eight edges from each node. The leaves represent cells each containing one body. After the tree has been constructed, the total mass and center of mass of the subcube is stored at each node.

33 Constructing tree requires a time of O(nlogn), and so does computing all the forces, so that overall time complexity of method is O(nlogn).

34 Recursive division of 2-dimensional space

35 (For 2-dimensional area) First, a vertical line found that divides area into two areas each with equal number of bodies. For each area, a horizontal line found that divides it into two areas each with equal number of bodies. Repeated as required. Orthogonal Recursive Bisection