1 Chapter4 Partitioning and Divide-and-Conquer Strategies 划分和分治的并行技术 Lecture 5.

Slides:



Advertisements
Similar presentations
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Advertisements

Back to Sorting – More efficient sorting algorithms.
首 页 首 页 上一页 下一页 本讲内容 投影法概述三视图形成及其投影规律平面立体三视图、尺寸标注 本讲内容 复习: P25~P31 、 P84~P85 作业: P7, P8, P14[2-32(2) A3 (1:1)]
FOUNDATION TO PARALLEL PROGRAMMING. CONTENT 并行程序设计简介 并行程序设计模型 并行程序设计范型 2.
Lecture 3: Parallel Algorithm Design
1 Parallel Parentheses Matching Plus Some Applications.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 11.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.
4.1 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M.
Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.
Partitioning and Divide-and-Conquer Strategies Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.
数据挖掘实验 1 Apriori 算法编程实现. 数据挖掘实验一 (20’) 实验目的:了解关联规则在数据挖掘中的 应用,理解和掌握关联挖掘的经典算法 Apriori 算法的基本原理和执行过程并完成程 序设计。 实验内容:对给定数据集用 Apriori 算法进行 挖掘,找出其中的频繁集并生成关联规则。
细分曲面 傅孝明 SA 目录 细分曲面的基本思想 两个关键问题 一些基本概念 几种简单的细分曲面算法 细分曲面方法分类.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
线性代数习题课 吉林大学 术洪亮 第一讲 行 列 式 前面我们已经学习了关 于行列式的概念和一些基本 理论,其主要内容可概括为:
吉林大学远程教育课件 主讲人 : 杨凤杰学 时: 64 ( 第二十五讲 ) 离散数学. 定理 群定义中的条件 ( 1 )和( 2 )可以减弱如下: ( 1 ) ’ G 中有一个元素左壹适合 1 · a=a; ( 2 ) ’ 对于任意 a ,有一个元素左逆 a -1 适 合 a -1 ·
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
and Divide-and-Conquer Strategies
实验三: 用双线性变换法设计 IIR 数字滤波器 一、实验目的 1 熟悉用双线性变换法设计 IIR 数字滤波器的原理与方法。 2 掌握数字滤波器的计算机仿真方法。 3 通过观察对实际心电图信号的滤波作用, 获得数字滤波的感性知 识。
操作系统原理课程设计指南 姜海燕 设计考核幻灯制作  1.1 封皮:系统名称,研制人员  1.2 目的及意义  1.3 功能设计:功能框图、用例图  1.4 结构设计:系统结构  1.5 核心技术及技术路线:画图  1.6 进度安排  1.7 人员安排  1.8.
Topic Overview One-to-All Broadcast and All-to-One Reduction
第 3 章 控制流分析 内容概述 – 定义一个函数式编程语言,变量可以指称函数 – 以 dynamic dispatch problem 为例(作为参数的 函数被调用时,究竟执行的是哪个函数) – 规范该控制流分析问题,定义什么是可接受的控 制流分析 – 定义可接受分析在语义模型上的可靠性 – 讨论分析算法.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
 符号表  标识符的作用: 声明部分:定义了各种对象及对应的属性和 使用规则。 程序体:对所定义的对象进行各种操作。 $ididname IdnameAttributeIR  必要性 Token : 新表-符号表(种类、类型等信息):
Department of Mathematics 第二章 解析函数 第一节 解析函数的概念 与 C-R 条件 第二节 初等解析函数 第三节 初等多值函数.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
《 UML 分析与设计》 交互概述图 授课人:唐一韬. 知 识 图 谱知 识 图 谱知 识 图 谱知 识 图 谱.
Basic Communication Operations Based on Chapter 4 of Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar These.
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “ Introduction to the Design & Analysis of Algorithms, ” 2 nd ed., Ch. 1 Chapter.
Fundamentals of Algorithms MCS - 2 Lecture # 7
Chapter 8 Algorithms. Understand the concept of an algorithm. Define and use the three constructs for developing algorithms: sequence, decision, and repetition.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
1. 2 Sorting Algorithms - rearranging a list of numbers into increasing (strictly nondecreasing) order.
CSCI-455/552 Introduction to High Performance Computing Lecture 11.5.
第四章 不定积分. 二、 第二类换元积分法 一、 第一类换元积分法 4.2 换元积分法 第二类换元法 第一类换元法 基本思路 设 可导, 则有.
近似搜索 邹权 博士、助理教授
个体 精子 卵细胞 父亲 受精卵 母亲 人类生活史 问题:人类产生配子(精、卵 细胞)是不是有丝分裂?
CSCI-455/552 Introduction to High Performance Computing Lecture 23.
Data Structures - CSCI 102 Selection Sort Keep the list separated into sorted and unsorted sections Start by finding the minimum & put it at the front.
Computer Sciences Department1.  Property 1: each node can have up to two successor nodes (children)  The predecessor node of a node is called its.
1 Ch.19 Divide and Conquer. 2 BIRD’S-EYE VIEW Divide and conquer algorithms Decompose a problem instance into several smaller independent instances May.
Data Structures and Algorithms in Parallel Computing Lecture 8.
Hadoop Daniel Hu. Scale-up vs Scale-out 并行计算 分解任务。关键是消除任务间的依赖。 整合结果。 ◦ 每个任务产生一个结果,然后要把这些结果组合起来得 出最终结果。 ◦ 结果相互独立,但每个任务产生一个结果。 ◦ 有的任务不产生结果。 ◦ 只有一个任务产生最终的结果。
CSCI-455/552 Introduction to High Performance Computing Lecture 21.
Data Structures and Algorithms Instructor: Tesfaye Guta [M.Sc.] Haramaya University.
Lecture 6 Sorting II Divide-and-Conquer Algorithms.
1 CS4402 – Parallel Computing Lecture 7 - Simple Parallel Sorting. - Parallel Merge Sort.
Divide and Conquer Algorithms Sathish Vadhiyar. Introduction  One of the important parallel algorithm models  The idea is to decompose the problem into.
Partitioning & Divide and Conquer Strategies Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
SME.USTB Human Factors 人机工程学 By Wei Dong Department of Industry Design, SME, USTB.
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! 1 ITCS 4/5145 Parallel Computing,
Divide-and-Conquer Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, June 25, 2012 slides4.ppt. 4.1.
Algorithm Design Techniques, Greedy Method – Knapsack Problem, Job Sequencing, Divide and Conquer Method – Quick Sort, Finding Maximum and Minimum, Dynamic.
Numerical Algorithms Chapter 11.
Lecture 3: Parallel Algorithm Design
Partitioning and Divide-and-Conquer Strategies
Parallel Sorting Algorithms
and Divide-and-Conquer Strategies
Parallel Sorting Algorithms
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
Presentation transcript:

1 Chapter4 Partitioning and Divide-and-Conquer Strategies 划分和分治的并行技术 Lecture 5

2 Partitioning 划分就是简单地将问题分成几部分,然 后计算每部分,最后组合各部分得结果。

3 Divide and Conquer 分治 Characterized by dividing problem into sub-problems of same form as larger problem. Further divisions into still smaller sub-problems, usually done by recursion. 通过递归,将问题分成与该问题形式一样的子问题,然后 继续将子问题继续分成更小规模的子问题。 递归的分治可以并行,因为每个进程可以对子问题进行处理。

4 Partitioning/Divide and Conquer Examples Many possibilities. Operations on sequences of number such as simply adding them together Several sorting algorithms can often be partitioned or constructed in a recursive fashion Numerical integration N-body problem

5 Partitioning a sequence of numbers into parts and adding the parts

6 方法 Send+recv (for loop send, for loop recv) Bcast+recv (master: bcast all to slaves + for loop recv, slaves : receive all from master+ send result to master) Scatter+reduce

7 Tree construction 二叉树

8 Dividing a list into parts send recv

9 Partial summation

10 进程 P 0 /*division phase*/ Divide(s1,s1,s2);/*divide s1 into two, s1 and s2*/ Send(s2,P4); Divide(s1,s1,s2); Send(s2,P2); Divide(s1,s1,s2); Send(s2,P1); /*combining phase*/ part_sum=*s1; Recv(&part_sum1,P1); part_sum=part_sum+part_sum1; Recv(&part_sum1,P2); part_sum=part_sum+part_sum1; Recv(&part_sum1,P4); part_sum=part_sum+part_sum1;

11 进程 P 4 /*division phase*/ Recv(s1,p0); Divide(s1,s1,s2); Send(s2,P6); Divide(s1,s1,s2); Send(s2,P5); /*combining phase*/ part_sum=*s1; Recv(&part_sum1,P5); part_sum=part_sum+part_sum1; Recv(&part_sum1,P6); part_sum=part_sum+part_sum1; Send(&part_sum1,P0);

12 M 路分治

13 Many Sorting algorithms can be parallelized by partitioning and by divide and conquer. Example Bucket sort (桶排序)

14 Bucket sort (桶排序) One “bucket” assigned to hold numbers that fall within each region. Numbers in each bucket sorted using a sequential sorting algorithm. Sequential sorting time complexity: O(nlog(n/m). Works well if the original numbers uniformly distributed across a known interval, say 0 to a - 1.

15 Parallel version of bucket sort Simple approach Assign one processor for each bucket.

16 big_array = malloc(n*sizeof(long int)); Make_numbers(big_array, n, n/p, p); n_bar = n/p; local_array = malloc(n_bar*sizeof(long int)); MPI_Scatter(big_array, n_bar, MPI_LONG, local_array, n_bar, MPI_LONG, 0, MPI_COMM_WORLD); Sequential_sort(local_array, n_bar); MPI_Gather(local_array, n_bar, MPI_LONG, big_array, n_bar, MPI_LONG, 0, MPI_COMM_WORLD);

17

18

19 Further Parallelization Partition sequence into m regions. Each processor assigned one region. (Hence number of processors, p, equals m.) Each processor maintains one “big” bucket for its region. Each processor maintains m “small” buckets, one for each region. Each processor separates numbers in its region into its own small buckets. All small buckets emptied into p big buckets Each big bucket sorted by its processor

20 Another parallel version of bucket sort

21 An MPI function that can be used to advantage here. Sends one data element from each process to every other process. Corresponds to multiple scatter operations, but implemented together. Should be called all-to-all scatter? all-to-all broadcast

22 From:

Applying “all-to-all” broadcast to emptying small buckets into big buckets Suppose 4 regions and 4 processors Small bucket Big bucket

24 “all-to-all” routine actually transfers rows of an array to columns: Transposes a matrix.

25 Numerical Integration Computing the “area under the curve.” Parallelized by divided area into smaller areas (partitioning). Can also apply divide and conquer -- repeatedly divide the areas recursively.

26 Numerical integration using rectangles Each region calculated using an approximation given by rectangles: Aligning the rectangles:

27 Numerical integration using trapezoidal method May not be better!

28 Applying Divide and Conquer

29 Adaptive Quadrature Solution adapts to shape of curve. Use three areas, A, B, and C. Computation terminated when largest of A and B sufficiently close to sum of remain two areas.

30 Adaptive quadrature with false termination. Some care might be needed in choosing when to terminate. Might cause us to terminate early, as two large regions are the same (i.e., C = 0).

31 Barnes-Hut Algorithm Start with whole space in which one cube contains the bodies (or particles). First, this cube is divided into eight subcubes. If a subcube contains no particles, subcube deleted from further consideration. If a subcube contains one body, subcube retained. If a subcube contains more than one body, it is recursively divided until every subcube contains one body.

32 Creates an octtree - a tree with up to eight edges from each node. The leaves represent cells each containing one body. After the tree has been constructed, the total mass and center of mass of the subcube is stored at each node.

33 Constructing tree requires a time of O(nlogn), and so does computing all the forces, so that overall time complexity of method is O(nlogn).

34 Recursive division of 2-dimensional space

35 (For 2-dimensional area) First, a vertical line found that divides area into two areas each with equal number of bodies. For each area, a horizontal line found that divides it into two areas each with equal number of bodies. Repeated as required. Orthogonal Recursive Bisection