Introduction 因為需要 large amount of computation. Why parallel?

Slides:

Advertisements

Similar presentations

PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.

Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.

Parallel vs Sequential Algorithms

PRAM (Parallel Random Access Machine)

Efficient Parallel Algorithms COMP308

TECH Computer Science Parallel Algorithms  several operations can be executed at the same time  many problems are most naturally modeled with parallelism.

: A-Sequence 星級 : ★★☆☆☆ 題組： Online-judge.uva.es PROBLEM SET Volume CIX 題號： Problem D : A-Sequence 解題者：薛祖淵解題日期： 2006 年 2 月 21 日題意：一開始先輸入一個.

Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.

1 Q10276: Hanoi Tower Troubles Again! 星級 : ★★★ 題組： Online-judge.uva.es PROBLEM SET Volume CII 題號： Q10276: Hanoi Tower Troubles Again! 解題者：薛祖淵解題日期： 2006.

PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.

Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.

矩陣乘法實作矩陣乘法利用 threads 來加速運算速度 – Matrix1 row x Matrix2 column = Ans (x,y) Matrix 1Matrix 2Answer.

Slide 1 Parallel Computation Models Lecture 3 Lecture 4.

McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 參實驗法.

: Factstone Benchmark ★★☆☆☆ 題組： Problem Set Archive with Online Judge 題號： : Factstone Benchmark 解題者：鐘緯駿解題日期： 2006 年 06 月 06 日題意：假設 1960.

Chapter 3 Growth of Functions Asymptotic notation Θ-notation: f(n) = Θ(g(n)) ， g(n) is an asymptotically tight bound for f(n) 。 Θ(g(n)) = {f(n)|

1.1 電腦的特性電腦能夠快速處理資料：電腦可在一秒內處理數百萬個基本運算，這是人腦所不能做到的。原本人腦一天的工作量，交給電腦可能僅需幾分鐘的時間就處理完畢。電腦能夠快速處理資料：電腦可在一秒內處理數百萬個基本運算，這是人腦所不能做到的。原本人腦一天的工作量，交給電腦可能僅需幾分鐘的時間就處理完畢。

STAT0_sampling Random Sampling  母體： Finite population & Infinity population  由一大小為 N 的有限母體中抽出一樣本數為 n 的樣本，若每一樣本被抽出的機率是一樣的，這樣本稱為隨機樣本 (random sample)

Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.

Overview Efficient Parallel Algorithms COMP308. COMP 308 Exam Time allowed : 2.5 hours Answer four questions (out of six). If you attempt to answer more.

JAVA 程式設計與資料結構第十四章 Linked List. Introduction Linked List 的結構就是將物件排成一列，有點像是 Array ，但是我們卻無法直接經由 index 得到其中的物件在 Linked List 中，每一個點我們稱之為 node ，第一個 node.

1 94 學年度碩士班新生座談擬定修正. 2 李之中 Chi-Chung Lee Assistant professor Department of Information Management, Chung Hwa University Office.

Network Connections ★★★☆☆ 題組： Contest Archive with Online Judge 題號： Network Connections 解題者：蔡宗翰解題日期： 2008 年 10 月 20 日題意：給你電腦之間互相連線的狀況後，題.

Chapter 8 消費可能性偏好選擇 Part 3 家庭的選擇

選舉制度、政府結構與政黨體系 Cox (1997) Electoral institutions, cleavage strucuters, and the number of parties.

Fourier Series. Jean Baptiste Joseph Fourier (French)(1763~1830)

: A-Sequence ★★★☆☆ 題組： Problem Set Archive with Online Judge 題號： 10930: A-Sequence 解題者：陳盈村解題日期： 2008 年 5 月 30 日題意： A-Sequence 需符合以下的條件， 1 ≤ a.

Teacher : Ing-Jer Huang TA : Chien-Hung Chen 2015/6/25 Course Embedded Systems : Principles and Implementations Weekly Preview Question CH 2.4~CH 2.6 &

Image Interpolation Use SSE 指導教授 : 楊士萱學生 : 楊宗峰日期 :

JAVA 程式設計與資料結構第二十章 Searching. Sequential Searching Sequential Searching 是最簡單的一種搜尋法，此演算法可應用在 Array 或是 Linked List 此等資料結構。 Sequential Searching 的 worst-case.

資料結構實習-二.

演算法 8-1 最大數及最小數找法 8-2 排序 8-3 二元搜尋法.

: Flip Sort ★★☆☆☆ 題組： Problem Set Archive with Online Judge 題號： 10327: Flip Sort 解題者：歐子揚解題日期： 2010 年 2 月 26 日題意：在這個問題中使用一種排序方式 (Flip) ，意思就是只能交換相鄰的.

Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.

INFORMATION RETRIEVAL AND EXTRACTION 作業： Program 1 第十四組組員：林永峰、洪承雄、謝宗憲.

:Commandos ★★★☆☆ 題組： Contest Archive with Online Judge 題號： 11463: Commandos 解題者：李重儀解題日期： 2008 年 8 月 11 日題意：題目會給你一個敵營區內總共的建築物數，以及建築物之間可以互通的路有哪些，並給你起點的建築物和終點.

1 高等演算法 -Introduction 1. Analysis 2. Basic arithmetic 3. Modular arithmetic 4. GCD 5. Primality testing 6. Cryptography.

資料結構實習-六.

: Finding Paths in Grid ★★★★☆ 題組： Contest Archive with Online Judge 題號： 11486: Finding Paths in Grid 解題者：李重儀解題日期： 2008 年 10 月 14 日題意：給一個 7 個 column.

Introduction to Parallel Processing Ch. 12, Pg

第七章計算複雜度概論：排序問題 7.1計算複雜度 7.2插入排序與選擇排序 7.3每次比較至多移除一個導致之演算法的下限

Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.

RAM and Parallel RAM (PRAM). Why models? What is a machine model? – A abstraction describes the operation of a machine. – Allowing to associate a value.

1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.

1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.

1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.

Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.

COMP308 Efficient Parallel Algorithms

CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page

Sorting. Quick Sort Example 2, 1, 3, 4, 5 2, 4, 3, 1, 5, 5, 6, 9, 7, 8, 5 S={6, 5, 9, 2, 4, 3, 5, 1, 7, 5, 8} 1, 234, 5 5 5, 6, 7, 8, 9.

RAM, PRAM, and LogP models

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 3, 2005 Session 7.

Parallel Algorithms. Parallel Models u Hypercube u Butterfly u Fully Connected u Other Networks u Shared Memory v.s. Distributed Memory u SIMD v.s. MIMD.

Parallel Processing & Distributed Systems Thoai Nam Chapter 2.

Data Structures and Algorithms in Parallel Computing Lecture 1.

© 2004 Goodrich, Tamassia Merge Sort1 7 2  9 4   2  2 79  4   72  29  94  4.

3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.

Classification of parallel computers Limitations of parallel processing.

Overview Parallel Processing Pipelining

Higher Level Parallelism

Lecture 3: Parallel Algorithm Design

Distributed and Parallel Processing

Lecture 2: Parallel computational models

Data Structures and Algorithms in Parallel Computing

AN INTRODUCTION ON PARALLEL PROCESSING

Parallel Algorithms A Simple Model for Parallel Processing

Module 6: Introduction to Parallel Computing

Presentation transcript:

Introduction 因為需要 large amount of computation. Why parallel?

Example /10 7 ≒ 10 8 秒 10 8 /10 5 ≒ 10 3 天≒ 3 年用在外科手術時，每秒約需個計算 3D 立體影像一般的電腦每秒約可算 10 7 個計算

其他需要大量計算的領域 Aircraft Testing New Drug Development Oil Exploration Modeling Fusion Reactors Economic Planning Cryptanalysis Managing Large Databases Astronomy Biomedical Analysis …

電腦的速度光速 : 3×10 8 m/s Components 太近 : 會互相干擾每 3 、 4 年增快一倍 Unfortunately It is evident that this trend will soon come to an end. A simple law of physics

Parallelism 人 (CPU) 海戰術目前許多電腦中都不只一顆 CPU

Models of Computation Flynn's classification: SISD(Single Instruction, Single Data) MISD(Multiple Instruction, Single Data) SIMD(Single Instruction, Multiple Data) MIMD(Multiple Instruction, Multiple Data)

SISD Computers Sequential(serial) Computation

Example summation of n numbers: Sum = 0 For I = 1 to n Sum = Sum + I Next need n-1 additions O(n) time

MISD Computers

Example test whether a positive integer is prime or not. n is prime: no divisors except 1 & n. n is composite: otherwise Each processor tests a number between 1 and n. Each processor needs O(1) operation. O(1) time

SIMD Computers

PRAM (Parallel Random Access Machine) Share-Memory(SM) SIMD computers

How does a processor i pass a number to another processor j? O(1) time

WriteRead 1. EREW (Exclusive Read, Exclusive Write) 2. CREW (Concurrent Read, Exclusive Write) 3. ERCW (Exclusive Read, Concurrent Write) 4. CRCW (Concurrent Read, Concurrent Write)

How to resolve conflicts in CRCW? (b) the values to be written in all processors are equal, otherwise access is denied to all processor. (c) sum the values to be written. (a) the smallest-numbered processor is allowed to write, and access is denied to all other processor.

Simulating multiple accesses on an EREW computer: (i)N multiple accesses (read) : broadcasting procedure: 1. P1→P2 2. P1,P2, → P3,P4 3. P1,P2,P3,P4 → P5,P6,P7,P8 : : O(log N) steps

D P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P8 A

(ii) m out of N multiple access(read), m ≦ N Each memory location is associated with a binary tree.

[1][2][4][7] [1][4] [7][1][7] d P1P1 P2P2 P4P4 P7P7 [1][2][4][7][1][2][4]

[1] [4][2] [1][7] d P1P1 P2P2 P4P4 P7P7 d[1] d [7]d [4][7] d d dd dd

Dividing a shared memory into blocks

Interconnection networks SIMD computers The memory is distributed among the N processors. Every pair of processors can communicate with each other if they are connected by a line(link). Instantaneous communication between several pairs of processors is allowed.

(1) fully connected networks There are links.

(2) linear array(1-dimensional array) 1-D array P i-1 Pi Pi P i+1, 2 ≦ i ≦ N – 1

(3) two-dimensional array (2-D mesh-connected)

P(j,k) has links connecting to P(j+1,k), P(j-1,k), P(j,k+1) and P(j,k-1).

(4) tree connection (tree machines)

The sons of P i : P 2i and P 2i+1 The parent of P i : P i/2

(5) shuffle-exchange connection shuffle links: P i  P j j=2i if 0  i  j=2i-(N-1) if exchange links: P i  P i+1 if i is even  i  N-1

(6) cube connection (n-cube,hypercube) 3-cube:

4-cube:

An n-cube connected network consists of 2 n node. (N=2 n ) binary representation of node i: i n-1 i n-2... i j... i 1 i 0 connects to i n-1 i n i 1 i 0 Each processor has n links.

adding 8 numbers:

This concept can be implemented on a tree machine. This concept can also be implemented on other models. time: O(log n), n:# of inputs m sets, each of n numbers: pipelining: log n+(m-1) steps.

MIMD Computers

MIMD computers sharing a common memory: multiprocessors, tightly coupled machines MIMD computers with an interconnection network: multicomputers, loosely coupled machines, distributed system

Analyzing Algorithms ● running time ● speedup ● cost ● efficiency 以工程為例 : ● 到完工所花的時間 ● 和一個人所花的時間比較 ● 人年 ( 或人月 ) ● 一人之人年 /n 人之人年

Running Time 從 the first processor begins computing 到 the last processor ends computing 的時間

Running Time Sum = 0 For I = 1 to n Sum = Sum + I Next 1次1次 N+1 次 N次N次共 3N+3 次 = O(N)

g(n)=O(f(n)): at most f(n), g(n) = 3n + 3 f(n) = n 3n+3 ≦ 5 n 當 n ≧ 2 時 c n0n0

Upper Bound g(n)=O(f(n)): at most f(n),

g(n) cf(n) n0n0

g(n) = 3n + 3 f(n) = n 3n+3 ≧ 2 n 當 n ≧ 1 時 c n0n0 g(n)= Ω(f(n)): at least f(n),

Lower Bound g(n)= Ω(f(n)): at least f(n),

g(n) cf(n) n0n0

adding 8 numbers: time: O(log n)

The time complexity of heapsort is O(n log n). The lower bound of sorting is Ω(n log n). (why?) Thus, heapsort is optimal. The upper bound of sorting is O(n log n).

For matrix multiplication, the lower bound is Ω(n 2 ) and the upper bound is O(n 2.5 ). (There exists such an algorithm.)

speedup = 即 : 一個 CPU 所花的時間 / 平行所花的時間

Adding n number on a tree machine Time: O(log n) Processor: n-1 speedup=O(n/log n)

Bitonic sorting speedup=O(nlogn/log 2 n) =O(n/log n) Time: O(log 2 n) Processor: O(n)

cost = parallel time * # of processor = t(n) * p(n)

s(n): time for the fastest sequential algo. If s(n)=t(n)*p(n), the algorithm is cost optimal (achieves optimal speedup).

adding n numbers processor: O(n/log n)}  cost optimal time: O(log n) Each processor first adds logn numbers sequentially.

efficiency = e.g. bitonic sorting efficiency: