Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction 因為需要 large amount of computation. Why parallel?

Similar presentations


Presentation on theme: "Introduction 因為需要 large amount of computation. Why parallel?"— Presentation transcript:

1 Introduction 因為需要 large amount of computation. Why parallel?

2 Example 10 15 /10 7 ≒ 10 8 秒 10 8 /10 5 ≒ 10 3 天≒ 3 年 用在外科手術時 , 每秒約需 10 15 個計算 3D 立體影像 一般的電腦每秒約可算 10 7 個計算

3 其他需要大量計算的領域 Aircraft Testing New Drug Development Oil Exploration Modeling Fusion Reactors Economic Planning Cryptanalysis Managing Large Databases Astronomy Biomedical Analysis …

4 電腦的速度 光速 : 3×10 8 m/s Components 太近 : 會互相干擾 每 3 、 4 年增快一倍 Unfortunately It is evident that this trend will soon come to an end. A simple law of physics

5 Parallelism 人 (CPU) 海戰術 目前許多電腦中都不只一顆 CPU

6 Models of Computation Flynn's classification: SISD(Single Instruction, Single Data) MISD(Multiple Instruction, Single Data) SIMD(Single Instruction, Multiple Data) MIMD(Multiple Instruction, Multiple Data)

7 SISD Computers Sequential(serial) Computation

8 Example summation of n numbers: Sum = 0 For I = 1 to n Sum = Sum + I Next need n-1 additions O(n) time

9 MISD Computers

10 Example test whether a positive integer is prime or not. n is prime: no divisors except 1 & n. n is composite: otherwise Each processor tests a number between 1 and n. Each processor needs O(1) operation. O(1) time

11 SIMD Computers

12 PRAM (Parallel Random Access Machine) Share-Memory(SM) SIMD computers

13 How does a processor i pass a number to another processor j? O(1) time

14 WriteRead 1. EREW (Exclusive Read, Exclusive Write) 2. CREW (Concurrent Read, Exclusive Write) 3. ERCW (Exclusive Read, Concurrent Write) 4. CRCW (Concurrent Read, Concurrent Write)

15 How to resolve conflicts in CRCW? (b) the values to be written in all processors are equal, otherwise access is denied to all processor. (c) sum the values to be written. (a) the smallest-numbered processor is allowed to write, and access is denied to all other processor.

16 Simulating multiple accesses on an EREW computer: (i)N multiple accesses (read) : broadcasting procedure: 1. P1→P2 2. P1,P2, → P3,P4 3. P1,P2,P3,P4 → P5,P6,P7,P8 : : O(log N) steps

17 1 23456 7 8 5 D P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P8 A 5 55 5 55 55 5555 5555

18 (ii) m out of N multiple access(read), m ≦ N Each memory location is associated with a binary tree.

19 [1][2][4][7] [1][4] [7][1][7] d P1P1 P2P2 P4P4 P7P7 [1][2][4][7][1][2][4]

20 [1] [4][2] [1][7] d P1P1 P2P2 P4P4 P7P7 d[1] d [7]d [4][7] d d dd dd

21 Dividing a shared memory into blocks

22 Interconnection networks SIMD computers The memory is distributed among the N processors. Every pair of processors can communicate with each other if they are connected by a line(link). Instantaneous communication between several pairs of processors is allowed.

23 (1) fully connected networks There are links.

24 (2) linear array(1-dimensional array) 1-D array P i-1 Pi Pi P i+1, 2 ≦ i ≦ N – 1

25 (3) two-dimensional array (2-D mesh-connected)

26 P(j,k) has links connecting to P(j+1,k), P(j-1,k), P(j,k+1) and P(j,k-1).

27 (4) tree connection (tree machines)

28 The sons of P i : P 2i and P 2i+1 The parent of P i : P i/2

29 (5) shuffle-exchange connection shuffle links: P i  P j j=2i if 0  i  j=2i-(N-1) if exchange links: P i  P i+1 if i is even  i  N-1

30 (6) cube connection (n-cube,hypercube) 3-cube:

31

32 4-cube:

33 An n-cube connected network consists of 2 n node. (N=2 n ) binary representation of node i: i n-1 i n-2... i j... i 1 i 0 connects to i n-1 i n-2...... i 1 i 0 Each processor has n links.

34 adding 8 numbers:

35 This concept can be implemented on a tree machine. This concept can also be implemented on other models. time: O(log n), n:# of inputs m sets, each of n numbers: pipelining: log n+(m-1) steps.

36 MIMD Computers

37 MIMD computers sharing a common memory: multiprocessors, tightly coupled machines MIMD computers with an interconnection network: multicomputers, loosely coupled machines, distributed system

38 Analyzing Algorithms ● running time ● speedup ● cost ● efficiency 以工程為例 : ● 到完工所花的時間 ● 和一個人所花的時間比較 ● 人年 ( 或人月 ) ● 一人之人年 /n 人之人年

39 Running Time 從 the first processor begins computing 到 the last processor ends computing 的時間

40 Running Time Sum = 0 For I = 1 to n Sum = Sum + I Next 1次1次 N+1 次 N次N次 共 3N+3 次 = O(N)

41 g(n)=O(f(n)): at most f(n), g(n) = 3n + 3 f(n) = n 3n+3 ≦ 5 n 當 n ≧ 2 時 c n0n0

42 Upper Bound g(n)=O(f(n)): at most f(n),

43 g(n) cf(n) n0n0

44 g(n) = 3n + 3 f(n) = n 3n+3 ≧ 2 n 當 n ≧ 1 時 c n0n0 g(n)= Ω(f(n)): at least f(n),

45 Lower Bound g(n)= Ω(f(n)): at least f(n),

46 g(n) cf(n) n0n0

47 adding 8 numbers: time: O(log n)

48 The time complexity of heapsort is O(n log n). The lower bound of sorting is Ω(n log n). (why?) Thus, heapsort is optimal. The upper bound of sorting is O(n log n).

49 For matrix multiplication, the lower bound is Ω(n 2 ) and the upper bound is O(n 2.5 ). (There exists such an algorithm.)

50 speedup = 即 : 一個 CPU 所花的時間 / 平行所花的時間

51 Adding n number on a tree machine Time: O(log n) Processor: n-1 speedup=O(n/log n)

52 Bitonic sorting speedup=O(nlogn/log 2 n) =O(n/log n) Time: O(log 2 n) Processor: O(n)

53 cost = parallel time * # of processor = t(n) * p(n)

54 s(n): time for the fastest sequential algo. If s(n)=t(n)*p(n), the algorithm is cost optimal (achieves optimal speedup).

55 adding n numbers processor: O(n/log n)}  cost optimal time: O(log n) Each processor first adds logn numbers sequentially.

56 efficiency = e.g. bitonic sorting efficiency:


Download ppt "Introduction 因為需要 large amount of computation. Why parallel?"

Similar presentations


Ads by Google