Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECE 1747: Parallel Programming Basics of Parallel Architectures: Shared-Memory Machines.

Similar presentations


Presentation on theme: "ECE 1747: Parallel Programming Basics of Parallel Architectures: Shared-Memory Machines."— Presentation transcript:

1 ECE 1747: Parallel Programming Basics of Parallel Architectures: Shared-Memory Machines

2 Two Parallel Architectures Shared memory machines. Distributed memory machines.

3 Shared Memory: Logical View proc1proc2proc3procN Shared memory space

4 Shared Memory Machines Small number of processors: shared memory with coherent caches (SMP). Larger number of processors: distributed shared memory with coherent caches (CC- NUMA).

5 SMPs 2- or 4-processors PCs are now commodity. Good price/performance ratio. Memory sometimes bottleneck (see later). Typical price (8-node): ~ $20-40k.

6 Physical Implementation proc1proc2proc3procN Shared memory cache1cache2cache3cacheN bus

7 Shared Memory Machines Small number of processors: shared memory with coherent caches (SMP). Larger number of processors: distributed shared memory with coherent caches (CC- NUMA).

8 CC-NUMA: Physical Implementation proc1proc2proc3procN mem2mem3memNmem1 cache2cache1cacheNcache3 inter- connect

9 Caches in Multiprocessors Suffer from the coherence problem: –same line appears in two or more caches –one processor writes word in line –other processors now can read stale data Leads to need for a coherence protocol –avoids coherence problems Many exist, will just look at simple one.

10 What is coherence? What does it mean to be shared? Intuitively, read last value written. Notion is not well-defined in a system without a global clock.

11 The Notion of “last written” in a Multi-processor System w(x) r(x) P0 P1 P2 P3

12 The Notion of “last written” in a Single-machine System w(x) r(x)

13 Coherence: a Clean Definition Is achieved by referring back to the single machine case. Called sequential consistency.

14 Sequential Consistency (SC) Memory is sequentially consistent if and only if it behaves “as if” the processors were executing in a time-shared fashion on a single machine.

15 Returning to our Example w(x) r(x) P0 P1 P2 P3

16 Another Way of Defining SC All memory references of a single process execute in program order. All writes are globally ordered.

17 SC: Example 1 w(x,1)w(y,1) r(x)r(y) Initial values of x,y are 0. What are possible final values?

18 SC: Example 2 w(x,1)w(y,1) r(y)r(x)

19 SC: Example 3 w(x,1) w(y,1) r(y)r(x)

20 SC: Example 4 w(x,1) w(x,2) r(x)

21 Implementation Many ways of implementing SC. In fact, sometimes stronger conditions. Will look at a simple one: MSI protocol.

22 Physical Implementation proc1proc2proc3procN Shared memory cache1cache2cache3cacheN bus

23 Fundamental Assumption The bus is a reliable, ordered broadcast bus. –Every message sent by a processor is received by all other processors in the same order. Also called a snooping bus –Processors (or caches) snoop on the bus.

24 States of a Cache Line Invalid Shared –read-only, one of many cached copies Modified –read-write, sole valid copy

25 Processor Transactions processor read(x) processor write(x)

26 Bus Transactions bus read(x) –asks for copy with no intent to modify bus read-exclusive(x) –asks for copy with intent to modify

27 State Diagram: Step 0 ISM

28 State Diagram: Step 1 ISM PrRd/BuRd

29 State Diagram: Step 2 ISM PrRd/BuRd PrRd/-

30 State Diagram: Step 3 ISM PrRd/BuRd PrRd/- PrWr/BuRdX

31 State Diagram: Step 4 ISM PrRd/BuRd PrRd/- PrWr/BuRdX

32 State Diagram: Step 5 ISM PrRd/BuRd PrRd/- PrWr/BuRdX PrWr/-

33 State Diagram: Step 6 ISM PrRd/BuRd PrRd/- PrWr/BuRdX PrWr/- BuRd/Flush

34 State Diagram: Step 7 ISM PrRd/BuRd PrRd/- PrWr/BuRdX PrWr/- BuRd/Flush BuRd/-

35 State Diagram: Step 8 ISM PrRd/BuRd PrRd/- PrWr/BuRdX PrWr/- BuRd/Flush BuRd/- BuRdX/-

36 State Diagram: Step 9 ISM PrRd/BuRd PrRd/- PrWr/BuRdX PrWr/- BuRd/Flush BuRd/- BuRdX/- BuRdX/Flush

37 In Reality Most machines use a slightly more complicated protocol (4 states instead of 3). See architecture books (MESI protocol).

38 Problem: False Sharing Occurs when two or more processors access different data in same cache line, and at least one of them writes. Leads to ping-pong effect.

39 False Sharing: Example (1 of 3) for( i=0; i<n; i++ ) a[i] = b[i]; Let’s assume we parallelize code: –p = 2 –element of a takes 4 words –cache line has 32 words

40 False Sharing: Example (2 of 3) a[0]a[1]a[2]a[3]a[4]a[5]a[6]a[7] cache line Written by processor 0 Written by processor 1

41 False Sharing: Example (3 of 3) P0 P1 a[0] a[1] a[2]a[4] a[3]a[5]... invdata

42 Summary Sequential consistency. Bus-based coherence protocols. False sharing.


Download ppt "ECE 1747: Parallel Programming Basics of Parallel Architectures: Shared-Memory Machines."

Similar presentations


Ads by Google