Download presentation
Presentation is loading. Please wait.
Published byRoss Paul Modified over 9 years ago
1
Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University of Mashhad khademzadeh@mshdiau.ac.ir
2
Art of Multiprocessor Programming 2 Multiprocessor Architecture Abstract models are (mostly) OK to understand algorithm correctness and progress To understand how concurrent algorithms actually perform You need to understand something about multiprocessor architectures
3
Art of Multiprocessor Programming 3 Pieces Processors Threads Interconnect Memory Caches
4
Art of Multiprocessor Programming 4 Design an Urban Messenger Service in 1980 Downtown Manhattan Should you use –Cars (1980 Buicks, 15 MPG)? –Bicycles (hire recent graduates)? Better use bicycles
5
Art of Multiprocessor Programming 5 Technology Changes Since 1980, car technology has changed enormously –Better mileage (hybrid cars, 35 MPG) –More reliable Should you rethink your Manhattan messenger service?
6
Art of Multiprocessor Programming 6 Processors Cycle: –Fetch and execute one instruction Cycle times change –1980: 10 million cycles/sec –2005: 3,000 million cycles/sec
7
Art of Multiprocessor Programming 7 Computer Architecture Measure time in cycles –Absolute cycle times change Memory access: ~100s of cycles –Changes slowly –Mostly gets worse
8
Art of Multiprocessor Programming 8 Threads Execution of a sequential program Software, not hardware A processor can run a thread Put it aside –Thread does I/O –Thread runs out of time Run another thread
9
Art of Multiprocessor Programming 9 Interconnect Bus –Like a tiny Ethernet –Broadcast medium –Connects Processors to memory Processors to processors Network –Tiny LAN –Mostly used on large machines SMP memory
10
Art of Multiprocessor Programming 10 Interconnect Interconnect is a finite resource Processors can be delayed if others are consuming too much Avoid algorithms that use too much bandwidth
11
Art of Multiprocessor Programming 11 Analogy You work in an office When you leave for lunch, someone else takes over your office. If you don’t take a break, a security guard shows up and escorts you to the cafeteria. When you return, you may get a different office
12
Art of Multiprocessor Programming 12 Processor and Memory are Far Apart processor memory interconnect
13
Art of Multiprocessor Programming 13 Reading from Memory address
14
Art of Multiprocessor Programming 14 Reading from Memory zzz…
15
Art of Multiprocessor Programming 15 Reading from Memory value
16
Art of Multiprocessor Programming 16 Writing to Memory address, value
17
Art of Multiprocessor Programming 17 Writing to Memory zzz…
18
Art of Multiprocessor Programming 18 Writing to Memory ack
19
Art of Multiprocessor Programming 19 Remote Spinning Thread waits for a bit in memory to change –Maybe it tried to dequeue from an empty buffer Spins –Repeatedly rereads flag bit Huge waste of interconnect bandwidth
20
Art of Multiprocessor Programming 20 Analogy In the days before the Internet … Alice is writing a paper on aardvarks Sources are in university library –Request book by campus mail –Book arrives by return mail –Send it back when not in use She spends a lot of time in the mail room
21
Art of Multiprocessor Programming 21 Analogy II Alice buys –A desk In her office To keep the books she is using now –A bookcase in the hall To keep the books she will need soon
22
Art of Multiprocessor Programming 22 Cache: Reading from Memory address cache
23
Art of Multiprocessor Programming 23 Cache: Reading from Memory cache
24
Art of Multiprocessor Programming 24 Cache: Reading from Memory cache
25
Art of Multiprocessor Programming 25 Cache Hit cache ?
26
Art of Multiprocessor Programming 26 Cache Hit cache Yes!
27
Art of Multiprocessor Programming 27 Cache Miss address cache ? No…
28
Art of Multiprocessor Programming 28 Cache Miss cache
29
Art of Multiprocessor Programming 29 Cache Miss cache
30
Art of Multiprocessor Programming 30 Local Spinning With caches, spinning becomes practical First time –Load flag bit into cache As long as it doesn’t change –Hit in cache (no interconnect used) When it changes –One-time cost –See cache coherence below
31
Art of Multiprocessor Programming 31 Granularity Caches operate at a larger granularity than a word Cache line: fixed-size block containing the address
32
Art of Multiprocessor Programming 32 Locality If you use an address now, you will probably use it again soon –Fetch from cache, not memory If you use an address now, you will probably use a nearby address soon –In the same cache line
33
Art of Multiprocessor Programming 33 Hit Ratio Proportion of requests that hit in the cache Measure of effectiveness of caching mechanism Depends on locality of application
34
Art of Multiprocessor Programming 34 L1 and L2 Caches L1 L2
35
Art of Multiprocessor Programming 35 L1 and L2 Caches L1 L2 Small & fast 1 or 2 cycles ~16 byte line
36
Art of Multiprocessor Programming 36 L1 and L2 Caches L1 L2 Larger and slower 10s of cycles ~1K line size
37
Art of Multiprocessor Programming 37 When a Cache Becomes Full… Need to make room for new entry By evicting an existing entry Need a replacement policy –Usually some kind of least recently used heuristic
38
Art of Multiprocessor Programming 38 Fully Associative Cache Any line can be anywhere in the cache –Advantage: can replace any line –Disadvantage: hard to find lines
39
Art of Multiprocessor Programming 39 Direct Mapped Cache Every address has exactly 1 slot –Advantage: easy to find a line –Disadvantage: must replace fixed line
40
Art of Multiprocessor Programming 40 K-way Set Associative Cache Each slot holds k lines –Advantage: pretty easy to find a line –Advantage: some choice in replacing line
41
Art of Multiprocessor Programming 41 Contention Alice and Bob are both writing research papers on aardvarks. Alice has encyclopedia vol AA-AC Bob asks library for it –Library asks Alice to return it –Alice returns it & rerequests it –Library asks Bob to return it…
42
Art of Multiprocessor Programming 42 Contention Good to avoid memory contention. Idle processors Consumes interconnect bandwidth
43
Art of Multiprocessor Programming 43 Contention Alice is still writing a research paper on aardvarks. Carol is writing a tourist guide to the German city of Aachen No conflict? –Library deals with volumes, not articles –Both require same encyclopedia volume
44
Art of Multiprocessor Programming 44 False Sharing Two processors may conflict over disjoint addresses If those addresses lie on the same cache line
45
Art of Multiprocessor Programming 45 False Sharing Large cache line size –increases locality –But also increases likelihood of false sharing Sometimes need to “scatter” data to avoid this problem
46
Art of Multiprocessor Programming 46 Cache Coherence Processor A and B both cache address x A writes to x –Updates cache How does B find out? Many cache coherence protocols in literature
47
Art of Multiprocessor Programming 47 MESI Modified –Have modified cached data, must write back to memory
48
Art of Multiprocessor Programming 48 MESI Modified –Have modified cached data, must write back to memory Exclusive –Not modified, I have only copy
49
Art of Multiprocessor Programming 49 MESI Modified –Have modified cached data, must write back to memory Exclusive –Not modified, I have only copy Shared –Not modified, may be cached elsewhere
50
Art of Multiprocessor Programming 50 MESI Modified –Have modified cached data, must write back to memory Exclusive –Not modified, I have only copy Shared –Not modified, may be cached elsewhere Invalid –Cache contents not meaningful
51
Art of Multiprocessor Programming 51 Bus Processor Issues Load Request Bus cache memory cache data load x (1)
52
Art of Multiprocessor Programming 52 cache Bus Memory Responds Bus memory cache data Got it! data (3) E
53
Art of Multiprocessor Programming 53 Bus Processor Issues Load Request Bus memory cache data Load x (2) E
54
Art of Multiprocessor Programming 54 Bus Other Processor Responds memory cache data Got it data Bus (2) E S S
55
Art of Multiprocessor Programming 55 S Modify Cached Data Bus data memory cachedata (1) S
56
Art of Multiprocessor Programming 56 S memory data Bus Write-Through Cache Bus cache data Write x! (5) S
57
Art of Multiprocessor Programming 57 Write-Through Caches Immediately broadcast changes Good –Memory, caches always agree –More read hits, maybe Bad –Bus traffic on all writes –Most writes to unshared data –For example, loop indexes … (1)
58
Art of Multiprocessor Programming 58 Write-Through Caches Immediately broadcast changes Good –Memory, caches always agree –More read hits, maybe Bad –Bus traffic on all writes –Most writes to unshared data –For example, loop indexes … “show stoppers” (1)
59
Art of Multiprocessor Programming 59 Write-Back Caches Accumulate changes in cache Write back when line evicted –Need the cache for something else –Another processor wants it
60
Art of Multiprocessor Programming 60 Bus Invalidate Bus memory cachedata cache Invalidate x (4) S S M I
61
Art of Multiprocessor Programming 61 Multicore Architectures The university president –Alarmed by fall in productivity Puts Alice, Bob, and Carol in same corridor –Private desks –Shared bookcase Contention costs go way down
62
Art of Multiprocessor Programming 62 cache Bus Old-School Multiprocessor Bus memory cache
63
Art of Multiprocessor Programming 63 cache Bus memory cache Multicore Architecture
64
Art of Multiprocessor Programming 64 Multicore Private L1 caches Shared L2 caches Communication between same-chip processors now very fast Different-chip processors still not so fast
65
Art of Multiprocessor Programming 65 NUMA Architectures Alice and Bob transfer to NUMA State University No centralized library Each office basement holds part of the library
66
Art of Multiprocessor Programming 66 Distributed Shared-Memory Architectures Alice’s has volumes that start with A –Aardvark papers are convenient: run downstairs –Zebra papers are inconvenient: run across campus
67
Art of Multiprocessor Programming 67 SMP vs NUMA SMP memory NUMA (1) SMP: symmetric multiprocessor NUMA: non-uniform memory access CC-NUMA: cache-coherent …
68
Art of Multiprocessor Programming 68 Spinning Again NUMA without cache –OK if local variable –Bad if remote Cc-NUMA –Like SMP
69
Art of Multiprocessor Programming 69 Recall: Real Memory is Relaxed Remember the flag principle? –Alice and Bob’s flag variables false Alice writes true to her flag and reads Bob’s Bob writes true to his flag and reads Alice’s One must see the other’s flag true
70
Art of Multiprocessor Programming 70 Not Necessarily So Sometimes the compiler reorders memory operations Can improve –cache performance –interconnect use But unexpected concurrent interactions
71
Art of Multiprocessor Programming 71 Write Buffers address Absorbing Batching
72
Art of Multiprocessor Programming 72 Volatile In Java, if a variable is declared volatile, operations won’t be reordered Expensive, so use it only when needed
73
Art of Multiprocessor Programming 73 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike 2.5 License You are free: –to Share — to copy, distribute and transmit the work –to Remix — to adapt the work Under the following conditions: –Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). –Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to –http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.