Download presentation
Presentation is loading. Please wait.
Published byTimothy Short Modified over 8 years ago
1
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Emery Berger and Mark Corner University of Massachusetts Amherst Computer Systems Principles Architecture
2
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 2 Architecture
3
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Von Neumann 3
4
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science “von Neumann architecture” 4
5
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Fetch, Decode, Execute 5
6
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 6 The Memory Hierarchy Registers Caches –Associativity –Misses “Locality” registers L1 L2 RAM
7
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 7 SP FP arg0 arg1 arg0 arg1 arg2 Registers Register = dedicated name for word of memory managed by CPU –General-purpose: “AX”, “BX”, “CX” on x86 –Special-purpose: “SP” = stack pointer “FP” = frame pointer “PC” = program counter
8
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 8 SP FP arg0 arg1 Registers Register = dedicated name for one word of memory managed by CPU –General-purpose: “AX”, “BX”, “CX” on x86 –Special-purpose: “SP” = stack pointer “FP” = frame pointer “PC” = program counter Change processes: save current registers & load saved registers = context switch
9
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 9 Caches Access to main memory: “expensive” –~ 100 cycles (slow, but relatively cheap ($)) Caches: small, fast, expensive memory –Hold recently-accessed data (D$) or instructions (I$) –Different sizes & locations Level 1 (L1) – on-chip, smallish Level 2 (L2) – on or next to chip, larger Level 3 (L3) – pretty large, on bus –Manages lines of memory (32-128 bytes)
10
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 10 D$, I$ separate registers L1 L2 RAM Disk 1-cycle latency 2-cycle latency 7-cycle latency 100 cycle latency 40,000,000 cycle latency Network 200,000,000+ cycle latency D$, I$ unified load evict Memory Hierarchy Higher = small, fast, more $, lower latency Lower = large, slow, less $, higher latency
11
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science “Locality” 11
12
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science “Level 0 Cache” 12
13
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science “Level 1 Cache” 13
14
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science “RAM” 14
15
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science “Disk” 15
16
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science “Disk” 16
17
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science “Disk” 17
18
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science “Disk” 18
19
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science “Book Hierarchy” 19
20
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 20 Orders of Magnitude 10^0 registers L1
21
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 21 Orders of Magnitude 10^1 L2
22
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 22 Orders of Magnitude 10^2 RAM
23
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 23 Orders of Magnitude 10^3
24
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 24 Orders of Magnitude 10^4
25
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 25 Orders of Magnitude 10^5
26
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 26 Orders of Magnitude 10^6
27
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 27 Orders of Magnitude 10^7 Disk
28
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 28 Orders of Magnitude 10^8 Network
29
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 29 Orders of Magnitude 10^9 Network
30
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 30 Cache Jargon Cache initially cold Accessing data initially misses –Fetch from lower level in hierarchy –Bring line into cache (populate cache) –Next access: hit Warmed up –cache holds most-frequently used data –Context switch implications? LRU: Least Recently Used –Use the past as a predictor of the future
31
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 31 Cache Details Ideal cache would be fully associative –That is, LRU (least-recently used) queue –Generally too expensive Instead, partition memory addresses and put into separate bins divided into ways –1-way or direct-mapped –2-way = 2 entries per bin –4-way = 4 entries per bin, etc.
32
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 32 Associativity Example Hash memory based on addresses to different indices in cache
33
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 33 Miss Classification First access = compulsory miss –Unavoidable without prefetching Too many items in way = conflict miss –Avoidable if we had higher associativity No space in cache = capacity miss –Avoidable if cache were larger Invalidated = coherence miss –Avoidable if cache were unshared
34
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 34 3711237799613725810 Quick Activity Cache with 8 slots, 2-way associativity –Assume hash(x) = x % 4 (modulus) How many misses? –# compulsory misses? –# conflict misses? –# capacity misses? 10 2 0
35
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 35 Locality Locality = re-use of recently-used items –Temporal locality: re-use in time –Spatial locality: use of nearby items In same cache line, same page (4K chunk) Intuitively – greater locality = fewer misses –# misses depends on cache layout, # of levels, associativity… –Machine-specific
36
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 36 377237 123456 3 7 Quantifying Locality Instead of counting misses, compute hit curve from LRU histogram –Assume perfect LRU cache Ignore compulsory misses
37
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 37 377237 123456 3 7 Quantifying Locality Instead of counting misses, compute hit curve from LRU histogram –Assume perfect LRU cache Ignore compulsory misses
38
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 38 377237 123456 3 7 2 Quantifying Locality Instead of counting misses, compute hit curve from LRU histogram –Assume perfect LRU cache Ignore compulsory misses
39
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 39 377237 123456 3 7 2 Quantifying Locality Instead of counting misses, compute hit curve from LRU histogram –Assume perfect LRU cache Ignore compulsory misses
40
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 40 377237 123456 3 7 2 Quantifying Locality Instead of counting misses, compute hit curve from LRU histogram –Assume perfect LRU cache Ignore compulsory misses
41
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 41 377237 123456 3 7 2 Quantifying Locality Instead of counting misses, compute hit curve from LRU histogram –Assume perfect LRU cache Ignore compulsory misses
42
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 42 123456 113333 Quantifying Locality Instead of counting misses, compute hit curve from LRU histogram –Start with total misses on right hand side –Subtract histogram values
43
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 43.3 1111 Quantifying Locality Instead of counting misses, compute hit curve from LRU histogram –Start with total misses on right hand side –Subtract histogram values –Normalize
44
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 44 354283699613725810 Hit Curve Exercise Derive hit curve for following trace:
45
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 45 354283699613725810 123456789 122233456 Hit Curve Exercise Derive hit curve for following trace:
46
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 46 354283699613725810 123456789 122233456 Hit Curve Exercise Derive hit curve for following trace:
47
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science What can we do with this? What would be the hit rate –with a cache size of 4 or 9?
48
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Simple cache simulator Only argument is N, length of LRU queue –Read in addresses (ints) from cin –Output hits & misses to cout queue –push_front (v) = put v on front of queue –pop_back() = remove back from queue –erase(i) = erase element (iterator i) –size() = number of elements –for (queue ::iterator i = q.begin(); i < q.end(); ++i) cout << *i << endl; 48
49
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 49 Important CPU Internals Other issues that affect performance –Pipelining –Branches & prediction –System calls (kernel crossings)
50
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 50 Scalar architecture Straight-up sequential execution –Fetch instruction –Decode it –Execute it Problem: I or D cache miss –Result – stall: everything stops –How long to wait for miss? long time compared to CPU
51
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 51 Superscalar architectures Out-of-order processors –Pipeline of instructions in flight –Instead of stalling on load, guess! Branch prediction Value prediction –Predictors based on history, location in program –Speculatively execute instructions Actual results checked asynchronously If mispredicted, squash instructions Accurate prediction = massive speedup –Hides latency of memory hierarchy
52
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Pipelining Pipelining overlaps instructions to exploit parallelism, –allows the clock rate to be increased
53
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Pipelining Branches cause bubbles in the pipeline, stages are left idle.
54
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Instruction fetch Instruction decode Execute Memory access Write back Pipelining overlaps instructions to exploit parallelism, allowing the clock rate to be increased. Branches cause bubbles in the pipeline, where some stages are left idle. Unresolved branch Pipelining and Branches
55
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Instruction fetch Instruction decode Execute Memory access Write back A branch predictor allows the processor to speculatively fetch and execute instructions down the predicted path. Speculative execution Branch Prediction
56
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 56 Kernel Mode Protects OS from users –kernel = English for nucleus Think atom Only privileged code executes in kernel System call expensive because: –Enters kernel mode Flushes pipeline, saves context –Executes code “in kernel land” –Returns to user mode, restoring context Where we are in user land
57
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 57 Timers & Interrupts Need to respond to events periodically –Change executing processes Quantum – time limit for process execution Fairness – when timer goes off, interrupt –Current process stops –OS takes control through interrupt handler –Scheduler chooses next process Interrupts also signal I/O events –Network packet arrival, disk read complete…
58
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 58 The End
59
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Branch Prediction A branch predictor allows the processor to speculatively fetch and execute instructions down the predicted path
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.