Download presentation
Presentation is loading. Please wait.
Published byDayna Hutchinson Modified over 9 years ago
1
1 CPRE 585 Term Review Performance evaluation, ISA design, dynamically scheduled pipeline, and memory hierarchy
2
2 Exam Schedule In-class (40 points) Wednesday Nov. 12 class time (75 minutes) Take home (60 points) Distributed Wednesday Nov. 12, return by 5:00pm Friday Not two exams: different purposes, question types, and difficulty levels
3
3 Performance Evaluate Performance metrics: latency and throughput and others Speedup Benchmarks: design considerations, categories, examples (SPEC and TPC) Summarizing performance Amdahl’s Law: idea and equation CPU time equation
4
4 ISA Design ISA types GPR ISA variants: #oprands, use of register, immediate, and memory operands GPR ISA design issues memory addressing endian and alignment Compare RISC and CISC ISA impacts on processor performance
5
5 Instruction Scheduling Fundamentals Dependence analysis Data and name (anti- and output) dependences; or RAW, WAW, and WAR Dependences through register and memory Control dependence CPU Time = #inst × CPI × Cycle time CPI = CPI ideal +CPI data hazard +CPI control hazard Deep pipeline: reduce cycle time Multi-issue and dynamic scheduling: CPI ideal Branch pred. and spec. execution: CPI control hazard Memory hierarchy: CPI data hazard
6
6 Tomasulo Algorithm Study focus: data and name dependences through registers Hardware structures Register status table (renaming table): Help remove name dependences and build up data dependences Reservation station: Preserve data dependences, buffer instruction states, wake up dependent instructions Common data bus: Broadcast tag and data What are the stages of Tomasulo? Understand the big example!
7
7 Precise Interrupt and Speculative Execution What is precise interrupt and why? In-order commit: solution for both Central idea: maintain architectural states Must buffer inst output after execution Commit inst output to architecture states in program order Flush pipeline at exceptions or mis-speculations Q: What is ROB? And its structure? Q: What is the change of pipeline?
8
8 Modern Instruction Scheduling Major differences: more pipeline stages, data forwarding, decoupled tag broadcasting, may use issue queue Issue queue: RS changed to IQ; switch two pipeline stages; significant changes at registers, renaming, and ROB Why data forwarding? How is IQ different from RS? What is the change at pipeline What is physical register? What is the change at rename stage? Understand the generic superscalar processor models
9
9 Branch Prediction Objective: delivery instruction continuously Several functions: predict target, direction, and return address Review BTB and BHT design Why use saturating counter? Why use correlating prediction How BTB and BHT are updated? How to calculate mis-prediction penalty? What is return address stack? Understand tournament predictor
10
10 Memory Data Flow Techniques Address dependence through memory: store->load dependences Must buffer store outputs => store queue Want memory-level parallelism => memory disambiguation load bypassing and forwarding may speculate if store address not known Need to detect mis-speculation => load queue (violation detected on stores) Q: Where is the performance gain? Q: What are the structures of LQ and SQ? Q: How store queue and load queue are synchronized with ROB? Q: Which portion of SQ preserves architecture states? How to flush SQ and LQ? Superscalar tech: inst flow, reg. flow, and data mem. flow
11
11 Limits of ILP What may limit ILP in realistic programs? What is the strategy to evaluate ILP limits?
12
12 Cache Fundaments Cache design What is cache? And why to use cache? What are the 4 Qs of cache design? Note caching happens on memory blocks Be very familiar with cache address mapping format Cache performance Three factors: miss rate, miss penalty, and hit time What is AMAT? And memory stall time? What is the final measurement of cache performance? How to evaluate set-associative caches? Know how to analyze memory access pattern
13
13 Cache Optimization Techniques What are desired: low cache misses, fast cache hit, and small miss penalty, with minimal complexity (ideal world) Understand cache misses What are three Cs? Which techniques to reduce each type? Involves tradeoff E.g. cache size, block size, set associativity
14
14 Improving Cache Performance 3.Reducing miss penalty or miss rates via parallelism Non-blocking caches Hardware prefetching Compiler prefetching 4.Reducing cache hit time Small and simple caches Avoiding address translation Pipelined cache access Trace caches 1. Reducing miss rates Larger block size larger cache size higher associativity way prediction Pseudoassociativity compiler optimization 2. Reducing miss penalty Multilevel caches critical word first read miss first merging write buffers victim caches Bold type: know details Others: understand concepts
15
15 Virtual Memory Why VM? What are four Qs for VM design? How to compare cache and VM? Be familiar with VM address mapping format Understand flat page table; what are in PTE? What is TLB? How does TLB work? Why multi-level page table?
16
16 Typical Memory Hierarchy Today L1 instruction cache: small and combined with prediction (way prediction, trace cache) and prefetching (e.g. stream buffer); virtually indexed and virtually tagged L1 data cache: small and fast, pipelined, and likely to be set associative (Intel: 8KB, 4-way set associative) virtually indexed and physical tagged Write through TLB: small (128-entry in 21264), split for inst and data, tends to be fully associative, D-TLB run in parallel with L1-data L2 unified cache: As large as transistor budget allows; today highly set associative (e.g., 512KB 8-way for P4); write-back to reduce memory traffic Optional L3 cache: Even larger, off-chip Page table: multi-level, software (21264) or hardware (Intel) managed Main memory: large but slow, high-bandwidth
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.