Download presentation
Presentation is loading. Please wait.
Published byGeorge Bennett Modified over 9 years ago
1
Part 2
2
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory hierarchy. Note cost vs. size.
3
1. All instructions are directly executed by hardware. 2. Maximize the rate at which instructions are issued. 3. Instructions should be easy to decode. 4. Only loads and stores should reference memory. 5. Provide many registers.
4
1. All instructions are directly executed by hardware. Eliminate the microcode interpreter
5
2. Maximize the rate at which instructions are issued. If you issue 500 MIPS, you have a 500 MIPS machine. Parallelism
6
3. Instructions should be easy to decode. Made possible by regular, fixed-length instructions w/ a small number of fields. Fewer instructions are better. Fewer instruction formats are better.
7
4. Only loads and stores should reference memory. Memory access takes a long time. Most instructions should use registers. Separate ops for load & store. can be done in parallel
8
5. Provide many registers. At least 32! Time consuming to have to save registers temporarily and reload them later.
9
Ways to increase speed: a. increase the clock speed b. parallelism types: 1. processor/core level 2. instruction level
10
Fetching instruction from memory is slow. So use a Prefetch Buffer = set of registers (memory) containing instructions to be executed. Fetch and execution can now be done in parallel!
11
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-stage pipeline The state of each stage as a function of time. Nine clock cycles are illustrated.
12
Latency = time to execute instruction Bandwidth = MIPS (instructions per second – typically in millions) Cycle time = time to move through 1 stage of the pipeline = clock rate = clock cycle
13
Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps. a. What is the bandwidth in MIPS for a machine without any pipeline (i.e., without any instruction- level parallelism)? b. What is the bandwidth in MIPS for a machine with a pipeline?
14
Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps. a. What is the bandwidth in MIPS for a machine without any pipeline (i.e., without any instruction- level parallelism)? 6 stages/inst x 3x10 -9 sec/stage = 18x10 -9 sec/inst 1 inst/18x10 -9 sec = 56 MIPS
15
Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps. a. What is the bandwidth in MIPS for a machine without any pipeline (i.e., without any instruction- level parallelism)? 6 stages/inst x 3x10 -9 sec/stage = 18x10 -9 sec/inst 1 inst/18x10 -9 sec = 56 MIPS b. What is the bandwidth in MIPS for a machine with a pipeline?
16
Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps. a. What is the bandwidth in MIPS for a machine without any pipeline (i.e., without any instruction- level parallelism)? 6 stages/inst x 3x10 -9 sec/stage = 18x10 -9 sec/inst 1 inst/18x10 -9 sec = 56 MIPS b. What is the bandwidth in MIPS for a machine with a pipeline? 3x10 -9 sec/inst 1 inst/3x10 -9 sec = 333 MIPS
17
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 Dual five-stage pipelines with a common instruction fetch unit. fetches pairs of instructions
18
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 Note: Since 2 inst can be executed at the same time (S4), they must not conflict over resource usage (e.g., register) and neither must depend on the result of the other. How can we insure this?
19
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 Note: Since 2 inst can be executed at the same time (S4), they must not conflict over resource usage (e.g., register) and neither must depend on the result of the other. How can we insure this? (1) hardware, (2) compiler
20
386 – no pipeline 486 – one pipeline first generation Pentium two 5-stage pipelines: 1. u pipeline - can execute any instruction 2. v pipeline – limited; only integer instructions or FXCH P4 – 20 stages “The later "Prescott" and "Cedar Mill" Pentium 4 cores (and their Pentium D derivatives) had a 31-stage pipeline, the longest in mainstream consumer computing.” - http://en.wikipedia.org/wiki/Instruction_pipeline http://en.wikipedia.org/wiki/Instruction_pipeline Nehalem (16 pipeline stages), Enhanced Core, and Sandy Bridge microachitecture (next few slides; see http://www.intel.com/content/dam/doc/manual/64-ia-32- architectures-optimization-manual.pdf) http://www.intel.com/content/dam/doc/manual/64-ia-32- architectures-optimization-manual.pdf
24
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A superscalar processor with five functional units. S3 issued every clock cycle S4 may require more than 1 clock cycle
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.