Computer Architecture: Wrap-up CENG331 - Computer Organization Instructors: Murat Manguoglu(Section 1) Erol Sahin (Section 2 & 3) Adapted from slides of.

Slides:



Advertisements
Similar presentations
Lecture 4: CPU Performance
Advertisements

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
1 Code Optimization(II). 2 Outline Understanding Modern Processor –Super-scalar –Out-of –order execution Suggested reading –5.14,5.7.
1 Seoul National University Wrap-Up. 2 Overview Seoul National University Wrap-Up of PIPE Design  Exception conditions  Performance analysis Modern.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Architecture Carnegie Mellon University
PipelinedImplementation Part I CSC 333. – 2 – Overview General Principles of Pipelining Goal Difficulties Creating a Pipelined Y86 Processor Rearranging.
Chapter 12 Pipelining Strategies Performance Hazards.
Chapter 12 CPU Structure and Function. Example Register Organizations.
Wrap-Up CSC 333. – 2 – Overview Wrap-Up of PIPE Design Performance analysis Fetch stage design Exceptional conditions Modern High-Performance Processors.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Appendix A Pipelining: Basic and Intermediate Concepts
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
David O’Hallaron Carnegie Mellon University Processor Architecture PIPE: Pipelined Implementation Part I Processor Architecture PIPE: Pipelined Implementation.
Architecture Basics ECE 454 Computer Systems Programming
Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth.
COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept.
1 Seoul National University Pipelined Implementation : Part I.
University of Amsterdam Computer Systems – optimizing program performance Arnoud Visser 1 Computer Systems Optimizing program performance.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
Pipeline Architecture I Slides from: Bryant & O’ Hallaron
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
PipelinedImplementation Part II PipelinedImplementation.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
Real-World Pipelines Idea –Divide process into independent stages –Move objects through stages in sequence –At any given times, multiple objects being.
1 Pipelined Implementation. 2 Outline Handle Control Hazard Handle Exception Performance Analysis Suggested Reading 4.5.
PipeliningPipelining Computer Architecture (Fall 2006)
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
Real-World Pipelines Idea Divide process into independent stages
Computer Architecture Chapter (14): Processor Structure and Function
Computer Organization CS224
Data Prefetching Smruti R. Sarangi.
Systems I Pipelining IV
Pipelining: Advanced ILP
Morgan Kaufmann Publishers The Processor
Lecture 6: Advanced Pipelines
Pipelined Implementation : Part I
Seoul National University
Seoul National University
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Systems I Pipelining II
Pipelined Implementation : Part I
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Seoul National University
Control unit extension for data hazards
Pipeline Architecture I Slides from: Bryant & O’ Hallaron
Pipelined Implementation : Part I
* From AMD 1996 Publication #18522 Revision E
Data Prefetching Smruti R. Sarangi.
Pipelined Implementation
Computer Architecture
Control unit extension for data hazards
CSC3050 – Computer Architecture
Systems I Pipelining II
Systems I Pipelining II
Control unit extension for data hazards
Lecture 11: Machine-Dependent Optimization
Presentation transcript:

Computer Architecture: Wrap-up CENG331 - Computer Organization Instructors: Murat Manguoglu(Section 1) Erol Sahin (Section 2 & 3) Adapted from slides of the textbook:

– 2 – CS:APP3e Exceptions Conditions under which processor cannot continue normal operationCauses Halt instruction(Current) Bad address for instruction or data(Previous) Invalid instruction(Previous) Typical Desired Action Complete some instructions Either current or previous (depends on exception type) Discard others Call exception handler Like an unexpected procedure call Our Implementation Halt when instruction causes exception

– 3 – CS:APP3e Exception Examples Detect in Fetch Stage irmovq $100,%rax rmmovq %rax,0x10000(%rax) # invalid address jmp $-1 # Invalid jump target.byte 0xFF # Invalid instruction code halt # Halt instruction Detect in Memory Stage

– 4 – CS:APP3e Rest of Real-Life Exception Handling Call Exception Handler Push PC onto stack Either PC of faulting instruction or of next instruction Usually pass through pipeline along with exception status Jump to handler address Usually fixed address Defined as part of ISAImplementation Haven’t tried it yet!

– 5 – CS:APP3e Performance Metrics Clock rate Measured in Gigahertz Function of stage partitioning and circuit design Keep amount of work per stage small Rate at which instructions executed CPI: cycles per instruction On average, how many clock cycles does each instruction require? Function of pipeline design and benchmark programs E.g., how frequently are branches mispredicted?

– 6 – CS:APP3e CPI for PIPE CPI  1.0 Fetch instruction each clock cycle Effectively process new instruction almost every cycle Although each individual instruction has latency of 5 cycles CPI > 1.0 Sometimes must stall or cancel branches Computing CPI C clock cycles I instructions executed to completion B bubbles injected (C = I + B) CPI = C/I = (I+B)/I = B/I Factor B/I represents average penalty due to bubbles

– 7 – CS:APP3e CPI for PIPE (Cont.) B/I = LP + MP + RP LP: Penalty due to load/use hazard stalling Fraction of instructions that are loads0.25 Fraction of load instructions requiring stall0.20 Number of bubbles injected each time1  LP = 0.25 * 0.20 * 1 = 0.05 MP: Penalty due to mispredicted branches Fraction of instructions that are cond. jumps 0.20 Fraction of cond. jumps mispredicted0.40 Number of bubbles injected each time 2  MP = 0.20 * 0.40 * 2 = 0.16 RP: Penalty due to ret instructions Fraction of instructions that are returns0.02 Number of bubbles injected each time 3  RP = 0.02 * 3 = 0.06 Net effect of penalties = 0.27  CPI = 1.27 (Not bad!) Typical Values

– 8 – CS:APP3e Modern CPU Design

– 9 – CS:APP3e Instruction Control Grabs Instruction Bytes From Memory Based on Current PC + Predicted Targets for Predicted Branches Hardware dynamically guesses whether branches taken/not taken and (possibly) branch target Translates Instructions Into Operations Primitive steps required to perform instruction Typical instruction requires 1–3 operations Converts Register References Into Tags Abstract identifier linking destination of one operation with sources of later operations

– 10 – CS:APP3e Execution Unit Multiple functional units Each can operate in independently Operations performed as soon as operands available Not necessarily in program order Within limits of functional units Control logic Ensures behavior equivalent to sequential program execution Execution Functional Units Integer/ Branch FP Add FP Mult/Div LoadStore Data Cache Prediction OK? Data Addr.. General Integer Operation Results Register Updates Operations

– 11 – CS:APP3e CPU Capabilities of Intel Haswell Multiple Instructions Can Execute in Parallel 2 load 1 store 4 integer 2 FP multiply 1 FP add / divide Some Instructions Take > 1 Cycle, but Can be Pipelined InstructionLatencyCycles/Issue Load / Store41 Integer Multiply31 Integer Divide3—303—30 Double/Single FP Multiply51 Double/Single FP Add31 Double/Single FP Divide10—156—11

– 12 – CS:APP3e Haswell Operation Translates instructions dynamically into “Uops” ~118 bits wide Holds operation, two sources, and destination Executes Uops with “Out of Order” engine Uop executed when Operands available Functional unit available Execution controlled by “Reservation Stations” Keeps track of data dependencies between uops Allocates resources