Download presentation
Presentation is loading. Please wait.
Published byAmi Snow Modified over 9 years ago
1
Instruction Issue Logic for High- Performance Interruptible Pipelined Processors Gurinder S. Sohi Professor UW-Madison Computer Architecture Group University of Wisconsin-Madison Sriram Vajapeyam Real-Time Collaboration space at Oracle, Bangalore, India
2
What is this about? The performance of pipelined processors is severely limited by data dependencies and branch instructions. Another major problem that arises in pipelined computer design is that an interrupt can be imprecise. Both of these causes performance degradation. A hardware solution is offered in this paper.
3
Problems and previous solutions Data Dependency Code scheduling Waiting or Reservation stations Branch Instructions Delayed branching Branch Prediction Imprecise Interrupts Reorder buffer Reorder buffer with bypass logic
4
Same instruction set as the scalar unit of the CRAY-I Several functional units connected to a common result bus Instruction Fetch Unit Decode and Issue Unit 144 registers Basic Architecture
5
Tomasulo’s Algorithm First presented for the floating-point unit of the IBM 360/91. Extension of this algorithm for the scalar unit of the CRAY-I is presented later. Algorithm: Instruction whose operands are not available is forwarded to a Reservation stations (RS). It waits in the RS until its operands are available. it is dispatched to the appropriate functional unit register is assigned a bit that determines if the register is busy (it is the destination of an instruction). Busy register is assigned a tag which represents the result to be stored in the register.
6
Tomasulo’s Algorithm (Contd...) Fields in Reservation Station Disadvantage: High cost of hardware for register tagging and its associative comparison hardware.
7
Extension to Tomasulo’s Algorithm A Separate Tag Unit Because only few sink registers (busy registers) are active. All tags from active registers are consolidated into Tag Unit Register retains the busy bit Algorithm: At instruction issue time, if a source register is busy, the TU is queried for the current tag of the appropriate register and the tag is forwarded to the reservation stations. If the destination register not busy obtaining tag is straightforward. If it is busy a new tag is obtained. Latest Field is used to keep the register busy even after the old instruction is executed. If the TU is full instruction issue is stopped.
8
Fields in Reservation Station Extension to Tomasulo’s Algorithm (contd…)
9
Other Extensions Merging Reservation Stations into RS pool (Disadvantage: only one instruction can be issued at a time! NO) Merging RS pool with Tag Unit. To make RS Tag Unit (RSTU) Fields in RSTU
10
Implementation of Precise interrupts Reorder Buffer: It allows instructions to finish execution out of order but updates registers, memory, etc. in the order that the instructions were present in the program. So it assures that a precise state of the machine is recoverable at any time. Bypass Logic: An instruction does not have to wait for the reorder buffer to update a source register, it can fetch the value from the reorder buffer (if it is available) and can issue.
11
MERGING DEPENDENCY RESOLUTION AND PRECISE INTERRUPTS RSTU can be made to behave like a reorder buffer if it is forced to update the state of the machine in the order that the instructions are encountered by making it a queue. Modified unit is called Register Update Unit (RUU). It (i) determines which instruction should be issued to the functional units for execution, reserves the result bus and dispatches the instruction to the functional unit, (ii) determines which instruction can commit, i.e., update the state of the machine, (iii) monitors the result bus to resolve dependencies and (iv) provides tags to and accepts new instructions from the decode and issue unit.
12
Fields in RUU
13
Merging … (Contd…) Destination Field In the RSTU the issue logic needed to search the TU to obtain the correct tag for the source operand and to update the latest copy field for the destination Here we use a counter to instead of multiple copies of a destination 2 n-bit counters - Number of Instances (NI) and Latest instance (LI) When an instruction that writes into destination is issued to the RUU, both NI and LI are incremented. LI incremented modulo n. When such instruction leaves the associated NI is decremented. Register tag consists of the register number appended with the LI counter.
14
Merging … (Contd…) Bypass Logic in the RUU case that bypass logic might be helpful is when Ij has completed execution but has not committed when Ii is issued to the RUU (Ii is issued after Ij) To provide bypass logic for this case, the monitoring capabilities of the reservation stations are extended to monitor both the result bus and the RUU to register bus.
15
SIMULATION Simulation Results The benchmark programs used were the Lawrence Livermore loops Large sized RUU is needed to achieve a performance improvement. RUU of size 10 has same hardware requirements as an architecture that has reservation station with each of the functional unit.
17
BRANCH PREDICTION AND CONDITIONAL INSTRUCTIONS To allow conditional execution of instructions, a hardware mechanism is needed that would allow the machine to recover from an incorrect branch prediction. RUU provides a method for nullifying instructions, as for the interrupts.
18
Conclusions combined the issues of hardware dependency-resolution and implementation of precise interrupts. A scheme to resolve dependencies and allowing the out-order- execution is devised with low hardware cost. It is incorporated with precise interrupts. This incorporation made each issue simpler than before. Results of performance evaluation are quite encouraging. This mechanism can be easily extended to support conditional execution of instructions from a predicted path.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.