Computer Organization Chapter 4

Slides:



Advertisements
Similar presentations
INSTRUCTION SET ARCHITECTURES
Advertisements

The CPU Revision Typical machine code instructions Using op-codes and operands Symbolic addressing. Conditional and unconditional branches.
Randal E. Bryant Carnegie Mellon University CS:APP2e CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture.
PipelinedImplementation Part I CSC 333. – 2 – Overview General Principles of Pipelining Goal Difficulties Creating a Pipelined Y86 Processor Rearranging.
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
Lecture 5 Sept 14 Goals: Chapter 2 continued MIPS assembly language instruction formats translating c into MIPS - examples.
Instruction Set Architecture CSC 333. – 2 – Instruction Set Architecture Assembly Language View Processor state Registers, memory, … Instructions addl,
Henry Hexmoor1 Chapter 10- Control units We introduced the basic structure of a control unit, and translated assembly instructions into a binary representation.
Randal E. Bryant CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture SequentialImplementation Slides.
Datapath Design II Topics Control flow instructions Hardware for sequential machine (SEQ) Systems I.
Lecture 18 Last Lecture Today’s Topic Instruction formats
David O’Hallaron Carnegie Mellon University Processor Architecture PIPE: Pipelined Implementation Part I Processor Architecture PIPE: Pipelined Implementation.
Y86 Processor State Program Registers
CS:APP2e CS:APP Chapter 4 Computer Architecture Instruction Set Architecture.
Processor Architecture: The Y86 Instruction Set Architecture
Randal E. Bryant adapted by Jason Fritts CS:APP2e CS:APP Chapter 4 Computer Architecture Instruction Set Architecture CS:APP Chapter 4 Computer Architecture.
1 Seoul National University Pipelined Implementation : Part I.
Chapter 4: Processor Architecture How does the hardware execute the instructions? We’ll see by studying an example system  Based on simple instruction.
Cis303a_chapt04.ppt Chapter 4 Processor Technology and Architecture Internal Components CPU Operation (internal components) Control Unit Move data and.
Randal E. Bryant Carnegie Mellon University CS:APP CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture.
Randal E. Bryant adapted by Jason Fritts CS:APP2e CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture.
Datapath Design I Topics Sequential instruction execution cycle Instruction mapping to hardware Instruction decoding Systems I.
1 ICS 51 Introductory Computer Organization Fall 2009.
Based on slides by Patrice Belleville CPSC 121: Models of Computation Unit 10: A Working Computer.
26-Nov-15 (1) CSC Computer Organization Lecture 6: Pentium IA-32.
Oct. 25, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 9: Alternative Instruction Sets * Jeremy R. Johnson Wed. Oct. 25, 2000.
Assembly Language. Symbol Table Variables.DATA var DW 0 sum DD 0 array TIMES 10 DW 0 message DB ’ Welcome ’,0 char1 DB ? Symbol Table Name Offset var.
1 Sequential CPU Implementation. 2 Outline Logic design Organizing Processing into Stages SEQ timing Suggested Reading 4.2,4.3.1 ~
1 Processor Architecture. 2 Topics Write Y86 code Basic Logic design Hardware Control Language HCL Suggested Reading: 4.1, 4.2.
1 Processor Architecture. Coverage Our Approach –Work through designs for particular instruction set Y86---a simplified version of the Intel IA32 (a.k.a.
Computer Architecture
CS 3843 Computer Organization Prof. Qi Tian Fall 2013
1 SEQ CPU Implementation. 2 Outline SEQ Implementation Suggested Reading 4.3.1,
Sequential Hardware “God created the integers, all else is the work of man” Leopold Kronecker (He believed in the reduction of all mathematics to arguments.
Real-World Pipelines Idea –Divide process into independent stages –Move objects through stages in sequence –At any given times, multiple objects being.
Logic Gates Dr.Ahmed Bayoumi Dr.Shady Elmashad. Objectives  Identify the basic gates and describe the behavior of each  Combine basic gates into circuits.
Sequential CPU Implementation Implementation. – 2 – Processor Suggested Reading - Chap 4.3.
1 Seoul National University Sequential Implementation.
CPSC 121: Models of Computation
Real-World Pipelines Idea Divide process into independent stages
CPSC 121: Models of Computation
Dr.Ahmed Bayoumi Dr.Shady Elmashad
Assembly language.
IA32 Processors Evolutionary Design
Lecture 13 Y86-64: SEQ – sequential implementation
Lecture 14 Y86-64: PIPE – pipelined implementation
Module 10: A Working Computer
Sequential Implementation
Samira Khan University of Virginia Feb 14, 2017
Computer Architecture adapted by Jason Fritts then by David Ferry
asum.ys A Y86 Programming Example
Y86 Processor State Program Registers
Pipelined Implementation : Part I
Processor Architecture: The Y86-64 Instruction Set Architecture
Seoul National University
Instruction Decoding Optional icode ifun valC Instruction Format
Systems I Pipelining II
Pipelined Implementation : Part I
Processor Architecture: The Y86-64 Instruction Set Architecture
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Homework Reading Machine Projects Labs PAL, pp
Pipelined Implementation : Part I
Recap: Performance Comparison
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Systems I Pipelining II
Chapter 4 Processor Architecture
Systems I Pipelining II
Sequential CPU Implementation
CS-447– Computer Architecture M,W 10-11:20am Lecture 5 Instruction Set Architecture Sep 12th, 2007 Majd F. Sakr
Sequential Design תרגול 10.
Presentation transcript:

Computer Organization Chapter 4 Prof. Qi Tian Fall 2013

Topics Dec. 6 (Friday) Dec. 4 (Wednesday) Dec. 2 (Monday) Final Exam Review Record Check Dec. 4 (Wednesday) 5 variable Karnaugh Map Quiz 5 Dec. 2 (Monday) 3, 4 variables Karnaugh Map Reminder: Assignment 6 is due (extended) on Wednesday Dec 4. Last quiz on Wednesday Dec 4 Final exam review on Friday. Course evaluation on ASAP by Dec 2.

Topics Nov. 27 (Wednesday) Nov. 25 (Monday) Nov. 22 (Friday) Minimum sum-of-product solution 2 variable Karnaugh map Nov. 25 (Monday) Truth Table Minterm and Maxterm Nov. 22 (Friday) Practice Problems 4.2 Digital Logic Design Function Complete

Topics Nov. 20 (Wednesday) Nov. 18 (Monday) Nov. 13 (Wednesday) Midterm Exam Two Practice Problems 4.1 Nov. 18 (Monday) Guest Lecture by Prof. Dakai Zhu Nov. 13 (Wednesday) Y86 Instruction Set Slides 1-13

Section 4.1 The Y86 Instruction Set Architecture We will look at an assembly language set Y86 Simpler than IA32 but similar to it Compared to IA32, Y86 has fewer data types, instructions, and addressing modes. Y86 is inspired by IA32 instruction set, which is colloquially referred to as “x86” Understand how it is encoded, and how you would build hardware to implement it.

Section 4.1.1 Programmer-Visible State CC: Condition codes RF: Program Registers Stat: Program status %eax %esi %ecx %edi %edx %esp %ebx %ebp ZF SF OF DMEM: Memory PC The Y86 8 32-bit registers with the same names as the IA32 32-bit registers 3 condition codes: ZF, SF, OF (not carry flag – interpret integers as signed) A program counter (PC) A program status byte: AOK, HLT, ADR, INS Memory: up to 4 GB to hold program and data The Y86 does not have A carry flag Floating point registers

Section 4.1.1 Programmer-Visible State CC: Condition codes RF: Program Registers Stat: Program status %eax %esi %ecx %edi %edx %esp %ebx %ebp ZF SF OF DMEM: Memory PC Register %esp is used as stack pointer by the push, pop, call and return instructions. Other registers do not have fixed meanings or values. Single-bit condition codes: ZF, SF, OF, storing information about the effect of the most recent arithmetic or logical instructions. The program counter (PC) holds the address of the instruction currently being executed. Memory is conceptually a large array of bytes, holding both program and data. Status code: Stat, indicating the overall state of program execution. It will indicate either normal operation, or that some sort of exception has occurred.

Section 4.1.2 Y86 instruction Y86 instruction set Instruction encodings range between 1 and 6 bytes An instruction consists of an 1-byte instruction specifier Possibly a 1-byte register specifier Possibly a 4-byte constant word Field fn specifies a particular integer operation (OP1), data movement condition (cmovXX), or branch condition (jXX). A numeric values are shown in hexadecimal.

Section 4.1.3 Instruction Encoding rA or rB represent one of the registers, encoded as follows: Number Register Name %eax 1 %ecx 2 %edx 3 %ebx 4 %esp 5 %ebp 6 %esi 7 %edi F No register Different opcodes for 4 types of moves: (rr) Register to register (ir) immediate to register (rm) register to memory (mr) memory to register

Section 4.1.3 Instruction Encoding The only memory addressing mode is base register + displacement No second register and scaling factor Memory operations always move 4 bytes (no byte or 2 bytes word memory operations Source or destination of memory move must be a register. The operations supported (OP1) are: fn operation 0 addl 1 subl 2 andl 3 xorl Only 32-bit operations and no or and no not. These only take registers as operands and only work on 32bits.

Section 4.1.3 Instruction Encoding 7 jumps instructions: fn jump 0 jmp 1 jle 2 jl 3 je 4 jne 5 jge 6 jg 6 conditional move instructions with encodings similar to the conditional jump instructions. Similar to the IA32 Note that rrmovl is a special case. You can tell the type of instruction and how many bytes it has by looking at the first byte of the instruction.

Figure 4.3. Function codes for Y86 instruction set Moves Operations Branches addl 6 jmp 7 jne 7 4 rrmovl 2 cmovne 2 4 subl 6 1 jle 7 1 jge 7 5 cmovle 2 1 cmovge 2 5 andl 6 2 jl 7 2 jg 7 6 cmovl 2 cmovg 2 6 xorl 6 3 je 7 3 cmove 2 3 The code specifies a particular integer operation, branch condition, or data transfer condition. These instructions are shown as OP1, jXX, and cmovXX in Figure 4.2

Summary of Section 4.1.2-4.1.3: Y86 instruction set fn jump jmp 1 jle 2 jl 3 je 4 jne 5 jge 6 jg Number Register Name %eax 1 %ecx 2 %edx 3 %ebx 4 %esp 5 %ebp 6 %esi 7 %edi F No register 7 jump functions fn operation addl 1 subl 2 andl 3 xorl Program register identifiers Operations supported Branches Moves Operations addl 6 jmp jne rrmovl 2 cmovne 2 4 subl 6 1 jle jge cmovle 2 1 cmovge 2 5 andl 6 2 jl jg cmovl 2 cmovg 2 6 xorl 6 3 je cmove 2 3

Section 4.1.2 Y86 instruction Y86 is largely a subset of the IA32 instruction set. Include only 4-byte integer operations, has fewer addressing modes, and includes a smaller set of operations. Since we only use 4-byte data, we can refer to these as “words” without ambiguity.

Instruction Encoding Examples rrmovl %eax, %ecx The encodings are: 2001 This would be stored in 2 bytes of memory, the first containing 0x20 and the second containing 0x01. rmmovl %ecx, 24(%ebp) The encodings are: 401524000000 The first two bytes are 4015 and the displacement is 0x24. On a little endian machine the next byte would be 0x24 followed by 3 bytes of 0.

Practice Problem 4.1 Determine the byte encoding of the Y86 instruction sequences that follows. The line “.pos 0x100” indicates that the starting address of the object code should be 0x100. .pos 0x100 # start code at address 0x100 irmovl $15, %ebx # load 15 into %ebx rrmovl %ebx, %ecx # copy 15 to %ecx loop: # loop rmmovl %ecx, -3(%ebx) # save %ecx at address 15-3=12 addl %ebx, %ecx # increment %ecx by 15 jmp loop # Goto loop

Practice Problem 4.1 - Solution Determine the byte encoding of the Y86 instruction sequences that follows. The line “.pos 0x100” indicates that the starting address of the object code should be 0x100. .pos 0x100 # start code at address 0x100 irmovl $15, %ebx # load 15 into %ebx 0x100: 30f30f000000 rrmovl %ebx, %ecx # copy 15 to %ecx 0x106: 2031 loop: # loop 0x108: rmmovl %ecx, -3(%ebx) # save %ecx at address 15-3=12 0x108: 4013fdffffff addl %ebx, %ecx # increment %ecx by 15 0x10e: 6031 jmp loop # Goto loop 0x110: 7008010000

Practice Problem 4.2 For each byte sequence listed, determine the Y86 instruction sequences it encodes. If there is some invalid byte in the sequence, show the instruction sequence up to that point and indicate where the invalid value occurs. For each sequence, we show that the starting address, then a colon, and then the byte sequence. 0x100: 30f3fcffffff40630008000000 0x200: a06f80080200000030f30a00000090

Practice Problem 4.2 - Solution For each byte sequence listed, determine the Y86 instruction sequences it encodes. If there is some invalid byte in the sequence, show the instruction sequence up to that point and indicate where the invalid value occurs. For each sequence, we show that the starting address, then a colon, and then the byte sequence. 0x100: 30f3fcffffff40630008000000 0x100: irmovl $-4, %ebx 0x106: rmmovl %esi, 0x800(%ebx) Note: -4 = fffffffc 0x10c: halt B. 0x200: a06f80080200000030f30a00000090 0x200: pushl %esi 0x202: call proc 0x207: halt 0x208: proc 0x208: irmovl $10, %ebx 0x20e: ret

Y86 vs IA32 Encodings of the Y86 are simpler than the IA32, but not as compact. IA 32 is sometimes labeled as CISC and is deemed to be the opposite of RISC. RISC and CISC RISC = reduced instruction set computer CISC = complex instruction set computer Basic ideas of RISC Small number of instructions Most instructions have the same length Simple addressing formats Arithmetic and logical operations only work on registers Memory operations only move between register and memory No condition codes: test instructions store results in registers. Long controversy between RISC and CISC since 1980’s (Read textbook pp. 342-344) Which is better? Answer: A combination Which is Y86? It includes both RISC and CISC On the CISC side, it has conditional codes, variable-length instructions, and stack-intensive procedure linkages. On the RISC side, it uses a load-store architecture and a regular encoding. Taking IA32 and simplifying it by applying the principle of RISC.

Section 4.1.4 Y86 Exceptions What happens when an invalid assembly instruction is found? This generates an exception. In Y86 an exception halts the machine, it stops executing. What are some possible causes of exceptions? Invalid operation Divide by 0 Sqrt of negative number Memory access error (e.g., address too large) Hardware error

Section 4.1.4 Y86 Exceptions Value Name Meaning 1 AOK Normal operation 2 HLT Halt instruction encountered 3 ADR Invalid address encountered 4 INS Invalid instruction encountered Y86 status codes. In our design, the processor halts for any code other than AOK

Y86 Examples Example 1: Example 2: Example 3: IA32 addl (%ecx), %eax Cannot be finished in one instruction 2 instructions to implement: mrmovl (%ecx), %esi addl %esi, %eax Example 2: IA 32: addl $4, %ecx Y86: irmovl $4, %ebx addl %ebx, %ecx Example 3: IA 32: addl (%ebx, %edx, 4), %eax Y86: How many Y86 instructions are needed to do this? For example, it needs 5 Y86 instructions addl %edx, %edx addl %edx, %ebx mrmovl (%ebx), %ecx addl %ecx, %eax

Section 4.2 Logic Design Section 4.2.1 Logic gates Logic gate: simplest building block, 1-2 inputs and 1 output; Boolean function such as AND, OR, and NOT Hardware Description Language (HDL) Currently, circuits are designed using a HDL. Much like a C code: for example, an AND gate is represented by a && b AND OR NOT

Section 4.2.2 Combinational Circuits No memory vs. clocked sequential circuits, has memory Building blocks: logic gates Design an economic circuit Algebraic methods for simplication Karnaugh maps Alternative way

Section 4.2.2 Combinational Circuits Example: bit equal 1) bool eq = (a&&b) || (!a && !b) Alternative way

Section 4.2.2 Combinational Circuits A block diagram: We can make a multi-bit equal out of 1-bit equals Here is a block diagram

Example: 1-bit multiplexer It allows you to select one of two one-bit inputs (data selector) and is described by: bool out = (s && a) || (!s && b) Here is a block diagram s = 1, out = a; s = 0, out = b;

Example: a multi-bit multiplexer We can make a multi-bit (word level) mux out of 1-bit muxes: HCL descriptions of the mux: Int Out = [ s: A; 1: B; ]; […] is like a select, it means if s is true, the result is A. Otherwise, we check the next case. 1 is always true, so we select B.

Example: 4-word MUX Here is a 4 word mux (4-way mux) HCL description: int Out4 = [ !s1 && !s0 : A; !s1 : B; !s0 : C; 1 : D; ]; s1s0 out 00 A 01 B 10 C 11 D Ans: 3 input control: s2s1s0 Question: How many control inputs would be needed for a 7-way mux?

Other Gates and Basic Building Blocks XOR gate: Out = a^b = !a && b || a && !b

Function Complete Function complete: Any circuits can be made from and, or, and not gates can also be made just using and and not gates; or or and not gates Because: a || b = ! (!a && !b); a && b = ! (!a || !b) The function complete sets: (and, or, not), (and, or), (or, not) Any single gates can be used as functionally complete sets? Ans: Yes, they are NAND () gate and NOR (|) gate. Questions: Prove {}, and {|} are function complete.

Function Complete Proof of functional complete for NAND {} Proof of functional complete for NOR {|}

Adders 1-bit Half Adder: 2 inputs (A, B) and 2 outputs (S, C) Note 0+0=1 1 0+1=1 1+0=1 1+1=2 Truth Table S = A^B C = A&&B

1-bit Full Adders 1-bit Full Adder: 3 inputs (A,B,Cin) and 2 outputs (S, Cout) A B Cin Cout S 1 Truth Table

1-bit Full Adder Assignment 6 A B

Class Notes Topics: Minterm mi Maxterm Mi Standard sum-of-product Standard product-of-sum Karnaugh-Map Minimum sum-of-product Minimum product-of-sum Note: See class notes in the course web page under Resources Link.

Karnaugh Maps Design: Kaunaugh Map Start from Truth table => Karnaugh Maps => Boolean expressions Kaunaugh Map Useful tool for simplifying and manipulating switching functions of three or four variables. Similar to truth tables, but in different representation.

4-bit Full Adder 4-bit full adder which takes as input two 4-bit number and a carry coming in and produce a 5 bits of output. Input: 9 bits Output: 5 bits How to design it? Using Truth table? How big is it? Not efficient for Kaunaugh-Map.

4-bit Full Adder 4-bit full adder which takes as input two 4-bit number and a carry coming in and produce a 5 bits of output. Can be designed in a cascade way!

A little more about Logic Design Propagation Delay Real gates are made from transistors, voltages are used to represent Boolean values true (1) and false (0) A voltage greater than a true-threshold is true, and a voltage less than false-threshold is false. Voltages between these two thresholds give undefined results. When you change the input from high to low, it takes some time, called the propagation delay, or gate delay, for the output voltage to reach its correct value. Propagation delay determines how fast your CPU can run.

ALU (Arithmetic Logic Unit) An ALU is a circuit that can produce one of several arithmetic (add, subtract, etc.) or logical (and, or, etc.) functions. Block diagram of this ALU ALU Design

Section 4.2.5 Memory and Clocking So far, we have talked about combinational circuits Clocked Sequential Circuits: Has memory; clock input Flip-Flops S-R Flip-Flop, D Flip-flop, J-K Flip-Flop, T Flip-Flop, edge-triggered D Flip-Flop and the building block of a multi-bit register

Section 4.3.1 Organizing Processing Steps into Stages SEQ: a “sequential” processor Processing an instruction involves a number of operations, and we organize them in a particular sequence of stages, attempting to make all instructions follow a uniform sequence. Design a processor that makes best use of the hardware.

SEQ Hardware Structure The computations required to implement all of the Y86 instructions can be organized into six basic stages: fetch, decode, execute, memory, write back, and PC update. See Figure 4.22 for a better quality.

An Informal Description Fetch Read the instruction into memory using the address in the PC. Decode If possible, read the values from the register file and set valA and valB. The registers are specified by rA and rB except for push and pop which use %esp in place of rB. Execute What it does depends on the icode. Some instructions feed values into the ALU to obtain a valE and possibly set the condition codes. e.g., OP1, rmmovl, mrmovl Some instructions will check the condition codes and change the valP. Memory May read from or write to memory Write back May write up to two values to the register file. Pop will update both the stack pointer and the register popped into. PC Update PC is set valP.

Sample Y86 instruction sequence Stage OP1 rA, rB rrmovl rA, rB irmovl V, rB Fetch icode:ifun M1[PC] rA:rB  M1[PC+1] valP  PC +2 valC  M4[PC+2] valP  PC +6 Decode valA  R[rA] valB  R[rB] Execute valE  valB OP valA Set CC valE  0 + valA valE  0 + valC Memory Write back R[rB]  valE PC update PC  valP Figure 4. 18 Computations in sequential implementation of Y86 instruction OP1, rrmovl, irmovl. OP1: integer and logical operations; rrmovl (register-to-register move) and irmovl (immediate-to-register move)

Sample Y86 instruction sequence 0x000: 30f209000000 | irmovl $9, %edx 0x006: 30f315000000 | irmovl $21, %ebx 0x00c: 6123 | subl %edx, %ebx 0x00e: 30f480000000 | irmovl $128, %esp 0x014: 404364000000 | rmmovl %esp, 100(%ebx) 0x01a: a02f | pushl %edx 0x01c: b00f | popl %eax 0x01e: 7328000000 | je done 0x023: 8029000000 | call proc 0x028: | done: 0x028: 00 | halt 0x029: | proc: 0x029: 90 | ret Questions: We will trace the processing of these instructions.

Practice Problem Fill-in the right-hand column of the following table to describe the processing of the irmovl instruction online 4 of the object code in previous slide. Stage Generic irmovl V, rB Specific irmovl $128, %esp Fetch icode:ifun M1[PC] rA:rB  M1[PC+1] valC  M4[PC+2] valP  PC +6 Decode Execute valE  0 + valC Memory Write back R[rB]  valE PC update PC  valP

Practice Problem - solution Fill-in the right-hand column of the following table to describe the processing of the irmovl instruction on line 4 of the object code in previous slide. Stage Generic irmovl V, rB Specific irmovl $128, %esp Fetch icode:ifun M1[PC] rA:rB  M1[PC+1] valC  M4[PC+2] valP  PC +6 icode:ifun M1[0x00e]=3:0 rA:rB  M1[0x00f]=f:4 valC  M4[PC+2]=128 valP  PC +6 = 0x014 Decode Execute valE  0 + valC valE  0 + valC = 0 + 128=128 Memory Write back R[rB]  valE R[rB]  128 PC update PC  valP PC  0x14

Sample Y86 instruction sequence Stage rmmovl rA, D(rB) mrmovl D(rB), rA Fetch icode:ifun M1[PC] rA:rB  M1[PC+1] valC  M4[PC+2] valP  PC +6 Decode valA  R[rA] valB  R[rB] Execute valE  valB + valC Memory M4[valE]  valA valM  M4[valE] Write back R[rA]  valM PC update PC  valP Figure 4. 19 Computations in sequential implementation of Y86 instruction rmmovl, mrmovl. These instructions read or write memory.

Sample Y86 instruction sequence Stage pushl rA pop1 rA Fetch icode:ifun M1[PC] rA:rB  M1[PC+1] valP  PC + 2 Decode valA  R[rA] valB  R[%esp] valA  R[%esp] Execute valE  valB + (-4) valE  valB + 4 Memory M4[valE]  valA valM  M4[valA] Write back R[%esp]  valE R[rA]  valM PC update PC  valP Figure 4. 20 Computations in sequential implementation of Y86 instruction pushl, popl. These instructions push and pop the stack.

Sample Y86 instruction sequence Stage jXX Dest Call Dest ret Fetch icode:ifun M1[PC] valC  M4[PC+1] valP  PC + 5 Icode:ifun M1[PC] valP  PC + 1 Decode valB  R[%esp] valA  R[%esp] Execute Cnd  Cond(CC, ifun) valE  valB + (-4) valE  valB + 4 Memory M4[valE]  valP valM  M4[valA] Write back R[%esp]  valE PC update PC  Cnd? valC: valP PC  valC PC  valM Figure 4. 21 Computations in sequential implementation of Y86 instruction jXX, call, ret. These instructions cause control transfers.