Computer Architecture I - Class 9

Slides:

Advertisements

Similar presentations

CPU Structure and Function

Advertisements

Chapter 4 - MicroArchitecture

CS364 CH16 Control Unit Operation

Chapter 8: Central Processing Unit

Shannon Tauro/Jerry Lebowitz Computer Organization Design of MicroArchitecture Level Tannenbaum 4.4.

Computer Systems. Computer System Components Computer Networks.

Chapter 16 Control Unit Operation No HW problems on this chapter. It is important to understand this material on the architecture of computer control units,

CSCE 121, Sec 200, 507, 508 Fall 2010 Prof. Jennifer L. Welch.

Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved The Microarchitecture Level.

Chapter 16 Control Unit Implemntation. A Basic Computer Model.

Mic-1: Microarchitecture University of Fribourg, Switzerland System I: Introduction to Computer Architecture WS January 2006

Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.

Chapter 15 IA 64 Architecture Review Predication Predication Registers Speculation Control Data Software Pipelining Prolog, Kernel, & Epilog phases Automatic.

An Example Implementation

The Microarchitecture Level The level above the digital logic level is the microarchitecture level.  Its job is to implement the ISA (Instruction Set.

Processor Types And Instruction Sets Barak Perelman CS147 Prof. Lee.

Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.

Lecture 13 - Introduction to the Central Processing Unit (CPU)

Computer Science 210 Computer Organization The Instruction Execution Cycle.

The von Neumann Model – Chapter 4 COMP 2620 Dr. James Money COMP

Basic Operational Concepts of a Computer

Micro-operations Are the functional, or atomic, operations of a processor. A single micro-operation generally involves a transfer between registers, transfer.

Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.

4-1 Chapter 4 - The Instruction Set Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of.

Lecture 16 Today’s topics: –MARIE Instruction Decoding and Control –Hardwired control –Micro-programmed control 1.

TDC 311 The Microarchitecture. Introduction As mentioned earlier in the class, one Java statement generates multiple machine code statements Then one.

Microcode Source: Digital Computer Electronics (Malvino and Brown)

EXECUTION OF COMPLETE INSTRUCTION

An Example Implementation  In principle, we could describe the control store in binary, 36 bits per word.  We will use a simple symbolic language to.

4-1 Chapter 4 - The Instruction Set Architecture Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles.

The CPU Central Processing Unit. 2 Reminder - how it fits together processor (CPU) memory I/O devices bus.

Microarchitecture Level 1 Introduction to Computer Architecture, Bachelor Course, 1st Semester, University of Fribourg, Switzerland © Béat Hirsbrunner.

The Microarchitecture Level

Mic-1: Microarchitecture University of Fribourg, Switzerland System I: Introduction to Computer Architecture WS December 2006 Béat Hirsbrunner,

Fetch-execute cycle.

September 26, 2001Systems Architecture I1 Systems Architecture I (CS ) Lecture 2: Implementation of a Simplified Computer Jeremy R. Johnson Wednesday,

CSC 235 Computer Organization. Computer Organizaton ä Top_Level Structure ä The von-Neumann Machine ä Stack Machine ä Accumulator Machine ä Load/Store.

Computer Organization 1 Instruction Fetch and Execute.

Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved The Microarchitecture Level.

Lecture 15 Microarchitecture Level: Level 1. Microarchitecture Level The level above digital logic level. Job: to implement the ISA level above it. The.

Microarchitecture. Outline Architecture vs. Microarchitecture Components MIPS Datapath 1.

The Micro Architecture Level

Overview von Neumann Architecture Computer component Computer function

Basic Elements of Processor ALU Registers Internal data pahs External data paths Control Unit.

HOW COMPUTERS WORK THE CPU & MEMORY. THE PARTS OF A COMPUTER.

Question What technology differentiates the different stages a computer had gone through from generation 1 to present?

Designing a CPU –Reading a programs instruction from memory –Decoding the instruction –Executing the instruction –Transferring Data to/From memory / IO.

RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.

Jeremy R. Johnson William M. Mongan

Computer Operation. Binary Codes CPU operates in binary codes Representation of values in binary codes Instructions to CPU in binary codes Addresses in.

BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.

Types of Micro-operation  Transfer data between registers  Transfer data from register to external  Transfer data from external to register  Perform.

Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 251 Introduction to Computer Organization.

1 CE 454 Computer Architecture Lecture 8 Ahmed Ezzat The Microarchitecture Level, Ch-4.4, 4.5,

Performance improvements ( 1 ) How to improve performance ? Reduce the number of cycles per instruction and/or Simplify the organization so that the clock.

Lecture 13 - Introduction to the Central Processing Unit (CPU)

CE 454 Computer Architecture

Advanced Topic: Alternative Architectures Chapter 9 Objectives

Computer Architecture

Computer Science 210 Computer Organization

The Processor and Machine Language

Computer Science 210 Computer Organization

Functional Units.

Systems Architecture I (CS ) Lecture 2: A Simplified Computer

Fundamental Concepts Processor fetches one instruction at a time and perform the operation specified. Instructions are fetched from successive memory locations.

A Discussion on Assemblers

A Level Computer Science Topic 5: Computer Architecture and Assembly

Computer Architecture Assembly Language

Computer Architecture

Presentation transcript:

Computer Architecture I - Class 9 Today’s class Microarchitecture Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 Data Path The part of the CPU containing the ALU, its inputs, and its outputs This is an example data path for the IJVM microarchitecture developed in your text IJVM = Integer Java Virtual Machine (a subset of the Java Virtual Machine that only does integer operations) The registers identified here are only accessible at the microarchitecture level (by the microprogram). Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

ALU Signals and Functions 6 control lines F0 and F1 determine ALU operation ENA and ENB for enabling the inputs INVA for inverting the left input INC for forcing a carry into the low-order bit, effectively adding 1 to the result Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 Data Path Timing Subcycles of the data path cycle: The control signals are set up (Δw). The registers are loaded onto the B bus (Δx). The ALU and shifter operate (Δy). The results propagate along the C bus back to the registers (Δz). MPC = MicroProgram Counter MIR = MicroInstruction Register Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 Memory Example machine has two different ways to communicate with memory 32-bit word-addressable memory port Controlled by two registers MAR (Memory Address Register) MDR (Memory Data Register) 8-bit byte-addressable memory port Controlled by one register PC, which reads 1 byte into the low-order 8 bits of MBR Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 Memory Operation Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Microinstruction Format Addr – contains the address of a potential next microinstruction JAM – determines how the next microinstruction is selected ALU – ALU and shifter functions C – selects which registers are written from the C bus Mem – memory functions B – selects the B bus source, encoded as shown Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 Sequencer Which control signals should be enabled on each cycle? Determined by the sequencer Must produce two kinds of information each cycle: The state of every control signal in the system The address of the microinstruction that is to be executed next Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 Detailed block diagram of the complete microarchitecture of our example machine, which is being called the Mic-1. Two parts: data path on the left and the control section on the right. Control store – holds microprogram High Bit – computes (JAMZ AND Z) OR (JAMN AND N) MPC takes on either NEXT_ADDRESS (Addr field) or NEXT_ADDRESS with the high-order bit set to 1 Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Stacks for Local Variable Storage Procedures need somewhere to store local variables that won’t be interfered with by recursive calls Suppose procedure A has three local variables. As shown in (a), register LV points to the start of the local variable frame on the stack, and register SP points to the top of the stack. Suppose procedure B has four local variables, and A calls B. The result is as shown in (b). Suppose procedure C has two local variables, and B calls C. The result is shown in (c). Now suppose C, and then B, return, and then A calls procedure D which has five local variables. The situation is as shown in (d). Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Stacks for Operand Storage Need someplace to hold operands during an arithmetic operation Suppose that procedure A has to compute a1=a2+a3 before calling B. It can do this by pushing a2 onto the stack (a), pushing a3 onto the stack (b), doing the actual computation by popping the stack twice, forming the sum, and pushing the result back on the stack (c), and finally popping the stack and assigning the result to the variable (d). Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 The IJVM Memory Model Address space can be thought of as 4 GB or 1 GW (gigawords), each word 4 bytes. Constant pool – cannot be written by an IJVM program. Consists of constants, strings, and pointers to other areas of memory that can be referenced. Method area – contains the program. CPP, SP, and LV are all word registers, and their contents are word addresses. PC is a byte register, and its content is a byte address. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

The IJVM Instruction Set Operands byte, const, and varnum are 1 byte. Operands disp, index, and offset are 2 bytes. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 Calling a Procedure Push onto the stack a reference (pointer) to the object to be called (since our architecture is not object-oriented, this will not be used but is retained for consistency) Push the procedure’s parameters onto the stack Execute INVOKEVIRTUAL instruction Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

The Stack and INVOKEVIRTUAL The first four bytes of the called procedure contain special information: First two bytes are a 16-bit integer indicating number of parameters for the procedure Second two bytes are a 16-bit integer indicating the size of the local variable area for the procedure (need this to allocate space on the stack for them, as in the right hand picture here) Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 The Stack and IRETURN Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 Compiling Java to IJVM is a simple Java code fragment. is the code produced by compiling (a) into IJVM is the hex code produced by assembling (b) (we assume i is variable 1, j is variable 2, and k is variable 3) Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Microprogram for the Mic-1 Consider how the IADD instruction will work. It will add the top two stack elements and put the result back on the stack. The TOS is already available. Need to fetch the next-to-top element of the stack. iadd1 will initiate the fetch of the next-to-top stack element. Note that the memory fetch will not occur in this clock cycle, so TOS remains unaffected. iadd2 will execute in the second clock cycle and will store the TOS in H, while the read operation from the first step works. iadd3 will actually do the addition. MDR contains the operand fetched from memory (originally the next-to-top item on the stack). Note that this is not the complete microprogram for the Mic-1. The rest can be found in your text. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 In-Class Exercise Give two different IJVM translations for the Java statement i=k+n+5; (refer to the instruction set on page 250 of your text) How long does a 2.5 GHz Mic-1 take to execute the Java statement i=j+k;? Give your answer in nanoseconds. (Refer to the code on page 254 of your text, and to the microprogram code on page 262) Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Increasing Speed of Execution Reduce the number of clock cycles needed to execute an instruction Simplify the organization so that the clock cycle can be shorter Overlap the execution of instructions Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Merging the Interpreter Loop with the Microcode One way of reducing the number of clock cycles is to find some dead ALU time and take advantage of it. This can often be accomplished by merging the interpreter loop with the microcode. On top here is the original code for the POP instruction. Note the dead cycle while memory is being read. Use it to fetch the next instruction information, as shown in the bottom. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

A Three Bus Architecture A second approach to shorten execution path length is to have two full input buses to the ALU. Consider the ILOAD instruction. In the top box, our original design, we see that LV is copied into H in iload1; the only reason for doing so is so that it can be added to MBRU in iload2. There’s no way to add two arbitrary registers, which is what we need to do here. With a three-bus design, we can save this first step, and thus save a cycle (see the lower box). Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

An Instruction Fetch Unit The ALU is used nearly every cycle for a variety of operations having to do with fetching the instruction and assembling the fields within the instruction, in addition to the real “work” of the instruction Need to free the ALU from some of these tasks Create an independent unit to fetch and process the instructions Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 The Mic-2 Datapath The Mic-2 incorporates the three previous design decisions to speed things up. It haas an instruction fetch unit, it has two full buses going into the ALU, and it merges the main loop into each instruction. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Microprogram for the Mic-2 Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 In-Class Exercise Reconsider the Java statement i=j+k; How long does it take to execute this statement on a 2.5 GHz Mic-2? (refer to the microprogram for the Mic-2 on page 282 of your text) Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 Pipelining – The Mic-3 3 cycles to operate: Load A and B Perform the operation and load C Write the results back to the registers Each of these pieces is called a microstep Three latches (registers) on the buses: A, B, and C. The latches are written on every cycle. We can speed up the clock because the maximum delay is now shorter. We can use all parts of the data path during every cycle. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 The Mic-2 Code for SWAP We will now look at this code on the Mic-3. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

The Implementation of SWAP on the Mic-3 Note that even though there are more cycles used in the Mic-3, each cycle on the Mic-3 is faster than the Mic-2, so we do get an overall improvement. (The cycle time on the Mic-3 is 1/3 that of the Mic-2.) Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 Pipeline Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Cache Memory A system with three levels of cache. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 Cache Model Main memory is divided up into fixed-size blocks called cache lines A cache line consists of 4 to 64 consecutive bytes Lines are numbered consecutively starting at 0 Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Computer Architecture I - Class 9 Direct-Mapped Caches This cache contains 2048 entries, each entry (row) holds one cache line from main memory. Let’s use a 32-byte cache line size, so this cache can hold 64 KB. The fields in the cache are: Valid = one bit to indicate whether there is any valid data in this entry or not Tag = a unique 16-bit value identifying the corresponding line of memory from which the data came Data = copy of the data in memory; this holds one cache line of 32 bytes. A 32-bit virtual address is broken up into four fields: TAG = corresponds to the Tag bits stored in the cache entry LINE = indicates which cache entry holds the corresponding data, if they are present WORD = which word within a line is referenced BYTE = only used if a single byte is requested and indicates which byte within the word is needed; for a cache supplying only 32-bit words this field will always be 0 Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9

Set-Associative Caches Allow multiple entries for each line number, to minimize collisions. This is a 4-way set associative cache. Saturday, October 20, 2007 Computer Architecture I - Class 9 Saturday, October 20, 2007 Computer Architecture I - Class 9