TDC 311 The Microarchitecture. Introduction As mentioned earlier in the class, one Java statement generates multiple machine code statements Then one.

Slides:



Advertisements
Similar presentations
Fetch-Execute cycle. Memory Read operation Read from memory.
Advertisements

The Fetch – Execute Cycle
Control path Recall that the control path is the physical entity in a processor which: fetches instructions, fetches operands, decodes instructions, schedules.
CS1104: Computer Organisation School of Computing National University of Singapore.
Computer Organization and Architecture
1 Microprocessor History. 2 The date is the year that the processor was first introduced. Many processors are re- introduced at higher clock speeds for.
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
Computer Organization and Architecture
Chapter 6 Microlevel of H1 and V1. We start with some concepts from Chapter 5 that are essential for this chapter.
Chapter 16 Control Unit Operation No HW problems on this chapter. It is important to understand this material on the architecture of computer control units,
Computer Organization and Architecture The CPU Structure.
Computer Architecture I - Class 9
Chapter 12 Pipelining Strategies Performance Hazards.
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Chapter 16 Control Unit Implemntation. A Basic Computer Model.
Chapter 12 CPU Structure and Function. Example Register Organizations.
Chapter 15 IA 64 Architecture Review Predication Predication Registers Speculation Control Data Software Pipelining Prolog, Kernel, & Epilog phases Automatic.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
An Example Implementation
The Microarchitecture Level The level above the digital logic level is the microarchitecture level.  Its job is to implement the ISA (Instruction Set.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
CPU Fetch/Execute Cycle
Micro-operations Are the functional, or atomic, operations of a processor. A single micro-operation generally involves a transfer between registers, transfer.
CMPE 421 Parallel Computer Architecture
Multiple-bus organization
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Mic-1: Microarchitecture University of Fribourg, Switzerland System I: Introduction to Computer Architecture WS December 2006 Béat Hirsbrunner,
ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013.
B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.
Fetch-execute cycle.
COMPILERS CLASS 22/7,23/7. Introduction Compiler: A Compiler is a program that can read a program in one language (Source) and translate it into an equivalent.
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved The Microarchitecture Level.
Microarchitecture. Outline Architecture vs. Microarchitecture Components MIPS Datapath 1.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
The Micro Architecture Level
Our programmer needs to do this !
System Hardware FPU – Floating Point Unit –Handles floating point and extended integer calculations 8284/82C284 Clock Generator (clock) –Synchronizes the.
GROUP 2 CHAPTER 16 CONTROL UNIT Group Members ๏ Evelio L. Hernandez ๏ Ashwin Soerdien ๏ Andrew Keiper ๏ Hermes Andino.
CSIT 301 (Blum)1 Instructions at the Lowest Level Some of this material can be found in Chapter 3 of Computer Architecture (Carter)
Chapter 20 Computer Operations Computer Studies Today Chapter 20.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
Performance improvements ( 1 ) How to improve performance ? Reduce the number of cycles per instruction and/or Simplify the organization so that the clock.
STUDY OF PIC MICROCONTROLLERS.. Design Flow C CODE Hex File Assembly Code Compiler Assembler Chip Programming.
Chapter 10: Computer systems (1)
William Stallings Computer Organization and Architecture 8th Edition
Lecture 16: Basic Pipelining
Cache Memory Presentation I
Instructions at the Lowest Level
Lecture 16: Basic Pipelining
CS149D Elements of Computer Science
The Little Man Computer
Guest Lecturer TA: Shreyas Chand
Lecture 20: OOO, Memory Hierarchy
Fundamental Concepts Processor fetches one instruction at a time and perform the operation specified. Instructions are fetched from successive memory locations.
William Stallings Computer Organization and Architecture 8th Edition
Computer Architecture
MARIE: An Introduction to a Simple Computer
The Von Neumann Architecture Odds and Ends
CS-447– Computer Architecture Lecture 20 Cache Memories
Basic components Instruction processing
Computer Architecture
Presentation transcript:

TDC 311 The Microarchitecture

Introduction As mentioned earlier in the class, one Java statement generates multiple machine code statements Then one machine code statement generates one or more micro-code statements 2

Introduction Continued For example, in Java: counter += 1; Might generate the following machine code: loadreg1,counter increg1 storereg1,counter 3

Reg BB 31 PC 1 MAR 2 MDR 3 Reg A 4 Reg B 5 Reg C 6 ALU Control Store MIR ALU control Add0 Multiply1 Inc A2 Inc B3 A Bus Decoder (assume 31 registers, 0 means no register) B Bus Decoder C Bus ( 32 individual signals) Addr A Bus B Bus C Bus Memory machine code instr Read, write signals Dec A4 Dec B5 AND6 OR7 Pass A8 TwosC A9 4

Clock Subcycles Subcycle 1 – set up signals to drive data path Subcycle 2 – drive A and B buses Subcycle 3 – ALU operation Subcycle 4 – drive C bus Cycle starts here Registers loaded from C Bus Next microinstruction loaded from control store Requires 2 complete clock cycles to perform a microinstruction. 5

Simple Example Java statement: counter += 1; What might the microinstructions look like? loadreg1,counter (Assume the address of counter is currently in Register C) Rd=1; Wr=0; A=00110 (Reg C); B=00000; C=00010 (MAR); ALU=1000 (pass A thru) Rd=1; all else 0 (counter should now be sitting in MDR) Rd=0; Wr=0; A=00011 (MDR); B=00000; C=00100 (Reg A/1); ALU=1000 increg1 Rd=0; Wr=0; A=00100 (Reg A/1); B=00000; C=00100 (Reg A); ALU=0010 (Inc A) storereg1,counter Rd=0; Wr=1; A=00100 (Register A); B=00000; C=00011 (MDR); ALU=8 (assume address of counter is still in MAR) Rd=0; Wr=1; all else 0 6

Design Issues Speed vs. cost  reduce the number of clock cycles needed to execute an instruction  simplify the organization so that the clock cycle can be shorter  overlap the execution of instructions Any way to improve upon the micro- architecture? 7

Design Issues Create independent units that fetch and process the instructions? (double-up on other things? Everything?) Pre-fetch one/two/three instructions? Perform pipelining? 8

Pipeline Example 9

Pipeline Problems Pipe stall – when a subsequent instruction must wait before it can proceed What causes stalls?  waiting for memory  waiting for subsequent instruction  determining the next instruction What if you encounter a branch instruction? Also takes time to fill the pipeline 10

Design Issues Perform branch prediction? Perform out-of-order execution  add two register contents and store in register  increment counter by 1  start a write operation changed to:  add two register contents and store in register  start a write operation  increment counter by 1 11

Design Issues Perform speculative execution? Re-use registers that are no longer used? Have a large register set and keep all current values in registers? Use cache memory? 12

Cache Memory Main memory is usually referenced near one location (locality principle) Program code should be in one location (if good programmer) and data often in another (but grouped together) Bring most recently referenced values into a high speed cache How does the CPU know something is in cache or not? 13

Direct-mapped Cache Most common form of cache memory Let’s consider a cache which has 2048 entries, each entry holding 32 bytes (not bits) of data 2048 entries times 32 bytes per entry equals 64 KB 14

V bitTag (16 bits)Data (32 bytes) : Addresses that use this entry: , ,… 64-95, ,… 32-63, ,… 0-31, , ,… 15

Cache Address When a program generates a 32-bit address, it has the following form: Tag – 16 bitsLine – 11 bitsWord – 3 bitsByte – 2 bits 16

Cache Hit To see if a data item is in the cache, use the 11-bit LINE portion (of the address) to point to one of the 2048 cache row entries Then the 16-bit TAG of the address is compared to the 16-bit TAG value in the cache entry If there is a match, the data is there 17

Cache Hit If the data is there, use the 3-bit WORD portion of the address to tell you which word from the 8 words (32 bytes) in the cache line should be fetched If necessary, the 2-bit BYTE address will tell you which one of the four bytes to fetch 18

Cache Memory Note that since this cache only holds 64KB, it holds data for addresses 0 – But it may also hold data for the addresses – That is why you must compare the TAG fields to see if there is a match 19

Cache Miss If no match (of TAG fields), then there is a cache miss The CPU goes to main memory and fetches the next block of data and stores it in the cache (thus wiping out the old block in the cache) 20

Cache Example Consider that the CPU wants to fetch data from location (or in hex) Tag = Line = Word = 001 Byte = 00 21