Recap: Performance Comparison

Slides:



Advertisements
Similar presentations
1 Seoul National University Logic Design. 2 Overview of Logic Design Seoul National University Fundamental Hardware Requirements  Computation  Storage.
Advertisements

David O’Hallaron Carnegie Mellon University Processor Architecture Logic Design Processor Architecture Logic Design
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
Lecture Notes from Randal E. Bryant, CMU CS:APP Chapter 4 Computer Architecture Instruction Set Architecture CS:APP Chapter 4 Computer Architecture Instruction.
– 1 – , F’02 ICS05 Instructor: Peter A. Dinda TA: Bin Lin Recitation 4.
Instruction Set Architecture CSC 333. – 2 – Instruction Set Architecture Assembly Language View Processor state Registers, memory, … Instructions addl,
CS:APP CS:APP Chapter 4 Computer Architecture Control Logic and Hardware Control Language CS:APP Chapter 4 Computer Architecture Control Logic and Hardware.
Stack Activation Records Topics IA32 stack discipline Register saving conventions Creating pointers to local variables February 6, 2003 CSCE 212H Computer.
Computer ArchitectureFall 2008 © Sep 3 rd, 2008 Majd F. Sakr CS-447– Computer Architecture.
Randal E. Bryant CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture SequentialImplementation Slides.
Datapath Design II Topics Control flow instructions Hardware for sequential machine (SEQ) Systems I.
Y86 Processor State Program Registers
CS:APP2e CS:APP Chapter 4 Computer Architecture Instruction Set Architecture.
Processor Architecture: The Y86 Instruction Set Architecture
Randal E. Bryant adapted by Jason Fritts CS:APP2e CS:APP Chapter 4 Computer Architecture Instruction Set Architecture CS:APP Chapter 4 Computer Architecture.
Fabián E. Bustamante, Spring 2007 Machine-Level Programming III - Procedures Today IA32 stack discipline Register saving conventions Creating pointers.
HCL and ALU תרגול 10. Overview of Logic Design Fundamental Hardware Requirements – Communication: How to get values from one place to another – Computation.
1 Seoul National University Logic Design. 2 Overview of Logic Design Seoul National University Fundamental Hardware Requirements  Computation  Storage.
Chapter 4: Processor Architecture How does the hardware execute the instructions? We’ll see by studying an example system  Based on simple instruction.
Randal E. Bryant Carnegie Mellon University CS:APP CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture.
Datapath Design I Topics Sequential instruction execution cycle Instruction mapping to hardware Instruction decoding Systems I.
Based on slides by Patrice Belleville CPSC 121: Models of Computation Unit 10: A Working Computer.
Architecture (I) Processor Architecture. – 2 – Processor Goal Understand basic computer organization Instruction set architecture Deeply explore the CPU.
Computer Architecture Carnegie Mellon University
1 Sequential CPU Implementation. 2 Outline Logic design Organizing Processing into Stages SEQ timing Suggested Reading 4.2,4.3.1 ~
CS:APP3e CS:APP Chapter 4 Computer Architecture Logic Design CS:APP Chapter 4 Computer Architecture Logic Design CENG331 - Computer Organization Murat.
1 Processor Architecture. 2 Topics Write Y86 code Basic Logic design Hardware Control Language HCL Suggested Reading: 4.1, 4.2.
1 Processor Architecture. Coverage Our Approach –Work through designs for particular instruction set Y86---a simplified version of the Intel IA32 (a.k.a.
Chapter 4: Processor Architecture
IA32 Stack –Region of memory managed with stack discipline –Grows toward lower addresses –Register %esp indicates lowest stack address address of top element.
1 Seoul National University Sequential Implementation.
CPSC 121: Models of Computation
CPSC 121: Models of Computation
Reading Condition Codes (Cont.)
IA32 Processors Evolutionary Design
Lecture 13 Y86-64: SEQ – sequential implementation
Homework Reading Labs PAL, pp
Lecture 12 Logic Design Review & HCL & Bomb Lab
Computer Architecture
Seoul National University
Aaron Miller David Cohen Spring 2011
Recitation 2 – 2/4/01 Outline Machine Model
Sequential Implementation
Ch. 2 Two’s Complement Boolean vs. Logical Floating Point
Computer Architecture adapted by Jason Fritts then by David Ferry
asum.ys A Y86 Programming Example
Y86 Processor State Program Registers
Machine-Level Programming 4 Procedures
Processor Architecture: The Y86-64 Instruction Set Architecture
Instruction Decoding Optional icode ifun valC Instruction Format
Systems I Pipelining II
Machine-Level Programming 2 Control Flow
Processor Architecture: The Y86-64 Instruction Set Architecture
Machine-Level Programming III: Procedures Sept 18, 2001
Machine-Level Representation of Programs III
Homework Reading Machine Projects Labs PAL, pp
Pipelined Implementation : Part I
Introduction to Computer System
Machine-Level Programming: Introduction
Systems I Pipelining II
Chapter 4 Processor Architecture
Systems I Pipelining II
Disassembly תרגול 7 ניתוח קוד.
Introduction to Computer System
Sequential CPU Implementation
Introduction to the Architecture of Computers
CS-447– Computer Architecture M,W 10-11:20am Lecture 5 Instruction Set Architecture Sep 12th, 2007 Majd F. Sakr
Sequential Design תרגול 10.
Presentation transcript:

Recap: Performance Comparison Computer Architecture Performance Comparison Speedup= CPU execution time = …=……=…. Amdahl‘s Law Speedup(E)= ISA Design

Topics Introducing to Y86 Logical Design & HCL Y86 instruction set architecture Suggested Reading: 4.1 Logical Design & HCL Logic design Hardware Control Language HCL Suggested Reading: 4.2

Introducing to Y86

Understanding the instruction encodings Preparing for designing your Goal Understanding the instruction encodings Preparing for designing your assembler simulator computer

Y86 Processor State Program Registers Condition Codes Program Counter Same 8 as with IA32. Each 32 bits Condition Codes Single-bit flags set by arithmetic or logical instructions OF: Overflow ZF: Zero SF: Negative Program Counter Indicates address of instruction Memory Byte-addressable storage array Words stored in little-endian byte order Program registers %eax %esi %ecx %edi %edx %esp %ebx %ebp Condition codes OF ZF SF PC Memory

Y86 Instructions Format 1--6 bytes of information read from memory Can determine instruction length from first byte Not as many instruction types, and simpler encoding than with IA32 Each accesses and modifies some part(s) of the program state

Errata: JXX and call are 5 bytes long. Y86 Instructions Format (P259) 1--6 bytes of information read from memory Can determine instruction length from first byte Not as many instruction types, and simpler encoding than with IA32 Each accesses and modifies some part(s) of the program state Errata: JXX and call are 5 bytes long.

Errata: JXX and call are 5 bytes long. Format (P259) 1--6 bytes of information read from memory Can determine instruction length from first byte Not as many instruction types, and simpler encoding than with IA32 Each accesses and modifies some part(s) of the program state Errata: JXX and call are 5 bytes long.

Each register has 4-bit ID 1 Encoding Registers Each register has 4-bit ID Same encoding as in IA32, but IA32 using only 3-bit ID Register ID F indicates “no register” Will use this in our hardware design in multiple places %eax %ecx %edx %ebx %esi %edi %esp %ebp 1 2 3 6 7 4 5

Instruction Example Addition Instruction Add value in register rA to that in register rB Store result in register rB Note that Y86 only allows addition to be applied to register data Set condition codes based on result Generic Form Encoded Representation addl rA, rB 6 rA rB

Instruction Example Addition Instruction e.g., addl %eax,%esi Encoding: 60 06 Two-byte encoding First indicates instruction type Second gives source and destination registers Generic Form Encoded Representation addl rA, rB 6 rA rB

Arithmetic and Logical Operations Instruction Code Function Code Add Refer to generically as “OPl” Encodings differ only by “function code” Low-order 4 bites in first instruction word Set condition codes as side effect Notice: no multiply or divide operation addl rA, rB 6 rA rB Subtract (rA from rB) subl rA, rB 6 1 rA rB And andl rA, rB 6 2 rA rB Exclusive-Or xorl rA, rB 6 3 rA rB

Move Operations Register --> Register rrmovl rA, rB irmovl V, rB 2 rA rB irmovl V, rB 3 F rB V Immediate --> Register Register --> Memory rmmovl rA, D(rB) 4 rA rB D Memory --> Register mrmovl D(rB), rA 5 rA rB D Like the IA32 movl instruction Simpler format for memory addresses Give different names to keep them distinct

Move Instruction Examples IA32 Y86 irmovl $0xabcd, %edx movl $0xabcd, %edx rrmovl %esp, %ebx movl %esp, %ebx mrmovl -12(%ebp),%ecx movl -12(%ebp),%ecx rmmovl %esp,0x12345c(%edx) movl %esp,0x12345(%edx) — movl $0xabcd, (%eax) movl %eax, 12(%eax,%edx) movl (%ebp,%eax,4),%ecx

Move Instruction Examples Y86 Encoding irmovl $0xabcd, %edx 30 F2 cd ab 00 00 rrmovl %esp, %ebx 20 43 mrmovl -12(%ebp),%ecx 50 15 f4 ff ff ff rmmovl %esp,0x12345(%edx) 40 42 45 23 01 00

Conditional Move Operations Refer to generically as “cmovXX” Share the same “instruction code” with rrmovl Encodings differ only by “function code” cmovle rA, rB 2 1 rA rB cmovl rA, rB 2 rA rB cmove rA, rB 2 3 rA rB cmovne rA, rB 2 4 rA rB cmovge rA, rB 2 5 rA rB cmovg rA, rB 2 6 rA rB Based on values of condition codes Same as IA32 counterparts

Jump Instructions Refer to generically as “jXX” jmp Dest 7 Jump Unconditionally Dest Refer to generically as “jXX” Encodings differ only by “function code” Based on values of condition codes Same as IA32 counterparts Encode full destination address Unlike PC-relative addressing seen in IA32 jle Dest 7 1 Jump When Less or Equal Dest jl Dest 7 2 Jump When Less Dest je Dest 7 3 Jump When Equal Dest jne Dest 7 4 Jump When Not Equal Dest jge Dest 7 5 Jump When Greater or Equal Dest jg Dest 7 6 Jump When Greater Dest

Y86 Program Stack Region of memory holding program data Stack Bottom Used in Y86 (and IA32) for supporting procedure calls Stack top indicated by %esp Address of top stack element Stack grows toward lower addresses Top element is at lowest address in the stack • Increasing Addresses %esp Stack Top

Stack Operations Decrement %esp by 4 pushl rA a rA F Decrement %esp by 4 Store word from rA to memory at %esp Like IA32 (pushl %esp  save old %esp) Read word from memory at %esp Save in rA Increment %esp by 4 Like IA32 (popl %esp  movl (%esp) %esp) popl rA b rA F

Subroutine Call and Return call Dest 8 Dest Push address of next instruction onto stack Start executing instructions at Dest Like IA32 Pop value from stack Use as address for next instruction ret 9

Miscellaneous Instructions nop 1 Don’t do anything Stop executing instructions IA32 has comparable instruction, but can’t execute it in user mode We will use it to stop the simulator halt

Errata: JXX and call are 5 bytes long. Y86 Instructions Format (P259) 1--6 bytes of information read from memory Can determine instruction length from first byte Not as many instruction types, and simpler encoding than with IA32 Each accesses and modifies some part(s) of the program state Errata: JXX and call are 5 bytes long.

Errata: JXX and call are 5 bytes long. Format (P259) 1--6 bytes of information read from memory Can determine instruction length from first byte Not as many instruction types, and simpler encoding than with IA32 Each accesses and modifies some part(s) of the program state Errata: JXX and call are 5 bytes long.

Y86 Programs int Sum(int *Start, int Count)-----Figure 4.6 { int sum = 0; while (Count) { sum += *Start; Start++; Count--; } return sum;

Y86 Assembly IA32 code int Sum(int *Start, int Count) 1 Sum: 2 pushl %ebp 3 movl %esp,%ebp 4 movl 8(%ebp),%ecx ecx = Start 5 movl 12(%ebp),%edx edx = Count 6 xorl %eax,%eax sum = 0 7 testl %edx,%edx je .L34 Y86 code int Sum(int *Start, int Count) 1 Sum: 2 pushl %ebp 3 rrmovl %esp,%ebp 4 mrmovl 8(%ebp),%ecx 5 mrmovl 12(%ebp),%edx 6 xorl %eax,%eax 7 andl %edx,%edx Set cc 8 je End

Y86 Instructions IA32 code 9 .L35: 10 addl (%ecx),%eax add *Start to sum 11 addl $4,%ecx Start++ 12 decl %edx Count-- 13 jnz .L35 Stop when 0 14 .L34: 15 movl %ebp,%esp 16 popl %ebp ret Y86 code 9 Loop: 10 mrmovl (%ecx),%esi 11 addl %esi,%eax 12 irmovl $4,%ebx 13 addl %ebx,%ecx 14 irmovl $-1,%ebx 15 addl %ebx,%edx 16 jne Loop 17 End: 18 rrmovl %ebp,%esp 19 popl %ebp 20 ret

Logical Design & HCL

Hardware Control Language (HCL) Logic Design Digital circuit What is digital circuit? Know what a CPU will base on? Hardware Control Language (HCL) A simple and functional language to describe our CPU implementation Syntax like C

Category of Circuit Analog Circuit Use all the range of Signal Most part is amplifier Hard to model and automatic design Use transistor and capacitance as basis We will not discuss it here

Category of Circuit Digital Circuit Has only two values, 0 and 1 Easy to model and design Use true table and other tools to analyze Use gate as the basis The voltage of 1 is differ in different kind circuit. E.g. TTL circuit using 5 voltage as 1

Digital Signals 1 Voltage Time 1 Use voltage thresholds to extract discrete values from continuous signal Simplest version: 1-bit signal Either high range (1) or low range (0) With guard range between them Not strongly affected by noise or low quality circuit elements Can make circuits simple, small, and fast

Overview of Logic Design Fundamental Hardware Requirements Communication How to get values from one place to another Computation Storage (Memory) Clock Signal A periodic signal Clock

Overview of Logic Design Bits are Our Friends Everything expressed in terms of values 0 and 1 Communication Low or high voltage on wire Computation Compute Boolean functions Storage Clock Signal

Category of Digital Circuit Combinational Circuit Without memory. So the circuit can’t have state. Any same input will get the same output at any time. Needn’t clock signal Typical application: ALU

Category of Digital Circuit Sequential Circuit = Combinational circuit + memory and clock signal Have state. Two same inputs may not generate the same output. Use clock signal to control the run of circuit. Typical application: CPU

Computing with Logic Gates And Or Not a a out out a out b b out = a && b out = a || b out = !a Outputs are Boolean functions of inputs Not an assignment operation, just give the circuit a name Respond continuously to changes in inputs With some, small delay Rising Delay Falling Delay a && b b Voltage a Time

Combinational Circuits

Combinational Circuits Acyclic Network Inputs Outputs Acyclic Network of Logic Gates Continuously responds to changes on inputs Outputs become (after some delay) Boolean functions of inputs

Hardware Control Language (HCL) Bit Equality Bit equal a b eq HCL Expression bool eq = (a&&b)||(!a&&!b) Generate 1 if a and b are equal Hardware Control Language (HCL) Very simple hardware description language Boolean operations have syntax similar to C logical operations We’ll use it to describe control logic for processors

Word Equality Word-Level Representation B = eq A HCL Representation Bit equal a31 eq31 b30 a30 eq30 b1 a1 eq1 b0 a0 eq0 eq = B A eq HCL Representation bool Eq = (A == B) 32-bit word size HCL representation Equality operation Generates Boolean value

Bit-Level Multiplexor HCL Expression Bit MUX b s a out bool out = (s&&a)||(!s&&b) Control signal s Data signals a and b Output a when s=1, b when s=0 Its name: MUX Usage: Select one signal from a couple of signals

Word Multiplexor Word-Level Representation HCL Representation ? s s b31 s a31 out31 b30 a30 out30 b0 a0 out0 Word-Level Representation s B A Out MUX HCL Representation ?

Word Multiplexor HCL Representation Select input word A or B depending on control signal s HCL representation Case expression Series of test : value pairs (Don’t require mutually) Output value for first successful test int Out = [ s : A; 1 : B; ]; default case

HCL Word-Level Examples Minimum of 3 Words HCL Representation A Min3 MIN3 B C int Min3 = [ A < B && A < C : A; B < A && B < C : B; 1 : C; ]; 好像这个Min可以简化的。求B时,条件可以为:B<C Find minimum of three input words HCL case expression Final case guarantees match

HCL Word-Level Examples 4-Way Multiplexor HCL Representation D0 D3 Out4 s0 s1 MUX4 D2 D1 int Out4 = [ !s1&&!s0: D0; !s1 : D1; !s0 : D2; 1 : D3; ]; 好像这个Min可以简化的。求B时,条件可以为:B<C Select one of 4 inputs based on two control bits HCL case expression Simplify tests by assuming sequential matching

Arithmetic Logic Unit Combinational logic Y X X + Y A L U Y X X - Y 1 A L U Y X X & Y 2 A L U Y X X ^ Y 3 A B A B A B A B OF ZF SF OF ZF SF OF ZF SF OF ZF SF Combinational logic Continuously responding to inputs Control signal selects function computed Corresponding to 4 arithmetic/logical operations in Y86 Also computes values for condition codes We will use it as a basic component for our CPU

Storage

Storage Clocked Registers Clock e.g. Program Counter(PC), Condition Codes(CC) Hold single words or bits Loaded as clock rises Not “program registers” Clock A periodic signal that determines when new values are to be loaded into the devices I O Clock Clock Rising edge falling

For most of time acts as barrier between input and output Register Operation State = x Rising clock State = y Output = y y x Input = y Output = x Stores data bits For most of time acts as barrier between input and output As clock rises, loads input

State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load

State Machine Example Out In Load Clock Comb. Logic Clock Load In Out State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load ? X0 Clock Load In Out X0

State Machine Example Out In Load Clock Comb. Logic Clock Load In Out State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load X0 X0+X0 X0 X0 Clock Load In Out X0 X0

State Machine Example Out In Load Clock Comb. Logic Clock Load In Out State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load X1 X0+X1 X0 X0 Clock Load In Out X0 X1 X0

State Machine Example Out In Load Clock Comb. Logic Clock Load In Out State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load X1 X0+X1+X1 X0+X1 X0+X1 Clock Load In Out X0 X1 X0 X0+X1

State Machine Example Out In Load Clock Comb. Logic Clock Load In Out State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load X2 X0+X1+X2 X0+X1 X0+X1 Clock Load In Out X0 X1 X2 X0 X0+X1

State Machine Example Out In Load Clock Comb. Logic Clock Load In Out State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load X2 X0…X2+X2 X0…X2 X0…X2 Clock Load In Out X0 X1 X2 X0 X0+X1 X0…X2

State Machine Example Out In Load Clock Comb. Logic Clock Load In Out State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load X3 X0…X2 X0…X2 Clock Load In Out X0 X1 X2 X3 X0 X0+X1 X0…X2

State Machine Example Out In Load Clock Comb. Logic Clock Load In Out State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load X3 X3 X3 Clock Load In Out X0 X1 X2 X3 X0 X0+X1 X0…X2 X3

State Machine Example Accumulator circuit State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load Accumulator circuit Load or accumulate on each cycle Clock Load In Out X0 X1 X2 X3 X4 X5 X0 X0+X1 X0…X2 X3 X3+X4 X3…X5

Random-access memories Storage Random-access memories e.g. Register File, Memory Hold multiple words Address input specifies which word to read or write Possible multiple read or write ports Read word when address input changes Write word as clock rises

Register File Register file Multiple Ports A B W dstW srcA valA srcB valB valW Read ports Write port Clock Register file Holds values of program registers %eax, %esp, etc. Register identifier serves as address ID “F” implies no read or write performed Multiple Ports Can read and/or write multiple words in one cycle Each has separate address and data input/output

Register File Timing Reading x Writing x 2 y x Rising clock y 2 Like combinational logic Output data generated based on input address (After some delay) Writing Like register Update only as clock rises valA Register file 2 x A srcA x valB srcB B 2 Register file W dstW valW Clock y 2 Register file W dstW valW Clock 2 x Rising clock y 2

Memory Memory Ports Holds program data and instructions Data Memory address read write Clock data in data out error Memory Holds program data and instructions Ports A single address input A data input for writing, and a data output for reading Error signal means invalid address

Summary Storage Clocked Registers Random-access memories Hold single words (e.g., PC, CC) Loaded as clock rises Random-access memories Hold multiple words (e.g., Register File, Memory) Possible multiple read or write ports Read word when address input changes Write word as clock rises