Recap: Performance Comparison Computer Architecture Performance Comparison Speedup= CPU execution time = …=……=…. Amdahl‘s Law Speedup(E)= ISA Design
Topics Introducing to Y86 Logical Design & HCL Y86 instruction set architecture Suggested Reading: 4.1 Logical Design & HCL Logic design Hardware Control Language HCL Suggested Reading: 4.2
Introducing to Y86
Understanding the instruction encodings Preparing for designing your Goal Understanding the instruction encodings Preparing for designing your assembler simulator computer
Y86 Processor State Program Registers Condition Codes Program Counter Same 8 as with IA32. Each 32 bits Condition Codes Single-bit flags set by arithmetic or logical instructions OF: Overflow ZF: Zero SF: Negative Program Counter Indicates address of instruction Memory Byte-addressable storage array Words stored in little-endian byte order Program registers %eax %esi %ecx %edi %edx %esp %ebx %ebp Condition codes OF ZF SF PC Memory
Y86 Instructions Format 1--6 bytes of information read from memory Can determine instruction length from first byte Not as many instruction types, and simpler encoding than with IA32 Each accesses and modifies some part(s) of the program state
Errata: JXX and call are 5 bytes long. Y86 Instructions Format (P259) 1--6 bytes of information read from memory Can determine instruction length from first byte Not as many instruction types, and simpler encoding than with IA32 Each accesses and modifies some part(s) of the program state Errata: JXX and call are 5 bytes long.
Errata: JXX and call are 5 bytes long. Format (P259) 1--6 bytes of information read from memory Can determine instruction length from first byte Not as many instruction types, and simpler encoding than with IA32 Each accesses and modifies some part(s) of the program state Errata: JXX and call are 5 bytes long.
Each register has 4-bit ID 1 Encoding Registers Each register has 4-bit ID Same encoding as in IA32, but IA32 using only 3-bit ID Register ID F indicates “no register” Will use this in our hardware design in multiple places %eax %ecx %edx %ebx %esi %edi %esp %ebp 1 2 3 6 7 4 5
Instruction Example Addition Instruction Add value in register rA to that in register rB Store result in register rB Note that Y86 only allows addition to be applied to register data Set condition codes based on result Generic Form Encoded Representation addl rA, rB 6 rA rB
Instruction Example Addition Instruction e.g., addl %eax,%esi Encoding: 60 06 Two-byte encoding First indicates instruction type Second gives source and destination registers Generic Form Encoded Representation addl rA, rB 6 rA rB
Arithmetic and Logical Operations Instruction Code Function Code Add Refer to generically as “OPl” Encodings differ only by “function code” Low-order 4 bites in first instruction word Set condition codes as side effect Notice: no multiply or divide operation addl rA, rB 6 rA rB Subtract (rA from rB) subl rA, rB 6 1 rA rB And andl rA, rB 6 2 rA rB Exclusive-Or xorl rA, rB 6 3 rA rB
Move Operations Register --> Register rrmovl rA, rB irmovl V, rB 2 rA rB irmovl V, rB 3 F rB V Immediate --> Register Register --> Memory rmmovl rA, D(rB) 4 rA rB D Memory --> Register mrmovl D(rB), rA 5 rA rB D Like the IA32 movl instruction Simpler format for memory addresses Give different names to keep them distinct
Move Instruction Examples IA32 Y86 irmovl $0xabcd, %edx movl $0xabcd, %edx rrmovl %esp, %ebx movl %esp, %ebx mrmovl -12(%ebp),%ecx movl -12(%ebp),%ecx rmmovl %esp,0x12345c(%edx) movl %esp,0x12345(%edx) — movl $0xabcd, (%eax) movl %eax, 12(%eax,%edx) movl (%ebp,%eax,4),%ecx
Move Instruction Examples Y86 Encoding irmovl $0xabcd, %edx 30 F2 cd ab 00 00 rrmovl %esp, %ebx 20 43 mrmovl -12(%ebp),%ecx 50 15 f4 ff ff ff rmmovl %esp,0x12345(%edx) 40 42 45 23 01 00
Conditional Move Operations Refer to generically as “cmovXX” Share the same “instruction code” with rrmovl Encodings differ only by “function code” cmovle rA, rB 2 1 rA rB cmovl rA, rB 2 rA rB cmove rA, rB 2 3 rA rB cmovne rA, rB 2 4 rA rB cmovge rA, rB 2 5 rA rB cmovg rA, rB 2 6 rA rB Based on values of condition codes Same as IA32 counterparts
Jump Instructions Refer to generically as “jXX” jmp Dest 7 Jump Unconditionally Dest Refer to generically as “jXX” Encodings differ only by “function code” Based on values of condition codes Same as IA32 counterparts Encode full destination address Unlike PC-relative addressing seen in IA32 jle Dest 7 1 Jump When Less or Equal Dest jl Dest 7 2 Jump When Less Dest je Dest 7 3 Jump When Equal Dest jne Dest 7 4 Jump When Not Equal Dest jge Dest 7 5 Jump When Greater or Equal Dest jg Dest 7 6 Jump When Greater Dest
Y86 Program Stack Region of memory holding program data Stack Bottom Used in Y86 (and IA32) for supporting procedure calls Stack top indicated by %esp Address of top stack element Stack grows toward lower addresses Top element is at lowest address in the stack • Increasing Addresses %esp Stack Top
Stack Operations Decrement %esp by 4 pushl rA a rA F Decrement %esp by 4 Store word from rA to memory at %esp Like IA32 (pushl %esp save old %esp) Read word from memory at %esp Save in rA Increment %esp by 4 Like IA32 (popl %esp movl (%esp) %esp) popl rA b rA F
Subroutine Call and Return call Dest 8 Dest Push address of next instruction onto stack Start executing instructions at Dest Like IA32 Pop value from stack Use as address for next instruction ret 9
Miscellaneous Instructions nop 1 Don’t do anything Stop executing instructions IA32 has comparable instruction, but can’t execute it in user mode We will use it to stop the simulator halt
Errata: JXX and call are 5 bytes long. Y86 Instructions Format (P259) 1--6 bytes of information read from memory Can determine instruction length from first byte Not as many instruction types, and simpler encoding than with IA32 Each accesses and modifies some part(s) of the program state Errata: JXX and call are 5 bytes long.
Errata: JXX and call are 5 bytes long. Format (P259) 1--6 bytes of information read from memory Can determine instruction length from first byte Not as many instruction types, and simpler encoding than with IA32 Each accesses and modifies some part(s) of the program state Errata: JXX and call are 5 bytes long.
Y86 Programs int Sum(int *Start, int Count)-----Figure 4.6 { int sum = 0; while (Count) { sum += *Start; Start++; Count--; } return sum;
Y86 Assembly IA32 code int Sum(int *Start, int Count) 1 Sum: 2 pushl %ebp 3 movl %esp,%ebp 4 movl 8(%ebp),%ecx ecx = Start 5 movl 12(%ebp),%edx edx = Count 6 xorl %eax,%eax sum = 0 7 testl %edx,%edx je .L34 Y86 code int Sum(int *Start, int Count) 1 Sum: 2 pushl %ebp 3 rrmovl %esp,%ebp 4 mrmovl 8(%ebp),%ecx 5 mrmovl 12(%ebp),%edx 6 xorl %eax,%eax 7 andl %edx,%edx Set cc 8 je End
Y86 Instructions IA32 code 9 .L35: 10 addl (%ecx),%eax add *Start to sum 11 addl $4,%ecx Start++ 12 decl %edx Count-- 13 jnz .L35 Stop when 0 14 .L34: 15 movl %ebp,%esp 16 popl %ebp ret Y86 code 9 Loop: 10 mrmovl (%ecx),%esi 11 addl %esi,%eax 12 irmovl $4,%ebx 13 addl %ebx,%ecx 14 irmovl $-1,%ebx 15 addl %ebx,%edx 16 jne Loop 17 End: 18 rrmovl %ebp,%esp 19 popl %ebp 20 ret
Logical Design & HCL
Hardware Control Language (HCL) Logic Design Digital circuit What is digital circuit? Know what a CPU will base on? Hardware Control Language (HCL) A simple and functional language to describe our CPU implementation Syntax like C
Category of Circuit Analog Circuit Use all the range of Signal Most part is amplifier Hard to model and automatic design Use transistor and capacitance as basis We will not discuss it here
Category of Circuit Digital Circuit Has only two values, 0 and 1 Easy to model and design Use true table and other tools to analyze Use gate as the basis The voltage of 1 is differ in different kind circuit. E.g. TTL circuit using 5 voltage as 1
Digital Signals 1 Voltage Time 1 Use voltage thresholds to extract discrete values from continuous signal Simplest version: 1-bit signal Either high range (1) or low range (0) With guard range between them Not strongly affected by noise or low quality circuit elements Can make circuits simple, small, and fast
Overview of Logic Design Fundamental Hardware Requirements Communication How to get values from one place to another Computation Storage (Memory) Clock Signal A periodic signal Clock
Overview of Logic Design Bits are Our Friends Everything expressed in terms of values 0 and 1 Communication Low or high voltage on wire Computation Compute Boolean functions Storage Clock Signal
Category of Digital Circuit Combinational Circuit Without memory. So the circuit can’t have state. Any same input will get the same output at any time. Needn’t clock signal Typical application: ALU
Category of Digital Circuit Sequential Circuit = Combinational circuit + memory and clock signal Have state. Two same inputs may not generate the same output. Use clock signal to control the run of circuit. Typical application: CPU
Computing with Logic Gates And Or Not a a out out a out b b out = a && b out = a || b out = !a Outputs are Boolean functions of inputs Not an assignment operation, just give the circuit a name Respond continuously to changes in inputs With some, small delay Rising Delay Falling Delay a && b b Voltage a Time
Combinational Circuits
Combinational Circuits Acyclic Network Inputs Outputs Acyclic Network of Logic Gates Continuously responds to changes on inputs Outputs become (after some delay) Boolean functions of inputs
Hardware Control Language (HCL) Bit Equality Bit equal a b eq HCL Expression bool eq = (a&&b)||(!a&&!b) Generate 1 if a and b are equal Hardware Control Language (HCL) Very simple hardware description language Boolean operations have syntax similar to C logical operations We’ll use it to describe control logic for processors
Word Equality Word-Level Representation B = eq A HCL Representation Bit equal a31 eq31 b30 a30 eq30 b1 a1 eq1 b0 a0 eq0 eq = B A eq HCL Representation bool Eq = (A == B) 32-bit word size HCL representation Equality operation Generates Boolean value
Bit-Level Multiplexor HCL Expression Bit MUX b s a out bool out = (s&&a)||(!s&&b) Control signal s Data signals a and b Output a when s=1, b when s=0 Its name: MUX Usage: Select one signal from a couple of signals
Word Multiplexor Word-Level Representation HCL Representation ? s s b31 s a31 out31 b30 a30 out30 b0 a0 out0 Word-Level Representation s B A Out MUX HCL Representation ?
Word Multiplexor HCL Representation Select input word A or B depending on control signal s HCL representation Case expression Series of test : value pairs (Don’t require mutually) Output value for first successful test int Out = [ s : A; 1 : B; ]; default case
HCL Word-Level Examples Minimum of 3 Words HCL Representation A Min3 MIN3 B C int Min3 = [ A < B && A < C : A; B < A && B < C : B; 1 : C; ]; 好像这个Min可以简化的。求B时,条件可以为:B<C Find minimum of three input words HCL case expression Final case guarantees match
HCL Word-Level Examples 4-Way Multiplexor HCL Representation D0 D3 Out4 s0 s1 MUX4 D2 D1 int Out4 = [ !s1&&!s0: D0; !s1 : D1; !s0 : D2; 1 : D3; ]; 好像这个Min可以简化的。求B时,条件可以为:B<C Select one of 4 inputs based on two control bits HCL case expression Simplify tests by assuming sequential matching
Arithmetic Logic Unit Combinational logic Y X X + Y A L U Y X X - Y 1 A L U Y X X & Y 2 A L U Y X X ^ Y 3 A B A B A B A B OF ZF SF OF ZF SF OF ZF SF OF ZF SF Combinational logic Continuously responding to inputs Control signal selects function computed Corresponding to 4 arithmetic/logical operations in Y86 Also computes values for condition codes We will use it as a basic component for our CPU
Storage
Storage Clocked Registers Clock e.g. Program Counter(PC), Condition Codes(CC) Hold single words or bits Loaded as clock rises Not “program registers” Clock A periodic signal that determines when new values are to be loaded into the devices I O Clock Clock Rising edge falling
For most of time acts as barrier between input and output Register Operation State = x Rising clock State = y Output = y y x Input = y Output = x Stores data bits For most of time acts as barrier between input and output As clock rises, loads input
State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load
State Machine Example Out In Load Clock Comb. Logic Clock Load In Out State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load ? X0 Clock Load In Out X0
State Machine Example Out In Load Clock Comb. Logic Clock Load In Out State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load X0 X0+X0 X0 X0 Clock Load In Out X0 X0
State Machine Example Out In Load Clock Comb. Logic Clock Load In Out State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load X1 X0+X1 X0 X0 Clock Load In Out X0 X1 X0
State Machine Example Out In Load Clock Comb. Logic Clock Load In Out State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load X1 X0+X1+X1 X0+X1 X0+X1 Clock Load In Out X0 X1 X0 X0+X1
State Machine Example Out In Load Clock Comb. Logic Clock Load In Out State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load X2 X0+X1+X2 X0+X1 X0+X1 Clock Load In Out X0 X1 X2 X0 X0+X1
State Machine Example Out In Load Clock Comb. Logic Clock Load In Out State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load X2 X0…X2+X2 X0…X2 X0…X2 Clock Load In Out X0 X1 X2 X0 X0+X1 X0…X2
State Machine Example Out In Load Clock Comb. Logic Clock Load In Out State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load X3 X0…X2 X0…X2 Clock Load In Out X0 X1 X2 X3 X0 X0+X1 X0…X2
State Machine Example Out In Load Clock Comb. Logic Clock Load In Out State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load X3 X3 X3 Clock Load In Out X0 X1 X2 X3 X0 X0+X1 X0…X2 X3
State Machine Example Accumulator circuit State Machine Example Comb. Logic A L U Out MUX 1 Clock In Load Accumulator circuit Load or accumulate on each cycle Clock Load In Out X0 X1 X2 X3 X4 X5 X0 X0+X1 X0…X2 X3 X3+X4 X3…X5
Random-access memories Storage Random-access memories e.g. Register File, Memory Hold multiple words Address input specifies which word to read or write Possible multiple read or write ports Read word when address input changes Write word as clock rises
Register File Register file Multiple Ports A B W dstW srcA valA srcB valB valW Read ports Write port Clock Register file Holds values of program registers %eax, %esp, etc. Register identifier serves as address ID “F” implies no read or write performed Multiple Ports Can read and/or write multiple words in one cycle Each has separate address and data input/output
Register File Timing Reading x Writing x 2 y x Rising clock y 2 Like combinational logic Output data generated based on input address (After some delay) Writing Like register Update only as clock rises valA Register file 2 x A srcA x valB srcB B 2 Register file W dstW valW Clock y 2 Register file W dstW valW Clock 2 x Rising clock y 2
Memory Memory Ports Holds program data and instructions Data Memory address read write Clock data in data out error Memory Holds program data and instructions Ports A single address input A data input for writing, and a data output for reading Error signal means invalid address
Summary Storage Clocked Registers Random-access memories Hold single words (e.g., PC, CC) Loaded as clock rises Random-access memories Hold multiple words (e.g., Register File, Memory) Possible multiple read or write ports Read word when address input changes Write word as clock rises