Microprocessors.

Microprocessors

CMOS transistor on silicon
The basic electrical component in digital systems Acts as an on/off switch Voltage at “gate” controls whether current flows from source to drain Don’t confuse this “gate” with a logic gate gate source drain Conducts if gate=1 source drain oxide gate IC package IC channel Silicon substrate 1

CMOS transistor implementations
Complementary Metal Oxide Semiconductor We refer to logic levels Typically 0 is 0V, 1 is 5V Two basic CMOS types nMOS conducts if gate=1 pMOS conducts if gate=0 Hence “complementary” Basic gates Inverter, NAND, NOR gate source drain nMOS Conducts if gate=1 gate source drain pMOS Conducts if gate=0 x F = x' 1 inverter F = (xy)' x 1 y NAND gate 1 F = (x+y)' x y NOR gate

Basic logic gates x F 1 x y F 1 x y F 1 x y F 1 F = x Driver F = x y
1 F x y x y F 1 F y x x y F 1 x y F x y F 1 F = x Driver F = x y AND F = x + y OR F = x  y XOR x F x F 1 x y F x y F 1 x y F x y F 1 x y F x y F 1 F = x’ Inverter F = (x y)’ NAND F = (x+y)’ NOR F = x y XNOR

Combinational logic design
A) Problem description y is 1 if a is to 1, or b and c are 1. z is 1 if b or c is to 1, but not both, or if all are 1. B) Truth table 1 Inputs a b c Outputs y z C) Output equations y = a'bc + ab'c' + ab'c + abc' + abc z = a'b'c + a'bc' + ab'c + abc' + abc D) Minimized output equations 00 1 01 11 10 a bc y y = a + bc z z = ab + b’c + bc’ E) Logic Gates a b c y z

Combinational components
With enable input e  all O’s are 0 if e=0 With carry-in input Ci sum = A + B + Ci May have status outputs carry, zero, etc. O = I0 if S=0..00 I1 if S=0..01 … I(m-1) if S=1..11 O0 =1 if I=0..00 O1 =1 if I=0..01 O(n-1) =1 if I=1..11 sum = A+B (first n bits) carry = (n+1)’th bit of A+B less = 1 if A<B equal =1 if A=B greater=1 if A>B O = A op B op determined by S. n-bit, m x 1 Multiplexor O S0 S(log m) n I(m-1) I1 I0 log n x n Decoder O1 O0 O(n-1) I(log n -1) n-bit Adder A B sum carry Comparator less equal greater n bit, m function ALU

Sequential components
clear n-bit Register n load I Q shift I Q n-bit Shift register n-bit Counter n Q Q = 0 if clear=1, I if load=1 and clock=1, Q(previous) otherwise. Q = lsb - Content shifted - I stored in msb Q = 0 if clear=1, Q(prev)+1 if count=1 and clock=1.

Gated R-S Latch (clocked S-R flip-flop)
Enb = 1, latch closed (outputs unchanged) Enb = 0, enabled (outputs depend on inputs)

J-K Flip-flop How to eliminate the forbidden state?
Idea: use output feedback to guarantee that R and S are never both one J, K both one yields toggle Characteristic Equation: Q+ = Q K + Q J

Sequential logic design
A) Problem Description You want to construct a clock divider. Slow down your pre-existing clock so that you output a 1 for every four clock cycles C) Implementation Model Combinational logic State register a x I0 I1 Q1 Q0 D) State Table (Moore-type) 1 Inputs Q1 Q0 a Outputs I1 I0 x 1 2 3 x=0 x=1 a=1 a=0 B) State Diagram Given this implementation model Sequential logic design quickly reduces to combinational logic design

Sequential logic design (cont.)
1 Q1Q0 I1 I1 = Q1’Q0a + Q1a’ + Q1Q0’ 00 11 10 a 01 I0 I0 = Q0a’ + Q0’a x = Q1Q0 x E) Minimized Output Equations F) Combinational Logic a Q1 Q0 I0 I1 x

Basic Architecture Control unit and datapath Key differences
Note similarity to single-purpose processor Key differences Datapath is general Control unit doesn’t store the algorithm – the algorithm is “programmed” into the memory Processor Control unit Datapath ALU Registers IR PC Controller Memory I/O Control /Status

Datapath Operations Load Read memory location into register
Processor Control unit Datapath ALU ALU operation Input certain registers through ALU, store back in register Controller +1 Control /Status Registers Store Write register to memory location 10 11 PC IR I/O ... Memory 10 11 ...

Control Unit Control unit: configures the datapath operations
Sequence of desired operations (“instructions”) stored in memory – “program” Instruction cycle – broken into several sub-operations, each one clock cycle, e.g.: Fetch: Get next instruction into IR Decode: Determine what the instruction means Fetch operands: Move data from memory to datapath register Execute: Move data through the ALU Store results: Write data from register to memory Processor Control unit Datapath ALU Registers IR PC Controller Memory I/O Control /Status 10 ... load R0, M[500] 500 501 100 inc R1, R0 101 store M[501], R1 102 R0 R1

Control Unit Sub-Operations
Fetch Get next instruction into IR PC: program counter, always points to next instruction IR: holds the fetched instruction Processor Control unit Datapath ALU Controller Control /Status Registers PC 100 IR R0 R1 load R0, M[500] I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1

Decode Determine what the instruction means Processor Control unit Datapath ALU Controller Control /Status Registers PC 100 IR R0 R1 load R0, M[500] I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1

Fetch operands Move data from memory to datapath register Processor Control unit Datapath ALU Controller Control /Status Registers 10 PC 100 IR R0 R1 load R0, M[500] I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1

Execute Move data through the ALU This particular instruction does nothing during this sub-operation Processor Control unit Datapath ALU Controller Control /Status Registers 10 PC 100 IR R0 R1 load R0, M[500] I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1

Store results Write data from register to memory This particular instruction does nothing during this sub-operation Processor Control unit Datapath ALU Controller Control /Status Registers 10 PC 100 IR R0 R1 load R0, M[500] I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1

Instruction Cycles PC=100 Fetch ops Store results Fetch Decode Exec.
Processor Control unit Datapath ALU Registers IR PC Controller Memory I/O Control /Status 10 ... load R0, M[500] 500 501 100 inc R1, R0 101 store M[501], R1 102 R0 R1 10 Fetch ops Store results Fetch load R0, M[500] Decode Exec. clk 100

Instruction Cycles PC=100 PC=101 Fetch ops Store results Fetch Decode
Processor Control unit Datapath ALU Registers IR PC Controller Memory I/O Control /Status 10 ... load R0, M[500] 500 501 100 inc R1, R0 101 store M[501], R1 102 R0 R1 Fetch ops Store results Fetch Decode Exec. clk +1 11 Exec. PC=101 Fetch ops Store results inc R1, R0 Fetch Decode clk 10 101

Instruction Cycles PC=100 PC=101 PC=102 Fetch ops Store results Fetch
Processor Control unit Datapath ALU Registers IR PC Controller Memory I/O Control /Status 10 ... load R0, M[500] 500 501 100 inc R1, R0 101 store M[501], R1 102 R0 R1 Fetch ops Store results Fetch Decode Exec. clk PC=101 Fetch ops Store results Fetch Decode Exec. Decode clk 10 11 11 Store results 102 store M[501], R1 Fetch PC=102 Fetch ops Exec. clk

Architectural Considerations
N-bit processor N-bit ALU, registers, buses, memory data interface Embedded: 8-bit, 16-bit, 32-bit common Desktop/servers: 32-bit, even 64 PC size determines address space Processor Control unit Datapath ALU Registers IR PC Controller Memory I/O Control /Status

Architectural Considerations
Clock frequency Inverse of clock period Must be longer than longest register to register delay in entire processor Memory access is often the longest Processor Control unit Datapath ALU Registers IR PC Controller Memory I/O Control /Status

Pipelining: Increasing Instruction Throughput
Wash 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Non-pipelined Pipelined Dry 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 non-pipelined dish cleaning Time pipelined dish cleaning Time Fetch-instr. 1 2 3 4 5 6 7 8 Decode 1 2 3 4 5 6 7 8 Fetch ops. 1 2 3 4 5 6 7 8 Pipelined Execute 1 2 3 4 5 6 7 8 Instruction 1 Store res. 1 2 3 4 5 6 7 8 Time pipelined instruction execution

Superscalar and VLIW Architectures
Performance can be improved by: Faster clock (but there’s a limit) Pipelining: slice up instruction into stages, overlap stages Multiple ALUs to support more than one instruction stream Superscalar Scalar: non-vector operations Fetches instructions in batches, executes as many as possible May require extensive hardware to detect independent instructions VLIW: each word in memory has multiple independent instructions Relies on the compiler to detect and schedule instructions Currently growing in popularity

Two Memory Architectures
Processor Program memory Data memory Memory (program and data) Harvard Princeton Princeton Fewer memory wires Harvard Simultaneous program and data memory access

Cache Memory Memory access may be slow
Cache is small but fast memory close to processor Holds copy of part of memory Hits and misses Processor Memory Cache Fast/expensive technology, usually on the same chip Slower/cheaper technology, usually on a different chip

Programmer’s View Programmer doesn’t need detailed understanding of architecture Instead, needs to know what instructions can be executed Two levels of instructions: Assembly level Structured languages (C, C++, Java, etc.) Most development today done using structured languages But, some assembly level programming may still be necessary Drivers: portion of program that communicates with and/or controls (drives) another device Often have detailed timing considerations, extensive bit manipulation Assembly level may be best for these

Assembly-Level Instructions
opcode operand1 operand2 ... Instruction 1 Instruction 2 Instruction 3 Instruction 4 Instruction Set Defines the legal set of instructions for that processor Data transfer: memory/register, register/register, I/O, etc. Arithmetic/logical: move register through ALU and back Branches: determine next PC value when not just PC+1

A Simple (Trivial) Instruction Set
Assembly instruct. First byte Second byte Operation MOV Rn, direct 0000 Rn direct Rn = M(direct) MOV direct, Rn 0001 Rn direct M(direct) = Rn Rm 0010 Rn Rm M(Rn) = Rm MOV Rn, #immed. 0011 Rn immediate Rn = immediate ADD Rn, Rm 0100 Rn Rm Rn = Rn + Rm SUB Rn, Rm 0101 Rn Rm Rn = Rn - Rm JZ Rn, relative 0110 Rn relative PC = PC+ relative (only if Rn is 0) opcode operands

Addressing Modes Data Immediate Register-direct Register indirect
Operand field Register address Memory address Addressing mode Register-file contents Memory

Equivalent assembly program
Sample Programs int total = 0; for (int i=10; i!=0; i--) total += i; // next instructions... C program MOV R0, #0; // total = 0 MOV R1, #10; // i = 10 JZ R1, Next; // Done if i=0 ADD R0, R1; // total += i MOV R2, #1; // constant 1 JZ R3, Loop; // Jump always Loop: Next: // next instructions... SUB R1, R2; // i-- Equivalent assembly program MOV R3, #0; // constant 0 1 2 3 5 6 7 Try some others Handshake: Wait until the value of M[254] is not 0, set M[255] to 1, wait until M[254] is 0, set M[255] to 0 (assume those locations are ports). (Harder) Count the occurrences of zero in an array stored in memory locations 100 through 199.

Application-Specific Instruction-Set Processors (ASIPs)
General-purpose processors Sometimes too general to be effective in demanding application e.g., video processing – requires huge video buffers and operations on large arrays of data, inefficient on a GPP But single-purpose processor has high NRE, not programmable ASIPs – targeted to a particular domain Contain architectural features specific to that domain e.g., embedded control, digital signal processing, video processing, network processing, telecommunications, etc. Still programmable

A Common ASIP: Microcontroller
For embedded control applications Reading sensors, setting actuators Mostly dealing with events (bits): data is present, but not in huge amounts e.g., VCR, disk drive, digital camera (assuming SPP for image compression), washing machine, microwave oven Microcontroller features On-chip peripherals Timers, analog-digital converters, serial communication, etc. Tightly integrated for programmer, typically part of register space On-chip program and data memory Direct programmer access to many of the chip’s pins Specialized instructions for bit-manipulation and other low-level operations

Another Common ASIP: Digital Signal Processors (DSP)
For signal processing applications Large amounts of digitized data, often streaming Data transformations must be applied fast e.g., cell-phone voice filter, digital TV, music synthesizer DSP features Several instruction execution units Multiple-accumulate single-cycle instruction, other instrs. Efficient vector operations – e.g., add two arrays Vector ALUs, loop buffers, etc.

Trend: Even More Customized ASIPs
In the past, microprocessors were acquired as chips Today, we increasingly acquire a processor as Intellectual Property (IP) e.g., synthesizable VHDL model Opportunity to add a custom datapath hardware and a few custom instructions, or delete a few instructions Can have significant performance, power and size impacts Problem: need compiler/debugger for customized ASIP Remember, most development uses structured languages One solution: automatic compiler/debugger generation e.g., Another solution: retargettable compilers e.g., (customized VLIW architectures)

Programmer Considerations
Program and data memory space Embedded processors often very limited e.g., 64 Kbytes program, 256 bytes of RAM (expandable) Registers: How many are there? Only a direct concern for assembly-level programmers I/O How communicate with external signals? Interrupts

Selecting a Microprocessor
Issues Technical: speed, power, size, cost Other: development environment, prior expertise, licensing, etc. Speed: how evaluate a processor’s speed? Clock speed – but instructions per cycle may differ Instructions per second – but work per instr. may differ Dhrystone: Synthetic benchmark, developed in Dhrystones/sec. MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digital’s VAX 11/780). A.k.a. Dhrystone MIPS. Commonly used today. So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second SPEC: set of more realistic benchmarks, but oriented to desktops EEMBC – EDN Embedded Benchmark Consortium, Suites of benchmarks: automotive, consumer electronics, networking, office automation, telecommunications

General Purpose Processors
Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998

Microprocessor Architecture Overview
If you are using a particular microprocessor, now is a good time to review its architecture

Microcontroller catalogue

Microcontroller packaging

Microprocessors.

Similar presentations

Presentation on theme: "Microprocessors."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Microprocessors.

Similar presentations

Presentation on theme: "Microprocessors."— Presentation transcript:

Similar presentations

About project

Feedback