Microprocessors and Embedded Systems

Microprocessors and Embedded Systems
Lecture 1: Introduction Lecturer: Hui Wu Microprocessors and Embedded Systems--Lecture 1

COMP3221: Microprocessors and Embedded Systems--Lecture 1
COMP 3221 Administration (1/2) Lecturer: Hui Wu: Office: K17-501D Consultation: Wed: 3:00–5:00pm Lecturer In Charge of the Lab: Samir Omar: Office: K17-314A For all issues regarding the lab contact Samir COMP3221: Microprocessors and Embedded Systems--Lecture 1

COMP 3221 Administration (2/2) Course homepage contains: All Lecture slides presented in the class. All material related to the Laboratory Exercises. Pointers to supplementary material. Announcements. Check it out frequently! COMP3221: Microprocessors and Embedded Systems--Lecture 1

Syllabus (1/2) Main Topics: Instruction Set Architecture (ISA). Number representation, computer arithmetic. Assembly and machine language Programming. Interrupts and I/O interfacing. Serial communication. Analog Input and output. Buses and memory system. COMP3221: Microprocessors and Embedded Systems--Lecture 1

Syllabus (2/2) Laboratory exercises: AVR assembly programming and I/O interfacing. Tools include AVR Studio, AVR board designed by David Johnson. Assignments: A survey of ARM microprocessor. A lift controller using AVR. COMP3221: Microprocessors and Embedded Systems--Lecture 1

Pre-Requisite (1/2) Digital Circuits (ELEC 1041, COMP 2021) Number representation, coding, registers, state machines. Realisation of simple logic circuits. Integrated circuit technologies. Designing with MSI components. Flip-Flops & state machines. Counters and sequential MSI components. Register transfer logic. Bus systems. COMP3221: Microprocessors and Embedded Systems--Lecture 1

Pre-requisite (2/2) Computers and Computing (COMP1011 & COMP1021) The von Neumann model: memory/I-O/processing. The instruction set and execution cycle. Registers and address spaces. An instruction set: operations and addressing modes. An expanded model of a computer: mass storage and I/O. The layered model of a computer: from gate- to user-level. C- Language Programming. COMP3221: Microprocessors and Embedded Systems--Lecture 1

Textbooks Main references for lecture material Fredrick M. Cady: Microcontrollers and Microcomputers —Principles of Software and Hardware Engineering. Additional references David Patterson and John Hennessy: Computer Organisation & Design: The HW/SW Interface," 2nd Ed Relevant chapters are, 3, 4 & 8. Brian Kernighan & Dennis Ritchie: The C Programming Language, 2nd Ed., Prentice Hall, 1988, ISBN: COMP3221: Microprocessors and Embedded Systems--Lecture 1

Laboratory Schedule Monday: :00 – 4:00 pm EE233 5:00 – 7:00 pm EE233 Wednesday: 1:00 – 3:00 pm EE233 Thursday: :00 – 2:00pm EE233 You will be only allowed to attend the lab session that you are enrolled in. No exception allowed. Starts in Week 3. Special Open Access labs TBA Not assessed. It is only for those who need a bit of extra time. COMP3221: Microprocessors and Embedded Systems--Lecture 1

Enrolment System in Lab Session Run “sirius” booking system form any CSE lab machine. Read as how to run “sirius”. Any problem with “sirius", contact Mei-Cheng Whale If you want to work with a partner please make sure that both of you enrol for the same lab session. You will be paired with a partner randomly if you don’t have one. Students who DO NOT select their Lab sessions will be not be allowed into the lab. COMP3221: Microprocessors and Embedded Systems--Lecture 1

Lab Format In group of two partners. You choose your partner in Sign Up Session (Week 3). It CANNOT be changed later. You will get a group account. No formal report to hand in. You are assessed based on a system of checkpoints. An assessors marks your check points. Lab Demonstrators help you with the lab. COMP3221: Microprocessors and Embedded Systems--Lecture 1

Laboratory Preparation & Catch Up You CAN finish the laboratory exercises in the allocated time only if you do the preparation before hand. You need to prepare for the laboratory outside the laboratory by: Carefully reading the lab related documentation Writing your programs and simulating them at home Leaving things to the last minute or walking into the laboratory without preparation may make you fail in this course. Go to one of the OPEN ACCESS Sessions if you think you are falling behind. COMP3221: Microprocessors and Embedded Systems--Lecture 1

Laboratory Structure & Specifications 5 experiments. Each experiment consists of several checkpoints. The full mark of each checkpoint is 5. Optional checkpoints give you extra marks. Each experiment lasts two weeks except Experiment 2 which takes 3 weeks. Lab specifications are available in the course homepage one week before each experiment starts. COMP3221: Microprocessors and Embedded Systems--Lecture 1

Assignments Two assignments. The first assignment: A Survey of ARM Microprocessor. The second assignment: An AVR-Based Lift Controller. Details to be announced. COMP3221: Microprocessors and Embedded Systems--Lecture 1

Course Grading Scheme Laboratory mark = 25% Assignment mark = 25% Assignment 1: 10% Assignment 2: 15 % Final exam mark = 50% Postgraduate students have a different exam paper (not harder, but slightly different scopes). COMP3221: Microprocessors and Embedded Systems--Lecture 1

Why Take This Course? Embedded Systems is a big, fast growing industry (US$ 40 billions in 2000). Microprocessors/Microcontrollers are the core of embedded systems. COMP3221: Microprocessors and Embedded Systems--Lecture 1

What is an Embedded System?
A combination of computer hardware and software, and perhaps additional mechanical or other parts, designed to perform a dedicated function. In some cases, embedded systems are part of a larger system or product, as is the case of an anti-lock braking system in a car. Contrast with general-purpose computer. Examples range from washing machines, cellular phones to missiles and space shuttles. COMP3221: Microprocessors and Embedded Systems--Lecture 1

Microprocessors are everywhere in our life. COMP3221: Microprocessors and Embedded Systems--Lecture 1

Why AVR? RISC architecture with load-store memory access. two-stage instruction pipelining. Internal program and data memory Wide variety of on-chip peripherals (digital I/O, ADC, EEPROM, UART, pulse width modulator (PWM) etc). COMP3221: Microprocessors and Embedded Systems--Lecture 1

Microcontrollers vs Microprocessors
A microprocessor is a CPU on a single chip. If a microprocessor, its associated support circuitry, memory and peripheral I/O components are implemented on a single chip, it is a microcontroller. COMP3221: Microprocessors and Embedded Systems--Lecture 1

COMP3221: Microprocessors and Embedded Systems
Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session 2, 2005

COMP3221/9221: Microprocessors and Embedded Systems
Instruction Set Architecture (ISA) ISA is the interface between hardware and software For (machine language) programmers (and compiler writers) Don’t need to know (much) about microarchitecture Just write or generate instructions that match the ISA For hardware (microarchitecture) designers Don’t need to know about the high level software Just build a microarchitecture that implements the ISA Software Hardware C program ISA level ISA program executed by hardware FORTRAN 90 program program compiled to ISA program compiled COMP3221/9221: Microprocessors and Embedded Systems

What makes an ISA? Memory models Registers Data types Instructions COMP3221/9221: Microprocessors and Embedded Systems

What makes an ISA? #1: Memory Models Memory model: how does memory look to CPU? Issues Addressable cell size Alignment Address spaces Endianness COMP3221/9221: Microprocessors and Embedded Systems

1. Addressable Cell Size Memory has cells, each of which has an address Most common cell size is 8 bits (1 byte) But not always! AVR Instruction memory has 16 bit cells Note – the data bus may be wider i.e. retrieve several cells (addresses) at once COMP3221/9221: Microprocessors and Embedded Systems

2. Alignment Many architectures require natural alignment, e.g. 4-byte words starting at addresses 0,4,8, … 8-byte words starting at addresses 0, 8, 16, … COMP3221/9221: Microprocessors and Embedded Systems

Alignment (cont.) Alignment often required because it is more efficient Example – Pentium II Fetches 8 bytes at a time from memory (8-byte wide data bus) Addresses have 36 bits, but address bus only has 33 bits But, alignment is NOT required (for backwards compatibility reasons) 4-byte word stored at address 6 is OK Must read bytes 0 to 7 (one read) and bytes 8 to 15 (second read) then extract the 4 required bytes from the 16 bytes read COMP3221/9221: Microprocessors and Embedded Systems

3. Address Spaces Princeton architecture or Von Neumann architecture (most used). A single linear address space for both instructions and data e.g. 232 bytes numbered from 0 to (may not be bytes – depends on addressable cell size) Harvard architecture Separate address spaces for instructions and data AVR AT90S8515 Data address space: up to 216 bytes Instruction address space: bit words COMP3221/9221: Microprocessors and Embedded Systems

AVR Address Spaces Data Memory Program Memory 0x0000 32 General purpose Working Registers 0x0000 16 Bits 0x1F Program Flash Memory (1K bytes~128K bytes) 0x20 64 Input/Output Registers 0x5F 0x60 Internal SRAM (128~4K bytes) External SRAM End Address 8 bits End Address COMP3221/9221: Microprocessors and Embedded Systems

AVR Address Spaces (cont.)
Data EEPROM Memory 0x0000 8 bits EEPROM Memory (64~4K bytes) End address COMP3221/9221: Microprocessors and Embedded Systems

4. Endianness Different machines may support different byte orderings Two orderings: Little endian – little end (least significant byte) stored first (at lowest address) Intel microprocessors (Pentium etc) Big endian – big end stored first SPARC, Motorola microprocessors Most CPUs produced since ~1992 are “bi-endian” (support both) some switchable at boot time others at run time (i.e. can change dynamically) COMP3221/9221: Microprocessors and Embedded Systems

What makes an ISA? #2: Registers
Two types General purpose Used for temporary results etc Special purpose, e.g. Program Counter (PC) Stack pointer (SP) Input/Output Registers Status Register COMP3221/9221: Microprocessors and Embedded Systems

Registers (cont.) Some other registers are part of the microarchitecture NOT the ISA Instruction Register (IR) Memory Address Register (MAR) Memory Data Register (MDR) i.e. programmer doesn’t need to know about these (and can’t directly change or use them) COMP3221/9221: Microprocessors and Embedded Systems

AVR Registers General purpose registers are quite regular Exception: a few instructions work on only the upper half (registers 16-31) Bit limitations in some instructions (e.g. only 4 bits to specify which register) There are many I/O registers Not to be confused with general purpose registers Some instructions work with these, others with general purpose registers – don’t confuse them When X is needed as an index register, R26 and R27 are not available as general registers. In AVR devices without SRAM, the registers are also the only memory – can be tricky to manage COMP3221/9221: Microprocessors and Embedded Systems

What makes an ISA? #3: Data Types
Numeric Integers of different lengths (8, 16, 32, 64 bits) Possibly signed or unsigned Floating point numbers, e.g. 32 bits (single precision) or 64 bits (double precision) Some machines support BCD (binary coded decimal) numbers Non-numeric Boolean (0 means false, 1 means true) – stored in a whole byte or word Bit-map (collection of booleans, e.g. 8 in a byte) Characters Pointers (memory addresses) COMP3221/9221: Microprocessors and Embedded Systems

Data types (cont.) Different machines support different data types in hardware e.g. Pentium II: e.g. Atmel AVR: Data Type 8 bits 16 bits 32 bits 64 bits 128 bits Signed integer  Unsigned integer BCD integer Floating point Data Type 8 bits 16 bits 32 bits 64 bits 128 bits Signed integer  Unsigned integer BCD integer Floating point COMP3221/9221: Microprocessors and Embedded Systems

Data types (cont.) Other data types can be supported in software e.g. 16-bit integer operations can be built out of 8-bit operations Floating point operations can be built out of logical and integer arithmetic operations COMP3221/9221: Microprocessors and Embedded Systems

What makes an ISA? #4: Instructions
This is the main feature of an ISA Instructions include Load/Store – move data from/to memory Move – move data between registers Arithmetic – addition, subtraction Logical – Boolean operations Branching – for deciding which instruction to perform next COMP3221/9221: Microprocessors and Embedded Systems

Some AVR Instruction Examples
Addition: add r2, r1 Subtraction: sub r13, r12 Branching: breq 6 Load: ldi r30, $F0 Store: st r2, x Port Read: in r25, $16; Read port B Port Write: out $16, r17; Write to port B COMP3221/9221: Microprocessors and Embedded Systems

ISA vs. Assembly Language
ISA defines machine code (or machine language) 1’s and 0’s that make up instructions Assembly language is a textual representation of machine language Example (Atmel AVR instruction): (machine code) inc r16 (assembly language, increment register 16) Assembly language also includes macros Example: .def temp = r16 .include “8515def.inc” COMP3221/9221: Microprocessors and Embedded Systems

Summary: What makes an ISA?
Memory models Registers Data types Instructions If you know all these details, you can Write machine code that runs on the CPU Build the CPU COMP3221/9221: Microprocessors and Embedded Systems

Backwards Compatibility
Many modern ISAs are constrained by backwards compatibility Pentium ISA is backwards compatible to the 8088 (1978) Echoes back to the 8080 (1974) Problem: Pentium family is a poor target for compilers (register poor, irregular instruction set) AMD has defined a 64-bit extension to the Pentium architecture Implemented by the Hammer family of CPUs COMP3221/9221: Microprocessors and Embedded Systems

CISC vs. RISC How complex should the instruction set be? Should you do everything in hardware? 2 “styles” of ISA design CISC = Complex Instruction Set Computer Lots of complex instructions – many of which take many clock cycles to execute Examples: 8086 to 80386 Classic example: VAX had a single instruction to evaluate a polynomial equation RISC = Reduced Instruction Set Computer Fewer, simpler instructions which can execute quickly (often one clock cycle) Lots of registers More complex operations built out of simpler instructions Examples: SPARC, MIPS, PowerPC COMP3221/9221: Microprocessors and Embedded Systems

CISC vs. RISC (cont.) Originally (80s) CISC – 200+ instructions RISC – ~50 instructions Today Number of instructions irrelevant Many “CISC” processors use RISC techniques e.g … Pentium IV Better to look at use of registers/memory CISC – often few registers, many instructions can access memory RISC – many registers, only load/store instructions access memory Atmel AVR is a RISC processor COMP3221/9221: Microprocessors and Embedded Systems

ISA vs. Microarchitecture
An Instruction Set Architecture (ISA) can be implemented by many different microarchitectures Examples 8086 ISA is implemented by many processors – in different ways Pentium ISA is implemented by Pentium … Pentium IV (in different ways) Various AMD devices … Other manufacturers also… COMP3221/9221: Microprocessors and Embedded Systems

Reading Material Chap 2, Microcontrollers and Microcomputers. COMP3221/9221: Microprocessors and Embedded Systems

Lecture 3: Number Systems (I) Lecturer: Hui Wu Session 2, 2005 COMP3221: Microprocessors and Embedded Systems-- Lecture 3

COMP3221: Microprocessors and Embedded Systems-- Lecture 3
Overview Positional notation Decimal, hexadecimal and binary One’ complement Two’s complement COMP3221: Microprocessors and Embedded Systems-- Lecture 3

Numbers: positional notation Number Base B => B symbols per digit: Base 10 (Decimal): 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Base 2 (Binary): 0, 1 Number representation: dpdp d2d1d0 is a p digit number value = dpx Bp + dp-1 x Bp d2 x B2 + d1 x B1 + d0 x B0 Binary: 0,1 = 1x26 + 0x25 + 1x24 + 1x23 + 0x x2 + 0x1 = = 90 COMP3221: Microprocessors and Embedded Systems-- Lecture 3

Hexadecimal Numbers: Base 16 (1/2) Digits: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F Normal digits have expected values In addition: A  10 B  11 C  12 D  13 E  14 F  15 COMP3221: Microprocessors and Embedded Systems-- Lecture 3

Hexadecimal Numbers: Base 16 (2/2) Example (convert hex to decimal): B28F0DD = (Bx166) + (2x165) + (8x164) + (Fx163) + (0x162) + (Dx161) + (Dx160) = (11x166) + (2x165) + (8x164) + (15x163) + (0x162) + (13x161) + (13x160) = decimal Notice that a 7 digit hex number turns out to be a 9 digit decimal number COMP3221: Microprocessors and Embedded Systems-- Lecture 3

Decimal vs. Hexadecimal vs. Binary Examples: (binary) = ? (hex) 10111 (binary) = (binary) = ? (hex) 3F9(hex) = ? (binary) A B C D E F 1111 COMP3221: Microprocessors and Embedded Systems-- Lecture 3

Hex to Binary Conversion HEX is a more compact representation of Binary! Each hex digit represents 16 decimal values. Four binary digits represent 16 decimal values. Therefore, each hex digit can replace four binary digits. Example: two b a c a hex C uses notation 0x3b9aca00 COMP3221: Microprocessors and Embedded Systems-- Lecture 3

Which Base Should We Use? Decimal: Great for humans; most arithmetic is done with these. Binary: This is what computers use, so get used to them. Become familiar with how to do basic arithmetic with them (+,-,*,/). Hex: Terrible for arithmetic; but if we are looking at long strings of binary numbers, it’s much easier to convert them to hex in order to look at four bits at a time. COMP3221: Microprocessors and Embedded Systems-- Lecture 3

How Do We Tell the Difference? In general, append a subscript at the end of a number stating the base: 1010 is in decimal 102 is binary (= 210) 1016 is hex (= 1610) When dealing with AVR microcontrollers: Hex numbers are preceded with “$” or “0x” $10 == 0x10 == 1016 == 1610 Binary numbers are preceded with “0b” Octal numbers are preceded with “0” (zero) Everything else by default is Decimal COMP3221: Microprocessors and Embedded Systems-- Lecture 3

Inside the Computer To a computer, numbers are always in binary; all that matters is how they are printed out: binary, decimal, hex, etc. As a result, it doesn’t matter what base a number in C is in... 3210 == 0x20 == Only the value of the number matters. COMP3221: Microprocessors and Embedded Systems-- Lecture 3

Bits Can Represent Everything Characters? 26 letter => 5 bits upper/lower case + punctuation => 7 bits (in 8) (ASCII) Rest of the world’s languages => 16 bits (unicode) Logical values? 0 -> False, 1 => True Colors ? Locations / addresses? commands? But N bits => only 2N things COMP3221: Microprocessors and Embedded Systems-- Lecture 3

What if too big? Numbers really have an infinite number of digits with almost all being zero except for a few of the rightmost digits: e.g: … == 98 Just don’t normally show leading zeros Computers have fixed number of digits Adding two n-bit numbers may produce an (n+1)-bit result. Since registers’ length (8 bits on AVR) is fixed, this is a problem. If the result of add (or any other arithmetic operation), cannot be represented by a register, overflow is said to have occurred COMP3221: Microprocessors and Embedded Systems-- Lecture 3

An Overflow Example Example (using 4-bit numbers): But we don’t have room for 5-bit solution, so the solution would be 0010, which is +2, which is wrong. COMP3221: Microprocessors and Embedded Systems-- Lecture 3

How avoid overflow, allow it sometimes? Some languages detect overflow (Ada), some don’t (C and JAVA) AVR has N, Z, C and V flags to keep track of overflow Will cover details later COMP3221: Microprocessors and Embedded Systems-- Lecture 3

Comparison How do you tell if X > Y ? See if X - Y > 0 COMP3221: Microprocessors and Embedded Systems-- Lecture 3

How to Represent Negative Numbers? So far, unsigned numbers Obvious solution: define leftmost bit to be sign! 0 => +, 1 => - Rest of bits can be numerical value of number Representation called sign and magnitude On AVR +1ten would be: And - 1ten in sign and magnitude would be: COMP3221: Microprocessors and Embedded Systems-- Lecture 3

Shortcomings of sign and magnitude? Arithmetic circuit more complicated Special steps depending whether signs are the same or not Also, two zeros. 0x00 = +0ten 0x80 = -0ten (assuming 8 bit integers). What would it mean for programming? Sign and magnitude abandoned because another solution was better COMP3221: Microprocessors and Embedded Systems-- Lecture 3

Another try: complement the bits Examples: 710 = = Called one’s Complement. The one’s complement of an integer X is 2p-X, where p is the number of integer bits. Questions: What is ? How many positive numbers in N bits? How many negative numbers in N bits? COMP3221: Microprocessors and Embedded Systems-- Lecture 3

Shortcomings of ones complement? Arithmetic not too hard Still two zeros 0x00 = +0ten 0xFF = -0ten (assuming 8 bit integers). One’s complement was eventually abandoned because another solution is better COMP3221: Microprocessors and Embedded Systems-- Lecture 3

Two’s Complement The two’s complement of an integer X is 2p-X+1, where p is the number of integer bits Bit p is the “sign” bit. Negative number if it is 1; positive number otherwise. Examples: – 710 = = – = = COMP3221: Microprocessors and Embedded Systems-- Lecture 3

Two’s Complement Formula Given a two’s complement representation dpdp-1…d1d0, its value is dp x (–2p)+ dp-1 x 2p d1 x 21 + d0 x 20 Example: – Two’s complement representation = 1 x (–27)+ 1 x x x x x x x 20 = COMP3221: Microprocessors and Embedded Systems-- Lecture 3

Property of Two’s Complement Let P denote the two’s complement operation. Given any integer X, the following equation holds: P(P(X))=X COMP3221: Microprocessors and Embedded Systems-- Lecture 3

Lecture 4: Number Systems (II) Lecturer: Hui Wu Session 2, 2005 COMP3221: Microprocessors and Embedded Systems--Lecture 4

Overview Overflow in 2’s complement addition Comparison in signed and unsigned numbers Condition flags Characters and strings COMP3221: Microprocessors and Embedded Systems--Lecture 4

Two’s Complement’s Arithmetic Examples Example 1: 20 – 4 = 16 Assume 8 bit architecture. 20 – 4 = 20 + (–4) = two – two = two two = two Carry Most significant bit (msb) No overflow. COMP3221: Microprocessors and Embedded Systems--Lecture 4

Two’s Complement’s Arithmetic Examples Example 2: –127 – 2 = – 129? – 127 – 2 = – two – two = two two = two Carry msb Overflow COMP3221: Microprocessors and Embedded Systems--Lecture 4

Two’s Complement’s Arithmetic Examples Example 3: = 129? = two two = two two = two msb Overflow COMP3221: Microprocessors and Embedded Systems--Lecture 4

When Overflow Occurs? The ‘two’s complement overflow’ occurs when: both the msb’s being added are 0 and the msb of the result is 1 both the msb’s being added are 1 and the msb of the result is 0 COMP3221: Microprocessors and Embedded Systems--Lecture 4

Signed vs. Unsigned Numbers C declaration int Declares a signed number Uses two’s complement C declaration unsigned int Declares a unsigned number Treats 32-bit number as unsigned integer, so most significant bit is part of the number, not a sign bit NOTE: Hardware does all arithmetic in 2’s complement. It is up to programmer to interpret numbers as signed or unsigned. COMP3221: Microprocessors and Embedded Systems--Lecture 4

Signed and Unsigned Numbers in AVR AVR microcontrollers support only 8 bit signed and unsigned integers. Multi-byte signed and unsigned integers can be implemented by software. Question: How to compute two two on AVR? COMP3221: Microprocessors and Embedded Systems--Lecture 4

Signed and Unsigned Numbers in AVR (Cont.) Solution: Four-byte integer addition can be done by using four one-byte integer additions taking carries into account (lowest bytes are added first). = Carry bits The result is two COMP3221: Microprocessors and Embedded Systems--Lecture 4

Signed v. Unsigned Comparison X = two Y = two Is X > Y? unsigned: YES signed: NO COMP3221: Microprocessors and Embedded Systems--Lecture 4

Signed v. Unsigned Comparison (Hardware Help)
X = two Y = two Is X > Y? Do the Subtraction X – Y and check result X – Y = two – two = two two = two Hardware needs to keep a special bit ( S flag in AVR) which indicates the result of signed comparison, and a special bit (C flag in AVR) which indicates the result of unsigned comparison.

Numbers are stored at addresses
0x0000 Memory is a place to store bits A word is a fixed number of bits (eg, 16 in AVR assembler) at an address Addresses have fixed number of bits Addresses are naturally represented as unsigned numbers How multi-byte numbers are stored in memory is determined by the endianness. On AVR, programmers choose the endianess. 0x0001 0x0002 0xF…F COMP3221: Microprocessors and Embedded Systems--Lecture 4

Status Flags in Program Status Register
H S V N Z C The Processor Status Register in AVR C: Its meaning depends on the operation. For addition X+Y, it is the carry from the most significant bit. In other words, C= Rd7 • Rr7 +Rr7 • NOT(R7) + NOT(R7) • Rd7, where Rd7 is bit 7 of x, Rr7 is bit 7 of y, R7 is bit 7 of x+y, • is the logical AND and + is the logical OR. For subtraction x-y, where x and y are unsigned integer, it indicates if y<x. If y<x, the C=1; otherwise, C=0. In other words, C = NOT(Rd7) • Rr7+ Rr7 • R7 +R7 • NOT(Rd7).

Status Flags in Program Status Register
H S V N Z C The Processor Status Register in AVR Z: 1 indicates a zero result after a arithmetic or logical operation. N: the most significant bit of the result. V: 1 indicates two’s complement oVerflow. S: Sign flag—exclusive OR between N and V. 1: negative result. 0: non-negative result. H: Half carry flag.

Experimentation with Condition Flags (#1/3)
Indicate the changes in N, Z, C, V flags for the following arithmetic operations: (Assume 4 bit-numbers) = N=1 V=0 Z=0 C=0 S=1 H=1 COMP3221: Microprocessors and Embedded Systems--Lecture 4

Indicate the changes in N, Z, C, V flags for the following arithmetic operations: (Assume 4 bit-numbers) = N=0 V=1 Z=0 C=1 S=1 H=1 COMP3221: Microprocessors and Embedded Systems--Lecture 4

Indicate the changes in N, Z, C, V flags for the following arithmetic operations: (Assume 4 bit-numbers) – = = N=1 V=0 Z=0 C=1 S=1 H=0 COMP3221: Microprocessors and Embedded Systems--Lecture 4

Beyond Integers (Characters) 8-bit bytes represent characters, nearly every computer uses American Standard Code for Information Interchange (ASCII) No. char char Uppercase + 32 = Lowercase (e.g, B+32=b) tab=9, carriage return=13, backspace=8, Null=0 COMP3221: Microprocessors and Embedded Systems--Lecture 4

Strings Characters normally combined into strings, which have variable length e.g., “Cal”, “M.A.D”, “COMP3221” How to represent a variable length string? 1) 1st position of string reserved for length of string (Pascal) 2) an accompanying variable has the length of string (as in a structure) 3) last position of string is indicated by a character used to mark end of string (C) C uses 0 (Null in ASCII) to mark the end of a string COMP3221: Microprocessors and Embedded Systems--Lecture 4

Example String How many bytes to represent string “Popa”? What are values of the bytes for “Popa”? No. char char 80, 111, 112, 97, 0 DEC 50, 6F, 70, 61, HEX COMP3221: Microprocessors and Embedded Systems--Lecture 4

Strings in C: Example String simply an array of char void strcpy (char x[],char y[]) { int i=0; /* declare and initialize i*/ while ((x[i]=y[i])!=’\0’) /* 0 */ i=i+1; /* copy and test byte */ } COMP3221: Microprocessors and Embedded Systems--Lecture 4

String in AVR Assembly Language
.db “Hello\n” ; This is equivalent to .db ‘H’, ‘e’, ‘l’, ‘l’, ‘o’, ‘\n’ What does the following instruction do? ldi r4, ‘1’ COMP3221: Microprocessors and Embedded Systems--Lecture 4

How to Represent A Machine Instruction?
Some bits for the operation (addition, subtraction etc.). Some bits for each operand (the maximum number of operands in an instruction is determined by the instruction set). Example: operation operand 1 operand 2 8 bits bits bits Will cover the details in next lecture. COMP3221: Microprocessors and Embedded Systems--Lecture 4

Reading Material Appendix A in Microcontrollers ands Microcomputers. COMP3221: Microprocessors and Embedded Systems--Lecture 4

Lecture 5: Instruction Format Lecturer: Hui Wu Session 2, 2005 COMP3221: Microprocessors and Embedded Systems--Lecture 5

Overview Instruction format AVR instruction format examples PowerPC instruction format examples COMP3221: Microprocessors and Embedded Systems--Lecture 5

Instruction Formats Instructions typically consist of Opcode (Operation code) – defines the operation (e.g. addition) Operands – what’s being operated on (e.g. particular registers or memory address) There are many different formats for instructions COMP3221: Microprocessors and Embedded Systems--Lecture 5

Instruction Formats OpCode OpCode Opd OpCode Opd1 Opd2 OpCode Opd1 Opd2 Opd3 Instructions typically have 0, 1, 2 or 3 operands – Could be memory addresses, constants, register addresses (i.e. register numbers) COMP3221: Microprocessors and Embedded Systems--Lecture 5

AVR Instruction Examples Clear register. Syntax: clr Rd Operand: 0  d  31 Operation: Rd ← 0 Instruction format. 0 1 d d d d d d d d d d 15 – OpCode uses 6 bits (bit 9 to bit 15). – The only operand uses the remaining 10 bits (only 5 bits (bit 0 to bit 4) are actually needed). COMP3221: Microprocessors and Embedded Systems--Lecture 5

AVR Instruction Examples Subtraction with carry. Syntax: sbc Rd, Rr Operation: Rd ← Rd – Rr – C Rd: Destination register. 0  d  31 Rr: Source register. 0  r  C: Carry Instruction format. 1 0 r d d d d d r r r r 15 – OpCode uses 6 bits (bit 9 to bit 15). – Two operands share the remaining 10 bits. COMP3221: Microprocessors and Embedded Systems--Lecture 5

Instruction Lengths On some machines – instructions are all the same length On other machines – instructions can have different lengths COMP3221: Microprocessors and Embedded Systems--Lecture 5

AVR Instruction Examples Almost all instructions are 16 bits long. – add Rd, Rr – sub Rd, Rr – mul Rd, Rr – brge k Few instructions are 32 bits long. – lds Rd, k ( 0  k  ) loads 1 byte from the SRAM to a register. COMP3221: Microprocessors and Embedded Systems--Lecture 5

Design Criteria for Instruction Formats 1. Backwards Compatibility e.g. Pentium 4 supports various instruction lengths so as to be compatible with 8086 2. Instruction Length Ideally (if you’re starting from scratch) All instructions same length Short instructions are better (less memory needed to store programs and can read instructions in from memory faster) COMP3221: Microprocessors and Embedded Systems--Lecture 5

Instruction Design Criteria (cont.) 3. Room to express operations 2n operations needs at least n bits Wise to allow room to add additional opcodes for next generation of CPU 4. Number of operand bits in instruction Do you address bytes or words? COMP3221: Microprocessors and Embedded Systems--Lecture 5

OpCode  Operand Tradeoffs Instructions can tradeoff number of OpCode bits against number of operand bits Example: 16 bit instructions 16 registers (i.e. 4-bit register addresses) Instructions could be formatted like this: But what if we need more instructions and some instructions only operate on 0, 1 or 2 registers? OpCode Operand Operand Operand3 COMP3221: Microprocessors and Embedded Systems--Lecture 5

Expanding OpCodes Some OpCodes can mean “look elsewhere in the instruction for the real OpCode” e.g. if first 4 bits are 1111, OpCode is really contained in next 4 bits (i.e. effectively an 8 bit OpCode), and so on COMP3221: Microprocessors and Embedded Systems--Lecture 5

Expanding OpCodes Other combinations are possible Exercise (two minutes) For a 16 bit instruction machine with 16 registers, design OpCodes that allow for 14 3-operand instructions 30 2-operand instructions 30 1-operand instructions 32 0-operand instructions COMP3221: Microprocessors and Embedded Systems--Lecture 5

PowerPC Examples 1 1 PowerPC ISA defines OpCode as the first six bits This specifies type of instruction (operation) OpCode specifies format of the rest of the instruction COMP3221: Microprocessors and Embedded Systems--Lecture 5

PowerPC Machine Instruction Example 1 32 bits OpCode 6 bits 1 Destination Register 5 bits 1 Source Register 5 bits 1 16 bits Value (2’s complement) 1 OpCode (001110two or 14) tells us that this instruction is an integer addition: destination-register = source-register + value r5 = r12 + (-1) COMP3221: Microprocessors and Embedded Systems--Lecture 5

PowerPC Machine Instruction Example 2 6 bits OpCode 1 11 bits 1 Secondary OpCode 1 OpCode (111111two or 63) tells us this is a double precision floating point instruction But it does not tell us what it actually does! We need to look at a Secondary OpCode COMP3221: Microprocessors and Embedded Systems--Lecture 5

PowerPC Machine Instruction Example 2 5 bits Destination Register 1 5 bits Source Registers A & B 1 1 1 Secondary OpCode Secondary OpCode (42) tells us this is a double precision floating point addition destination-reg = register-A + register-B fr13 = fr26 + fr6 COMP3221: Microprocessors and Embedded Systems--Lecture 5

Lecture 6: Addressing Modes Lecturer: Hui Wu Session 2, 2005 COMP3221: Microprocessors and Embedded Systems--Lecture 6

Overview Addressing Modes AVR Instruction Examples COMP3221: Microprocessors and Embedded Systems--Lecture 6

Operands Instructions need to specify where to get operands from Some possibilities Value is in instruction Value is in a register Register number is in the instruction Value is in memory address is in instruction address is in a register register number is in the instruction address is register value plus some offset offset is in the instruction (or in a register) These are called addressing modes COMP3221: Microprocessors and Embedded Systems--Lecture 6

Immediate Addressing Not really an addressing mode – since there is no address Instruction doesn’t have address of operand – it has the value itself i.e. the operand is immediately available Limits to the size of the operand you can fit in an instruction (especially in RISC machines which have instruction word size = data word size) COMP3221: Microprocessors and Embedded Systems--Lecture 6

Immediate Addressing Examples AVR Pentium SUBI 0101 KKKK Rd ADWI KK Rd KKKK 1011 1 011 KKKK mov ebx, KKKK COMP3221: Microprocessors and Embedded Systems--Lecture 6

Direct Addressing Address of the memory operand is in the instruction Useful for global variables (accessible from all subroutines) AVR Datasheet calls this “Data Direct Addressing” and I/O Direct Addressing. COMP3221: Microprocessors and Embedded Systems--Lecture 6

Direct Addressing Examples AVR Rd 0000 kkkk kkkk kkkk STS Pentium kkkk kkkk kkkk mov eax,[kk] COMP3221: Microprocessors and Embedded Systems--Lecture 6

Register Direct Addressing Register numbers are contained in the instruction Data in the registers. Fastest mode and most common mode in RISC Example: ADD AVR r ddddd r r r r Pentium 11 r r r ddd 10 rd rs1 - rs2 Sparc COMP3221: Microprocessors and Embedded Systems--Lecture 6

Register Indirect Addressing Register number in instruction, as with register addressing However, contents of the register is used to address memory the register is used as a pointer AVR datasheet calls this “Data Indirect” addressing COMP3221: Microprocessors and Embedded Systems--Lecture 6

Register Indirect Addressing Example AVR LDD Rd, Y ddddd 1000 Operation: Rd(Y) Y: r29: r28 COMP3221: Microprocessors and Embedded Systems--Lecture 6

Indexed Addressing Reference memory at a known offset from a register Two main uses: Register holds address of object; fixed offset indexes into the object (structure, array, etc) Address of object is constant; register has index into the object AVR datasheet calls this “Data Indirect with Displacement” addressing (fixed offset, not in register) COMP3221: Microprocessors and Embedded Systems--Lecture 6

Indexed Addressing Examples AVR (data indirect with displacement) Only 6 bit displacement (q bits) Only Y or Z index registers determined by this bit Operation: Rd  (Y + q) LDD Rd, Y+q 10 q qq ddddd 1 qqq COMP3221: Microprocessors and Embedded Systems--Lecture 6

Auto Increment Some architectures, allow modification of the index register as a side effect of the addressing mode Usually add or subtract a small constant, often equal to the operand size Could happen before or after the index register is used in the address calculation Most common are post increment and pre decrement AVR supports these “Data Indirect with Pre-decrement” “Data Indrect with Post-increment” COMP3221: Microprocessors and Embedded Systems--Lecture 6

Auto Increment Examples AVR LDD Rd, -Y ddddd 1010 Operation: Y  Y–1 Rd  (Y) LDD Rd, Y+ Operation: Rd  (Y) Y  Y+1 ddddd 1001 COMP3221: Microprocessors and Embedded Systems--Lecture 6

Code Memory Constant Addressing AVR has separate data and instruction memories Sometimes need to get data constants out of instruction memory (flash memory) Special instruction provided to do this (LPM) LPM Rd, Z ddddd 0100 Operation: Rd  (Z) Load a byte at the address contained in register Z (r30: r29) COMP3221: Microprocessors and Embedded Systems--Lecture 6

Branch Instructions Specify where in the program to go next (i.e. change the program counter rather than just increment) program flow is no longer linear Types of Branch Instructions Unconditional – always do it Conditional – do it if some condition is satisfied (e.g. check status register) Jumps – no return Subroutines (function calls) – can return COMP3221: Microprocessors and Embedded Systems--Lecture 6

Branch Instruction Addressing Modes Direct addressing Address to jump to included in the instruction Not every AVR device support this (JMP and CALL instructions) Indirect addressing Address to jump to is in a register AVR examples: IJMP, ICALL Program Counter Relative addressing AVR calls this “Relative Program Memory” addressing Add a constant value to the Program Counter AVR examples: RJMP, RCALL, conditional branch instructions… COMP3221: Microprocessors and Embedded Systems--Lecture 6

Branch Instruction Addressing Modes: AVR Examples
JMP k Operation: PC  k (0 k  4M) kkkkk 110 k kkkk kkkk kkkk IJMP Operation: PC  Z COMP3221: Microprocessors and Embedded Systems--Lecture 6

Branch Instruction Addressing Modes: AVR Examples
BRGE k (signed) Operation: If Rd  Rr then PC  PC+k+1 else PC  PC+1 Operand: -64  k  +63 111101 kkkkkkk 100 COMP3221: Microprocessors and Embedded Systems--Lecture 6

Reading Material Chapter 4 in Microcontrollers and Microcomputers. COMP3221: Microprocessors and Embedded Systems--Lecture 6

Lecture 7: Arithmetic and logic Instructions Lecturer: Hui Wu Session 2, 2005 COMP3221: Microprocessors and Embedded Systems--Lecture 7

Overview Arithmetic and Logic Instructions in AVR Sample AVR Assembly Programs Using AL instructions COMP3221: Microprocessors and Embedded Systems--Lecture 7

AVR Instruction Overview
Load/store architecture At most two operands in each instruction Most instructions are two bytes long Some instructions are 4 bytes long Four Categories: Arithmetic and logic instructions Program control instruction Data transfer instruction Bit and bit test instructions COMP3221: Microprocessors and Embedded Systems--Lecture 7

General-Purpose Registers in AVR
named r0, r1, …, r31 in AVR assembly language Broken into two parts: with 16 registers each, r0 to r15 and r16 to r31. Each register is also assigned a memory address in SRAM space. Register r0 and r26 through r31 have additional functions. r0 is used in the instruction LPM (load program memory) Registers x (r27 : r26), y (r29 : r28) and z (r31 : r30) are used as pointer registers Most instructions that operate on the registers have direct, single cycle access to all general registers. Some instructions such as sbci, subi, cpi, andi, ori and ldi operates only on a subset of registers. COMP3221: Microprocessors and Embedded Systems--Lecture 7

General-Purpose Registers in AVR (Cont.)
Address 0x00 r0 0x01 r1 0x1A r26 x register low byte 0x1B r27 x register high byte 0x1C r28 y register low byte 0x1D r29 y register high byte 0x1E r30 z register low byte 0x1F r31 z register high byte COMP3221: Microprocessors and Embedded Systems--Lecture 7

The Status Register in AVR
The Status Register (SREG) contains information about the result of the most recently executed arithmetic instruction. This information can be used for altering program flow in order to perform conditional operations. SREG is updated after all ALU operations. SREG is not automatically stored when entering an interrupt routine and restored when returning from an interrupt. This must be handled by software. COMP3221: Microprocessors and Embedded Systems--Lecture 7

The Status Register in AVR (Cont.)
Z C Bit Bit 7 – I: Global Interrupt Enable Used to enable and disable interrupts. 1: enabled. 0: disabled. The I-bit is cleared by hardware after an interrupt has occurred, and is set by the RETI instruction to enable subsequent interrupts. COMP3221: Microprocessors and Embedded Systems--Lecture 7

Z C Bit Bit 6 – T: Bit Copy Storage The Bit Copy instructions BLD (Bit LoaD) and BST (Bit STore) use the T-bit as source or destination for the operated bit. A bit from a register in the Register File can be copied into T by the BST instruction, and a bit in T can be copied into a bit in a register in the Register File by the BLD instruction. COMP3221: Microprocessors and Embedded Systems--Lecture 7

Z C Bit Bit 5 – H: Half Carry Flag The Half Carry Flag H indicates a Half Carry (carry from bit 4) in some arithmetic operations. Half Carry is useful in BCD arithmetic. COMP3221: Microprocessors and Embedded Systems--Lecture 7

Z C Bit Bit 4 – S: Sign Bit Exclusive OR between the Negative Flag N and the Two’s Complement Overflow Flag V ( S = N V). Bit 3 – V: Two’s Complement Overflow Flag The Two’s Complement Overflow Flag V supports two’s complement arithmetic. COMP3221: Microprocessors and Embedded Systems--Lecture 7

Z C Bit Bit 2 – N: Negative Flag N is the most significant bit of the result. • Bit 1 – Z: Zero Flag Z indicates a zero result in an arithmetic or logic operation. 1: zero. 0: Non-zero. COMP3221: Microprocessors and Embedded Systems--Lecture 7

Z C Bit Bit 0 – C: Carry Flag Its meaning depends on the operation. For addition X+Y, it is the carry from the most significant bit. In other words, C= Rd7 • Rr7 +Rr7 • NOT(R7) + NOT(R7) • Rd7, where Rd7 is bit 7 of x, Rr7 is bit 7 of y, R7 is bit 7 of x+y, • is the logical AND and + is the logical OR. For subtraction x-y, where x and y are unsigned integer, it indicates if x<y. If x<y, the C=1; otherwise, C=0. In other words, C = NOT(Rd7) • Rr7+ Rr7 • R7 +R7 • NOT(Rd7).

Selected Arithmetic and Logic Instructions
add, adc, inc sub, sbc, dec mul, muls, mulsu and, or, eor clr, cbr, cp, cpc, cpi, tst com, neg Refer to the main textbook (Pages 63~67) and AVR Instruction Set for the complete list of AL instructions. COMP3221: Microprocessors and Embedded Systems--Lecture 7

Add without Carry Syntax: add Rd, Rr Operands: Rd, Rr {r0, r1, …, r31} Operation: RdRd + Rr Flags affected: H, S, V, N, Z, C Encoding: rd dddd rrrr Words: Cycles: Example: add r1, r ; Add r2 to r1 add r28, r28 ; Add r28 to itself COMP3221: Microprocessors and Embedded Systems--Lecture 7

Add with Carry Syntax: adc Rd, Rr Operands: Rd, Rr {r0, r1, …, r31} Operation: RdRd + Rr + C Flags affected: H, S, V, N, Z, C Encoding: rd dddd rrrr Words: Cycles: Example: Add r1 : r0 to r3 : r2 add r2, r ; Add low byte adc r3, r ; Add high byte Comments: adc is used in multi-byte addition. COMP3221: Microprocessors and Embedded Systems--Lecture 7

Increment Syntax: inc Rd Operands: Rd {r0, r1, …, r31} Operation: RdRd+1 Flags affected: S, V, N, C Encoding: d dddd 1011 Words: Cycles: Example: clr r ; clear r22 loop: inc r ; Increment r22 cpi r22, $4F ; compare r22 to $4F brne loop ; Branch to loop if not equal nop ; Continue (do nothing COMP3221: Microprocessors and Embedded Systems--Lecture 7

Subtract without Carry
Syntax: sub Rd, Rr Operands: Rd, Rr {r0, r1, …, r31} Operation: RdRd–Rr Flags affected: H, S, V, N, Z, C Encoding: rd dddd rrrr Words: Cycles: Example: sub r13, r12 ; Subtract r12 from r13 COMP3221: Microprocessors and Embedded Systems--Lecture 7

Subtract with Carry Syntax: sbc Rd, Rr Operands: Rd, Rr {r0, r1, …, r31} Operation: RdRd–Rr–C Flags affected: H, S, V, N, Z, C Encoding: rd dddd rrrr Words: Cycles: Example: Subtract r1:r0 from r3:r2 sub r2, r0 ; Subtract low byte sbc r3, r1 ; Subtract with carry high byte Comments: sbc is used in multi-byte subtraction COMP3221: Microprocessors and Embedded Systems--Lecture 7

Decrement Syntax: dec Rd Operands: Rd {r0, r1, …, r31} Operation: RdRd–1 Flags affected: S, V, N, Z Encoding: d dddd 1010 Words: Cycles: Example: ldi r17, $10 ; Load constant in r17 loop: add r1, r ; Add r2 to r1 dec r ; Decrement r17 brne loop; ; Branch to loop if r170 nop ; Continue (do nothing) COMP3221: Microprocessors and Embedded Systems--Lecture 7

Multiply Unsigned Syntax: mul Rd, Rr Operands: Rd, Rr {r0, r1, …, r31} Operation: r1, r0Rr*Rd (unsignedunsigned * unsigned ) Flags affected: Z, C Encoding: rd dddd rrrr Words: Cycles: Example: mul r6, r5 ; Multiply r6 and r5 mov r6, r1 mov r5, r0 ; Copy result back in r6 : r5 COMP3221: Microprocessors and Embedded Systems--Lecture 7

Multiply Signed Syntax: muls Rd, Rr Operands: Rd, Rr {r16, r17, …, r31} Operation: r1, r0Rr*Rd (signedsigned * signed ) Flags affected: Z, C Encoding: dddd rrrr Words: Cycles: Example: mul r17, r ; Multiply r17 and r16 movw r17:r16, r1:r0 ; Copy result back to r17 : r16 COMP3221: Microprocessors and Embedded Systems--Lecture 7

Multiply Signed with Unsigned
Syntax: mulsu Rd, Rr Operands: Rd, Rr {r16, r17, …, r23} Operation: r1, r0Rr*Rd (signedsigned * unsigned ) Flags affected: Z, C C is set if bit 15 of the result is set; cleared otherwise. Encoding: ddd 0rrr Words: Cycles: COMP3221: Microprocessors and Embedded Systems--Lecture 7

Multiply Signed with Unsigned (Cont.)
Example: Signed multiply of two 16-bit numbers stored in r23:r22 and r21:r20 with 32-bit result stored in r19:r18:r17:r16 How to do? Let ah and al be the high byte and low byte, respectively, of the multiplicand and bh and bb the high byte and low byte, respectively, of the multiplier. ah : al * bh : bl = (ah* 28+ al) * (bh* 28+bl) = ah*bh*216 + al*bh* 28 + ah*bl*28 + al*bl COMP3221: Microprocessors and Embedded Systems--Lecture 7

Multiply Signed with Unsigned (Cont.)
Example: Signed multiply of two 16-bit numbers stored in r23:r22 and r21:r20 with 32-bit result stored in r19:r18:r17:r16 muls16x16_32: clr r2 muls r23, r21 ; (signed) ah * (signed) bh movw r19 : r18, r1 : r0 mul r22, r ; (unsigned) al * (unsigned) bl movw r17 : r16, r1: r0 mulsu r23, r20 ; (signed) ah * (unsigned) bl sbc r19, r ; Trick here (Hint: what does the carry mean here?) add r17, r0 adc r18, r1 adc r19, r2 mulsu r21, r22 ; (signed) bh * (unsigned) al sbc r19, r ; Trick here add r17, r0 adc r18, r1 adc r19, r2 ret COMP3221: Microprocessors and Embedded Systems--Lecture 7

Lower-Case to Upper-Case
.include "m64def.inc" .equ size =5 .def counter =r17 .dseg .org 0x ; Set the starting address of data segment to 0x100 Cap_string: .byte 5 .cseg Low_string: .db "hello" ldi zl, low(Low_string<<1) ; Get the low byte of the address of "h" ldi zh, high(Low_string<<1) ; Get the high byte of the address of "h" ldi yh, high(Cap_string) ldi yl, low(Cap_string) clr counter ; counter=0

Lower-Case to Upper-Case (Cont.) main: lpm r20, z+ ; Load a letter from flash memory subi r20, ; Convert it to the capital letter st y+,r ; Store the capital letter in SRAM inc counter cpi counter, size brlt main loop: nop rjmp loop COMP3221: Microprocessors and Embedded Systems--Lecture 7

Bitwise AND Syntax: and Rd, Rr Operands: Rd, Rr {r0, r1, …, r31} Operation: RdRr · Rd (Bitwise AND Rr and Rd) Flags affected: S, V, N, Z Encoding: rd dddd rrrr Words: Cycles: Example: ldi r2, 0b ldi r16, 1 and r2, r16 ; r2=0b COMP3221: Microprocessors and Embedded Systems--Lecture 7

Bitwise OR Syntax: or Rd, Rr Operands: Rd, Rr {r0, r1, …, r31} Operation: RdRr v Rd (Bitwise OR Rr and Rd) Flags affected: S, V, N, Z Encoding: rd dddd rrrr Words: Cycles: Example: ldi r15, 0b ldi r16, 0b or r15, r16 ; Do bitwise or between registers ; r15=0b COMP3221: Microprocessors and Embedded Systems--Lecture 7

Bitwise Exclusive-OR Syntax: eor Rd, Rr Operands: Rd, Rr {r0, r1, …, r31} Operation: RdRr  Rd (Bitwise exclusive OR Rr and Rd) Flags affected: S, V, N, Z Encoding: rd dddd rrrr Words: Cycles: Example: eor r4, r4 ; Clear r4 eor r0, r22 ; Bitwise exclusive or between r0 and r22 ; If r0=0b and r22=0b ; then r0=0b COMP3221: Microprocessors and Embedded Systems--Lecture 7

Clear Bits in Register Syntax: cbr Rd, k Operands: Rd {r16, r17, …, r31} and 0  k  255 Operation: RdRd · ($FF-k) (Clear the bits specified by k ) Flags affected: S, V, N, Z Encoding: wwww dddd wwww (wwwwwwww=$FF-k) Words: Cycles: Example: cbr r4, ; Clear bits 0 and 1 of r4. COMP3221: Microprocessors and Embedded Systems--Lecture 7

Compare Syntax: cp Rd, Rr Operands: Rd {r0, r1, …, r31} Operation: Rd–Rr (Rd is not changed) Flags affected: H, S, V, N, Z, C Encoding: rd dddd rrrr Words: Cycles: Example: cp r4, r5 ; Compare r4 with r5 brne noteq ; Branch if r4  r5 ... noteq: nop ; Branch destination (do nothing) COMP3221: Microprocessors and Embedded Systems--Lecture 7

Compare with Carry Syntax: cpc Rd, Rr Operands: Rd {r0, r1, …, r31} Operation: Rd–Rr–C (Rd is not changed) Flags affected: H, S, V, N, Z, C Encoding: rd dddd rrrr Words: Cycles: Example: ; Compare r3:r2 with r1:r0 cp r2, r0 ; Compare low byte cpc r3, r1 ; Compare high byte brne noteq ; Branch if not equal ... noteq: nop ; Branch destination (do nothing) COMP3221: Microprocessors and Embedded Systems--Lecture 7

Compare with Immediate
Syntax: cpi Rd, k Operands: Rd {r16, r17, …, r31} and 0 k  255 Operation: Rd – k (Rd is not changed) Flags affected: H, S, V, N, Z, C Encoding: kkkk dddd kkkk Words: Cycles: Example: cp r19, 30 ; Compare r19 with 30 brne noteq ; Branch if r19  30 ... noteq: nop ; Branch destination (do nothing) COMP3221: Microprocessors and Embedded Systems--Lecture 7

Test for Zero or Minus Syntax: tst Rd Operands: Rd {r0, r1, …, r31} Operation: RdRd · Rd Flags affected: S, V, N, Z Encoding: dd dddd dddd Words: Cycles: Example: tst r0 ; Test r0 breq zero ; Branch if r0=0 ... zero: nop ; Branch destination (do nothing) COMP3221: Microprocessors and Embedded Systems--Lecture 7

One's Complement Syntax: com Rd Operands: Rd {r0, r1, …, r31} Operation: Rd$FF – Rd Flags affected: S, V, N, Z Encoding: d dddd 0000 Words: Cycles: Example: com r4 ; Take one's complement of r4 breq zero ; Branch if zero ... zero: nop ; Branch destination (do nothing) COMP3221: Microprocessors and Embedded Systems--Lecture 7

Two's Complement Syntax: neg Rd Operands: Rd {r0, r1, …, r31} Operation: Rd$00 – Rd (The value of $80 is left unchanged) Flags affected: H, S, V, N, Z, C H: R3 + Rd3 Set if there is a borrow from bit 3; cleared otherwise Encoding: d dddd 0001 Words: Cycles: Example: sub r11,r0 ;Subtract r0 from r11 brpl positive ;Branch if result positive neg r11 ;Take two's complement of r11 positive: nop ;Branch destination (do nothing)

Reading Material AVR Instruction Set. COMP3221: Microprocessors and Embedded Systems--Lecture 7

Lecture 8: Program Control Instructions Lecturer: Hui Wu Session 2, 2005 COMP3221: Microprocessors and Embedded Systems--Lecture 8

Overview Program control instructions in AVR Stacks Sample AVR assembly programs using program control instructions COMP3221: Microprocessors and Embedded Systems--Lecture 8

Motivations Arithmetic and logic instructions cannot change the program control flow. How to implement “ if some condition holds then do task A else do task B”? How to call a subroutine? How to return to the caller from a function (subroutine)? How to return from an interrupt handler? COMP3221: Microprocessors and Embedded Systems--Lecture 8

Selected Program Control Instructions
Unconditional jump: jmp, rjmp, ijmp Subroutine call: rcall, icall, call Subroutine and interrupt return: ret, reti Conditional branching: breq, brne, brsh, brlo, brge, brlt, brvs, brvc, brie, brid Refer to the main textbook and AVR Instruction Set for a complete list. COMP3221: Microprocessors and Embedded Systems--Lecture 8

Jump Syntax: jmp k Operands: ≤ k < 4M Operation: PCk Flag affected: None Encoding: k kkkk 110k kkkk kkkk kkkk kkkk Words: Cycles: Example: mov r1, r0 ; Copy r0 to r1 jmp farplc ; Unconditional jump ... farplc: inc r20 ; Jump destination COMP3221: Microprocessors and Embedded Systems--Lecture 8

Relative Jump Syntax: rjmp k Operands: K ≤ k < 2K Operation: PCPC+k+1 Flag affected: None Encoding: kkkk kkkk kkkk Words: Cycles: Example: cpi r16, $42 ; Compare r16 to $42 brne error ; Branch to error if r16  $42 rjmp ok ; jump to ok error: add r16, r17 ; Add r17 to r16 inc r16 ; Increment r16 ok: mov r2, r20 ; Jump destination COMP3221: Microprocessors and Embedded Systems--Lecture 8

Indirect Jump Syntax: ijmp Operation: (i) PCZ(15:0) Devices with 16 bits PC, 128K bytes program memory maximum. (ii) PC(15:0)Z(15:0)Devices with 22 bits PC, 8M bytes program memory maximum. PC(21:16) <- 0 Flag affected: None Encoding: Words: Cycles: COMP3221: Microprocessors and Embedded Systems--Lecture 8

Indirect Jump (Cont.) Example: clr r10 ; Clear r10
ldi r20, ; Load jump table offset ldi r30, low(Lab<<1) ; High byte of the starting address (base) of jump table ldi r31, high(Lab<<1) ; Low byte of the starting address (base) of jump table add r30, r20 adc r31, r ; Base + offset is the address of the jump table entry lpm r0, Z ; Load low byte of the the jump table entry lpm r1, Z ; Load high byte of the jump table entry movw r31:r30, r1:r ; Set the pointer register Z to point the target instruction ijmp ; Jump to the target instruction … Lab: .dw jt_l ; The first entry of the jump table .dw jt_l ; The second entry of the jump table jt_l0: nop jt_l1: nop

Stacks A stack is an area of memory that supports two operations push – put something on the top of the stack pop – take something off the top of the stack (LIFO – last in, first out) Every processor has a stack of some kind Used for procedure calls (or subroutines) and interrupts Used to store local variables in C Special register called a Stack Pointer (SP) stores the address of the top of the stack COMP3221: Microprocessors and Embedded Systems--Lecture 8

Stacks (Cont.) A stack will grow after push is executed. A stack will shrink after pop is executed. A stack may grow upwards (from a lower address to a higher address) or downwards (from a higher address to a lower address). The direction in which a stack grows is determined by the hardware. COMP3221: Microprocessors and Embedded Systems--Lecture 8

AVR and Stacks Stacks are part of SRAM space. Stacks grow downwards (from a higher address to a lower address). SP needs to hold addresses (therefore 16 bits wide). Made up of two 8 bit registers SPH (high byte) (IO register $3E) SPL (low byte) (IO register $3D) First thing to do in any program is to initialize the stack pointer. Typically stacks use the top of SRAM space. COMP3221: Microprocessors and Embedded Systems--Lecture 8

AVR Stack Initialization
.include "m64def.inc" .def temp=r20 .cseg ldi temp, low(RAMEND) out spl, temp ldi temp, high(RAMEND) out sph, temp 1 RAMEND–1 RAMEND SP COMP3221: Microprocessors and Embedded Systems--Lecture 8

AVR Stack Operations .include "m64def.inc" .def temp=r20 .cseg ldi temp, low(RAMEND) out spl, temp ldi temp, high(RAMEND) out sph, temp ldi r1, 0xff push r1 1 RAMEND–1 RAMEND 0xff SP COMP3221: Microprocessors and Embedded Systems--Lecture 8

AVR Stack Operations (Cont.)
.include "m64def.inc" .def temp=r20 .cseg ldi temp, low(RAMEND) out spl, temp ldi temp, high(RAMEND) out sph, temp ldi r1, 0xff push r1 pop r ; r2=0xff 1 RAMEND–1 RAMEND 0xff SP COMP3221: Microprocessors and Embedded Systems--Lecture 8

Relative Call to Subroutine
Syntax: rcall k Operands: -2K ≤ k < 2K Operation: (i) STACK ← PC + 1 (Store return address) (ii) SP ← SP – 2 (2 bytes, 16 bits) for devices with 16 bits PC SP ← SP – 3 (3 bytes, 22 bits) for devices with 22 bits PC (iii) PC ← PC + k + 1 Flag affected: None. Encoding: kkkk kkkk kkkk Words: Cycles: (Devices with 16-bit PC) 4 (Devices with 22-bit PC) COMP3221: Microprocessors and Embedded Systems--Lecture 8

Relative Call to Subroutine (Cont.)
Example: rcall routine ; Call subroutine ... routine: push r14 ; Save r14 on the stack push r15 ; Save r15 on the stack ; Put the code for the subroutine here. pop r15 ; Restore r15 pop r14 ; Restore r14 ret ; Return from subroutine COMP3221: Microprocessors and Embedded Systems--Lecture 8

Indirect Call to Subroutine
Syntax: icall Operation: (i) STACK ← PC + 1 (Store return address) (ii) SP ← SP – 2 (2 bytes, 16 bits) for devices with 16 bits PC SP ← SP – 3 (3 bytes, 22 bits) for devices with 22 bits PC (iii) PC(15:0) ← Z(15:0) for devices with 16 bits PC PC(15:0) ← Z(15:0) and PC(21:16) ← 0 for devices with 22 bits PC Flag affected: None. Encoding: Words: Cycles: (Devices with 16-bit PC) 4 (Devices with 22-bit PC) COMP3221: Microprocessors and Embedded Systems--Lecture 8

Indirect Call to Subroutine (Cont.)
Example: clr r ; Clear r10 ldi r20, ; Load call table offset ldi r30, low(Lab<<1) ; High byte of the starting address (base) of call table ldi r31, high(Lab<<1) ; Low byte of the starting address (base) of call table add r30, r20 adc r31, r ; Base + offset is the address of the call table entry lpm r0, Z ; Load low byte of the the call table entry lpm r1, Z ; Load high byte of the call table entry movw r31:r30, r1:r ; Set the pointer register Z to point the target function icall ; Call the target function … Lab: .dw ct_l ; The first entry of the call table .dw ct_l ; The second entry of the call table ct_l0: nop ct_l1: nop

Long Call to Subroutine
Syntax: call k Operands: 0 ≤ k < 64K Operation: (i) STACK ← PC + 1 (Store return address) (ii) SP ← SP – 2 (2 bytes, 16 bits) for devices with 16 bits PC SP ← SP – 3 (3 bytes, 22 bits) for devices with 22 bits PC (iii) PC ← k Flag affected: None. Encoding: k kkkk 111k kkkk kkkk kkkk kkkk Words: Cycles: (Devices with 16-bit PC) 5 (Devices with 22-bit PC) COMP3221: Microprocessors and Embedded Systems--Lecture 8

Long Call to Subroutine (Cont.)
Example: mov r16, r0 ; Copy r0 to r16 call check ; Call subroutine nop ; Continue (do nothing) ... check: cpi r16, $42 ; Check if r16 has a special value breq error ; Branch if equal … error: ldi r1, 1 … ; put the code for handling the error here ret ; Return from subroutine COMP3221: Microprocessors and Embedded Systems--Lecture 8

Return from Subroutine
Syntax: ret Operation: (i) SP ← SP + 2 (2 bytes, 16 bits) for devices with 16 bits PC SP ← SP + 3 (3 bytes, 22 bits) for devices with 22 bits PC (ii) PC(15:0) ← STACK for devices with 16 bits PC PC(21:0) ← STACK Devices with 22 bits PC Flag affected: None Encoding: Words: Cycles: (Devices with 16-bit PC) 5 (Devices with 22-bit PC) Example: routine: push r14 ; Save r14 on the stack ; Put the code for the subroutine here. pop r14 ; Restore r14 ret ; Return from subroutine COMP3221: Microprocessors and Embedded Systems--Lecture 8

Return from Interrupt Syntax: reti Operation: (i) SP ← SP + 2 (2 bytes, 16 bits) for devices with 16 bits PC SP ← SP + 3 (3 bytes, 22 bits) for devices with 22 bits PC (ii) Set global interrupt flag I (Bit 7 of the Program Status Register). (ii) PC(15:0) ← STACK for devices with 16 bits PC PC(21:0) ← STACK Devices with 22 bits PC Flag affected: I ← 1 Encoding: Words: Cycles: (Devices with 16-bit PC) 5 (Devices with 22-bit PC) COMP3221: Microprocessors and Embedded Systems--Lecture 8

Return from Interrupt (Cont.)
Example: ... extint: push r0 ; Save r0 on the stack pop r0 ; Restore r0 reti ; Return and enable interrupts Will cover details later COMP3221: Microprocessors and Embedded Systems--Lecture 8

Branch If Equal Syntax: breq k Operands: -64 ≤ k < 63 Operation: If Rd = Rr (Z = 1) then PC  PC + k + 1, else PC  PC + 1 Flag affected: None Encoding: kk kkkk k001 Words: Cycles: if condition is false 2 if conditional is true Example: cp r1, r0 ; Compare registers r1 and r0 breq equal ; Branch if registers equal ... equal: nop ; Branch destination (do nothing) COMP3221: Microprocessors and Embedded Systems--Lecture 8

Branch If Same or Higher (Unsigned)
Syntax: brsh k Operands: -64 ≤ k < 63 Operation: if rd  Rr (unsigned comparison) then PC  PC + k + 1, else PC  PC + 1 Flag affected: none Encoding: kk kkkk k000 Words: Cycles: if condition is false 2 if conditional is true Example: sbi r26, $56 ; subtract $56 from r26 brsh test ; branch if r26  $56 ... Test: nop ; branch destination … COMP3221: Microprocessors and Embedded Systems--Lecture 8

Branch If Lower (Unsigned)
Syntax: brlo k Operands: -64 ≤ k < 63 Operation: If Rd < Rr (unsigned comparison) then PC  PC + k + 1, else PC  PC + 1 Flag affected: None Encoding: kk kkkk k000 Words: Cycles: if condition is false 2 if conditional is true Example: eor r19, r19 ; Clear r19 loop: inc r19 ; Increase r19 ... cpi r19, $10 ; Compare r19 with $10 brlo loop ; Branch if r19 < $10 (unsigned) nop ; Exit from loop (do nothing) COMP3221: Microprocessors and Embedded Systems--Lecture 8

Branch If Less Than (Signed)
Syntax: brlt k Operands: -64 ≤ k < 63 Operation: If Rd < Rr (signed comparison) then PC  PC + k + 1, else PC  PC + 1 Flag affected: None Encoding: kk kkkk k100 Words: Cycles: if condition is false 2 if conditional is true Example: cp r16, r1 ; Compare r16 to r1 brlt less ; Branch if r16 < r1 (signed) ... less: nop ; Branch destination (do nothing) COMP3221: Microprocessors and Embedded Systems--Lecture 8

Branch If Greater or Equal (Signed)
Syntax: brge k Operands: -64 ≤ k < 63 Operation: If Rd  Rr (signed comparison) then PC  PC + k + 1, else PC  PC + 1 Flag affected: None Encoding: kk kkkk k100 Words: Cycles: if condition is false 2 if conditional is true Example: cp r11, r12 ; Compare registers r11 and r12 brge greateq ; Branch if r11 ≥ r12 (signed) ... greateq: nop ; Branch destination (do nothing) COMP3221: Microprocessors and Embedded Systems--Lecture 8

Branch If Overflow Set Syntax: brvs k Operands: -64 ≤ k < 63 Operation: If V=1 then PC  PC + k + 1, else PC  PC + 1 Flag affected: None Encoding: kk kkkk k011 Words: Cycles: if condition is false 2 if conditional is true Example: add r3, r4 ; Add r4 to r3 brvs overfl ; Branch if overflow ... overfl: nop ; Branch destination (do nothing) COMP3221: Microprocessors and Embedded Systems--Lecture 8

Branch If Overflow Clear
Syntax: brvc k Operands: -64 ≤ k < 63 Operation: If V=0 then PC  PC + k + 1, else PC  PC + 1 Flag affected: None Encoding: kk kkkk k011 Words: Cycles: if condition is false 2 if conditional is true Example: add r3, r4 ; Add r4 to r3 brvs noover ; Branch if no overflow ... noover: nop ; Branch destination (do nothing) COMP3221: Microprocessors and Embedded Systems--Lecture 8

Branch if Global Interrupt is Enabled
Syntax: brie k Operands: -64 ≤ k < 63 Operation: If I=1 then PC  PC + k + 1, else PC  PC + 1 Flag affected: None Encoding: kk kkkk k111 Words: Cycles: if condition is false 2 if conditional is true Example: brvs inten ; Branch if the global interrupt is enabled ... inten: nop ; Branch destination (do nothing) COMP3221: Microprocessors and Embedded Systems--Lecture 8

Branch if Global Interrupt is Disabled
Syntax: brid k Operands: -64 ≤ k < 63 Operation: If I=0 then PC  PC + k + 1, else PC  PC + 1 Flag affected: None Encoding: kk kkkk k111 Words: Cycles: if condition is false 2 if conditional is true Example: brid intdis ; Branch if the global interrupt is enabled ... intdis: nop ; Branch destination (do nothing) COMP3221: Microprocessors and Embedded Systems--Lecture 8

Reading Material AVR Instruction Set. COMP3221: Microprocessors and Embedded Systems--Lecture 8

Lecture 9: Data Transfer Instructions Lecturer: Hui Wu Session 2, 2005 COMP3221: Microprocessors and Embedded Systems--Lecture 9

Overview Data transfer instructions in AVR Sample AVR assembly programs using data transfer instructions COMP3221: Microprocessors and Embedded Systems--Lecture 9

Motivations How to transfer data between two registers? How to transfer data between memory and a register? How to transfer a constant to a register? COMP3221: Microprocessors and Embedded Systems--Lecture 9

Selected Data Transfer Instructions
mov, movw ldi, ld, ldd, lds st, sts lpm in, out Push, pop Refer to the main textbook and AVR Instruction Set for a complete list. COMP3221: Microprocessors and Embedded Systems--Lecture 9

Copy Register Syntax: mov Rd, Rr Operands: Rd, Rr {r0, r1, …, r31} Operation: RdRr Flag affected: None Encoding: rd dddd rrrr Words: Cycles: Example: mov r1, r0 ; Copy r0 to r1 COMP3221: Microprocessors and Embedded Systems--Lecture 9

Copy Register Pair Syntax: movw Rd+1:Rd, Rr+1:Rr Operands: d, r {0, 2, …, 28, 30} Operation: Rd+1:RdRr+1:Rr Flag affected: None Encoding: dddd rrrr Words: Cycles: Example: movw r21:r20, r1:r0 ; Copy r1:r0 to r21:r20 COMP3221: Microprocessors and Embedded Systems--Lecture 9

Load Immediate Syntax: ldi Rd, k Operands: Rd{r16, r17, …, r31} and 0 ≤ k  255 Operation: Rdk Flag affected: None Encoding: kkkk dddd kkkk Words: Cycles: Example: ldi r16, $42 ; Load $42 to r16 COMP3221: Microprocessors and Embedded Systems--Lecture 9

Load Indirect Syntax: ld Rd, v Operands: Rd{r0, r1, …, r31} and v{x, x+, -x, y, y+, -y, z, z+, -z} Operation: (i) Rd(v) if v {x, y, z} (ii) xx-1 and Rd(x) if v =-x yy-1 and Rd(y) if v =-y zz-1 and Rd(z) if v =-z (iii) Rd(x) and xx+1 if v =x+ Rd(y) and yy+1 if v =y+ Rd(z) and zz+1 if v =z+ Flag affected: None Encoding: Depends on v. Refer to AVR Instruction Set for details Words: Cycles: Comments: Post-inc and pre-dec are used to load contiguous data. COMP3221: Microprocessors and Embedded Systems--Lecture 9

Load Indirect (Cont.) Example : 4-byte integer addition
.def loop_counter = r20 .equ loop_bound = 4 .dseg int1: .byte ; Allocate 4 bytes to the first integer; int2: .byte ; Allocate 4 byte to the second integer; int3: .byte ; Allocate 4 byte to store the result .cseg ldi r26, low(int1) ; Load low byte of the address of the 1st int ldi r27, high(int1) ; Load high byte of the address of the 2nd int ldi r28, low(int2) ldi r29, high(int2) ldi r30, low(int3) ldi r31, high(int3) clr loop_counter ; loop_counter=0

Load Indirect (Cont.) Example: 4-byte integer addition. loop: ld r0, x ; Load the next byte of the 1st int ld r1, y ; Load the next byte of the 2nd int inc loop_counter cpi loop_counter, ; Least significant byte? breq first_byte adc r1, r ; Add two bytes with carry jmp store first_byte: add r1, r ; Add two least significant bytes store: st z+, r ; Store the result cpi loop_counter, loop_bound ; End of loop? brlt loop ret COMP3221: Microprocessors and Embedded Systems--Lecture 9

Load Indirect with Displacement
Syntax: ldd Rd, v Operands: Rd{r0, r1, …, r31} and v{y+q, z+q} Operation: Rd(v) Flag affected: None Encoding: Depends on v. Refer to AVR Instruction Set for details Words: Cycles: Example: clr r31 ; Clear Z high byte ldi r30, $60 ; Set Z low byte to $60 ld r0, Z+ ; Load r0 with data space loc. $60(Z post inc) ld r1, Z ; Load r1 with data space loc. $61 ldi r30, $63 ; Set Z low byte to $63 ld r2, Z ; Load r2 with data space loc. $63 ld r3, -Z ; Load r3 with data space loc. $62(Z pre dec) ldd r4, Z+2 ; Load r4 with data space loc. $64 Comments: ldd is used to load an element of a structure.

Store Indirect Syntax: st v, Rr
Operands: Rr{r0, r1, …, r31} and v{x, x+, -x, y, y+, -y, z, z+, -z} Operation: (i) (v)Rr if v {x, y, z} (ii) xx-1 and (x)Rr if v =-x yy-1 and (y)Rr if v =-y zz-1 and (z)Rr if v =-z (iii) (x)Rr and xx+1 if v =x+ (y)Rr and yy+1 if v =y+ (z)Rr and zz+1 if v =z+ Flag affected: None Encoding: Depends on v. Refer to AVR Instruction Set for details Words: Cycles: Comments: Post-inc and pre-dec are used to store contiguous data.

Store Indirect with Displacement
Syntax: std v, Rr Operands: Rd{r0, r1, …, r31} and v{y+q, z+q} Operation: (v)Rr Flag affected: None Encoding: Depends on v. Refer to AVR Instruction Set for details Words: Cycles: Example: clr r29 ; Clear Y high byte ldi r28, $60 ; Set Y low byte to $60 st Y+, r0 ; Store r0 in data space loc. $60(Y post inc) st Y, r1 ; Store r1 in data space loc. $61 ldi r28, $63 ; Set Y low byte to $63 st Y, r2 ; Store r2 in data space loc. $63 st -Y, r3 ; Store r3 in data space loc. $62 (Y pre dec) std Y+2, r4 ; Store r4 in data space loc. $64 Comments: std is used to store an element of a structure.

Load Program Memory Syntax: Operands: Operations:
(i) LPM None, R0 implied R0(Z) (ii) LPM Rd, Z 0 ≤ d ≤ 31 Rd(Z) (iii) LPM Rd, Z+ 0 ≤ d ≤ 31 Rd(Z) Flag affected: None Encoding: (i) (ii) d dddd 0100 (iii) d dddd 0101 Words: Cycles: Comments: Z contains the byte address while the flash memory uses word addressing. Therefore, the word address must be converted into byte address before having access to data on flash memory.

Load Program Memory (Cont.)
Example ldi zh, high(Table_1<<1) ; Initialize Z pointer ldi zl, low(Table_1<<1) lpm r16, z+ ; r16=0x76 lpm r17, z ; r17=0x58 ... Table_1: .dw 0x5876 … Comments: Table_1<<1 converts word address into byte address COMP3221: Microprocessors and Embedded Systems--Lecture 9

Load an I/O Location to Register
Syntax: in Rd, A Operands: Rd{r0, r1, …, r31} and 0A63 Operation: RdI/O (A) Loads one byte from the location A in the I/O Space (Ports, Timers, Configuration registers etc.) into register Rd in the register file. Flag affected: None Encoding: AAd dddd AAAA Words: Cycles: Example: in r25, $16 ; Read Port B cpi r25, 4 ; Compare read value to constant breq exit ; Branch if r25=4 ... exit: nop ; Branch destination (do nothing)

Store Register to an I/O Location
Syntax: out A, Rr Operands: Rr{r0, r1, …, r31} and 0A63 Operation: I/O (A)Rr Store the byte in register Rr to the I/O location (register). Flag affected: None Encoding: AAr rrrr AAAA Words: Cycles: Example: clr r16 ; Clear r16 ser r17 ; Set r17 to $ff out $18, r16 ; Write zeros to Port B nop ; Wait (do nothing) out $18, r17 ; Write ones to Port B

Push Register on Stack Syntax: push Rr Operands: Rr{r0, r1, …, r31}
Operation: (SP)  Rr SP  SP –1 Flag affected: None Encoding: d dddd 1111 Words: Cycles: Example call routine ; Call subroutine ... routine: push r ; Save r14 on the stack push r13 ; Save r13 on the stack pop r13 ; Restore r13 pop r14 ; Restore r14 ret ; Return from subroutine

Pop Register from Stack
Syntax: pop Rr Operands: Rr{r0, r1, …, r31} Operation: Rr  (SP) SP  SP +1 Flag affected: None Encoding: d dddd 1111 Words: Cycles: Example call routine ; Call subroutine ... routine: push r ; Save r14 on the stack push r13 ; Save r13 on the stack pop r13 ; Restore r13 pop r14 ; Restore r14 ret ; Return from subroutine

Lecture 10: Shift and Bit-set Instructions Lecturer: Hui Wu Session 2, 2005 COMP3221: Microprocessors and Embedded Systems--Lecture 10

Overview Shift and bit-set instructions in AVR Sample AVR assembly programs using shift and bit-set instructions COMP3221: Microprocessors and Embedded Systems--Lecture 10

Selected Shift and Bit-set Instructions
Shift instructions: lsl, lsr, rol, ror, asr Bit-set Instructions: bset, bclr, sbi, cbi, bst, bld, sex, clx, nop, sleep, wdr, break Refer to AVR Instruction Set for a complete list. COMP3221: Microprocessors and Embedded Systems--Lecture 10

Logical Shift Left Syntax: lsl Rd Operands: Rd {r0, r1, …, r31}
Operation: CRd7, Rd7 Rd6, Rd6 Rd5, …, Rd1 Rd0, Rd00 Flags affected: H, S, V, N, Z, C C Rd7 Set if, before the shift, the MSB of Rd was set; cleared otherwise. N  R7 Set if MSB of the result is set; cleared otherwise. VN  C S N  V For signed tests. C Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0

Logical Shift Left (Cont.)
Encoding: dd dddd dddd Words: Cycles: Example: add r0, r4 ; Add r4 to r0 lsl r0 ; Multiply r0 by 2 Comments: This operation effectively multiplies a one-byte signed or unsigned integer by two.

Logical Shift Right Syntax: lsr Rd Operands: Rd {r0, r1, …, r31}
Operation: CRd0, Rd0 Rd1, Rd1 Rd2, …, Rd6 Rd7, Rd70 Flags affected: H, S, V, N, Z, C Encoding: d dddd 0110 Words: Cycles: Example: add r0, r4 ; Add r4 to r0 lsr r0 ; Divide r0 by 2 Comments: This instruction effectively divides an unsigned one-byte integer by two. C stores the remainder. Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit C

Rotate Left Through Carry
Syntax: rol Rd Operands: Rd {r0, r1, …, r31} Operation: temp  C, CRd7, Rd7  Rd6, Rd6  Rd5, …, Rd1 Rd0, Rd0  temp C Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0 COMP3221: Microprocessors and Embedded Systems--Lecture 10

Rotate Left Through Carry (Cont.)
Flag affected: H, S, V, N, Z, C C Rd7 Set if, before the shift, the MSB of Rd was set; cleared otherwise. N  R7 Set if MSB of the result is set; cleared otherwise. VN  C S N  V For signed tests. Encoding: dd dddd dddd Words: Cycles: COMP3221: Microprocessors and Embedded Systems--Lecture 10

Rotate Left Through Carry (Cont.)
Example: Assume a 32-bit signed or unsigned integer x is stored in registers r13: r12:r11:r10. The following code computes 2*x. lsl r ; Shift byte 0 (least significant byte) left rol r ; Shift byte 1 left through carry rol r ; Shift Byte 2 left through carry rol r ; Shift Byte 3 (most significant byte) left through carry COMP3221: Microprocessors and Embedded Systems--Lecture 10

Rotate Right Through Carry
Syntax: ror Rd Operands: Rd {r0, r1, …, r31} Operation: temp  Rd0, Rd0Rd1, Rd1  Rd2, … , Rd6 Rd7, Rd7  C, C  temp. Flags affected: H, S, V, N, Z, C Encoding: d dddd 0111 Words: Cycles: C Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0 COMP3221: Microprocessors and Embedded Systems--Lecture 10

Rotate Right Through Carry (Cont.)
Example: Assume a 32-bit signed or unsigned integer x is stored in registers r13: r12:r11:r10. The following code computes x/2. asr r ; Shift byte 3 (most significant byte) right ror r ; Shift byte 2 right through carry ror r ; Shift Byte 1 right through carry ror r ; Shift Byte 0 (least significant byte) right through carry COMP3221: Microprocessors and Embedded Systems--Lecture 10

Arithmetic Shift Right
Syntax: asr Rd Operands: Rd {r0, r1, …, r31} Operation: C Rd0, Rd0Rd1, Rd1  Rd2, … , Rd6 Rd7 Flag affected: H, S, V, N, Z, C Encoding: d dddd 0101 Words: Cycles: C Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0 COMP3221: Microprocessors and Embedded Systems--Lecture 10

Arithmetic Shift Right (Cont.)
Example ldi r10, ; r10=10 ldi r11, ; r11=-20 add r10, r ; r10=-20+10 asr r ; r10=(-20+10)/2 Comments: This instruction effectively divides a signed value by two. C stores the remainder. COMP3221: Microprocessors and Embedded Systems--Lecture 10

Bit Set in Status Register
Syntax: bset s Operation: Bit s of SREG (Status Register)1 Operands: 0  s 7 Flags affected I: 1 if s = 7; Unchanged otherwise. T: 1 if s = 6; Unchanged otherwise. H: 1 if s = 5; Unchanged otherwise. S: 1 if s = 4; Unchanged otherwise. V: 1 if s = 3; Unchanged otherwise. N: 1 if s = 2; Unchanged otherwise. Z: 1 if s = 1; Unchanged otherwise. C: 1 if s = 0; Unchanged otherwie.

Bit Set in Status Register (Cont.)
Encoding: sss 1000 Words: Cycles: Example bset ; Set C bset ; Set V bset ; Enable interrupt

Bit Clear in Status Register
Syntax: bclr s Operation: Bit s of SREG (Status Register)0 Operands: 0  s 7 Flags affected I: 0 if s = 7; Unchanged otherwise. T: 0 if s = 6; Unchanged otherwise. H: 0 if s = 5; Unchanged otherwise. S: 0 if s = 4; Unchanged otherwise. V: 0 if s = 3; Unchanged otherwise. N: 0 if s = 2; Unchanged otherwise. Z: 0 if s = 1; Unchanged otherwise. C: 0 if s = 0; Unchanged otherwie.

Bit Clear in Status Register (Cont.)
Encoding: sss 1000 Words: Cycles: Example bclr ; Clear C bclr ; Clear V bclr ; Disable interrupt

Set Bit in I/O Register Syntax: sbi A, b Encoding: 1001 1010 AAAA Abbb
Operation: Bit b of the I/O register with address A1 Operands:  A 31 and 0  b 7 Flags affected: None Encoding: AAAA Abbb Words: Cycles: Example out $1E, r0 ; Write EEPROM address sbi $1C, 0 ; Set read bit in EECR in r1, $1D ; Read EEPROM data

Clear Bit in I/O Register
Syntax: cbi A, b Operation: Bit s of the I/O register with address A0 Operands:  A 31 and 0  b 7 Flags affected: None Encoding: AAAA Abbb Words: Cycles: Example cbi $12, 7 ; Clear bit 7 in Port D

Set Flags Syntax: sex Encoding: Depends on x.
where x {I, T, H, S, V, N, Z, C} Operation: Flag x1 Operands: None Flags affected: Flag x1 Encoding: Depends on x. Refer to the AVR Instruction Set for details of encoding. Words: Cycles:

Set Flags (Cont.) Example sec ; Set carry flag adc r0, r1 ; r0=r0+r1+1
sbc r0, r ; r0=r0–r11 sen ; Set negative flag sei ; Enable interrupt sev ; Set overflow flag sez ; Set zero flag ses ; Set sign flag

Clear Flags Syntax: clx Encoding: Depends on x.
where x {I, T, H, S, V, N, Z, C} Operation: Flag x0 Operands: None Flags affected: Flag x0 Encoding: Depends on x. Refer to the AVR Instruction Set for details of encoding. Words: Cycles:

Clear Flags Example clc ; Clear carry flag cln ; Clear negative flag
cli ; Disable interrupt clv ; Clear overflow flag clz ; Clear zero flag cls ; Clear sign flag

No Operation Syntax: nop Operation: No Operands: None
Flags affected: None Encoding: Words: Cycles: Example clr r16 ; Clear r16 ser r17 ; r17=0xff out $18, r16 ; Write zeros to Port B nop ; Wait (do nothing) out $18, r17 ; Write ones to Port B

Sleep Syntax: sleep Operation: Sets the circuit in sleep mode defined by the MCU control register. When an interrupt wakes the MCU from the sleep mode, the instructions following the sleep instruction will be executed. Operands: None Flags affected: None Encoding: Words: Cycles: Example mov r0, r11 ; Copy r11 to r0 ldi r16, (1<<SE) ; Enable sleep mode (SE=5) out MCUCR, r16 sleep ; Put MCU in sleep mode

Watchdog Reset Syntax: wdr Operation: Resets the Watchdog Timer. This instruction must be executed within a limited time given by the WD prescaler. See the Watchdog Timer hardware specification. Operands: None Flags affected: None Encoding: Words: Cycles: Example wdr ; Reset watchdog timer COMP3221: Microprocessors and Embedded Systems--Lecture 10

Break Syntax: break Operation: The break instruction is used by the On-Chip Debug system, and is normally not used in the application software. When the BREAK instruction is executed, the AVR CPU is set in the Stopped Mode. This gives the On-Chip Debugger access to internal resources. If any lock bits are set, or either the JTAGEN or OCDEN fuses are unprogrammed, the CPU will treat the break instruction as a nop and will not enter the Stopped Mode. Operands: None Flags affected: None Encoding: Words: Cycles: Example break ; stop here

Lecture 11: Assembly Lecturer: Hui Wu Session 2, 2005

Overview Pseudo Instructions Macro Assembly Process COMP3221/9221: Microprocessors and Embedded Systems

Assembly Language Format An input line takes one of the following forms : [label:] directive [operands] [Comment] [label:] instruction [operands] [Comment] Comment Empty line A comment has the following form: ; [Text] Items placed in braces are optional. The text between the comment-delimiter (;) and the end of line (EOL) is ignored by the Assembler. COMP3221/9221: Microprocessors and Embedded Systems

Memory Segments Different types of memory are known as segments to the assembler Assembler directives enable code/data to be placed into different segments AVR has Data segment (SRAM) Can’t place data here, just reserve space (for variables) Code segment (Flash) Can place code or constant data here EEPROM Segment Can place constants here COMP3221/9221: Microprocessors and Embedded Systems

Pseudo Instructions From AVR Studio Help
Directive Description BYTE Reserve byte to a variable CSEG Code Segment CSEGSIZE Program memory size DB Define constant byte(s) DEF Define a symbolic name on a register DEVICE Define which device to assemble for DSEG Data Segment DW Define Constant word(s) ENDM, ENDMACRO End macro EQU Set a symbol equal to an expression ESEG EEPROM Segment EXIT Exit from file INCLUDE Read source from another file LIST Turn listfile generation on LISTMAC Turn Macro expansion in list file on MACRO Begin macro NOLIST Turn listfile generation off ORG Set program origin SET Set a symbol to an expression Pseudo Instructions From AVR Studio Help These are for the AVR Studio Assembler

Pseudo Instructions .byte: Reserve space; only allowed in dseg
Segment directives .cseg and .dseg allow the text and data segments to be built up in pieces: .dseg amount: .byte 2 .cseg formula: inc r0 count: .byte 2 .db: Initialize constant in code or EEPROM segment .dw: As above but defines a 16-bit word

Pseudo Instructions .def: Make a definition for registers only
.def ZH=r31 .def ZL=r30 .device: Specify the exact processor that this program is designed for .device AT90S8515 Prohibits use of non-implemented instructions .macro, .endm: Begin and end macro definition .include: Include a file .exit: Stop processing this file

Expressions Expressions can consist of operands, operators and functions. All expressions are internally 32 bits. Example: ldi r26, low(label + 0xff0) Function Operands Operator

Operands User defined labels which are given the value of the location counter at the place they appear. User defined variables defined by the SET directive User defined constants defined by the EQU directive Integer constants: constants can be given in several formats, including Decimal (default): 10, 255 Hexadecimal (two notations): 0x0a, $0a, 0xff, $ff Binary: 0b , 0b Octal (leading zero): 010, 077 PC - the current value of the Program memory location counter. COMP3221/9221: Microprocessors and Embedded Systems

Operators Symbol Description ! Logical Not ~ Bitwise Not - Unary Minus
* Multiplication / Division + Addition Subtraction << Shift left >> Shift right < Less than <= Less than or equal > Greater than >= Greater than or equal == Equal != Not equal & Bitwise And ^ Bitwise Xor | Bitwise Or && Logical And || Logical Or Same meanings as in c

Functions LOW(expression): Returns the low byte of an expression HIGH(expression): Returns the second byte of an expression BYTE2(expression): The same function as HIGH BYTE3(expression): Returns the third byte of an expression BYTE4(expression): Returns the fourth byte of an expression LWRD(expression): Returns bits 0-15 of an expression HWRD(expression): Returns bits of an expression PAGE(expression): Returns bits of an expression EXP2(expression): Returns 2 to the power of expression LOG2(expression): Returns the integer part of log2(expression) COMP3221/9221: Microprocessors and Embedded Systems

Functions (Cont.) Examples cp r0, low(-13167) cpc r1, high(-13167) brlt case1 … case1: inc r10 COMP3221/9221: Microprocessors and Embedded Systems

Macros Assembler programmers often need to repeat sequences of instructions several times Could just type them out – tedious Could just copy and paste - then the specializations are often forgotten or wrong Could use a subroutine, but then there is the overhead of the call and return instructions Macros solve this problem Consider code to swap two bytes in memory: lds r2, p lds r3, q sts q, r2 sts p, r3 COMP3221/9221: Microprocessors and Embedded Systems

Macros With macro .macro myswap lds r2, p lds r3, q sts q, r2 sts p, r3 .endmacro myswap Swapping p and q twice Without macro lds r2, p lds r3, q sts q, r2 sts p, r3 COMP3221/9221: Microprocessors and Embedded Systems

AVR Macro Parameters There are up to 10 parameters Indicated in the macro body @0 is the first the second, and so on Other assemblers let you give meaningful names to parameters COMP3221/9221: Microprocessors and Embedded Systems

AVR Parameterised Macro Without macro lds r2, p lds r3, q sts q, r2 sts p, r3 lds r2, r lds r3, s sts s, r2 sts r, r3 With macro .macro change lds lds r2 r3 .endmacro change p, q change r, s COMP3221/9221: Microprocessors and Embedded Systems

Another Example Subtract 16-bit immediate value from 16 bit number stored in two registers .MACRO SUBI16 ; Start macro definition subi ; Subtract low byte sbci ; Subtract high byte .ENDMACRO ; End macro definition .CSEG ; Start code segment SUBI16 0x1234,r16,r17 ; Sub.0x1234 from ; r17:r16 Useful for other 16-bit operations on an 8-bit processor COMP3221/9221: Microprocessors and Embedded Systems

Two Pass Assembly Process We need to process the file twice Pass One Lexical and syntax analysis: checking for syntax errors Record all the symbols (labels etc) in a symbol table Expand macro calls Pass Two Use the symbol table to substitute the values for the symbols and evaluate functions. Assemble each instruction i.e. generate machine code COMP3221/9221: Microprocessors and Embedded Systems

An Example .include "m64def.inc" .equ bound =5 .def counter =r17 .dseg
Cap_word:.byte 5 .cseg rjmp start ; Interrupt vector tables starts at 0x00 .org 0x003E ; Program starts at 0x003E Low_word: .db "hello“ start: ldi zl, low(Low_word<<1) ; Get the low byte of the address of "h" ldi zh, high(Low_word<<1) ; Get the high byte of the address of "h" ldi yh, high(Cap_word) ldi yl, low(Cap_word) clr counter ; counter=0

An Example (Cont.) main: lpm r20, z+ ; Load a letter from flash memory subi r20, ; Convert it to the capital letter st y+, r ; Store the capital letter in SRAM inc counter cpi counter, bound brlt main loop: nop rjmp loop COMP3221/9221: Microprocessors and Embedded Systems

An Example (Cont.) Value Pass 1: Lexical and syntax analysis Symbol 5
Symbol Table Symbol Value bound 5 counter 17 Cap_word 0x0000 Low_word 0x003E start 0x0041 main 0x0046 loop 0x004c

An Example (Cont.) Pass 2: code generation.
Program address Machine code Assembly code 0x : C rjmp start … 0x E: “he” ; Little endian 0x F: C6C “ll” 0x : F “o” 0x : E7EC ldi zl, low(Low_word<<1) 0x : E0F ldi zh, high(Low_word<<1) 0x : E0D ldi yh, high(Cap_word) 0x : E6C ldi yl, low(Cap_word) 0x : clr counter

Absolute Assemblers The only source file contains all the source code of the program Programmers use .org to tell the assembler the starting address of a segment (data segment or code segment) Whenever any change is made in the source program, all code must be assembled. A downloader transfers an executable file (machine code) to the target system. COMP3221/9221: Microprocessors and Embedded Systems

Absolute Assemblers (Cont.)
Source file with location information (NAME.ASM) Absolute assembler Absolute Assembler Operation Executable file (NAME.EXE) Loader Program Computer memory

Relocatable Assemblers The program may be split into multiple source files Each source file can be assembled separately Each file is assembled into an object file where some addresses may not be resolved A linker program is needed to resolve all unresolved addresses and make all object files into a single executable file COMP3221/9221: Microprocessors and Embedded Systems

Relocatable Assemblers (Cont.) Source file 1 (MODULE1.ASM Source file 2 (MODULE2.ASM Relocatable assembler Relocatable assembler Object file1 (MODULE1.OBJ Object file2 (MODULE2.OBJ COMP3221/9221: Microprocessors and Embedded Systems

Linker Takes all object files and links them together and locates all addresses Works together with relocatable assembler COMP3221/9221: Microprocessors and Embedded Systems

Linker (Cont.) Source file 1 (MODULE1.ASM Source file 2 (MODULE1.ASM
Relocatable assembler Relocatable assembler Object file1 (MODULE1.OBJ Library of object files (FILE.LIB) Object file2 (MODULE2.OBJ Code and data location information Linker program Executable file (NAME.EXE)

Loader Puts an executable file into the memory of the computer. May take many forms. Part of an operating system. A downloader program that takes an executable file created on one computer and puts it into the target system. A system that burns a programmable read-only memory (ROM). COMP3221/9221: Microprocessors and Embedded Systems

Reading Chap. 5. Microcontrollers and Microcomputers User’s guide to AVR assembler – This guide is a part of the on-line documentations accompanied with AVR Studio. Click help in AVR Studio. COMP3221/9221: Microprocessors and Embedded Systems

Lecture 12: Functions I Lecturer: Hui Wu Session 2, 2005

Overview Variable types Memory sections in C Parameter passing Stack frames COMP3221/9221: Microprocessors and Embedded Systems

Types of Variables in C Global variables: The variable that are declared outside a function Exist during the execution of the program 2 Local variables: The variables that are declared in a function. Exist during the execution of the function only 3. Static variables. Can be either global or local. A global static variable is valid only within the file where it is declared A local static variable still exists after the function returns COMP3221/9221: Microprocessors and Embedded Systems

Variable Types and Memory Sections
Global variables occupy their memory space during the execution of the program Need the static memory which exists during the program’s lifetime Static local variables still occupy their memory space after the function returns. Also need the static memory which exists after the function returns. Local variables occupy their memory space only during the execution of the function. Need the dynamic memory which exists only during the execution of the function So the entire memory space need be partitioned into different sections to be more efficiently utilized.

An Example #inlcude <stdio.h> int x, y; /* Global variables */
static int b[10]; /* Static global array */ void auto_static(void) { int autovar=1; /* Local variable */ static int staticvar=1; /* Static local variable */ printf(autovar = %i, staticvar = %i\n, autovar, staticvar); ++autovar; ++staticvar; }

An Example (Cont.) int main(void) { int i; /* Local variable */ void auto_static(void); for (i=0; i<5; i++) auto_static(); return 0; } COMP3221/9221: Microprocessors and Embedded Systems

An Example (Cont.) Program output: Autovar = 1, staticvar = 1 Autovar = 1, staticvar = 2 Autovar = 1, staticvar = 3 Autovar = 1, staticvar = 4 Autovar = 1, staticvar = 5 COMP3221/9221: Microprocessors and Embedded Systems

Memory Sections in C for General Microprocessors
Heap: Used for dynamic memory applications such as malloc() and calloc() Stack: Used to store return address, actual parameters, conflict registers and local variables and other information. Uninitialized data section .bss, contains all uninitialized global or static local variables. Data section .data. Contains all initialized global or static local variables Text section .text Contains code

Memory Sections in WINAVR (C for AVR)
Additional EEPROM section .eeprom Contains constants in eeprom The text section .text in WINAVR includes two subsections .initN and .finiN .initN contains the startup code which initializes the stack and copies the initialized data section .data from flash to SRAM. .finiN is used to define the exit code executed after return from main() or a call to exit().

C Functions void main(void) { int i, j, k, m; i = mult(j,k); ... ; m = mult(i,i); …; } int mult (int mcand, int mlier) { int product = 0; while (mlier > 0) { product = product + mcand; mlier = mlier -1; } return product; Caller Actual Parameters Callee COMP3221/9221: Microprocessors and Embedded Systems

Two Parameter Passing Approaches Pass by value Pass the value of an actual parameter to the callee Not efficient for structures and array Need to pass the value of each element in the structure or array Pass by reference Pass the address of the actual parameter to the callee Efficient for structures and array passing COMP3221/9221: Microprocessors and Embedded Systems

Parameter Passing in C Pass by value for scalar variables such as char, int and float. Pass by reference for non-scalar variables i.e. array and structures. COMP3221/9221: Microprocessors and Embedded Systems

C Functions (Cont.) Questions: How to pass the actual parameters by value to a function? How to pass the actual parameters by reference to a function? Where to get the return value? How to allocate stack memory to local variables? How to deallocate stack memory after a function returns? How to handle register conflicts? Rules are needed between caller and callee. COMP3221/9221: Microprocessors and Embedded Systems

Register Conflicts If a register is used in both caller and callee and the caller needs its old value after the return from the callee, then a register conflict occurs. Compiler or assembly programmers need to check for register conflict. Need to save conflicts registers on the stack. Caller or callee or both can save conflict registers. In WINAVR, callee saves conflict registers. COMP3221/9221: Microprocessors and Embedded Systems

Parameter Passing and Return Value May use general registers to store part of actual parameters and push the rest of parameters on the stack. WINAVR uses general registers up to r24 to store actual parameters Actual parameters are eventually passed to the formal parameters stored on the stack. The return value need be stored in designated registers WINAVR uses r25:r24 to store the return value. COMP3221/9221: Microprocessors and Embedded Systems

Stack Structure A stack consists of stack frames. A stack frame is created whenever a function is called. A stack frame is freed whenever the function returns. What’s inside a stack frame? COMP3221/9221: Microprocessors and Embedded Systems

Stack Frame A typical stack frame consists of the following components: Return address Used when the function returns Conflict registers Need to restore the old contents of these registers when the function returns One conflict register is the stack frame pointer Parameters (arguments) Local variables COMP3221/9221: Microprocessors and Embedded Systems

Implementation Considerations
Local variables and parameters need be stored contiguously on the stack for easy accesses. In which order the local variables or parameters stored on the stack? In the order that they appear in the program from left to right? Or the reverse order? C compiler uses the reverse order. Need a stack frame register to point to either the base (starting address) or the top of the stack frame Points to the top of the stack frame if the stack grows downwards. Otherwise, points to the base of the stack frame (Why?) WINAVR uses Y (r29: r28) as a stack frame register.

An Sample Stack Frame Structure for AVR
RAMEND Stack Frame for main() Return Address Conflict Registers Local Variable n … Local variable 1 Parameter m Parameter 1 Empty int main(void) { … foo(arg1, arg2, …, argm); } void foo(arg1, arg2, …, argm) { int var1, var2, …, varn; … Stack frame for foo() Y

A Template for Caller Caller: Store actual parameters in designated registers and the rest of registers on the stack. Call the callee. COMP3221/9221: Microprocessors and Embedded Systems

A Template for Callee Callee: Prologue Function body Epilogue COMP3221/9221: Microprocessors and Embedded Systems

A Template for Callee (Cont.)
Prologue: Store conflict registers, including the stack frame register Y, on the stack by using push Pass the actual parameters to the formal parameters on the stack Update the stack frame register Y to point to the top of its stack frame Function body: Does the normal task of the function.

A Template for Callee (Cont.)
Epilogue: Store the return value in designated registers r25:r24. Deallocate local variables and parameters by updating the stack pointer SP. SP=SP + the size of all parameters and local variables. 3. Restore conflict registers from the stack by using pop The conflict registers must be popped in the reverse order that they are pushed on the stack. The stack frame register of the caller is also restored. Step 2 and Step 3 together deallocate the stack frame. 4. Return to the caller by using ret.

An Example int foo(char a, int b, int c); int main() { int i, j; i=0;
foo(1, i, j); return 0; } int foo(char a, int b, int c) { int x, y, z; x=a+b; y=c–a; z=x+y; return z;

Stack frames for main() and foo()
RAMEND j i Return address r28 r29 z y x c b a Empty Stack frame pointer Y for main() Conflict register Y (r29:r28) Local variables Parameters Stack frame pointer Y for foo()

An Example (Cont.) main: ldi r28, low(RAMEND-4) ; 4 bytes to store local variables i and j ldi r29, hi8(RAMEND-4) ; The size of each integer is 2 bytes out SPH, r ; Adjust stack pointer so that it points to out SPL, r ; the new stack top. clr r ; The next three instructions implement i=0 std Y+1, r ; The address of i in the stack is Y+1 std Y+2, r0 ldi r24, low(300) ; The next four instructions implement j=300 ldi r25, high(300) std Y+3, r24 std Y+4, r25 ldd r20,Y ; r21:r20 keep the actual parameter j ldd r21,Y+4 ldd r22,Y ; r23:r22 keep the actual parameter i ldd r23,Y+2 ldi r24,low(1) ; r24 keeps the actual parameter 1 rcall foo ; Call foo …

An Example (Cont.) foo: ; Prologue: frame size=11 (excluding the stack frame ; space for storing return address and registers) push r ; Save r28 and r29 in the stack push r29 in r28, SPL in r29, SPH sbiw r28, ; Compute the stack frame top for foo ; Notice that 11 bytes are needed to store ; The actual parameters a, i, j and local ; variables x, y and z out SPH, r ; Adjust the stack frame pointer to point to out SPL, r ; the new stack frame std Y+1, r ; Pass the actual parameter 1 to a std Y+2, r ; Pass the actual parameter i to b std Y+3, r23 std Y+4, r ; Pass the actually parameter j to c std Y+5, r ; End of prologue

An Example (Cont.) foo: … ; Function body here ; Epilogue starts here
ldd r24, Y ; The return value of z is store in r25:r24 ldd r25, Y+11 adiw r28, ; Deallocate the stack frame out SPH, r29 out SPL, r28 pop r ; Restore Y pop r28 ret ; Return to main()

Lecture 13: Functions II Lecturer: Hui Wu Session 2, 2005

Overview Recursive Functions Computing the Stack Size for function calls COMP3221/9221: Microprocessors and Embedded Systems

Recursive Functions A recursive function is both a caller and a callee of itself. Need to check both its source caller (that is not itself) and itself for register conflicts. Can be hard to compute the maximum stack space needed for recursive function calls. Need to know how many times the function is nested (the depth of the calls). COMP3221/9221: Microprocessors and Embedded Systems

An Example of Recursive Function Calls
int sum(int n); int main(void) { int n=100; sum(n); return 0; } void sum(int n) { if (n<=0) return 0; else return (n+ sum(n-1)); main() is the caller of sum() sum() is the caller and callee of itself

Call Trees A call tree is a weighted directed tree G = (V, E, W) where
V={v1, v2, …, vn} is a set of nodes each of which denotes an execution of a function; E={vivj: vi calls vj} is a set of directed edges each of which denotes the caller-callee relationship, and W={wi (i=1, 2, …, n): wi is the frame size of vi} is a set of stack frame sizes. The maximum size of stack space needed for the function calls can be derived from the call tree.

An Example of Call Trees
int main(void) { … func1(); … func2(); } void func1() func3(); void func2() { … func4(); … func5(); }

An Example of Call Trees (Cont.)
10 main() 20 60 func1() func2() 30 80 10 func3() func4() func5() The number in red beside a function is its frame size in bytes.

Computing the Maximum Stack Size for Function Calls
Step 1: Draw the call tree. Step 2: Find the longest weighted path in the call tree. The total weight of the longest weighted path is the maximum stack size needed for the function calls.

An Example 10 main() 20 60 func1() func2() 30 80 10 func3() func4()
The longest path is main()  func1()  func3() with the total weight of 110. So the maximum stack space needed for this program is 110 bytes.

Fibonacci Rabbits Suppose a newly-born pair of rabbits, one male, one female, are put in a field. Rabbits are able to mate at the age of one month so that at the end of its second month a female can produce another pair of rabbits. Suppose that our rabbits never die and that the female always produces one new pair (one male, one female) every month from the second month on. How many pairs will there be in one year? Fibonacci’s Puzzle Italian, mathematician Leonardo of Pisa (also known as Fibonacci) 1202. COMP3221/9221: Microprocessors and Embedded Systems

Fibonacci Rabbits (Cont.) The number of pairs of rabbits in the field at the start of each month is 1, 1, 2, 3, 5, 8, 13, 21, 34, ... . In general, the number of pairs of rabbits in the field at the start of month n, denoted by F(n), is recursively defined as follows. F(n) = F(n-1) + F(n-2) Where F(0) = F(1) = 1. F(n) (n=1, 2, …, ) are called Fibonacci numbers. COMP3221/9221: Microprocessors and Embedded Systems

C Solution of Fibonacci Numbers int month=4; int main(void) { fib(month); } int fib(int n) if(n == 0) return 1; if(n == 1) return 1; return (fib(n - 1) + fib(n - 2)); COMP3221/9221: Microprocessors and Embedded Systems

AVR Assembler Solution X Return address r16 r17 r28 r29 n Empty X–2 Frame structure for fib() X–3 X–4 X–5 r16, r17, r28 and r29 are conflict registers. X–6 An integer is 2 bytes long in WINAVR X–8 Y COMP3221/9221: Microprocessors and Embedded Systems

Assembly Code for main()
.cseg month: .dw 4 main: ; Prologue ldi r28, low(RAMEND) ldi r29, high(RAMEND) out SPH, r ; Initialise the stack pointer SP to point to out SPL, r ; the highest SRAM address ; End of prologue ldi r30, low(month<<1) ; Let Z point to month ldi r31, high(month<<1) lpm r24, z ; Actual parameter 4 is stored in r25:r24 lpm r25, z rcall fib ; Call fib(4) ; Epilogue: no return loopforever: rjmp loopforever

Assembly Code for fib()
fib: push r ; Prologue push r ; Save r16 and r17 on the stack push r ; Save Y on the stack push r29 in r28, SPL in r29, SPH sbiw r29:r28, 2 ; Let Y point to the bottom of the stack frame out SPH, r ; Update SP so that it points to out SPL, r ; the new stack top std Y+1, r ; Pass the actual parameter to the formal parameter std Y+2, r25 cpi r24, ; Compare n with 0 clr r0 cpc r25, r0 brne L ; If n!=0, go to L3 ldi r24, ; n==0 ldi r25, ; Return 1 rjmp L ; Jump to the epilogue

Assembly Code for fib() (Cont.)
L3: cpi r24, ; Compare n with 1 clr r0 cpc r25, r0 brne L ; If n!=1 go to L4 ldi r24, ; n==1 ldi r25, ; Return 1 rjmp L ; Jump to the epilogue L4: ldd r24, Y ; n>=2 ldd r25, Y ; Load the actual parameter n sbiw r24, ; Pass n-1 to the callee rcall fib ; call fib(n-1) mov r16, r ; Store the return value in r17:r16 mov r17, r25 ldd r24, Y ; Load the actual parameter n ldd r25, Y+2 sbiw r24, ; Pass n-2 to the callee rcall fib ; call fib(n-2) add r24, r ; r25:r25=fib(n-1)+fib(n-2) adc r25, r17

Assembly Code for fib() (Cont.)
; Epilogue adiw r29:r28, ; Deallocate the stack frame for fib() out SPH, r ; Restore SP out SPL, r28 pop r ; Restore Y pop r28 pop r ; Restore r17 and r16 pop r16 ret

Computing the Maximum Stack Size
Step 1: Draw the call tree. main() 8 fib(4) 8 8 fib(3) fib(2) 8 8 8 8 fib(2) fib(1) fib(1) fib(0) 8 8 fib(1) fib(0) The call tree for n=4

Computing the Maximum Stack Size (Cont.)
Step 1: Find the longest weighted path. main() 8 fib(4) 8 8 fib(3) fib(2) 8 8 8 8 fib(2) fib(1) fib(1) fib(0) 8 8 fib(1) fib(0) The longest weighted path is main()  fib(4)  fib(3) fib(2)  fib(1) with the total weight of 32. So a stack space of 32 bytes is needed for this program.

Lecture 14: Floating Point Numbers Lecturer: Hui Wu Session 2, 2005

Overview IEEE Floating Point Number Representation Floating Point Number Operations COMP3221/9221: Microprocessors and Embedded Systems

Scientific Notation Exponent 6.02 x 1023 Integer Decimal point Radix (base) Normalized form: no leadings 0 (exactly one non-zero digit to the left of decimal point) Alternatives to representing 1/1,000,000,000 Normalized: 1.0 * 10-9 Not normalized: 0.1 * 10-8,10.0 * 10-10 How to represent 0 in Normalized form? COMP3221/9221: Microprocessors and Embedded Systems

Scientific Notation for Binary Numbers Exponent 1.01 x 2-12 Integer Binary point Radix (base) Computer arithmetic that supports it is called floating point, because it represents numbers where binary point is not fixed, as it is for integers Declare such variables in C as float (single precision floating point number) or double (double precision floating point number). COMP3221/9221: Microprocessors and Embedded Systems

Floating Point Representation
Normal form: +(-) 1.x * 2 y Sign bit Significand Exponent How many bits for significand (mantissa) x? How many bits for exponent y Is y stored in its original value or in transformed value? How to represent +infinity and –infinity? How to represent 0?

Overflow and Underflow
What if result is too large? Overflow! Overflow => Positive exponent larger than the value that can be represented in exponent field What if result too small? Underflow! Underflow => Negative exponent smaller than the value that can be represented in Exponent field How to reduce the chance of overflow or underflow?

IEEE 754 FP Standard—Single Precision
Sign bit Biased Exponent Significand S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF Bits Bit 31 for sign S=1 for negative numbers, 0 for positive numbers Bits for biased exponent The real exponent = E –127 127 is called bias. Bits 0-22 for significand

IEEE 754 FP Standard—Single Precision (Cont.)
The value V of a single precision FP number is determined as follows: If 0<E<255 then V=(-1) S * 2 E-127 * 1.F where "1.F" is intended to represent the binary number created by prefixing F with an implicit leading 1 and a binary point. If E = 255 and F is nonzero, then V=NaN ("Not a number") If E = 255 and F is zero and S is 1, then V= -Infinity If E = 255 and F is zero and S is 0, then V=Infinity If E = 0 and F is nonzero, then V=(-1) S * * 0.F. These are unnormalized numbers or subnormal numbers. If E = 0 and F is 0 and S is 1, then V=-0 If E = 0 and F is 0 and S is 0, then V=0

IEEE 754 FP Standard—Single Precision (Cont.)
Subnormal numbers reduce the chance of underflow. Without subnormal numbers, the smallest positive number is 2 –127 With subnormal numbers, the smallest positive number is * =2 –(126+23) =2-149

IEEE 754 FP Standard—Double Precision
Sign bit Biased Exponent Significand S EEEEEEEEEEE FFFFFFFFFF…FFFFFFFFFFFFF Bits Bit 63 for sign S=1 for negative numbers, 0 for positive numbers Bits for biased exponent The real exponent = E –1023 1023 is called bias. Bits 0-51 for significand

IEEE 754 FP Standard—Double Precision (Cont.)
The value V of a double precision FP number is determined as follows: If 0<E<2047 then V=(-1) S * 2 E-1023 * 1.F where "1.F" is intended to represent the binary number created by prefixing F with an implicit leading 1 and a binary point. If E = 2047 and F is nonzero, then V=NaN ("Not a number") If E = 2047 and F is zero and S is 1, then V= -Infinity If E = 2047 and F is zero and S is 0, then V=Infinity If E = 0 and F is nonzero, then V=(-1) S * * 0.F. These are unnormalized numbers or subnormal numbers. If E = 0 and F is 0 and S is 1, then V=-0 If E = 0 and F is 0 and S is 0, then V=0

Hardware Support for FP Numbers
Typically a coprocessor implements FP. Works under the processor’s supervision Has its own set of registers and instructions The hardware for FP is quite complicated. Most low end microprocessors microcontrollers such as AVR do not support FP numbers in hardware. Need to use software to implement FP if necessary.

Implementing FP Addition by Software
How to implement x+y where x and y are two single precision FP numbers? Step 1: Convert x and y into IEEE format Step 2: Align two significands if two exponents are different. Let e1 and e2 are the exponents of x and y, respectively, and assume e1> e2. Shift the significant (including the implicit 1) of y right e1–e2 bits to compensate for the change in exponent. Step 3: Add two (adjusted) significands. Step 4: Normalize the result.

An Example How to implement x+y where x=2.625 and y= – 4.75?
Step 1: Convert x and y into IEEE format x=  (Binary)  * 21 (Normal form)  * (IEEE format)  Comments: The fractional part can be converted by multiplication. (This is the inverse of the division method for integers.) 0.625 × 2 = ( the most significant bit in fraction) 0.25 × 2 = 0.5 × = ( the least significant bit in fraction)

An Example (Cont.) y= – 4.75  – 100.11 (Binary)
 – * 22 (Normal form)  – * (IEEE format)  Step 2: Align two significands. The significand of x =  (After shift right 1 bit) Comments: x= * and y= – *2 129 after the alignment.

An Example (Cont.) Step 3: Add two (adjusted) significands.
The adjusted significand of x – The significand of y = – The significand of x+y Step 4: Normalize the result. Result = – *  – * 2128  (Normal form)

Reading COMP3221/9221: Microprocessors and Embedded Systems

Lecture 15: Interrupts I Lecturer: Hui Wu Session 2, 2005

Overview Interrupt System Specifications Multiple Sources of Interrupts Interrupt Priorities Polling COMP3221/9221: Microprocessors and Embedded Systems

Five Components of any Computer Keyboard, Mouse Computer Processor (active) Memory (passive) (where programs, data live when running) Devices Disk (where programs, data live when not running) Input Control (“brain”) Output Datapath (“brawn”) Display, Printer COMP3221/9221: Microprocessors and Embedded Systems

How CPU Interacts with I/O?
Two Choices: Interrupts. I/O devices generate signals to request services from CPU . Need special hardware to implement interrupts. Efficient. A signal is generated only if the I/O device needs services from CPU. Polling Software queries I/O devices. No hardware needed. Not efficient. CPU may waste processor cycles to query a device even if it does not need any service.

Interrupt System Specifications (Cont.)
Allow for synchronous events to occur and be recognized. Wait for the current instruction to finish before taking care of any interrupt. Branch to the correct interrupt service routine (interrupt handler) to servicing interrupting device. Return to the interrupted program at the point it was interrupted. Allow for a variety of interrupting signals, including levels and edges. Signal the interrupting device with an acknowledge signal when the interrupt has been recognized.

Interrupt System Specifications (Cont.)
Allow programmers to selectively enable and disable all interrupts. Allow programmers to enable and disable selected interrupts. Disable further interrupts while the first is being serviced Deal with multiple sources of interrupts. Deal with multiple, simultaneous interrupts.

Interrupt Recognition and Ack
Pending Interrupt Interrupt signal to sequence controller Interrupt ack from sequence controller SEQUENCE CONTROLLER Disable interrupt instruction Enable interrupt instruction Return from interrupt instruction INTERR-UPTING DEVICE Set IRQ-FF Reset IRQ Signal conditioning Set INTE-FF Reset Interrupt Enable CPU

Interrupt Recognition and Ack
An Interrupt Request (IRQ) may occur at any time. It may have rising or falling edges or high or low levels. Frequently it is a active-low signal and multiple devices are wire-ORed together. Signal Conditioning Circuit detects these different types of signals. Interrupt Request Flip-Flop (IRQ-FF) remembers that an interrupt request has been generated until it is acknowledged. When IRQ-FF is set, it generates a pending interrupt signal that goes towards the Sequence Controller. IRQ-FF is reset when CPU acknowledges the interrupt with INTA signal.

Interrupt Recognition and Ack (Cont.)
The programmer has control over interrupting process by enabling and disabling interrupts with explicit instructions The hardware that allows this is Interrupt Enable Flip-Flop (INTE-FF). When the INTE-FF is set, all interrupts are enabled and the pending interrupt interrupt is allowed through the AND gate to the sequence controller. The INTE-FF is reset in the following cases. CPU acknowledges the interrupt. CPU is reset. Disable interrupt instruction is executed.

Interrupt Recognition and Ack (Cont.)
An interrupt acknowledge signal is generated by the CPU when the current instruction has finished execution and CPU has detected the IRQ. This resets the IRQ-FF and INTE-FF and signals the interrupting device that CPU is ready to execute the interrupting device routine. At the end of the interrupt service routine, CPU executes a return-from-interrupt instruction. Part of this instruction’s job is to set the INTE-FF to reenable interrupts. If the IRQ-FF is set during an interrupt service routine a pending interrupt, there is one, will be recognized by the sequence controller immediately after the INTE-FF is set. This allows nested interrupts i.e. interrupts interrupting interrupts.

Multiple Sources of Interrupts
IRQ INTA CPU Device 1 Device 2 • • • Device n Determine which of the multiple devices has generated the IRQ to be able to execute its interrupt service routine. Two approaches: Polled interrupts and vectored interrupts. Resolve simultaneous requests from interrupts with a prioritization scheme.

Polled Interrupts Software, instead of hardware, is responsible for determining the interrupting device. The device must have logic to generate the IRQ signal and to set an “I did it” bit a status register that is read by CPU. The bit is reset after the register has been read. IRQ signals the sequence controller to start executing an interrupt service routine that first polls the device then branches to the correct service routine.

Polled Interrupt Logic
IRQ Logic to generate IRQ Logic to reset IRQ when status register is read Logic to set “I did it” bit Logic to read status register and reset “I did it” bit Status register Data Address Control

Vectored Interrupts (I)
CPU’s response to IRQ is to assert INTA. The interrupting device uses INTA to place information that identifies itself, called vector, onto the data bus for CPU to read. An vector is the address of an interrupt service routine. CPU uses the vector to execute the interrupt service routine.

Vectored Interrupting Device Hardware (I)
INTA IRQ Logic to reset IRQ Logic to generate IRQ Vector Information Three-State Driver Data Address Control

Vector Interrupts (II)
IRQ 0 IRQ 1 IRQ 2 CPU ••• IRQ n CPU has multiple IRQ input pins. CPU designers reserve specific memory locations for a vector associated with each IRQ line. Individual disable/enable bit is assigned to each interrupting source.

Interrupt Priorities When multiple interrupts occurs at the same time, which one will be serviced first? Two resolution approaches: Software resolution. Polling software determines which interrupting source is serviced first. Hardware resolution. Daisy chain. Separate IRQ lines. Hierarchical prioritization. Nonmaskable interrupts.

Daisy Chain Priority Resolution
CPU asserts INTA that is passed down the chain from device to device. The higher-priority device is closer to CPU. When the INTA reaches the device that generated the IRQ, that device puts its vector on the data bus and not passing along the INTA. So lower-priority devices do NOT receive the INTA.

Daisy Chain Priority Resolution (Cont.)
IRQ INTA INTA INTA INTA CPU Device 1 Device 2 • • • Device n Data Address Control

Hardware Priority Resolution Separate IRQ Lines. Each IRQ line is assigned a fixed priority. For example, IRQ0 has higher priority than IRQ1 and IRQ1 has higher priority than IRQ2 and so on. Hierarchical Prioritization. Higher priory interrupts are allowed while lower ones are masked. Nonmaskable Interrupts. Cannot be disabled. Used for important events such as power failure. COMP3221/9221: Microprocessors and Embedded Systems

Transferring Control to Interrupt Service Routine
Hardware needs to save the return address. Most processors save the return on the stack. ARM uses a special register, link register, to store the return address. Hardware may also save some registers such as program status register. AVR does not save any register. It is programmer’s responsibility to save program status register and conflict registers. The delay from the time the IRQ is generated by the interrupting device to the time the Interrupt Service Routine (ISR) starts to execute is called interrupt latency.

Interrupt Service Routine
A sequence of code to be executed when the corresponding interrupt is responded by CPU. Consists of three parts: Prologue, Body and Epilogue. Prologue: Code for saving conflict registers on the stack. Body: Code for doing the required task. Epilogue: Code for restoring all saved registers from the stack. The last instruction is the return-from-interrupt instruction. iret in AVR.

Software Interrupt Software interrupt is the interrupt generated by software without a hardware-generated-IRQ. Software interrupt is typically used to implement system calls in OS. Most processors provide a special machine instruction to generate software interrupt. SWI in ARM. AVR does NOT provide a software interrupt instruction. Programmers can use External Interrupts to implement software interrupts.

Exceptions Abnormalities that occur during the normal operation of the processor. Examples are internal bus error, memory access error and attempts to execute illegal instructions. Some processors handle exceptions in the same way as interrupts. AVR does not handle exceptions.

Reset Reset is a type of of interrupt in most processors (including AVR). It is a signal asserted on a separate pin. Nonmaskable. It does not do other interrupt processes, such as saving conflict registers. It initialize the system to some initial state.

Reading Chapter 8. Microcontrollers and Microcomputers. Interrupts. Mega64 Data Sheet. COMP3221/9221: Microprocessors and Embedded Systems

Lecture 16: Interrupts II Lecturer: Hui Wu Session 2, 2005

Overview AVR Interrupts Interrupt Vector Table System Reset Watchdog Timer Timer/Counter0 Interrupt Service Routines COMP3221/9221: Microprocessors and Embedded Systems

AVR MCU Architecture COMP3221/9221: Microprocessors and Embedded Systems

Interrupts in AVR The number of interrupts varies with specific AVR device. Two types of interrupts: Internal interrupts and external interrupts. Internal interrupts: Generated by on-chip I/O devices. External interrupts: Generated by external I/O devices. For most internal interrupts, they don’t have an individual enable/disable bit. Program cannot enable/disable these interrupts. External interrupts have an individual enable/disable bit. Program can enable/disable these interrupts. An external interrupt can be rising edge-triggered, or falling edge- triggered or low level-triggered). Special I/O registers (External Interrupt Control Registers EICRA and EICRB in Mega64) to specify how each external interrupt is triggered..

Interrupts in AVR (Cont.)
There is a global interrupt enable/disable bit, the I-bit, in Program Status Register SREG. Setting the I-bit will enable all interrupts except those with individual enable/disable bit. Those interrupts are enabled only if both I and their own enable/disable bit are set. The I-bit is cleared after an interrupt has occurred and is set by the instruction RETI. Programmers can use SEI and CLI to set and clear the I-bit. If the I-bit is enabled in the interrupt service routine, nested interrupts are allowed. SREG is not automatically saved by hardware when entering an interrupt service routine. An interrupt service routine needs to save it and other conflict registers on the stack at the beginning and restore them at the end.

Interrupts in AVR (Cont.)
Reset is handled as a nonmaskable interrupt. Each interrupt has a 4-byte interrupt vector, containing an instruction to be executed after MCU has accepted the interrupt. Each interrupt vector has a vector number, an integer from 1 to n, the maximum number of interrupts. The priority of each interrupt is determined by its vector number. The lower the vector number, the higher priority. All interrupt vectors, called Interrupt Vector Table, are stored in a contiguous section in flash memory. Starts from 0 by default. Can be relocated.

Interrupt Vectors in Mega64

Interrupt Vectors in Mega64 (Cont.)

Initialization of Interrupt Vector Table in Mega64
Typically an interrupt vector contains a branch instruction (JMP or RJMP) that branches to the first instruction of the interrupt service routine. Or simply RETI (return-from-interrupt) if you don’t handle this interrupt.

Example of IVT Initialization in Mega64
.include "m64def.inc" .cseg .org 0 rjmp RESET ; Jump to the start of Reset interrupt service routine ; Relative jump is used assuming RESET is not far jmp IRQ ; Long jump is used assuming IRQ0 is very far away reti ; Return to the break point (No handling for this interrupt). … RESET: ; The interrupt service routine for RESET starts here. IRQ0: ; The interrupt service routine for IRQ0 starts here.

RESET in Mega64 The ATmega64 has five sources of reset: Power-on Reset. The MCU is reset when the supply voltage is below the Power-on Reset threshold (VPOT). • External Reset. The MCU is reset when a low level is present on the RESET pin for longer than the minimum pulse length. • Watchdog Reset. The MCU is reset when the Watchdog Timer period expires and the Watchdog is enabled. COMP3221/9221: Microprocessors and Embedded Systems

RESET in Mega64 (Cont.) • Brown-out Reset.
The MCU is reset when the supply voltage VCC is below the Brown-out Reset threshold (VBOT) and the Brown-out Detector is enabled. • JTAG AVR Reset. The MCU is reset as long as there is a logic one in the Reset Register, one of the scan chains of the JTAG system. For each reset, there is a flag (bit) in MCU Control Register MCUCSR. These bits are used to determine the source of the RESET interrupt.

RESET Logic in Mega64

Watchdog Timer Used to detect software crash.
Can be enabled or disabled by properly updating WDCE bit and WDE bit in Watchdog Timer Control Register WDTCR. 8 different periods determined by WDP2, WDP1 and WDP0 bits in WDTCR. If enabled, it generates a Watchdog Reset interrupt when its period expires. So program needs to reset it before its period expires by executing instruction WDR. When its period expires, Watchdog Reset Flag WDRF in MCU Control Register MCUCSR is set. This flag is used to determine if the watchdog timer has generated a RESET interrupt.

Watchdog Timer Logic COMP3221/9221: Microprocessors and Embedded Systems

Timer Interrupt Timer interrupt has many applications:
Used to schedule (real-time) tasks Round-Robin scheduling All tasks take turn to execute for some fixed period. Real-time scheduling Some tasks must be started at a particular time and finished by a deadline. Some tasks must be periodically executed. Used to implement a clock How much time has passed since the system started?

Timer Interrupt (Cont.)
Used to synchronize tasks. Task A can be started only if a certain amount of time has passed since the completion of task B. Can be coupled with wave-form generator to support Pulse-Width Modulation (PWM). Details to be covered later.

Timer0 in AVR 8-bit timer with the following features:
Clear Timer on Compare Match (Auto Reload) Glitch-free, Phase Correct Pulse Width Modulator (PWM) Frequency Generator 10-bit Clock Prescaler Overflow and Compare Match Interrupt Sources (TOV0 and OCF0) It generates a Timer0 Overflow Interrupt Timer0OVF when it overflows. It generates a Timer/Counter0 Output Match Interrupt Timer0COMP when the timer/counter value matches the value in Output Compare Register OCR0. Timer0OVF and Timer0COMP can be individually enabled/disabled. Allows Clocking from External 32 kHz Watch Crystal Independent of the I/O Clock

Timer0 In AVR–Block Diagram

Prescaler for Timer0 COMP3221/9221: Microprocessors and Embedded Systems

Timer0 Registers Seven I/O registers for Timer0:
Timer/Counter Register TCNT0. Contains the current timer/counter value. Output Compare Register OCR0. Contains an 8-bit value that is continuously compared with the counter value (TCNT0). Timer/Counter Control Register TCCR. Contains control bits. Timer/Counter Interrupt Mask Register TIMSK (shared with other timers). Contains enable/disable bits.

Timer0 Registers (Cont.) Timer/Counter Interrupt Flag Register TIFR (shared with other timers). Contains interrupt flags. Asynchronous Status Register ASSR. Contains control bits for asynchronous operations. Special Function I/O Register SFIOR. Contains synchronization mode bit and prescaler reset bit. COMP3221/9221: Microprocessors and Embedded Systems

Timer Control Register COMP3221/9221: Microprocessors and Embedded Systems

Timer Control Register (Cont.) The mode of operation, i.e., the behavior of the Timer/Counter and the Output Compare pins, is defined by the combination of the Waveform Generation mode (WGM01:0) and Compare Output mode (COM01:0) bits. The simplest mode of operation is the Normal Mode (WGM01:0 = 0). In this mode the counting direction is always up (incrementing), and no counter clear is performed. The counter simply overruns when it passes its maximum 8-bit value (TOP = 0xFF) and then restarts from the bottom (0x00). Refer to Mega64 Data Sheet (pages 96~100) for details. COMP3221/9221: Microprocessors and Embedded Systems

Timer/Counter Interrupt Mask Register • Bit 1 – OCIE0: Timer/Counter0 Output Compare Match Interrupt Enable. 1: Enabled 0: Disabled • Bit 0 – TOIE0: Timer/Counter0 Overflow Interrupt Enable. COMP3221/9221: Microprocessors and Embedded Systems

ISR Example .include "m64def.inc" ; This program implements a second counter ; using Timer0 interrupt. .def temp r16 .MACRO Clear ldi r28, ; Load the low byte of ldi r29, ; Load the high byte of clr temp st y, temp st –y, temp ; Initialize the two-byte integer to 0 .ENDMACRO .

ISR Example .dseg SecondCounter: .byte 2 ; Two-byte second counter.
TempCounter: .byte ; Temporary counter. Used to determine ; if one second has passed .cseg .org 0 jmp RESET jmp DEFAULT ; No handling for IRQ0. jmp DEFAULT ; No handling for IRQ1.

ISR Example (Cont.) … jmp Timer ; Jump to the interrupt handler for Timer 0 overflow. jmp DEFAULT ; No handling for all other interrupts. DEFAULT: reti ; No handling foe this interrupt RESET: ldi temp, high(RAMEND) ; Initialize stack pointer out SPH, temp ldi temp, low(RAMEND) out SPL, temp … ; Insert further initialization code here rjmp main

Timer0 ISR Timer0: push SREG ; Prologue starts.
push r ; Save all conflict registers in the prologue. push r28 push r25 push r ; Prologue ends. ldi r28, low(TempCounter) ; Load the address of the temporary ldi r29, high(TempCounter) ; counter. ld r24, y ; Load the value of the temporary counter. ld r25, y adiw r25:r24, 1 ; Increase the temporary counter by one.

Timer0 ISR (Cont.) cpi r24, low(3597) ; Check if (r25:r24)=3597
ldi temp, high(3597) ; 3597= 106/278 cpc r25, temp brne NotSecond clr temp ; One second has passed since last interrupt st y, temp ; Reset the temporary counter. st –y, temp ldi r30, low(SecondCounter) ; Load the address of the second ldi r31, high(SecondCounter) ; counter. ld r24, z ; Load the value of the second counter. ld r25, z adiw r25:r24, 1 ; Increase the second counter by one.

Timer0 ISR (Cont.) st z, r25 ; Store the value of the second counter.
NotSecond: st y, r ; Store the value of the temporary counter. st –y, r24 pop r ; Epilogue starts; pop r ; Restore all conflict registers from the stack. pop r28 pop r29 pop SREG reti ; Return from the interrupt.

ISR Example (Cont.) main:
Clear TempCounter ; Initialize the temporary counter to 0 Clear SecondCounter ; Initialize the second counter to 0 ldi temp, 0b out TCCR0, temp ; Prescaling value=8 256*8/7.3728 ldi temp, 1<<TOIE ; =278 microseconds out TIMSK, temp ; T/C0 interrupt enable sei ; Enable global interrupt loop: rjmp loop ; loop forever Comments: 1: The frequency of Timer clock in Mega64 is Mhz. 2: Prescaling value is set to 8. Since the maximum value of a 8-bit counter is 255, Timer0 Overflow Interrupt occurs every 256*8/ =278 microseconds.

Non-Nested Interrupts
Interrupt Service Routines cannot be interrupted by another interrupt. Interrupt service routine Main program

Nested Interrupts Interrupt Service Routines can be interrupted by another interrupt. ISR1 ISR2 ISR3 Main program

Reading Read the following sections in Mega64 Data Sheet. Overview AVR CPU Core System Control and Reset. Watchdog Timer. Interrupts. External Interrupts. 8-bit Time/Counter0 with PWM and Asynchronous Operation. COMP3221/9221: Microprocessors and Embedded Systems

Lecture 17: Computer Buses and Parallel Input/Output (I) Lecturer: Hui Wu Session 2, 2005

Overview Buses Memory mapped I/O and Separate I/O I/O Synchronization COMP3221/9221: Microprocessors and Embedded Systems

Bus Oriented Architecture Parallel I/O Device Parallel I/O Device I/O Interface CPU Memory Data Bus Address Bus Control Bus COMP3221/9221: Microprocessors and Embedded Systems

Computer Buses CPU is connected to memory and I/O devices via data, address and control buses. Data bus is bi-directional and transfers information (memory data and instruction, I/O data) to and from CPU. Address bus may be bi-directional (with more than one source of information) but is most often unidirectional because CPU is the only source of the addresses. Control bus carries all other signals required to control the operation of the system.

Levels of Buses Component level bus is defined by the signals on the microprocessor chip, such as READ/WRITE. Component level signals are different for different manufacturers and used when designing single borad computers or dedicated application systems. System level bus is defined by more generic signals such as MEMRD and IORD. Often designed for use as a backplane into which printed circuit boards are plugged. Intersystem bus is is used to connect different systems.

Computer Buses (Cont.) Each line of a bus may have multiple sources and destinations. Multiple Destinations    Data Bus CPU    Multiple Sources

Information Sources – The Input Interface
The input interface provides three-state buffers between the source and the data bus. For example, a parallel, eight-bit input interface can be constructed with eight three-state gates whose enable lines are tied together. The open-collector gate is often used for control signal such as request for interrupts.

Typical Bus Interface Gates
Vcc A External Pull-up Resistor 1G Open Collector 1G A Y X X High Impedance (a) Three-state gate (b) Typical open-collector gate

Information Destinations – The Output Interface
The output interface between the data bus and a destination or output device is a latch. DBn D Q Destination or Output Device 74116 Dual 4-bit Latch with Clear Clock C1 C2 CLR

Address Decoding The interface must provides the ability for CPU to select one of many sources and destinations. Addressing and address decoding can select one out of many sources and destinations. COMP3221/9221: Microprocessors and Embedded Systems

Address Decoding for Input Devices
74LS139 1-of-4 Decoder A1 A1 From CPU O0 A0 A0 O1 Read Control O2 E O3 Info Source Info Source Info Source Info Source To/From CPU Data Bus

Address Decoding for Output Devices
74LS139 1-of-4 Decoder A1 A1 From CPU O0 A0 A0 O1 Write Control O2 E O3 74116 Dual 4-Bit Latch 74116 Dual 4-Bit Latch 74116 Dual 4-Bit Latch 74116 Dual 4-Bit Latch To/From CPU Data Bus

CPU Timing Signals CPU must provide timing and synchronization so that the transfer of information occurs at the right time. CPU has its own clock. I/O devices may have a separate I/O clock. Typical timing signals include READ and WRITE. COMP3221/9221: Microprocessors and Embedded Systems

Typical CPU Read Cycle CPU Clock A Address Bus Address From CPU Valid C Data Bus Data From Device Valid READ Control Signal B COMP3221/9221: Microprocessors and Embedded Systems

Typical CPU Read Cycle CPU places the address on the address bus at point A. The control signal READ is asserted at point B to signal the external device that CPU is ready to take the data from the data bus. CPU reads the data bus at point C whether or not the input device has put it ready If NOT, some form of synchronization is required. COMP3221/9221: Microprocessors and Embedded Systems

Typical CPU Write Cycle CPU Clock A Address Bus Address From CPU Valid B Data Bus Data From CPU Valid WRITE Control Signal C D COMP3221/9221: Microprocessors and Embedded Systems

Typical CPU Write Cycle CPU places the address on the address bus at point A. The data bits are supplied by CPU at point B. The control signal WRITE is asserted by CPU at point C to signal the external device that CPU is ready to take the data from the data bus. This signal is used to create the clock to latch the data at the correct time. Depending on the type of latch and when WRITE is asserted, the data may be captured on the falling edge or rising edge. COMP3221/9221: Microprocessors and Embedded Systems

Complete I/O Interface
Data Bus 74LS139 1-of-4 Decoder 74LS244 Octal Buffer A1 A1 SOURCE_ADR_OK O0 A0 A0 O1 Source READ O2 E O3 74116 Dual 4-Bit Latch Destination A1 DES_ADR_OK A1 O0 A0 A0 O1 WRITE O2 E O3

Complete I/O Interface (Cont.)
READ and WRITE control the enable (E). Three state enables and the latch clock signals are not asserted until the correct address is on the address bus AND the correct time in the read or write cycle has arrived.

I/O Addressing If the same address bus is used for both memory and I/O, how does hardware distinguish between memory reads and writes and I/O reads and writes? Two approaches: Memory-mapped I/O. Separate I/O. AVR supports both.

Memory Mapped I/O The entire memory space is divided into memory space and I/O space. 0x0000 Memory 0xFBFF 0xFC00 I/O 0xFFFF

Memory Mapped I/O (Cont.)
Advantages: Simpler CPU design. No special instructions for I/O accesses. Disadvantages: I/O devices reduce the amount of memory available for application programs. The address decoder needs to decode the full address bus to avoid conflict with memory addresses .

I/O Interface for Memory-Mapped I/O
Data Bus Address Bus ADR_OK Information Destination D Q CL Decoder Information Source READ WRITE

Separate I/O Two separate spaces for memory and I/O.
Less expensive address decoders than those needed for memory-mapped I/O. Additional control signal, called IO/M, is required to prevent both memory and I/O trying to place data on the bus simultaneously. IO/M is high for I/O use and low for memory use. Special I/O instructions such as in and out are required.

I/O Interface for Separate I/O
Data Bus Reduced Address Bus ADR_OK Information Destination D Q CL Decoder IO_READ 74LS373 Octal Latch Information Source READ IO/M ADR_OK WRITE IO_WRITE IO/M

I/O Synchronization CPU is typically much faster than I/O devices.
I/O devices need to transfer data at unpredictable intervals. Software synchronization. Hardware synchronization. Therefore, synchronization between CPU and I/O devices is required. Two synchronization approaches:

Software Synchronization
Two software synchronization approaches: Real-time synchronization. Uses a software delay to match CPU to the timing requirements of the I/O device. Sensitive to CPU clock frequency. Wastes CPU time. Polled I/O. A status register, with a DATA_READY bit, is added to the device. The software keeps reading the status register until the DATA_READY bit is set. Not sensitive to CPU clock frequency. Still wastes CPU time, but CPU can do other tasks.

Handshaking I/O This hardware synchronization approach needs a control signal READY or WAIT. For an input device, when CPU is asking for input data, the input device will assert WAIT if the input data is NOT available. When the input data is available, it will deassert WAIT. While WAIT is asserted, CPU must wait until this control signal is deasserted. For an output device, when CPU is sending output data via the data bus, the output device will assert WAIT if it is not ready to take the data. When it is ready, it will deassert WAIT. While WAIT is asserted, CPU must wait until this control signal is deasserted.

Input Handshaking Hardware
To CPU WAIT or READY INPUT DEVICE Wait State Logic DATA_REQUEST Data Register Address Bus INFO_ADD_OK READ Data Bus

Reading Chapter 7: Computer Buses and Parallel Input and Output. Microcontrollers and Microcomputers by Fredrick M. Cady. COMP3221/9221: Microprocessors and Embedded Systems

Lecture 18: Computer Buses and Parallel Input/Output (II) Lecturer: Hui Wu Session 2, 2005

Overview Bus Arbitration Switches COMP3221/9221: Microprocessors and Embedded Systems

Bus Masters and Slaves A bus master is either a CPU or a hardware component (DMA Controller for instance) that controls the buses. A bus slave is a device that takes its orders from the bus master. What if two or more bus masters want to control the buses simultaneously? Bus arbitration is required. COMP3221/9221: Microprocessors and Embedded Systems

Bus Arbitration Daisy chain bus arbitration.
Hardware priority bus arbitration.

Daisy Chain Bus Arbitration
HOLD Master 1 Master 2 Master n Intel 8086 CPU • • • HOLDA

Daisy Chain Bus Arbitration (Cont.)
Whenever a device wants to control the bus, it asserts HOLD and opens the switch in the HOLDA (HOLD_ACKNOLEDGE) line. When HOLDA is asserted by the CPU, it passes through each of the inactive one. If a bus master farther right on the chain asserts HOLD before another master is finished, HOLDA is not passed along until the higher priority device (closer to CPU) is finished with its tasks and closes its switch.

Hardware Priority Bus Arbitration
HOLD Priority Encoder Priority Resolution Hardware Intel 8086 CPU HOLDA HOLD1 HOLD2 HOLDn HOLDA1 HOLDA2 HOLDAn Master 1 Master 2 Master n • • •

Hardware Priority Bus Arbitration
Each device is pre-assigned a priority. Simultaneous HOLD signals are encoded so that only the highest priority device receives HOLDA.

Input Switches Most basic of all binary input devices.
The switch output is high or low depends on the the switch position. Pull-up resistors are necessary in each switch to provide a high logic level when the switch is open. Problem with switches: Switch bounce. When a switch makes contact, its mechanical springiness will cause the contact to bounce, or make and break, for a few millisecond (typically 5 to 10 ms).

Input Switches (Cont.) Vcc R Typically 1K Ohm
Logic high with switch open Logic low with switch closed ½ 74LS244 Octal Buffer Data Bus (a) Single-pole, single-throw (SPST) logic switch Data Bus (b) Multiple pole switch.

Software Debouncing Two software debouncing approaches: Wait and see:
If the software detects a low logic level, indicating that switch has closed, it simply waits for longer than 10 ms, say 20 to 100ms, and then test for the switch still being low. Counter-based approach: Initialize a counter to 10. Poll the switch every millisecond until the counter is either 0 or 20. If the switch output is low, decrement the counter; otherwise, increment the counter. If the counter is 0, we know that switch output has been low for at least 10 ms. If, on the other hand, the counter reaches 20, we know that the switch has been closed for at least 10 ms.

NAND Latch Debouncer R Typically 1K Ohm Logic high with switch up
Logic low with switch down

NOR Latch Debouncer Vcc Logic high with switch up
Logic low with switch down

Integrating Debouncer with Schmitt Trigger
Logic high with switch up Logic low with switch down 74LS14 Schmitt Trigger

One-Dimensional Array of Switches
Vcc A I0 74LS151 8 to 1 Multiplexer I1 Scanned Switch Data To Input Port I2 I3 Z I4 I5 I6 I7 E S2 S1 S0 Selected Input From Output Port

One-Dimensional Array of Switches
Switch bounce problem must be solved. The array of switches must be scanned to find out which switches are closed or open. Software is required to scan the array. As the software outputs a 3-bits sequence from 000 to 111, the multiplexer selects each of the switch inputs. The software scanner then read one bit at an input port. The output of switch array could be interfaced directly to an eight-bit port at point A. To save I/O lines, a 74LS151 8_Input Multiplexer can be used.

Keyboard Matrix of Switches
Vcc 12 74LS Input Multiplexer A 00 01 02 06 07 I0 10 11 12 17 I1 Scanned Switch Data To Input Port I2 I3 Z I4 I5 I6 I7 B 70 71 77 E S2 S1 S0 Vcc O0 O1 O2 O3 O4 O5 O6 O7 E3 E2 E1 A2 A1 A0 Select Input From Output Port 74LS of-8 Decoder Scan Input From Output Port

Keyboard Matrix of Switches (Cont.)
A keyboard is an array of switches arranged in a two-dimensional matrix. A switch is connected at each intersection of vertical and horizontal lines. Closing the switch connects the horizontal line to the vertical line. 8*8 keyboard can be interfaced directly into 8-bit output and input ports at point A and B. Some input and output lines can be saved by using a 74LS138 3-of-8 decoder and a 74LS151 8-Input Multiplexer.

Keyboard Matrix of Switches (Cont.)
Software can scan the key board by outputting a three-bit code to 74LS138 and then scanning the 74LS151 multiplexer to find the closed switch. The combination of the two 3-bit scan codes identifies which switch is closed. For example, the code scan switch 00 in the upper left-hand corner. The diode prevents a problem called ghosting.

Ghosting Vcc R3 R2 R1 Col 0 Col 1 Col 2 00 01 02
Row 0 (Pulled low, error) 10 11 12 Row 1 (Pulled low, OK) 20 21 22 Row 2 (High, OK) Low (Scanned column)

Ghosting (Cont.) Ghosting occurs when several keys are pushed at once.
Consider the case shown in the figure where three switches 01, 10 and 11 are all closed. Column 0 is selected with a logic low and assume that the circuit does not contain the diodes. As the rows are scanned, a low is sensed on Row 1, which is acceptable because switch 10 is closed. In addition, Row 0 is seen to be low, indicating switch 00 is closed, which is NOT true. The diodes in the switches eliminate this problem by preventing current flow from R1 through switches 01 and 11. Thus Row 0 will not be low when it is scanned.

Reading Chapter 7: Computer Buses and Parallel Input and Output. Microcontrollers and Microcomputers by Fredrick M. Cady. COMP3221/9221: Microprocessors and Embedded Systems

Lecture 19: Analog Input Lecturer: Hui Wu Session 2, 2005

Overview Analog-to-Digital (A/D) Conversion Shannon’s Theorem A/D Converter Types A/D Converter Specifications COMP3221/9221: Microprocessors and Embedded Systems

Analog Signals versus Digital Signals
Continuous in both time and amplitude. Noise sensitive. Cannot be manipulated by the computer. Digital signals: Discrete in both time and amplitude. Generally free from noise. Can be manipulated by the computer. cannot exactly represent or reconstruct analog signals.

Data Acquisition and Conversion
Procedure of data acquisition and conversion: A transducer converts physical processes to electrical signals, either voltages or currents. Signal conditioner performs the following tasks: Isolation and buffering: The input to the A/D may need to be protected from dangerous voltages such as static charges or reversed polarity voltages. Amplification: Rarely does the transducer produce the voltage or current needed by the A/D. The amplifier is designed so that the full-scale signal from the analog results in a full-scale signal to the A/D. Bandwidth limiting: The signal conditioning provides a low-pass filter to limit the range of frequencies that can be digitized.

Data Acquisition and Conversion (Cont.)
In applications where several analog inputs must be digitized, an analog multiplexer is followed the signal conditioning. It allows multiple analog inputs, each with its own signal conditioning for different transducers. The sample-and-hold circuit samples the signal and holds it steady while the A/D converts it. The A/D converter converts the sampled signal to digital values. The three state gates hold the digital values generated by the A/D converter.

Data Acquisition System
Signal Cond. Analog Input Transducer Analog Amplifier Other Analog Input Analog Multiplexer 2 THREE-STATE ENABLE N Analog-to-Digital Converter N Sample- and-Hold Three State Gates Data Digital TO CPU START_OF_CONVERT END_OF_CONVERT

Analog Signal Multiplexer
Analog Inputs Analog Input I2 I3 S1 S2 Multiplexer Select The multiplexer is selected by the CPU generating an address on the multiplexer select lines.

Shannon’s Sampling Theorem and Aliasing
Claude Shannon’s Theorem: When a signal, f(t) = X sin(2fsigt), is to be sampled (digitized), the minimum sampling frequency must be twice the signal frequency.

Shannon’s Sampling Theorem and Aliasing (Cont.)
1.0 0.8 f(t)=X sin(2fsig t) 0.6 A 0.4 0.2 -0.2 -0.4 B -0.6 -0.8 -1.0 Sinusoidal waveform sampled at twice the signal frequency.

1.0 0.8 0.6 A 0.4 0.2 -0.2 -0.4 B -0.6 -0.8 -1.0 Sampled waveform.

1.0 f(t)=Y sin(2gsig t) 0.8 0.6 A 0.4 0.2 -0.2 -0.4 B -0.6 B -0.8 -1.0 Undersampled waveform.

To preserve the full information in the signal, it is necessary to sample at twice the maximum frequency of the signal. This is known as the Nyquist rate. A signal can be exactly reproduced if it is sampled at a frequency F, where F is greater than twice the maximum frequency in the signal. If the sampling frequency is less than Nyquist rate, the waveform is said to be undersampled.

Undersampled signal, when converted back into a continuous time signal, will exhibit a phenomenon called aliasing. Aliasing is the presence of unwanted components in the reconstructed signal. These components were not present when the original signal was sampled. In addition, some of the frequencies in the original signal may be lost in the reconstructed signal.

A/D Converter Types Successive approximation A/D.
Tracking A/D converter. Dual-slope A/D converter. Parallel A/D converter. Two-stage parallel A/D converter.

Successive Approximation A/D Converter
Each bit in the successive approximation register is tested, starting at the most significant bit and working toward the least significant bit. As each bit is set, the output of the D/A converter is compared with the input. If the D/A output is lower than the input signal, the bit remains set and the next bit is tried. Bits that make the D/A output higher than the analog input are reset. N bit-times are required to set and test each bit in the successive approximation register.

Successive Approximation Converter
Analog Input D/A Converter Ref Comparator Digital Outputs   MSB LSB  Successive Approximation Register Clock

Tracking A/D Converter
Close cousin of the successive approximation converter. Has a up/down counter controlled by the comparator. If the input signal is higher or lower than the output of the D/A converter, the counter counts up or down, respectively. May quickly converge to the correct digital value when the signal is not changing rapidly. May have to count through its full range before reaching the final stage if large, rapid, input changes are seen.

Tracking A/D Converter
Analog Input D/A Converter Ref Comparator Digital Outputs    Track/ HOLD UP Up/Down Counter Clock DOWN

Dual Slope A/D Converter
Also called integrating A/D converter. Integrates the input signal for a fixed time, T1, with higher input signals integrating to higher values. During the second period, T2, the switch is changed the minus reference voltage and the integrator discharges to zero at a constant rate. The time it takes to discharge, T2, gives digital value. It is remarkably efficient at recovering signals from periodic noise.

Dual Slope A/D Converter (Cont.)
Integrator Switch Analog Input Comparator R -Ref Control Logic Clock Counter Digital Outputs

Dual Slope A/D Converter (Cont.)
Discharge Integrate Full-Scale Conversion Half-Scale Conversion Integrator Output Quarter-Scale Conversion T1 T2 Fixed Time Measured Time Integrator output for dual-slope A/D

Parallel A/D Converter
An array of 2N-1 comparators and produces an output code in the propagation time of the comparators and the output decoder. Fast but more costly in comparison to other designs. Also called flash A/D converter.

Parallel A/D Converter
Analog Input Ref 2N –1 Comparators 3R/2 R Digital Outputs Decoder R R/2

Two-Stage Parallel A/D Converter
The input signal is converted in two pieces. First, a coarse estimate is found by the first parallel A/D converter. This digital value is sent to the D/A and summer, where it is subtracted from original signal. The difference is converted by the second parallel converter and the result combined with the first A/D to give the digitized value. It has nearly the performance of the parallel converter but without the complexity of 2N –1 comparators. It offers high resolution and high-speed conversion for applications like video signal processing.

Two-Stage Parallel A/D Converter (Cont.)
+ N/2-Bit Flash A/D N/2-Bit Flash A/D Analog Input - N/2-Bit D/A N-Bit Register Digital Outputs

A/D Converter Specifications
Conversion time. The time required to complete a conversion of the input signal. Establishes the upper signal frequency limit that can be sampled without aliasing. fMAX=1/(2*conversion time) (1) Resolution. The number of bits in the converter gives the resolution and thus the smallest analog input signal for which the converter will produce a digital code. It may be given in terms of the full-scale input signal: Resolution=full-scale signal/2n (2) It is often given as the number of bits, n, or stated as one part in 2n. Sometimes it is given as a percent of maximum.

A/D Converter Specifications (Cont.)
Accuracy. Relates to the smallest signal (or noise) to the measured signal. Given as a percent and describes how close the measurement is to the actual value. The signal is accurate to within 100% * VRESOLUTION/VSIGNAL (3) Linearity. The derivation in output codes from a straight line drawn through zero and full-scale. The best that can be achieved is  ½ of the least significant bit ( ½ LSB).

Missing codes. A missing code could be caused by an internal error, especially by the A/D converter in a successive approximation converter. Aperture time. The time that the A/D converter is “looking” at the input signal. It is usually equal to the conversion time.

11 11 Output Code 10 10 Output Code 01 Missing Code 01  ½ LSB  ½ LSB 00 00 Input Voltage Full-Scale Input Voltage Full-Scale A/D linearity A/D missing codes

Example 1 An A/D converter has a conversion time of 100 s. What is the maximum frequency that can be converted without aliasing? Solution: The maximum sampling frequency is the reciprocal of the the conversion time=10 kHz. The maximum signal frequency that can be converted is 5 kHz.

Example 2 An 8-bit A/D converter is to digitize a five-volt, full scale signal. What is the resolution? Solution: The resolution is 5/256=19.5 mV. Another way of stating the resolution is 1 part in 256 or 0.4% of the full-scale value.

Example 3 An 8-bit A/D converter is to digitize a five-volt, full-scale signal. What is the accuracy with which the A/D converter can digitize the following signals? 50 mV, 1 V, 2.5 V, 4.9 V Solution: The resolution is 5 V/256=19.5 mV. The measurement will be accurate to within the following: 50 mV (19.5 mV/50 mV) = 39% 1 V (19.5 mV/1 V) = 1.9% 2.5 V (19.5 mV/2.5 V) = 0.8% 4.9 V (19.5 mV/4.9 V) = 0.4%

A/D Errors Three sources of errors in A/D conversion: Noise.
All signals have noise. Need to reduce noise or choose the converter resolution appropriately to control the peak-to-peak noise. Aliasing. The errors due to aliasing is difficult to quantify. They depend on the relative amplitude of the signals at frequencies below and above the Nyquist frequency. The system design should include a low-pass filter to attenuate frequencies above the Nyquist frequency.

A/D Errors (Cont.) Aperture.
A significant error in a digitizing system is due to signal variation during the aperture time. A good design will attempt to have the uncertainty, V, be less than one least significant bit. A design equation for the aperture time, tAP, in terms of the maximum signal frequency, fMAX, and the number of bits in the A/D converter is tAP=1/(2  fMAX 2n) (4) The aperture time needed to reduce the error to is surprisingly short.

A/D Errors (Cont.) Example 4 A 1 kHz sinusoidal signal is to be digitized to eight bits. Find the maximum conversion time that can be used and still avoid aliasing and aperture time so that the aperture error is less than  ½ LSB. Solution: There must be at least two samples per period; so the maximum conversion time is 0.5 ms. The aperture time is given by Equation 4 and is tAP = 1/(2  * 103 *256) = 0.62 s

A/D Errors (Cont.) Analog Input A/D Aperture V  ½ LSB tAP
Aperture time error

Sample-and-Hold The sample-and-hold (S/H) circuit can achieve the short aperture time while allowing a less expensive converter to satisfy the conversion time. It is a high-quality capacitor and a high-speed semiconductor switch. The sample command closes the switch for a very short time, and the capacitor charges or discharges to the input voltage. When the switch is open, the voltage is held for the A/D during its conversion time.

Sample-and-Hold Hold Held Analog Signal Analog Input +1 +1 C SAMPLE
Sample-and-hold circuit

Reading Chapter 11: Analog Input and Output. Microcontrollers and Microcomputers by Fredrick M. Cady. COMP3221/9221: Microprocessors and Embedded Systems

Lecture 20: Analog Output Lecturer: Hui Wu Session 2, 2005

Overview Digital-to-Analog Conversion D/A Converter Types D/A Converter Specifications Pulse-Width-Modulated (PWM) Analog Output COMP3221/9221: Microprocessors and Embedded Systems

Digital-to-Analog Converter
LATCH ENABLE Analog Output Digital-to-Analog Converter N N From CPU Signal Cond. Latch Digital Data

Digital-to-Analog Converter (Cont.)
A parallel output interface connects the D/A to the CPU. The latches may be part of the D/A converter or the output interface. The analog output signal from the D/A is quantized. A signal conditioning block may be used as a filter to smooth the quantized nature of the output. The signal conditioning block also provide isolation, buffering and voltage amplification if needed.

Quantized D/A Output Desired sinusoid D/A output 1.0 0.8 0.6 0.4 0.2
-0.2 -0.4 D/A output -0.6 -0.8 -1.0

D/A Converter Types Binary-weighted register D/A.
As the switches for the bits are closed, a weighted current is supplied to the summing junction of the amplifier. For high-resolution D/A converters, the binary-weighted type must have a wide range of resistors. This may lead to temperature stability and switching problems. B0 100K 6.25K B1 50K Analog Output B2 25K B3 12.5K

D/A Converter Types (Cont.)
R-2R Ladder D/A. As the switches for the grounded to the reference position, a binary-weighted current current is supplied to the summing junction. For high-resolution D/A converters, a wide range of resistors are not required. However, single-pole double throw switches are. VREF B0 B1 B2 B3 2R 2R 2R 2R 2R Analog Output 2R R R R

D/A Converter Types (Cont.)
Multiplying D/A. The R-2R ladder D/A can be used as a multiplying D/A by using reference voltage as an input. The reference voltage can vary over the maximum voltage range of the amplifier and is multiplied by the digital code.

D/A Converter Specifications
Resolution and linearity. The resolution is determined by the number of bits and is given as the output voltage corresponding to the smaller digital step, i.e. 1 LSB. The linearity show how closely the output voltage follows a straight line drawn through zero and full-scale. Settling Time. The time taken for the output voltage to settle to within a specified error band, usually  ½ LSB.

D/A Converter Specifications (Cont.)
Glitches. A glitch is caused by asymmetrical switching in the D/A switches. If a switch changes from a one to a zero faster than from a zero to a one, a glitch may occur. Consider changing the output code of a 8-bit D/A from to These code are adjacent, and we expect the output to go from one-half full-scale to one resolution value less than that. However, if the switches can switch faster from a one to a zero, the output code will go through a transitory state sequence to to This results in a short but sometimes noticeable glitch in the output signal. Glitches are especially noticeable in video displays. D/A converter glitch can be eliminated by using a sample-and-hold. The S/H is strobed to sample the data after the glitch has occurred and after the D/A settling time.

D/A Output Glitch 10000000 Digital Input Code 01111111 Glitch 00000000
Output Voltage

Deglitched D/A N Digital-to-Analog Converter Deglitched Analog Output
Sample-and-Hold Digital SAMPLE

PWM Analog Output PWM (Pulse Width Modulation) is a way of digitally encoding analog signal levels. Through the use of high-resolution counters, the duty cycle (pulse width/period) of a square wave is modulated to encode a specific analog signal level. The PWM signal is still digital because, at any given instant of time, the full DC supply is either fully on or fully off. The voltage or current source is supplied to the analog load by means of a repeating series of on and off pulses. Given a sufficient bandwidth, any analog value can be encoded with PWM.

PWM Analog Output (Cont.)
PWM is a powerful technique for controlling analog circuits with a processor's digital outputs. It is employed in a wide variety of applications, ranging from measurement and communications to motor speed control.

Pulse Width Modulated Output from CPU D/C Analog Output = A*t/T Low Pass Filter

A low-pass filtered is required to eliminate the inherent noise components in PWM signal. PWM signals contain strong noise components at the PWM frequency and at odd harmonics of that frequency. The output voltage is directly proportional to the pulse width. By changing the pulse width of the PWM waveform, we can control the output voltage.

Examples of PWM Signals
Duty cycle=10% Duty cycle=50% Duty cycle=90%

Reading Chapter 11: Analog Input and Output. Microcontrollers and Microcomputers by Fredrick M. Cady. Timers/Counters. AVR Mega64 Data Sheet. COMP3221/9221: Microprocessors and Embedded Systems

Lecture 21: Serial Input/Output (I) Lecturer: Hui Wu Session 2, 2005

Overview The Components of a Asynchronous Communication System Standards for the Serial I/O Interface RS232-C and Other Standards COMP3221/9221: Microprocessors and Embedded Systems

Why Serial I/O? Problems with Parallel I/O: Needs a wire for each bit. When the source and destination are more than a few feet the parallel cable cab be bulky and expensive. Susceptible to reflections and induced noise for long distance communication. Serial I/O provides a solution to these problems. COMP3221/9221: Microprocessors and Embedded Systems

The Components of a Asynchronous Communication System Data From Source Data From Source 8 8 Received Data Buffer Transmit Data Buffer Serial Data Rclock Tclock Serial In/Parallel Out Shift Register Parallel In/Serial Out Shift Register RECEIVER TRANSMITTER COMP3221/9221: Microprocessors and Embedded Systems

The Components of a Asynchronous Communication System (Cont.) At the communication source: The parallel interface transfers data from the source to the transmit data buffer. These data are shifted into the parallel in/serial out shift register and Tclock shifts the data out to the receiver. COMP3221/9221: Microprocessors and Embedded Systems

The Components of a Asynchronous Communication System (Cont.) At the communication destination: Rclock shifts each bit received into the serial in/parallel out shift register. After all data bits have been shifted, they are transferred to the received data buffer. The data in the received data buffer are transferred to the input operation via the parallel interface. COMP3221/9221: Microprocessors and Embedded Systems

UART The device that implements both transmitter and receiver in a single integrated circuit is called a UART (Universal Asynchronous Receiver/Transmitter). UART is the basis for most serial communication hardware. Details of UART will be covered in the next lecture. COMP3221/9221: Microprocessors and Embedded Systems

UART (Cont.) Tclock1 Rclock2 Transmitter Receiver Data Bus Data Bus Rclock1 Tclock2 Receiver Transmitter UART UART COMP3221/9221: Microprocessors and Embedded Systems

Design Considerations of the Serial Communication System How are data to be encoded? If the data are sent in serial, which bit is set first? How the receiver synchronised with the transmitter? What is the data rate? How are the electrical signals for logic values defined? How does the system provide for handshaking? COMP3221/9221: Microprocessors and Embedded Systems

Data Encoding and Transmission
Several codes used for the alphanumeric information. The most common is ASCII (American Standard Code for Information Interchange), using 7 bits to encode 96 printable characters and 32 control characters. Two choices for the order of data transmission: least significant bit first or most significant bit first. USRT uses least significant first order. Data are transmitted asynchronously. Therefore, synchronisation between transmitter and receiver is required. UART provides a way to synchronise the receiver shift register with the transmitter shift register.

Data Encoding and Transmission (Cont.)
Data are encapsulated in two other bits called start bit and stop bit. Mark and space: the logic one and zero levels are called mark and space. When the transmitter is not sending anything, it holds the line at mark level, also called idle level. Least Significant Bit Optional Parity Bit Stop Bit Mark Space 5 to 8 Data Bits Start Bit

Data Encoding and Transmission (Cont.) Typical bits in data transmission: Start bit: When the transmitter has data to send, it first changes the line from the mark to the space level for one bit time. This synchronises the receiver with transmitter. When the receiver detects the start bit, it knows to start clocking in the serial data bits. Data bits: Almost any number of data bits can be sent between the start and stop bits. Typically, between 5 and 8 bits are sued. Parity bit: The parity bit, used to detect errors in the data, is added to the data to make the total number of ones odd (odd parity) or even (even parity). Stop bit: The stop bit is added at the end of data bits. It gives one bit-time between successive characters. Some systems require more than one stop bit. COMP3221/9221: Microprocessors and Embedded Systems

Data Transmission Rate The rate at which bits are transmitted is called baud rate. It is given in bits per second. Standard data rates – Baud: 110, 150, 300, 600, 900, 1200, 2400, 4800, 9600, 14400, 19200, 38400, 57800 COMP3221/9221: Microprocessors and Embedded Systems

Standards for the Serial I/O Interface Interface standards are needed to allow different manufacturers’ equipment to be interconnected and must define the following elements: Handshaking signals. Direction of signal flow. Types of communication devices. Connectors and interface mechanical considerations. Electrical signal levels. COMP3221/9221: Microprocessors and Embedded Systems

Standards for the Serial I/O Interface (Cont.) The existing standards include RS-232-C, RS-422, RS-423 and RS-485. RS-232-C standard is used in most serial interface. If the signals must be transmitted farther than 50 feet or greater than 20 Kbits/second, another electrical interface standard such as RS-422, RS-423 or RS-485 should be chosen. For RS-422, RS-423 and RS-485, handshaking, direction of signal flow, and the types of communication devices are based on the RS-232-C standard. COMP3221/9221: Microprocessors and Embedded Systems

Communication System Types Three ways that the data can be sent in serial communication system: Simplex system: Data are sent in one direction only, say, to a serial printer. If the computer does not send data faster than the printer can accept it, no handshaking signals are required. Two signal wires are needed for this system. Computer Printer COMP3221/9221: Microprocessors and Embedded Systems

Communication System Types (Cont.) Full-duplex (FDX) system: Data are transmitted in two directions. It is called four-wire system, although only two signal wires and a common ground are sufficient. Terminal Computer COMP3221/9221: Microprocessors and Embedded Systems

Communication System Types (Cont.) Half-duplex (HDX) system: Data are transmitted in two directions with only one pair of signal lines. Additional hardware and handshaking signals must be added to an HDX system. Computer Computer COMP3221/9221: Microprocessors and Embedded Systems

Half-Duplex Handshaking Signals
Figure 1 shows a half-duplex system with additional interface circuitry and handshaking signals defined for the RS-232-C interface standard. Full-Duplex Interface Half-Duplex Interface Full-Duplex Interface Terminal Interface Interface Computer RTS RTS RTS RTS CTS CTS CTS CTS Figure 1 Half-duplex system with handshaking

Half-Duplex Handshaking Signals (Cont.)
The interface blocks have three roles: They give a full-duplex channel between themselves and the terminal or computer. They decide whether they or their opposite interface is sending or receiving data They use and control the request to send (RTS) and clear to send (CTS) handshaking signals. The RTS signal is asserted by the terminal or computer when data are to be sent. When the interface finds that the other system is not sending data, it asserts CTS signal. The sending station must wait until it is clear to send before transmitting. Half duplex systems are not often used these days, although the RTS/CTS handshaking signals have been retained to control a the flow of data.

Data Terminal Equipment and Data Communication Equipment The blocks labelled “Interface” in Figure 1 are, in practice, modems, and the two-wire half-duplex line is a telephone line. Modems are called data communication equipment (DCE). The terminals or the computers to which modems are attached to are called data terminal equipment (DTE). A modem is a modulator/demodulator. It converts logic levels into tones to be sent over a telephone line. At the other end of the telephone line, a demodulator converts the tones back to logic levels. In a half-duplex system, a single set of tones are defined, one for space ad one for mark. COMP3221/9221: Microprocessors and Embedded Systems

Data Terminal Equipment and Data Communication Equipment (Cont.)
Half-duplex modems are no longer used because modem have been developed to allow full-duplex data transmission over a telephone line. A full-duplex system has two types of modems, called originate and answer modems, and two sets of tones. Originate and answer modem tone definitions for Bell 212A Originate modem Direction Answer modem Modulator tones Demodulator 1070 Hz — Space  1270 Hz — Mark Demodulator Modulator tones  Hz — Space 2225 Hz — Mark

Modem Handshaking Signals Ring Indicator (RI) The telephone company transmits a special tone that rings the phone. The modem can detect this and asserts the RI signal. The terminal or computer can use RI to start some special process such as notifying the user that the other end is calling or to answer the telephone in an answer modem. Data Set Ready (DSR) This signal tells the DTE that the modem (also called data set) has established a connection over the telephone line to the far end. COMP3221/9221: Microprocessors and Embedded Systems

Modem Handshaking Signals (Cont.)
Data Terminal Ready (DTR) This signal comes from the DTE and inform the modem that it is ready to operate. This is usually just an indication that the power is turned on in the terminal but could be controlled by a computer. An intelligent answer modem can use it to answer a call automatically only when the computer or terminal is ready. Data Carrier Detect (DCD) DCD is asserted when the carrier, or tone defined for a mark, is being generated by the modem on the other end. It was originally used in half-duplex systems. When one end wanted to transmit, it first asserted the RTS line. The modem then checked the DCD bit. If it found it asserted, it knew the other end was sending. When DCD was deasserted, CTS was asserted, allowing transmission from the requesting terminal.

Modem Handshaking Signals (Cont.)
DCE DTE TxD TxD RxD RxD Gnd Gnd Telephone Line Terminal or Computer Modem RI RI DSR DSR DTR DTR DCD DCD

RS-232-C Signal Definitions
DE9 DB25 Signal Purpose PG Protective ground: this is actually the shield in a shielded cable. It is designed to be connected to the equipment frame and may be connected to external grounds. TxD Transmitted data: Sourced by DTE and received by DCE. Data terminal equipment cannot send unless RTS, CTS, DSR and DTR are asserted. RxD Received data: Received by DTE, sourced by DCE. RTS Request to send: Sourced by DTE, received by DCE. RTS is asserted by the DTE when it wants to send data. The DCE responds by asserting CTS.

RS-232-C Signal Definitions (Cont.)
DE9 DB25 Signal Purpose CTS Clear to send: Sourced by DCE, received by DTE. CTS must be asserted before the DTE can transmit data. DSR Data set ready: Sourced by DCE and received by DTE. Indicates that the DCE has made a connection on the telephone line and is ready to receive data from the terminal. The DTE must see this asserted before it can transmit data. SG Signal ground: Ground reference for this signal is separate from pin 1, protective ground.

RS-232-C Signal Definitions (Cont.)
DE9 DB25 Signal Purpose DCD Data carrier detect: Sourced by DCE, received by DTE. Indicates that a DCE has detected the carrier on the telephone line. Originally it was used in half-duplex systems but can be used in full-duplex systems, too. DTR Data terminal ready: Sourced by DTE and received by DCE. Indicates that DTE is ready to send or receive data. RI Ring indicator: Sourced by DCE and received by DTE. Indicates that a ringing signal is detected.

RS-232-C Interconnections When two serial ports are connected, the data rate, the number of data bits, whether parity is used, the type of parity, and the number of stop bits must be set properly and identically on each UART. Proper cables must be used. There are three kinds of cables from which to choose, depending on the types of devices to be interconnected. The full DTE – DCE cable. The DTE – DTE null modem cable. The minimal DTE – DCE cable. COMP3221/9221: Microprocessors and Embedded Systems

RS-232-C Interconnections (Cont.)
DE9 DB DB25 DE9 DTE DTE DCE DCE TxD TxD RxD RxD SG SG RTS RTS CTS CTS DCD DCD DSR DSR DTR DTR Full DTE – DCE cable

DE9 DB DB25 DE9 DTE DTE DCE DCE TxD TxD RxD RxD SG SG RTS RTS CTS CTS DCD DCD DSR DSR DTR DTR DTE – DTE null modem cable

DE9 DB DB25 DE9 DTE DTE DCE DCE TxD TxD RxD RxD SG SG RTS RTS CTS CTS DCD DCD DSR DSR DTR DTR Minimal three-wire cable

DE9 DB DB25 DE9 DTE DTE DCE DCE TxD TxD RxD RxD SG SG RTS RTS CTS CTS DCD DCD DSR DSR DTR DTR Minimal null modem cable

RS-232-C Interface RS-232-C Logic levels: Mark to –3 volts Space to +3 volts RS-232-C Logic levels TTL Logic levels D TTL Logic levels R COMP3221/9221: Microprocessors and Embedded Systems

RS-423 Standard Also a single ended system.
Allows longer distance and higher data rates than RS-232-C. Allows a driver to broadcast data to 10 receivers. D R Up to 10 receivers R RS-423 Interface

RS-422 Standard RS-422 line drivers and receivers operates with differential amplifier. These drivers eliminate much of the common-mode noise experienced with long transmission lines, thus allowing the longer distances and higher data rates. D R Up to 10 receivers R RS-422 Interface

RS-485 Standard Similar to RS-422 in that it uses differential line drivers and receivers. Unlike RS-422, RS-485 provides for multiple drivers and receivers in a bussed environment. Up to 32 drivers/receivers pairs can be used together. D R Up to 32 receivers Up to 32 drivers D R RS-485 Interface

Line Lengths and Data Rates
RS-423 line length and data rate RS-422 line length and data rate Line length (ft) Data rate Kbits/s Kbits/s Kbits/s Line length (ft) Data rate Mbits/s Mbits/s Kbits/s RS-485 line length and data rate Line length (ft) Data rate Mbits/s Mbits/ Kbits/s

Summary of Standards Specification RS-232-C RS-423 RS-422 RS-485
Receiver input 3 to 15V 200mV to 12V 200mV to 200mV to voltage 7V to +12V Driver output signal 5 to 15V 3.6 to 6V 2 to 5V 1.5 to 5V Maximum data rate Kb/s Kb/s Mb/s Mb/s Maximum cable ft ft ft ft length Driver source K  min   Impedance Receiver input K K min K  min K minimum resistance Mode Singled-ended Singled-ended Differential Differential Number of drivers Driver driver Driver Driver and receivers allowed on one line Receivers Receivers Receivers Receivers

Reading Chapter 10: Serial Input/Output. Microcontrollers and Microcomputers by Fredrick M. Cady. COMP3221/9221: Microprocessors and Embedded Systems

Lecture 22: Serial Input/Output (II) Lecturer: Hui Wu Session 2, 2005

Overview USART (Universal Synchronous and Asynchronous serial Receiver and Transmitter) in AVR COMP3221/9221: Microprocessors and Embedded Systems

Main Features of USART in AVR
Full duplex operation (independent serial receive and transmit registers). Asynchronous or synchronous operation. Master or slave clocked synchronous operation. High resolution baud rate generator. Supports serial frames with 5, 6, 7, 8, or 9 data bits and 1 or 2 stop bits. Odd or even parity generation and parity check supported by hardware.

Main Features of USART in AVR (Cont.)
Framing error detection. Noise filtering includes false start bit detection and digital low pass filter. Three separate interrupts on TX Complete, TX Data Register Empty and RX Complete. Multi-processor communication mode. Double speed asynchronous communication mode.

The Block Diagram of USART

The Main Components of USART
Three main components: Clock generator The Clock Generation logic consists of synchronization logic for external clock input used by synchronous slave operation, and the baud rate generator. Transmitter The Transmitter consists of a single write buffer, a serial Shift Register, Parity Generator and Control Logic for handling different serial frame formats. The write buffer allows a continuous transfer of data without any delay between frames.

The Main Components of USART (Cont.)
Receiver The Receiver is the most complex part of the USART module due to its clock and data recovery units. The recovery units are used for asynchronous data reception. In addition to the recovery units, the Receiver includes a Parity Checker, Control Logic, a Shift Register and a Two Level Receive Buffer (UDR). The Receiver supports the same frame formats as the Transmitter, and can detect Frame Error, Data OverRun and Parity Errors.

Clock Generation The Clock Generation logic generates the base clock for the Transmitter and Receiver. The USART supports four modes of clock operation: Normal asynchronous, Double Speed asynchronous, Master synchronous and Slave synchronous mode. The UMSEL bit in USART Control and Status Register C (UCSRC) selects between asynchronous and synchronous operation. Double Speed (asynchronous mode only) is controlled by the U2X found in the UCSRB Register. When using synchronous mode (UMSEL = 1), the Data Direction Register for the XCK pin (DDR_XCK) controls whether the clock source is internal (Master mode) or external (Slave mode). The XCK pin is only active when using synchronous mode.

Clock Generation (Cont.)

Clock Generation (Cont.)
Signal description: txclk: Transmitter clock (internal signal). rxclk: Receiver base clock (internal signal). xcki: Input from XCK pin (internal signal). Used for synchronous slave operation. xcko: Clock output to XCK pin (internal signal). Used for synchronous master operation. fosc: XTAL pin frequency (system clock).

The Baud Rate Generator
The USART Baud Rate Register (UBRR) and the down-counter connected to it function as a programmable prescaler or baud rate generator. The down-counter, running at system clock (fOSC), is loaded with the UBRR value each time the counter has counted down to zero or when the UBRRL Register is written. A clock is generated each time the counter reaches zero. This clock is the baud rate generator clock output (=fOSC/(UBRR+1)). The transmitter divides the baud rate generator clock output by 2, 8, or 16 depending on mode. The baud rate generator output is used directly by the receiver’s clock and data recovery units. However, the recovery units use a state machine that uses 2, 8 or 16 states depending on mode set by the state of the UMSEL, U2X and DDR_XCK bits.

Frame Formats A serial frame is defined to be one character of data bits with synchronization bits (start and stop bits), and optionally a parity bit for error checking. The USART accepts all 30 combinations of the following as valid frame formats: 1 start bit 5, 6, 7, 8, or 9 data bits no, even or odd parity bit 1 or 2 stop bits

Frame Formats (Cont.) A frame starts with the start bit followed by the least significant data bit. Then the next data bits, up to a total of nine, are succeeding, ending with the most significant bit. If enabled, the parity bit is inserted after the data bits, before the stop bits. When a complete frame is transmitted, it can be directly followed by a new frame, or the communication line can be set to an idle (high) state.

Frame Formats (Cont.) St Start bit, always low.
(n) Data bits (0 to 8). P Parity bit. Can be odd or even. Sp Stop bit, always high. IDLE No transfers on the communication line (RxD or TxD). An IDLE line must be high.

Parity Bit Calculation
The parity bit is calculated by doing an exclusive-or of all the data bits. If odd parity is used, the result of the exclusive or is inverted. The relation between the parity bit and data bits is as follows: Peven = dn  dn-1 …  d1  d0  0 Podd = dn  dn-1 …  d1  d0  1 Where Peven Parity bit using even parity Podd Parity bit using odd parity dn Data bit n of the character If used, the parity bit is located between the last data bit and first stop bit of a serial frame.

USART Initialisation USART has to be initialised before any communication can take place. The initialisation process normally consists of setting the baud rate, setting frame format and enabling the Transmitter or the Receiver depending on the usage. For interrupt driven USART operation, the Global Interrupt Flag should be cleared when doing the initialisation.

USART Initialisation (Cont.)
Before doing a re-initialisation with changed baud rate or frame format, be sure that there are no ongoing transmissions during the period the registers are changed. The TXC flag can be used to check that the Transmitter has completed all transfers, and the RXC flag can be used to check that there are no unread data in the receive buffer. Note that the TXC flag must be cleared before each transmission (before UDR is written) if it is used for this purpose.

USART Initialisation (Cont.)
Assembly Code Example: USART_Init: ; Set baud rate out UBRRH, r17 out UBRRL, r16 ; Enable receiver and transmitter ldi r16, (1<<RXEN)|(1<<TXEN) out UCSRB,r16 ; Set frame format: 8data, 2stop bit ldi r16, (1<<USBS)|(3<<UCSZ0) out UCSRC,r16 ret

The USART Transmitter The USART Transmitter is enabled by setting the Transmit Enable (TXEN) bit in the UCSRB Register. When the Transmitter is enabled, the normal port operation of the TxD pin is overridden by the USART and given the function as the transmitter’s serial output. The baud rate, mode of operation and frame format must be set up once before doing any transmissions. If synchronous operation is used, the clock on the XCK pin will be overridden and used as transmission clock.

The USART Transmitter (Cont.)
A data transmission is initiated by loading the transmit buffer with the data to be transmitted. The CPU can load the transmit buffer by writing to the UDR I/O location. The buffered data in the transmit buffer will be moved to the Shift Register when the Shift Register is ready to send a new frame. The Shift Register is loaded with new data if it is in idle state (no ongoing transmission) or immediately after the last stop bit of the previous frame is transmitted. When the Shift Register is loaded with new data, it will transfer one complete frame at the rate given by the baud register, U2X bit or by XCK depending on mode of operation.

The USART Transmitter (Cont.)
Assembly Code Example: USART_Transmit: ; Wait for empty transmit buffer sbis UCSRA,UDRE rjmp USART_Transmit ; Put data (r16) into buffer, sends the data out UDR,r16 ret

Transmitter Flags and Interrupts
The USART Transmitter has two flags that indicate its state: USART Data Register Empty (UDRE) and Transmit Complete (TXC). Both flags can be used for generating interrupts. The Data Register Empty (UDRE) flag indicates whether the transmit buffer is ready to receive new data. This bit is set when the transmit buffer is empty, and cleared when the transmit buffer contains data to be transmitted that has not yet been moved into the Shift Register. For compatibility with future devices, always write this bit to zero when writing the UCSRA Register.

Transmitter Flags and Interrupts (Cont.)
When the Data Register empty Interrupt Enable (UDRIE) bit in UCSRB is written to one, the USART Data Register Empty Interrupt will be executed as long as UDRE is set (provided that global interrupts are enabled). UDRE is cleared by writing UDR. When interrupt-driven data transmission is used, the Data Register Empty Interrupt routine must either write new data to UDR in order to clear UDRE or disable the Data Register Empty Interrupt, otherwise a new interrupt will occur once the interrupt routine terminates. The Data Register Empty (UDRE) flag indicates whether the transmit buffer is ready to receive new data.

Transmitter Flags and Interrupts (Cont.)
The Transmit Complete (TXC) flag bit is set to one when the entire frame in the Transmit Shift Register has been shifted out and there are no new data currently present in the transmit buffer. The TXC flag bit is automatically cleared when a transmit complete interrupt is executed, or it can be cleared by writing a one to its bit location. The TXC flag is useful in half-duplex communication interfaces (like the RS-485 standard), where a transmitting application must enter Receive mode and free the communication bus immediately after completing the transmission.

The USART Receiver The USART Receiver is enabled by writing the Receive Enable (RXEN) bit in the UCSRB Register to one. When the Receiver is enabled, the normal pin operation of the RxD pin is overridden by the USART and given the function as the receiver’s serial input. The baud rate, mode of operation and frame format must be set up once before any serial reception can be done. If synchronous operation is used, the clock on the XCK pin will be used as transfer clock.

The USART Receiver (Cont.)
The Receiver starts data reception when it detects a valid start bit. Each bit that follows the start bit will be sampled at the baud rate or XCK clock, and shifted into the Receive Shift Register until the first stop bit of a frame is received. A second stop bit will be ignored by the Receiver. When the first stop bit is received, i.e., a complete serial frame is present in the Receive Shift Register, the contents of the Shift Register will be moved into the receive buffer. The receive buffer can then be read by reading the UDR I/O location.

The USART Receiver (Cont.)
Assembly Code Example: USART_Receive: ; Wait for data to be received sbis UCSRA, RXC rjmp USART_Receive ; Get and return received data from buffer in r16, UDR ret

Receive Compete Flag and Interrupt
The Receive Complete (RXC) flag indicates if there are unread data present in the receive buffer. This flag is one when unread data exist in the receive buffer, and zero when the receive buffer is empty (i.e. does not contain any unread data). If the receiver is disabled (RXEN = 0), the receive buffer will be flushed and consequently the RXC bit will become zero. When the Receive Complete Interrupt Enable (RXCIE) in UCSRB is set, the USART Receive Complete Interrupt will be executed as long as the RXC flag is set (provided that global interrupts are enabled). When interrupt-driven data reception is used, the receive complete routine must read the received data from UDR in order to clear the RXC flag; otherwise a new interrupt will occur once the interrupt routine terminates.

Receiver Error Flags The USART Receiver has three error flags: Frame Error (FE), Data OverRun (DOR) and USART Parity Error (UPE). The Frame Error (FE) flag indicates the state of the first stop bit of the next readable frame stored in the receive buffer. The FE flag is zero when the stop bit was correctly read (as one), and the FE flag will be one when the stop bit was incorrect (zero). This flag can be used for detecting out-of-sync conditions, detecting break conditions and protocol handling.

Receiver Error Flags (Cont.)
The Data OverRun (DOR) flag indicates data loss due to a receiver buffer full condition. A Data OverRun occurs when the receive buffer is full (two characters), it is a new character waiting in the Receive Shift Register, and a new start bit is detected. If the DOR flag is set there was one or more serial frame lost between the frame last read from UDR, and the next frame read from UDR. The USART Parity Error (UPE) flag indicates that the next frame in the receive buffer had a Parity Error when received.

Asynchronous Data Reception
The USART includes a clock recovery and a data recovery unit for handling asynchronous data reception. The clock recovery logic is used for synchronizing the internally generated baud rate clock to the incoming asynchronous serial frames at the RxD pin. The data recovery logic samples and low pass filters each incoming bit, thereby improving the noise immunity of the Receiver. The asynchronous reception operational range depends on the accuracy of the internal baud rate clock, the rate of the incoming frames, and the frame size in number of bits.

Asynchronous Clock Recovery
The Clock Recovery logic synchronizes internal clock to the incoming serial frames. The following figure illustrates the sampling process of the start bit of an incoming frame. The sample rate is 16 times the baud rate for Normal mode, and eight times the baud rate for Double Speed mode.

Asynchronous Clock Recovery (Cont.)
The horizontal arrows illustrate the synchronization variation due to the sampling process. Note the larger time variation when using the Double Speed mode (U2X = 1) of operation. Samples denoted by zero are samples done when the RxD line is idle (i.e., no communication activity). When the Clock Recovery logic detects a high (idle) to low (start) transition on the RxD line, the start bit detection sequence is initiated.

Asynchronous Clock Recovery (Cont.)
Let sample 1 denote the first zero-sample as shown in the figure. The Clock Recovery logic then uses samples 8, 9 and 10 for Normal mode, and samples 4, 5 and 6 for Double Speed mode (indicated with sample numbers inside boxes on the figure), to decide if a valid start bit is received. If two or more of these three samples have logical high levels (the majority wins), the start bit is rejected as a noise spike and the Receiver starts looking for the next high to low-transition. If however, a valid start bit is detected, the clock recovery logic is synchronized and the data recovery can begin. The synchronization process is repeated for each start bit.

Asynchronous Data Recovery
When the receiver clock is synchronized to the start bit, the data recovery can begin. The data recovery unit uses a state machine that has 16 states for each bit in Normal mode and eight states for each bit in Double Speed mode. The following figure shows the sampling of the data bits and the parity bit. Each of the samples is given a number that is equal to the state of the recovery unit.

Asynchronous Data Recovery (Cont.)
The decision of the logic level of the received bit is taken by doing a majority voting of the logic value to the three samples in the centre of the received bit. The centre samples are emphasized on the figure by having the sample number inside boxes. The majority voting process is done as follows: If two or all three samples have high levels, the received bit is registered to be a logic 1. If two or all three samples have low levels, the received bit is registered to be a logic 0. This majority voting process acts as a low pass filter for the incoming signal on the RxD pin. The recovery process is then repeated until a complete frame is received.

Reading USART. Mega64 Data Sheet. COMP3221/9221: Microprocessors and Embedded Systems

Lecture 23: Memory Systems (I) Lecturer: Hui Wu Session 2, 2005

Overview Memory System Hierarchy RAM, ROM, EPROM, EEPROM and FLASH COMP3221/9221: Microprocessors and Embedded Systems

Memory System Hierarchy
Decreasing speed and cost Processor Increasing size Control Auxiliary storage (hard disk, floppy disk, CDROM) Off-chip memory (RAM, ROM) Datapath Cache or On-chip memory Registers Fastest but most expensive Slowest and cheapest

Memory System Hierarchy
Registers Fastest but most expensive. Cache Slower than registers but bigger size. Managed by hardware and therefore typically invisible to programmers. On-chip memory (RAM and ROM) An alternative for cache. Managed by the software. External memory (RAM and ROM) Slower than on-chip memory but bigger size. Auxiliary storage (Hard disk, floppy disk, CDROM) Much slower than external memory but much bigger size. Non-volatile i.e. data exists after the power is switched off.

Computer Types and Memory Maps
General purpose computer systems Programs and data (including OS) are stored on the auxiliary storage (typically hard disk). A large amount of RAM to store programs and data. When executed, programs are loaded from disk to RAM by OS. ROM to store boot-up code and low-level system I/O drivers. Embedded systems Typically no auxiliary storage. Program and constants are stored in ROM or non-volatile memory such as flash. Data are stored in RAM. Program could be copied from ROM to RAM to increase the execution speed. On-chip memory is preferred to cache due to its low power consumption.

Computer Types and Memory Maps (Cont.)
Low Memory Address Area of RAM reserved for OS Application Programs RAM High Memory Address BIOS and boot-up code ROM A general-purpose computer memory map

Computer Types and Memory Maps (Cont.)
Low Memory Address Enough RAM for variables and stack RAM None Enough ROM for application programs and possibly OS ROM High Memory Address An embedded system memory map

Semiconductor RAM Memories are semiconductor integrated circuits.
A RAM (Random Access Memory) chip consists of an array of memory cells, a decoder for addressing a particular cell, and signals to control the direction of data flow.

An Example Semiconductor RAM
5-to-32 Row Address Decoder 5-Bit Row Address 5 32 32×32-bit RAM Array 32 32 5-Bit Column Address 5 32-1 Multiplexer 32-1 Demultiplexer CE R/W Di Do

An Example Semiconductor RAM
The CE signal (chip enable, or sometimes CS, chip select) is derived by decoding the rest of address bus. R/W controls whether the memory cell is being read from or written to. Di and Do are separate data-in and data-out pins. Some chip has a single data I/O pin.

Static Memory Cells A static memory cell is a flip-flop.
The transistors could be bipolar or MOS devices. The following figure shows a typical static memory cell. C 3.5 V C´ R1 R1´ D1 D1´ A A´ R2 R2´ 2.5 V 0 V Q1 Q1´ ROW_SELECT COLUMN LINES

Static Memory Cells The bipolar flip-flop works as follows
Assume the ROW_SELECT is high (2.5 volts) and that transistor Q1 is on. Current flows through R1 and R2, which are chosen to make the voltage at point A higher than the column line C. Q1 is off, making the voltage at point A higher than the column line C. Thus, when the row is not selected, the diodes D1 and D1 isolate the cell from from the column lines C and C. When the cell is read, ROW_SELECT is asserted (it now is 0 volt), point A becomes lower than C, and current flows in diode D1 from column line C. This current could signify a logic one stored in the cell. A logic zero is stored by turning Q1 off and Q1 on. Now, when ROW_SELECT is asserted , the C column line will not have current flow while C will. Writing into the cell involves asserting the ROW_SELECT and driving either C or C to set Q1 or Q1 depending on whether zero or one is to be stored.

Dynamic Memory Cells A dynamic cell is a capacitor where absence or presence of charge denotes a stored one or zero. The following figure shows a typical dynamic memory cell. The MOS capacitor can be written to by activating the row, or word, line to turn the MOS transistor on and charge the capacitor through the column, or bit, line. The cell can be read by turning the transistor on and sensing a voltage on the column. Row or Word Line MOS Transistor MOS Capacitor Column or Bit Line

Dynamic Memory Cells (Cont.)
A problem with dynamic cell is that the charge stored on the capacitor leaks away through the substrate. Thus, the dynamic memory must be refreshed at periodic intervals by activating ROW_SELECT line while holding all column lines at a particular voltage level. All cells in the row can have the capacitor’s charge (or lack of charge) refreshed at once.

Static Memory Static RAM, or SRAM, consists of array of flip-flops.
Has lower density and thus lower storage capability than dynamic RAM. Simpler to use than dynamic RAM Does not need to refreshed like DRAM.

Static Memory (Cont.) Logic block diagram of a typical SRAM

Dynamic Memory Intel 1M×1-bit DRAM RAS DATA IN BUFFER CONTROL & CLOCKS
CAS FRESH CONTROL & ADDRESS COUNTER W DATA OUT BUFFER COLUMN DECODER Q SENSE AMPS & I/O GATINGS REFRESH CONTROL& ADDRESS COUNTER MEMORY ARRAY 1,048,567 CELLS Vcc Vss Intel 1M×1-bit DRAM

Dynamic Memory (Cont.) The DRAM chip has 10 bits, separate data-in and data-out pins, a write enable (w), and two other control signals: RAS (row address strobe) and CAS (column address strobe). RAS and CAS control the multiplexing of the two 10-bit address fields that make up the full 20-bit address required for 1M bits.

DRAM Refresh The DRAM memory cell requires periodic refresh operation. There are several refresh methods. RAS-only refresh: This is the most common method of refresh. The row addresses are strobed by asserting RAS while CAS is held high. The cycle must be repeated for every row address. CAS-before-RAS refresh: CAS-before-RAS eliminates the need for external refresh addresses. If CAS is held low, a specified time before RAS is asserted, on-chip refresh circuitry automatically furnishes the refresh address. This method takes slightly longer time than RAS-only method. Hidden refresh:This refresh is done while maintaining the latest valid data at the output and extending CAS and cycling RAS.

Pseudostatic RAM A memory that combines the high storage capability of DRAM and ease of use of SRAM is pheudostatic RAM. It uses DRAM cells and includes on-chip refresh circuitry so that it appears to the user as SRAM. Some care must be taken to avoid a conflict when the system attempts to access the memory while an internal refresh is being done. Two approaches may be included in the design of the chip to solve this problem: In the first approach, a separate pin may be included to tell the RAM when it can execute a refresh cycle without conflicting with an external access request. External logic can pusle this input to refresh the chip. In the second approach, a “ready” or “wait” output from the RAM may be used for handshaking in a system where “wait state” can be generated.

ROM Memory There are various types of ROM memory chips.
Mask programmable ROM are programmed during the manufacturing stage and cannot be programmed by user. Other ROM devices are field programmable and may be programmed by the user. These care called programmable read only memories., and include UV-erasable PROMs (EPROMs), one-time programmable (OTP) EPROMs, and fusible-link PROMs. EPROMs are electrically programmable are erased by irradiating the chip through a quartz window with ultraviolet (UV) light. An OTP EPROM is an EPROM without the window so that once programmed, it cannot be erased. Another type of programmable read only memory is the electrically erasable PROM (EEPROM), which can be programmed and erased while in use.

ROM Memory Cells Bit Line Bit Line Bit Line Word Line Gate No Gate
ROM cell

ROM Memory Cells (Cont.)
The ROM memory cell is simply a wire or connection made or not made in the programming process. The binary information is represented by the presence or absence of the gate on the MOS transistor. Activating the word line puts a one or zero on the bit line.

EPROM Memory Cells UV Light to Erase Quartz Window Drain Source SiO2
Filed Oxide Field Oxide n+ n+ Electrons injected to program Si Floating Gate p-Substrate EPROM Cell

EPROM Memory Cells (Cont.)
The EPROM cell is a MOS transistor without a connection to the gate. To program the EPROM, the chip is placed into a PROM programmer and during the programming cycle, the address and data are sent to the chip and the programming voltage is applied. To change the state of the gate, electrons are either injected by an avalanche mechanism into the silicon floating gate or not. This after the programming, the channel between the source and and the drain either conducts or does not. If the chip needs to be erased, it must be placed into the PROM eraser. The ultraviolet light irradiated from the PROM eraser disperses any charge stored in the floating gate back into the substrate and erases the memory.

EEPROM Memory Cells Control Gate Drain Source SiO2 Filed Oxide
Field Oxide n+ n+ Electrons injected to program Si Floating Gate p-Substrate EEPROM Cell

EEPROM Memory The EEPROM is a further development of the EPROM.
A second polysilicon gate, called the control gate, is added above the floating gate. A control voltage may be applied to the gate to program and erase the cell by injecting or disperse electrons in the floating gate. EEPROM can be programmed and erased without removing the chip from the circuit in use. The time required to write is longer than a comparable RAM chip. There is a maximum number of times it can be programmed (the industry standard as of 1993 is 10,000 program/erase cycles).

FLASH Memory Similar to the EEPROM.
Its drawback is that the entire memory or page must be erased where single locations can be erased and reprogrammed in the EEPROM devices.

Reading Chapter 9: Computer Memories. Microcontrollers and Microcomputers by Fredrick M. Cady COMP3221/9221: Microprocessors and Embedded Systems

Lecture 23: Memory Systems (II) Lecturer: Hui Wu Session 2, 2005

Overview Memory Timing Requirements COMP3221/9221: Microprocessors and Embedded Systems

Memory Timing Requirements There are two components of timing requirements. Timing requirements from CPU. CPU generates control signals such as READ/WRITE, and in the absence of handshaking signals such as WAIT or READY, takes data from or put data on the bus at specific times. Timing requirements from memory. We will use SRAM as an example to illustrate the timing requirements from the point of view from the memory. COMP3221/9221: Microprocessors and Embedded Systems

CPU Read and Write Cycles The CPU control all reading and writing of information. tCYC Cycle time: The total time to complete a write or read cycle. tAD Address delay: The delay from the start of the write or read cycle until the address appears on the external address bus. This delay accounts for multiplexing and other CPU-generated delay. tAV Address valid: The time the address is valid on the external address. The CPU takes it away or changes it at the end of the read or write cycle. COMP3221/9221: Microprocessors and Embedded Systems

CPU Read Cycle tRED Read enable delay: The delay from the start of the read cycle until the read enable signal is asserted. This is found in CPUs that have separate READ and WRITE control signals. tRE Read enable pulse length: The duration of the READ signal. tRDD Read data delay: The CPU waits for this time before it reads the data from the data bus. tRDS Read data setup: The time the data must be valid before they are read by the CPU. tRDH Read data hold: The CPU may require the data to be held after it reads them.

CPU Read Cycle tCYC CPU Clock ADDRESS Address Valid tAD tAV tRE READ
tRED Data Valid DATA tRDS tRDH tRDD

CPU Write Cycle tWDD Write data delay: The CPU waits for this time before it places the data to be written to memory on the data bus. tWDV Write data valid The time the CPU keeps the data on the data bus. tWED WRITE enable delay: The CPU waits for this time before it asserts the write enable signal. tWE Write enable pulse length: The during of WRITE signal. tWDH Write data hold: The time the CPU holds the data on the data bus after deasserting the write enable signal.

CPU Write Cycle tCYC CPU Clock ADDRESS Address Valid tAD tAV tRE WRITE
tWDH tWED tWE Data Valid DATA tWDD tWDV

Memory Read Cycle tRC Read cycle: This the total time for the read cycle. tACS Chip select access: The maximum time required by the memory for the CS to be asserted before the data are available. tAA Address access: This is the maximum time required by the memory for the address to be present before the data are available. tRDHA Read data hold after address: The time the memory may hold the data at the output after the address is changed. tRDHC Read data hold after chip select: The minimum time the chip will hold the data after being deselected.

Memory Read Cycle (Cont.)
tOE Output enable access: On chips that have an output enable, this parameter gives the maximum time for the chip to respond with the data. tOHZ Output enable to out high Z: On chips that have an output enable, this parameter specifies the time the data will remain valid before going into three-state (high impedance). There are two times for reading data are important to memory designers. The read cycle time, tRC, is the minimum time that the address must be stable (unchanging) at the chip. The address access time, tAA, is the maximum time required by the memory before the data are available.

Memory Read Cycle (Cont.)
tRC ADDRESS Address Valid CS tRDHA tACS tRDHC tAA DATA Data Valid tOE tOHZ OUTPUT ENABLE

Memory Write Cycle tWC Write cycle: This is the minimum total time required by the memory to complete a write cycle. This may or may not the same as tRC. tCW Chip selection to end of write: The minimum time the CS signal must be asserted. tAS Address setup: The minimum time the address must be valid before the WRITE signal is asserted. tMWE Write enable: The minimum time the WRITE signal must be asserted. tAW Address valid to end of write: The minimum time the address must be valid.

Memory Write Cycle (Cont.)
tWDS Write data setup: The minimum time the data must be valid before the end of write enable. tMWDHE Write data hold after enable: The minimum time the data must be valid before the WRITE signal is deasserted.

Memory Write Cycle (Cont.)
tWC ADDRESS Address Valid tCW CS tAS tMWE WRITE tAW tWDHE tWDS DATA Data Valid

Reading Chapter 9: Computer Memories. Microcontrollers and Microcomputers by Fredrick M. Cady COMP3221/9221: Microprocessors and Embedded Systems

Lecture 25: Cache - I Lecturer: Hui Wu Session 2, 2005 Modified from notes by Saeid Nooshabadi

Outline Memory Hierarchy On-Chip SRAM Direct-Mapped Cache

Memory Hierarchy (#1/5) Processor Disk executes programs
runs on order of nanoseconds to picoseconds needs to access code and data for programs: where are these? Disk HUGE capacity (virtually limitless) VERY slow: runs on order of milliseconds so how do we account for this gap?

Memory Hierarchy (#2/5) Memory (DRAM)
smaller than disk (not limitless capacity) contains subset of data on disk: basically portions of programs that are currently being run much faster than disk: memory accesses don’t slow down processor quite as much Problem: memory is still too slow (hundreds of nanoseconds) Solution: add more layers On-chip Memory On-chip Caches

Memory Hierarchy (#3/5) Level 1 Level 2 Level 3 . . . Level n
Processor Increasing Distance from Proc., Decreasing cost / MB Levels in memory hierarchy Higher Level 1 Level 2 Level n Level 3 . . . Lower Size of memory at each level

Memory Hierarchy (#4/5) Hard Disk RAM/ROM DRAM EEPROM Processor
Control Memory Memory Memory Datapath Memory Memory Registers Slowest Speed: Fastest Biggest Size: Smallest Cache L1 SRAM Cache L2 SRAM Lowest Cost: Highest

Memory Hierarchy (#5/5) If level is closer to Processor, it must be:
smaller faster subset of all lower levels (contains most recently used data) contain at least all the data in all higher levels Lowest Level (usually disk) contains all available data

Memory Hierarchy Purpose: Faster access to large memory from processor
(active) Computer Control (“brain”) Datapath (“brawn”) Memory (passive) (where programs, data live when running) Devices Input Output Keyboard, Mouse Display, Printer Disk, Network Purpose: Faster access to large memory from processor

Memory Hierarchy Analogy: Library (#1/2)
You’re writing an assignment paper (Processor) at a table in the Library Library is equivalent to disk essentially limitless capacity very slow to retrieve a book Table is memory smaller capacity: means you must return book when table fills up easier and faster to find a book there once you’ve already retrieved it

Memory Hierarchy Analogy: Library (#2/2)
Open books on table are on-chip memory/cache smaller capacity: can have very few open books fit on table; again, when table fills up, you must close a book much, much faster to retrieve data Illusion created: whole library open on the tabletop Keep as many recently used books open on table as possible since likely to use again Also keep as many books on table as possible, since faster than going to library shelves

Memory Hierarchy Basis
Disk contains everything. When Processor needs something, bring it into to all higher levels of memory. On-chip Memory/Cache contains copies of data in memory that are being used. Memory contains copies of data on disk that are being used. Entire idea is based on Temporal Locality: if we use it now, we’ll want to use it again soon (a Big Idea)

On Chip SRAM Memory Provides fast (zero wait state access to program and data) It occupies a portion of address space. Requires explicit management by the programmers. Part of the program has to copy itself from slow external memory (eg flash-rom), into the internal on-chip ram and start executing from there Works well for limited number of programs where, the program behaviour and space requirement is well defined.

A Case for Cache On-Chip SRAM requires explicit management by the programmer Possible for an embedded system with small number of well defined programs Not possible for a general purpose processor with many programs, where the application mix cannot be determined in advanced Explicit memory management become difficult We need a mechanism where the copying from the slow external RAM to Int. memory is automated by hardware (Cache!)

Cache Design How do we organize cache?
Where does each memory address map to? (Remember that cache is subset of memory, so multiple memory addresses map to the same cache location.) (Books from many shelves are on the same table) How do we know which elements are in cache? How do we quickly locate them?

Direct-Mapped Cache (#1/2)
In a direct-mapped cache, each memory address is associated with one possible block within the cache Therefore, we only need to look in a single location in the cache for the data to see if it exists in the cache Block is the unit of transfer between cache and memory

Direct-Mapped Cache (#2/2)
4 Byte Direct Mapped Cache Cache Index 1 2 3 Memory Memory Address 1 2 3 4 5 6 7 8 9 A B C D E F Block size = 1 byte Cache Location 0 can be occupied by data from: Memory location 0, 4, 8, ... In general: any memory location that is multiple of 4 Let’s look at the simplest cache one can build. A direct mapped cache that only has 4 bytes. In this direct mapped cache with only 4 bytes, location 0 of the cache can be occupied by data form memory location 0, 4, 8, C, ... and so on. While location 1 of the cache can be occupied by data from memory location 1, 5, 9, ... etc. So in general, the cache location where a memory location can map to is uniquely determined by the 2 least significant bits of the address (Cache Index). For example here, any memory location whose two least significant bits of the address are 0s can go to cache location zero. With so many memory locations to chose from, which one should we place in the cache? Of course, the one we have read or write most recently because by the principle of temporal locality, the one we just touch is most likely to be the one we will need again soon. Of all the possible memory locations that can be placed in cache Location 0, how can we tell which one is in the cache?

Issues with Direct-Mapped
Since multiple memory addresses map to same cache index, how do we tell which one is in there? Store the address information along with the data in the cache address tag to check for correct block ttttttttttttttttttttttttttttttii index to select block Address from the processor Compare address tag with indexed value to check for match 4 Byte Direct Mapped Cache Cache Index 1 2 3 tttttttttttttttttttttttttttttt

Direct-Mapped with 1 Byte Blocks Example
Address tag block index tag RAM compare data hit address data RAM decoder

Issues with Direct-Mapped with Larger Blocks
Since multiple memory blocks map to same cache index, how do we tell which one is in there? How do we select the bytes in the block? Result: divide memory address into three fields tttttttttttttttttttttttttttttioo tag index byte to check to offset for select within correct block block block

Direct-Mapped with Larger Blocks Example
Address tag block index Byte offset

Direct-Mapped Cache Terminology
All fields are read as unsigned integers. Index: specifies the cache index (which “row” or “line” of the cache we should look in) Offset: once we’ve found correct block, specifies which byte within the block we want Tag: the remaining bits after offset and index are determined; these are used to distinguish between all the memory addresses that map to the same location

Direct-Mapped Cache Example (#1/3)
Suppose we have a 16KB of data in a direct-mapped cache with 4 word blocks Determine the size of the tag, index and offset fields if we’re using a 32-bit architecture (ie. 32 address lines) Offset need to specify correct byte within a block block contains 4 words bytes bytes need 4 bits to specify correct byte

Index: (~index into an “array of blocks”) need to specify correct row in cache cache contains 16 KB = 214 bytes block contains 24 bytes (4 words) # rows/cache = # blocks/cache (since there’s one block/row) = bytes/cache bytes/row = 214 bytes/cache bytes/row = 210 rows/cache need 10 bits to specify this many rows

Tag: use remaining bits as tag tag length = mem addr length offset index = bits = 18 bits so tag is leftmost 18 bits of memory address Why not full 32 bit address as tag? All bytes within block need same address (-4b) Index must be same for every address within a block, so its redundant in tag check, thus can leave off to save memory (- 10 bits in this example)

Things to Remember We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. So we create a memory hierarchy: each successively higher level contains “most used” data from next lower level exploits temporal locality Locality of reference is a Big Idea

Reading Material Steve Furber: ARM System On-Chip; 2nd Ed, Addison-Wesley, 2000, ISBN: Chapter 10.

Modified from notes by Saeid Nooshabadi
COMP3221: Microprocessors and Embedded Systems Lecture 26: Cache - II Lecturer: Hui Wu Session 2, 2005 Modified from notes by Saeid Nooshabadi

Outline Direct-Mapped Cache Types of Cache Misses
A (long) detailed example Peer - to - peer education example Block Size Tradeoff

Review: Memory Hierarchy
Hard Disk RAM/ROM DRAM EEPROM Processor Control Memory Memory Memory Datapath Memory Memory Registers Slowest Speed: Fastest Biggest Size: Smallest Cache L1 SRAM Cache L2 SRAM Lowest Cost: Highest

Review: Direct-Mapped Cache (#1/2)
In a direct-mapped cache, each memory address is associated with one possible block within the cache Therefore, we only need to look in a single location in the cache for the data if it exists in the cache Block is the unit of transfer between cache and memory

Review: Direct-Mapped Cache 1 Word Block
Memory Memory Address 1 2 3 4 5 6 7 8 9 A B C D E F 8 Byte Direct Mapped Cache Cache Index 1 Block size = 4 bytes Cache Location 0 can be occupied by data from: Memory location 0 - 3, 8 - B, ... In general: any 4 memory locations that is 8*n (n=0,1,2,..) Cache Location 1 can be occupied by data from: Memory location 4 - 7, C - F, ... In general: any 4 memory locations that is 8*n + 4 (n=0,1,2,..)

Direct-Mapped Cache Address tag block index Byte offset

Review: Direct-Mapped Cache Terminology
All fields are read as unsigned integers. Index: specifies the cache index (which “row” or “line” of the cache we should look in) Offset: once we’ve found correct block, specifies which byte within the block we want Tag: the remaining bits after offset and index are determined; these are used to distinguish between all the memory addresses that map to the same location

Accessing Data in a Direct Mapped Cache (#1/3)
Memory Ex.: 16KB of data, direct-mapped, 4 word blocks Read 4 addresses 0x , 0x C, 0x , 0x Only cache/memory level of hierarchy Address (hex) Value of Word C a b c d ... C e f g h C i j k l

4 Addresses: 0x , 0x C, 0x , 0x 4 Addresses divided (for convenience) into Tag, Index, Byte Offset fields ttttttttttttttttt iiiiiiiiii oooo tag to check if have index to byte offset correct block select block within block Tag Index Offset

So lets go through accessing some data in this cache 16KB data, direct-mapped, 4 word blocks Will see 3 types of events: cache miss: nothing in cache in appropriate block, so fetch from memory cache hit: cache block is valid and contains proper address, so read desired word cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory

16 KB Direct Mapped Cache, 16B blocks
Valid bit: determines whether anything is stored in that row (when computer initially turned on, all entries are invalid) ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Example Block Index

Read 0x = 0… Index field Offset Tag field ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index

So we read block 1 ( ) Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index

No valid data 000000000000000000 0000000001 0100 Tag field Index field
Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index

So load that data into cache, setting tag, valid
Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d

Read from cache at offset, return word b
Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d

Read 0x C = 0… Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d

Data valid, tag OK, so read offset return word d
... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d

Read 0x = 0… Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d

So read block 3 Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d

No valid data 000000000000000000 0000000011 0100 Tag field Index field
Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d

Load that cache block, return word f
Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d 1 e f g h

Read 0x = 0… Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d 1 e f g h

So read Cache Block 1, Data is Valid

Cache Block 1 Tag does not match (0 != 2)

Miss, so replace block 1 with new data & tag
Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 2 i j k l 1 e f g h

And return word j 000000000000000010 0000000001 0100 Tag field
Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 2 i j k l 1 e f g h

Do an example yourself. What happens?
Chose from: Cache: Hit, Miss, Miss w. replace Values returned: a ,b, c, d, e, ..., k, l Read address 0x ? Read address 0x c ? Cache Valid 0x0-3 0x4-7 0x8-b 0xc-f Index Tag 1 1 2 i j k l 2 3 1 e f g h 4 5 6 7 ... ...

Answers 0x00000030 a hit 0x0000001c a miss with replacment
Index = 3, Tag matches, Offset = 0, value = e 0x c a miss with replacment Index = 1, Tag mismatch, so replace from memory, Offset = 0xc, value = d The Values read from Cache must equal memory values whether or not cached: 0x = e 0x c = d Memory Address Value of Word c a b c d ... c e f g h c i j k l

Block Size Tradeoff (#1/3)
Benefits of Larger Block Size Spatial Locality: if we access a given word, we’re likely to access other nearby words soon (Another Big Idea) Very applicable with Stored-Program Concept: if we execute a given instruction, it’s likely that we’ll execute the next few as well Works nicely in sequential array accesses too As I said earlier, block size is a tradeoff. In general, larger block size will reduce the miss rate because it take advantage of spatial locality. But remember, miss rate NOT the only cache performance metrics. You also have to worry about miss penalty. As you increase the block size, your miss penalty will go up because as the block gets larger, it will take you longer to fill up the block. Even if you look at miss rate by itself, which you should NOT, bigger block size does not always win. As you increase the block size, assuming keeping cache size constant, your miss rate will drop off rapidly at the beginning due to spatial locality. However, once you pass certain point, your miss rate actually goes up. As a result of these two curves, the Average Access Time (point to equation), which is really the more important performance metric than the miss rate, will go down initially because the miss rate is dropping much faster than the increase in miss penalty. But eventually, as you keep on increasing the block size, the average access time can go up rapidly because not only is the miss penalty is increasing, the miss rate is increasing as well. Let me show you why your miss rate may go up as you increase the block size by another extreme example.

Drawbacks of Larger Block Size Larger block size means larger miss penalty on a miss, takes longer time to load a new block from next level If block size is too big relative to cache size, then there are too few blocks Result: miss rate goes up In general, minimize Average Access Time = Hit Time x Hit Rate Miss Penalty x Miss Rate

Hit Time = time to find and retrieve data from current level cache Miss Penalty = average time to retrieve data on a current level miss (includes the possibility of misses on successive levels of memory hierarchy) Hit Rate = % of requests that are found in current level cache Miss Rate = 1 - Hit Rate

Extreme Example: One Big Block
Cache Data Valid Bit B 0 B 1 B 3 Tag B 2 Cache Size = 4 bytes Block Size = 4 bytes Only ONE entry in the cache! If item accessed, likely accessed again soon But unlikely will be accessed again immediately! The next access will likely to be a miss again Continually loading data into the cache but discard data (force out) before use it again Nightmare for cache designer: Ping Pong Effect

Block Size Tradeoff Conclusions
Miss Rate Block Size Miss Penalty Block Size Exploits Spatial Locality Fewer blocks: compromises temporal locality Average Access Time Block Size Increased Miss Penalty & Miss Rate

Things to Remember Cache Access involves 3 types of events:
cache miss: nothing in cache in appropriate block, so fetch from memory cache hit: cache block is valid and contains proper address, so read desired word cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory

COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified from notes by Saeid Nooshabadi

Outline Fully Associative Cache N-Way Associative Cache
Block Replacement Policy Multilevel Caches (if time) Cache write policy (if time)

Review We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. So we create a memory hierarchy: each successively higher level contains “most used” data from next lower level exploits temporal locality Locality of reference is a Big Idea

Big Idea Review Mechanism for transparent movement of data among levels of a storage hierarchy set of address/value bindings address => index to set of candidates compare desired address with tag service hit or miss load new block and binding on miss Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 ... a b c d address: tag index offset

Types of Cache Misses (#1/2)
Compulsory Misses occur when a program is first started cache does not contain any of that program’s data yet, so misses are bound to occur can’t be avoided easily, so won’t focus on these in this course

Types of Cache Misses (#2/2)
1 2 3 1 2 3 4 5 6 7 8 9 A B C D E F Types of Cache Misses (#2/2) Conflict Misses miss that occurs because two distinct memory addresses map to the same cache location two blocks (which happen to map to the same location) can keep overwriting each other big problem in direct-mapped caches how do we lessen the effect of these?

Dealing with Conflict Misses
Solution 1: Make the cache size bigger fails at some point Solution 2: Multiple distinct blocks can fit in the same Cache Index? 1 2 3 4 5 6 7 8 9 A B 1 Let’s go back to our 4-byte direct mapped cache and increase its block size to 4 byte. Now we end up have one cache entries instead of 4 entries. What do you think this will do to the miss rate? Well the miss rate probably will go to hell. It is true that if an item is accessed, it is likely that it will be accessed again soon. But probably NOT as soon as the very next access so the next access will cause a miss again. So what we will end up is loading data into the cache but the data will be forced out by another cache miss before we have a chance to use it again. This is called the ping pong effect: the data is acting like a ping pong ball bouncing in and out of the cache. It is one of the nightmares scenarios cache designer hope never happens. We also defined a term for this type of cache miss, cache miss caused by different memory location mapped to the same cache index. It is called Conflict miss. There are two solutions we can use to reduce the conflict miss. The first one is to increase the cache size. The second one is to increase the number of cache entries per cache index. Let me show you what I mean.

Fully Associative Cache (#1/4)
Memory address fields: Tag: same as before Offset: same as before Index: non-existent What does this mean? no “rows”: any block can go anywhere in the cache must compare with all tags in entire cache to see if data is there

Fully Associative Cache (e.g., 32 B block) compare tags in parallel Byte Offset : Cache Data B 0 4 31 Cache Tag (27 bits long) Valid B 1 B 31 Cache Tag = :

CAM (Content Addressable memory): A RAM Cell with in-built comparator

Benefit of Fully Assoc Cache no Conflict Misses (since data can go anywhere) Drawbacks of Fully Assoc Cache need hardware comparator for every single entry: if we have a 64KB of data in cache with 4B entries, we need 16K comparators: infeasible

Third Type of Cache Miss
Capacity Misses miss that occurs because the cache has a limited size miss that would not occur if we increase the size of the cache sketchy definition, so just get the general idea This is the primary type of miss for Fully Associate caches.

N-Way Set Associative Cache (#1/5)
Memory address fields: Tag: same as before Offset: same as before Index: points to the correct “row” (called a set in this case) So what’s the difference? each set contains multiple blocks once we’ve found correct set, must compare with all tags in that set to find our data

2-Way Set Associative Cache Organisation index data RAM tag RAM compare mux address data hit decoder 1 2 3 4 5 6 7 8 9 A B C D E F

Summary: cache is direct-mapped with respect to sets each set is fully associative basically N direct-mapped caches working in parallel: each has its own valid bit and data

Given memory address: Find correct set using Index value. Compare Tag with all Tag values in the determined set. If a match occurs, it’s a hit, otherwise a miss. Finally, use the offset field as usual to find the desired data within the desired block.

What’s so great about this? even a 2-way set assoc cache avoids a lot of conflict misses hardware cost isn’t that bad: only need N comparators In fact, for a cache with M blocks, it’s Direct-Mapped if it’s 1-way set assoc it’s Fully Assoc if it’s M-way set assoc so these two are just special cases of the more general set associative design

Cache Organisation Comparison

Degree of Associativity on 4KB Cache

ARM3 Cache Organisation

Block Replacement Policy (#1/2)
Direct-Mapped Cache: index completely specifies which position a block can go in on a miss N-Way Set Assoc (N > 1): index specifies a set, but block can occupy any position within the set on a miss Fully Associative: block can be written into any position Question: if we have the choice, where should we write an incoming block?

Block Replacement Policy (#2/2)
Solution: If there are any locations with valid bit off (empty), then usually write the new block into the first one. If all possible locations already have a valid block, we must pick a replacement policy: rule by which we determine which block gets “cached out” on a miss.

Block Replacement Policy: LRU
LRU (Least Recently Used) Idea: cache out block which has been accessed (read or write) least recently Pro: temporal locality => recent past use implies likely future use: in fact, this is a very effective policy Con: with 2-way set assoc, easy to keep track (one LRU bit); with 4-way or greater, requires complicated hardware and much time to keep track of this

Block Replacement Example
We have a 2-way set associative cache with a four word total capacity and one word blocks. We perform the following word accesses (ignore bytes for this problem): 0, 2, 0, 1, 4, 0, 2, 3, 5, 4 How many hits and how many misses will there for the LRU block replacement policy?

Block Replacement Example: LRU
set 0 set 1 Addresses 0, 2, 0, 1, 4, 0, ... 0: miss, bring into set 0 (loc 0) lru set 0 set 1 2 2: miss, bring into set 0 (loc 1) lru 2 set 0 set 1 0: hit 2 lru set 0 set 1 1: miss, bring into set 1 (loc 0) lru 1 lru set 0 set 1 1 lru 4 4: miss, bring into set 0 (loc 1, replace 2) lru set 0 set 1 4 1 lru 0: hit

Ways to Reduce Miss Rate
Larger cache limited by cost and technology hit time of first level cache < cycle time More places in the cache to put each block of memory - associativity fully-associative any block any line k-way set associated k places for each block direct map: k=1

Big Idea How chose between options of associativity, block size, replacement policy? Design against a performance model Minimize: Average Access Time = Hit Time x Hit Rate + Miss Penalty x Miss Rate influenced by technology and program behavior Create the illusion of a memory that is large, cheap, and fast - on average

Example Assume Avg mem access time = 1 + 0.05 x 20 = 2 cycle
Hit Time = 1 cycle Miss rate = 5% Miss penalty = 20 cycles Avg mem access time = x = 2 cycle

Improving Miss Penalty
When caches first became popular, Miss Penalty ~ 10 processor clock cycles Today 1000 MHz Processor (1 ns per clock cycle) and 100 ns to go to DRAM  100 processor clock cycles! MEM L1 L2 DRAM Proc Solution: another cache between memory and the processor cache: Second Level (L2) Cache

Analyzing Multi-level Cache Hierarchy
DRAM Proc L1 L2 L2 hit time L2 Miss Rate L2 Miss Penalty L1 hit time L1 Miss Rate L1 Miss Penalty Avg Mem Access Time = L1 Hit Time + L1 Miss Rate * L1 Miss Penalty L1 Miss Penalty = L2 Hit Time + L2 Miss Rate * L2 Miss Penalty Avg Mem Access Time = L1 Hit Time + L1 Miss Rate * (L2 Hit Time + L2 Miss Rate * L2 Miss Penalty)

Typical Scale L1: size: tens of KB hit time: complete in one clock cycle miss rates: 1-5% L2: size: hundreds of KB hit time: few clock cycles miss rates: 10-20% L2 miss rate is fraction of L1 misses that also miss in L2 why so high?

Example Assume L1 miss penalty = 5 + 0.15 * 100 = 20
L1 Hit Time = 1 cycle L1 Miss rate = 5% L2 Hit Time = 5 cycles L2 Miss rate = 15% (% L1 misses that miss) L2 Miss Penalty = 100 cycles L1 miss penalty = * 100 = 20 Avg mem access time = x = 2 cycle

Example: Without L2 Cache
Assume L1 Hit Time = 1 cycle L1 Miss rate = 5% L1 Miss Penalty = 100 cycles Avg mem access time = x = 6 cycles 3x faster with L2 cache

What to Do on a Write Hit? Write-through Write-back
update the word in cache block and corresponding word in memory Write-back update word in cache block allow memory word to be “stale” => add ‘dirty’ bit to each line indicating that memory needs to be updated when block is replaced => OS flushes cache before I/O !!! So that cache values become the same as memory values changed by I/O Performance trade-offs?

Things to Remember (#1/2)
Caches are NOT mandatory: Processor performs arithmetic Memory stores data Caches simply make data transfers go faster Each level of memory hierarchy is just a subset of next higher level Caches speed up due to temporal locality: store data used recently Block size > 1 word speeds up due to spatial locality: store words adjacent to the ones used recently

Cache design choices: size of cache: speed v. capacity direct-mapped v. associative for N-way set assoc: choice of N block replacement policy 2nd level cache? Write through v. write back? Use performance model to pick between choices, depending on programs, technology, budget, ...

Lecture 28: Virtual Memory-I Lecturer: Hui Wu Session 2, 2005 Modified from notes by Saeid Nooshabadi

Overview Virtual Memory Page Table

Cache Review (#1/2) Caches are NOT mandatory:
Processor performs arithmetic Memory stores instructions & data Caches simply make things go faster Each level of memory hierarchy is just a subset of next lower level Caches speed up due to Temporal Locality: store data used recently Block size > 1 word speeds up due to Spatial Locality: store words adjacent to the ones used recently

Cache Review (#2/2) Cache design choices:
size of cache: speed vs. capacity direct-mapped vs. associative for N-way set assoc: choice of N block replacement policy 2nd level cache? Write through vs. write back? Use performance model to pick between choices, depending on programs, technology, budget, ...

Another View of the Memory Hierarchy
Regs Upper Level Instr. Operands Faster Thus far { Cache Blocks L2 Cache Blocks Memory { Next: Virtual Memory Pages Disk Files Larger Tape Lower Level

Virtual Memory If Principle of Locality allows caches to offer (usually) speed of cache memory with size of DRAM memory, then why not, recursively, use at next level to give speed of DRAM memory, size of Disk memory? Called “Virtual Memory” Also allows OS to share memory, protect programs from each other Today, more important for protection vs. just another level of memory hierarchy Historically, it predates caches

Problems Leading to Virtual Memory (#1/2)
Programs address space is larger than the physical memory. Need to swap code and data back and forth between memory and Hard disk using Virtual Memory) Stack >>64 MB Heap Physical Memory 64 MB Static Code

Problems Leading to Virtual Memory (#2/2)
Code Static Heap Stack Code Static Heap Stack Code Static Heap Stack Many Processes (programs) active at the same time. (Single Processor - many Processes) Processor appears to run multiple programs all at once by rapidly switching between active programs. The rapid switching is managed by Memory Management Unit (MMU) by using Virtual Memory concept. Each program sees the entire address space as its own. How to avoid multiple programs overwriting each other.

Segmentation Solution
Segmentation provides simple MMU Program views its memory as set of segments. Code segment, Data Segment, Stack segment, etc. Each program has its own set of private segments. Each access to memory is via a segment selector and offset within the segment. It allows a program to have its own private view of memory and to coexist transparently with other programs in the same memory space.

Segmentation Memory Management Unit
Virtual Address to memory segment selector logical address Look up table held by OS in mem base bound Segment Descriptor Table (SDT) + >? physical address access fault Base: The base address of the segment Logical address: an offset within a segment Bound: Segment limit SDT: Holds Access and other information about the segment

Virtual to Physical Addr. Translation
Program operates in its virtual address space Physical memory (incl. caches) HW mapping virtual address (inst. fetch load, store) physical address (inst. fetch load, store) Each program operates in its own virtual address space; Each is protected from the other OS can decide where each goes in memory Hardware (HW) provides virtual -> physical mapping

Simple Example: Base and Bound Reg
Enough space for User D, but discontinuous (“fragmentation problem”) User C base+ bound User B base Want discontinuous mapping Process size >> mem Addition not enough! User A OS

Mapping Virtual Memory to Physical Memory
Divide into equal sized chunks (about 4KB) Stack Any chunk of Virtual Memory assigned to any chuck of Physical Memory (“page”) Physical Memory Heap 64 MB Static Code

Paging Organization (assume 1 KB pages)
Page is unit of mapping page 0 1K 1024 31744 Virtual Memory Virtual Address page 1 page 31 2048 page 2 ... page 0 1024 7168 Physical Address Memory 1K page 1 page 7 ... Addr Trans MAP Page also unit of transfer from disk to physical memory Addr Trans MAP is organised by OS

Virtual Memory Mapping Function
Cannot have simple function to predict arbitrary mapping Use table lookup of mappings Page Number Offset Use table lookup (“Page Table”) for mappings: Page number is index Virtual Memory Mapping Function Physical Offset = Virtual Offset Physical Page Number = PageTable[Virtual Page Number] (P.P.N. also called “Page Frame”)

Address Mapping: Page Table
Virtual Address: page no. offset (actually, concatenation) Reg #2 in CP #15 in ARM index into page table + Physical Memory Address Page Table Val -id Access Rights Physical Page Number . V A.R. P. P. A. ... Page Table Base Reg Page Table located in physical memory

Page Table A page table is an operating system structure which contains the mapping of virtual addresses to physical locations There are several different ways, all up to the operating system, to keep this data around Each process running in the operating system has its own page table “State” of process is PC, all registers, plus page table OS changes page tables by changing contents of Page Table Base Register

Paging/Virtual Memory for Multiple Pocesses
User A: Virtual Memory User B: Virtual Memory Physical Memory Stack Stack 64 MB A Page Table B Page Table Heap Heap Static Static Code Code

Page Table Entry (PTE) Format
Contains either Physical Page Number or indication not in Main Memory OS maps to disk if Not Valid (V = 0) ... Page Table Val -id Access Rights Physical Page Number V A.R. P. P. N. P. P.N. P.T.E. If valid, also check if have permission to use page: Access Rights (A.R.) may be Read Only, Read/Write, Executable

Things to Remember Apply Principle of Locality Recursively
Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings vs. tag/data in cache Virtual Memory allows protected sharing of memory between processes with less swapping to disk, less fragmentation than always swap or base/bound Virtual Memory allows protected sharing of memory between processes with less swapping to disk, less fragmentation than always swap or base/bound in Segmentation

COMP 3221: Microprocessors and Embedded Systems Lectures 27: Virtual Memory - II Lecturer: Hui Wu Session 2, 2005 Modified from notes by Saeid Nooshabadi

Overview Page Table Translation Lookaside Buffer (TLB)

Review: Memory Hierarchy
Regs Upper Level Instr. Operands Faster Cache { Cache Blocks L2 Cache Blocks Memory { Virtual Memory Pages Disk Files Larger Tape Lower Level

Review: Address Mapping: Page Table
Virtual Address: page no. offset (actually, concatenation) Reg #2 in CP #15 in ARM index into page table + Physical Memory Address Page Table Val -id Access Rights Physical Page Number . V A.R. P. P. A. ... Page Table Base Reg Page Table located in physical memory

Paging/Virtual Memory for Multiple Processes
User A: Virtual Memory User B: Virtual Memory Physical Memory Stack Stack 64 MB A Page Table B Page Table Heap Heap Static Static Code Code

Analogy Book title like virtual address (ARM System On-Chip)
Library of Congress call number like (QA76.5.F ) physical address Card (or online-page) catalogue like page table, indicating mapping from book title to call number On card (or online-page) info for book, indicating in local library vs. in another branch like valid bit indicating in main memory vs. on disk On card (or online-page), available for 2-hour in library use (vs. 2-week checkout) like access rights

Address Map, Mathematically Speaking
V = {0, 1, , n - 1} virtual page address space (n > m) M = {0, 1, , m - 1} physical page address space MAP: V --> M U {q} page address mapping function MAP(a) = a' if data at virtual address a is present in physical address a' and a' = q if data at virtual address a is not present in M a page fault Name Space V OS fault handler Processor Addr Trans Mechanism Main Memory Disk a a' physical address OS performs this transfer

Comparing the 2 Levels of Hierarchy
Cache Version Virtual Memory vers. Block or Line Page Miss Page Fault Block Size: 32-64B Page Size: 4K-8KB Placement: Fully Associative Direct Mapped, N-way Set Associative Replacement: Least Recently Used LRU or Random (LRU) Write Thru or Back Write Back

Notes on Page Table Solves Fragmentation problem: all chunks same size, so all holes can be used OS must reserve “Swap Space” on disk for each process To grow a process, ask Operating System If unused pages, OS uses them first If not, OS swaps some old pages to disk (Least Recently Used to pick pages to swap) Each process has its own Page Table Will add details, but Page Table is essence of Virtual Memory

Virtual Memory Problem #1
Not enough physical memory! Only, say, 64 MB of physical memory N processes, each 4GB of virtual memory! Could have 64 virtual pages/physical page! Spatial Locality to the rescue Each page is 4 KB, lots of nearby references No matter how big program is, at any time only accessing a few pages “Working Set”: recently used pages

Virtual Address and a Cache (#1/2)
VA miss PA Cache Trans- lation Main Memory Processor hit data Cache operates on Virtual addresses. ARM Strategy The advantage: If in cache the translation is not required. Disadvantage: Several copies of the the same physical memory location may be present in several cache blocks. (Synonyms problem). Gives rise to some complications!

Virtual Address and a Cache (#2/2)
PA miss VA Trans- lation Cache Main Memory Processor hit data Cache typically operates on physical addresses on most other systems. Address Translation (Page Table access) is another memory access for each program memory access! Accessing memory for Page Table to get Physical address (Slow Operation) Need to fix this!

Map every address  1 extra memory accesses for every memory access Observation: since locality in pages of data, must be locality in virtual addresses of those pages Why not use a cache of virtual to physical address translations to make translation fast? (small is fast) For historical reasons, this cache is called a Translation Lookaside Buffer, or TLB

Typical TLB Format Virtual Physical Dirty Ref Valid Access
Address Address Rights TLB just a cache on the page table mappings TLB access time comparable to cache (much less than main memory access time) Ref: Used to help calculate LRU on replacement Dirty: since use write back, need to know whether or not to write page to disk when replaced

Apply Principle of Locality Recursively Reduce Miss Penalty? add a (L2) cache Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings vs. tag/data in cache Virtual memory to Physical Memory Translation too slow? Add a cache of Virtual to Physical Address Translations, called a TLB

Virtual Memory allows protected sharing of memory between processes with less swapping to disk, less fragmentation than always swap or base/bound Spatial Locality means Working Set of Pages is all that must be in memory for process to run fairly well

Things to Remember Spatial Locality means Working Set of Pages is all that must be in memory for process to run fairly well Virtual memory to Physical Memory Translation too slow? Add a cache of Virtual to Physical Address Translations, called a TLB TLB to reduce performance cost of VM

COMP 3221: Microprocessors and Embedded Systems Lectures 27: Virtual Memory - III Lecturer: Hui Wu Session 2, 2005 Modified from notes by Saeid Nooshabadi

Overview Translation Lookaside Buffer (TLB) Mechanism
Two level page Table

Three Advantages of Virtual Memory (#1/2)
1) Translation: Program can be given consistent view of memory, even though physical memory is scrambled Makes multiple processes reasonable Only the most important part of program (“Working Set”) must be in physical memory Contiguous structures (like stacks) use only as much physical memory as necessary yet still grow later

Three Advantages of Virtual Memory (#2/2)
2) Protection: Different processes protected from each other Different pages can be given special behavior (Read Only, Invisible to user programs, etc). Privileged data protected from User programs Very important for protection from malicious programs  Far more “viruses” under Microsoft Windows 3) Sharing: Can map same physical page to multiple users (“Shared memory”)

Why Translation Lookaside Buffer (TLB)?
Every paged virtual memory access must be checked against Entry of Page Table in memory to provide VA  PA translation and protection Cache of Page Table Entries makes address translation possible without memory access in common case to make it fast

Recall: Typical TLB Format
Virtual Physical Dirty Ref Valid Access Address Address Rights TLB just a cache on the page table mappings TLB access time comparable to cache (much less than main memory access time) Ref: Used to help calculate LRU on replacement Dirty: since use write back, need to know whether or not to write page to disk when replaced

What if Not in TLB? Option 1: Hardware checks page table and loads new Page Table Entry into TLB Option 2: Hardware traps to OS, up to OS to decide what to do ARM follows Option 1: Hardware does the loading of new Page Table Entry

TLB Miss If the address is not in the TLB, ARM’s Translation Table Walk Hardware is invoked to retrieve the relevant entry from translation table held in main memory. valid virtual physical 1 2 9 There are two possibilities

TLB Miss (If the Data is in Memory)
Translation Table Walk Hardware simply adds the entry to the TLB, evicting an old entry from the TLB if no empty slot 2 9 1 valid virtual physical 7 32 1 Fetch Translation once on TLB, Send PA to memory

TLB Miss (if the Data is on Disk)
A Page Fault (Abort exception) is issued to the processor The OS loads the page off the disk into a free block of memory, using a DMA transfer Meantime OS switches to some other process waiting to be run When the DMA is complete, Processor gets an interrupt and OS update the process's page table and TLB So when OS switches back to the task, the desired data will be in memory

What if We Don't Have Enough Memory?
OS chooses some other page belonging to a program and transfer it onto the disk if it is dirty If clean (other copy is up-to-date), just overwrite that data in memory OS chooses the page to evict based on replacement policy (e.g., LRU) And update that program's page table to reflect the fact that its memory moved somewhere else on disk

Translation Look-Aside Buffers
TLBs usually small, typically entries Like any other cache, the TLB can be fully associative, set associative, or direct mapped VA hit PA miss TLB Lookup Cache Main Memory Processor miss hit Trans- lation data

Virtual Memory Review Summary
Let’s say we’re fetching some data: Check TLB (input: VPN, output: PPN) hit: fetch translation miss: check pagetable (in memory) pagetable hit: fetch translation, return translation to TLB pagetable miss: page fault, fetch page from disk to memory, return translation to TLB Check cache (input: PPN, output: data) hit: return value miss: fetch value from memory

Page Table too big! 4GB Virtual Memory ÷ 4 KB page  ~ 1 million Page Table Entries  4 MB just for Page Table for 1 process, 25 processes  100 MB for Page Tables! Variety of solutions to trade off memory size of mapping function for slower when miss TLB Make TLB large enough, highly associative so rarely miss on address translation COMP3231: Operating Systems, will go over more options and in greater depth

2-level Page Table ¥ Virtual Memory 2nd Level Super Page Tables Page
Code Static Heap Stack ... 2nd Level Page Tables Super Page Table ARM MMU uses 2-level page Table Physical Memory 64 MB

Page Table Shrink: 20 bits 12 bits
Page Number Offset 20 bits 12 bits 220 = 4MB 1st level page Table per process! Single Page Table 210 X 210 = 220 = 4MB 2nd level page Table per process! Page Number Super Page No. Offset 10 bits 12 bits Multilevel Page Table But: Only have second level page table for valid entries of super level page table

Space Savings for Multi-Level Page Table
If only 10% of entries of Super Page Table have valid entries, then total mapping size is roughly 1/10-th of single level page table

Address Translation & 3 Exercises
VPN-tag Offset Virtual Address INDEX Hit = TLB Physical Page Number PPN ... TLB-tag TLB- tag PPN Offset Physical Address VPN = VPN-tag + Index

Address Translation Exercise 1 (#1/2)
40-bit VA, 16 KB pages, 36-bit PA Number of bits in Virtual Page Number? a) 18; b) 20; c) 22; d) 24; e) 26; f) 28 Number of bits in Page Offset? a) 8; b) 10; c) 12; d) 14; e) 16; f) 18 Number of bits in Physical Page Number? e) 26 d) 14 16KB  14bits 40 –14 = 26 36 – 14 = 22 c) 22

40- bit virtual address, 16 KB (214 B) 36- bit physical address, 16 KB (214 B) Virtual Page Number (26 bits) Page Offset (14 bits) Physical Page Number (22 bits) Page Offset (14 bits)

40-bit VA, 16 KB pages, 36-bit PA 2-way set-assoc TLB: 256 "slots", 2 per slot Number of bits in TLB Index? a) 8; b) 10; c) 12; d) 14; e) 16; f) 18 Number of bits in TLB Tag? a) 18; b) 20; c) 22; d) 24; e) 26; f) 28 Approximate Number of bits in TLB Entry? a) 32; b) 36; c) 40; d) 42; e) 44; f) 46 a) 8 a) 18 f) 46

Address Translation 2 (#2/2)
2-way set-assoc data cache, 256 (28) "slots", 2 TLB entries per slot => 8 bit index Data Cache Entry: Valid bit, Dirty bit, Access Control (2-3 bits?), Virtual Page Number, Physical Page Number TLB Tag (18 bits) TLB Index (8 bits) Page Offset (14 bits) Virtual Page Number (26 bits) V D Access (3 bits) TLB Tag (18 bits) Physical Page No. (22 bits)

40-bit VA, 16 KB pages, 36-bit PA 2-way set-assoc TLB: 256 "slots", 2 per slot 64 KB data cache, 64 Byte blocks, 2 way S.A. Number of bits in Cache Offset? a) 6; b) 8; c) 10; d) 12; e) 14; f) 16 Number of bits in Cache Index? a) 6; b) 9; c) 10; d) 12; e) 14; f) 16 Number of bits in Cache Tag? a) 18; b) 20; c) 21; d) 24; e) 26; f) 28 Approximate No. of bits in Cache Entry? a) 6 b) 9 16KB  14bits 40 –14 = – 8 = 22 36 – 14 = 22 c) 21

Address Translation 3 (#2/2)
2-way set-assoc data cache, 64K/64 =1K (210) blocks, 2 entries per slot => 512 slots => 9 bit index Data Cache Entry: Valid bit, Dirty bit, Cache tag + 64 Bytes of Data Cache Tag (21 bits) Cache Index (9 bits) Block Offset (6 bits) Physical Page Address (36 bits) V D Cache Tag (21 bits) Cache Data (64 Bytes)

Things to Remember Spatial Locality means Working Set of Pages is all that must be in memory for process to run fairly well TLB to reduce performance cost of VM Need more compact representation to reduce memory size cost of simple 1-level page table (especially 32  64-bit address)

Lecture 31: Embedded Systems Lecturer: Hui Wu Session 2, 2005 COMP3221/9221: Microprocessors and Embedded Systems

Overview What is an embedded system? Characteristics of embedded systems Embedded system requirements COMP3221/9221: Microprocessors and Embedded Systems

An embedded system is a combination of hardware and software to perform a specific function; is part of a larger system; works in a reactive and time-constrained environment. COMP3221/9221: Microprocessors and Embedded Systems

Characteristics of Embedded Systems
Application specific An embedded system performs a single or fixed set of functions; All functions are known a priori before the system design begins. The fixed functionality provides opportunities for design optimization. Application specific processor design can be a significant component of some embedded systems Advantages Customization yields lower area, power, cost and higher performance. Disadvantages Higher hardware/software development overhead, resulting in longer time-to-market. Strict design constraints performance, timing, power, area, cost, reliability etc.

Characteristics of Embedded Systems (Cont.)
Multiple heterogeneous processing units General processor, ASIC (Application Specific Integrated Circuit) , ASIP (Application Specific Instruction set Processor), DSP (Digital Signal Processing processor) etc. Reactive Embedded systems constantly interact with their environment, taking in data from sensors and/or other input devices and making appropriate responses. Real-time Embedded systems interact with their environments in a timely manner. Parallel and distributed computing Many embedded systems use parallel/distributed architecture where multiple processing units are tightly or loosely coupled.

Examples of Embedded Systems Consumer electronics, e.g., cellular phones, personal digital assistants, interactive game boxes, cameras, camcorders, .... Consumer products, e.g., washers, microwave ovens, ... Automobiles (anti-lock braking, engine control, ...) Industrial process controllers & avionics/defence applications Computer/Communication products, e.g., printers, FAX machines, ... COMP3221/9221: Microprocessors and Embedded Systems

Traditional Embedded Systems Design: Major Procedures Modelling Specifying the behaviours of the target embedded system. Hardware-software partitioning Partitioning the specifications into either hardware components or software components. Hardware components are implemented in co-processors. Software components run on custom hardware or a general microprocessor. Hardware design and software design Hardware design includes co-processor design, interfaces etc. Software design includes interrupt handlers, task scheduler etc. COMP3221/9221: Microprocessors and Embedded Systems

Traditional Embedded Systems Design: Major Procedures (Cont.) Modelling Hardware-software partitioning Hardware design Software design COMP3221/9221: Microprocessors and Embedded Systems

Problems with Traditional Embedded Systems Design The precise information (execution time etc) about each task is not available at the partitioning stage. Designers have to use estimated values in partitioning, leading to bad partitioning and therefore bad design. How to solve this problem? Use hardware-software co-design. COMP3221/9221: Microprocessors and Embedded Systems

What Is Hardware-Software Co-design? The hardware/software designs proceed in parallel, with feedbacks and interactions occurring between the two as the design progresses. An multi-objective function of cost, area, power etc is used to find an optimal design. COMP3221/9221: Microprocessors and Embedded Systems

Goals of Embedded System Design Reduce time-to-market. Produce an optimal design which minimize the multi-objective function of cost, area, power etc. New design methodology and CAD tools for automating embedded system design are needed. CAD today addresses synthesis problems at a purely hardware level: efficient techniques for data-path and control synthesis down to silicon. COMP3221/9221: Microprocessors and Embedded Systems

Disciplines Involved in Embedded System Design Application domain (Signal processing, process control, machine control, robot, ...). Software engineering How to build a correct and reliable embedded system? Software reuse? Programming Languages and Compilers How to reduce the execution time of each task? How to reduce the power consumption of processors and memory? COMP3221/9221: Microprocessors and Embedded Systems

Disciplines Involved in Embedded System Design (Cont.) Operating Systems How to schedule tasks such that all timing constraints are satisfied? How to schedule tasks such that the processor power consumption is minimized? VLSI (computer aided) design How to minimize the area and maximize the performance for a co-processor? How to minimize the power consumption of a co-processor? COMP3221/9221: Microprocessors and Embedded Systems

Disciplines Involved in Embedded System Design (Cont.) Parallel/Distributed systems Many embedded systems use parallel/distributed architecture where the multiple processors are tightly coupled or loosely coupled. Many issues exist. Task scheduling; Resources sharing etc. Real-time systems (Hard & soft real time systems) How to specify and satisfy timing requirements? How to share resource such that timing constraints are still satisfied? COMP3221/9221: Microprocessors and Embedded Systems

Embedded System Requirements Functional requirements Timing requirements Dependability requirements COMP3221/9221: Microprocessors and Embedded Systems

Functional Requirements
Data collection Sensors AD converters Signal conditioning etc Direct digital control Actuators Man-machine interface Informs the operator of the current state of the controlled object Assists the operator in controlling the system.

Timing Requirements Tasks Release times and deadlines Minimal task distance Maximal task distance Task Periods Minimal error detection latency Minimal latency jitter etc. COMP3221/9221: Microprocessors and Embedded Systems

Timing Requirements (Cont.)
Timing constraints are often imposed on tasks. Typical timing constraints include: Release time: A task cannot be executed before its release time. Deadline: A task is required to finish by its deadline. Minimal distance: The distance between two tasks is required to be greater than a specified value. The distance is defined to be the difference of the start time of the other task completed later and the completion time of the task completed earlier. Maximal distance: The distance between two tasks is required to be less than a specified value.

Timing Requirements (Cont.) Period: A periodic task must be executed periodically. For example, if the period of a task is 5, it must be executed and completed every 5 time units. … T1 T1 T1 Figure 1: A periodic task T1 has a period of 5 and a worst-case execution time of 2. COMP3221/9221: Microprocessors and Embedded Systems

Timing Requirements (Cont.) Hard Timing constraints: Miss of any hard timing constraints may cause catastrophes e.g., control systems for aircraft/space probes/nuclear reactors. Soft timing constraints: The violation of soft timing constraints only causes performance degradation. e.g., game box. Embedded systems may contain both hard and soft timing constraints. Task scheduler is responsible for satisfying all timing constraints. COMP3221/9221: Microprocessors and Embedded Systems

Timing Requirements (Cont.)
Consider an embedded system with a single processor and a set of 3 tasks T1, T2 and T3 with the following attributes: T1 is a periodic task with a period of 4 and a worst-case execution time of 2; T2 is a periodic task with a period of 5 and a worst-case execution time of 2; T3 is a non-periodic task with a release time of 0, a deadline of 20 and a worst-case execution of 2. T1 T2 T1 T2 T1 T2 T1 T3 T1 T2

Dependability Requirements
Reliability Number of failures per hour or Mean-Time-To-Failure (MTTF) in hours. Safety critical failure modes certification Maintainability Mean-Time-To-Repair (MTTR). Availability A = MTTF / (MTTF + MTTR) Security

Major Components in Embedded Systems
Microprocessors/microcontrollers, co-processors, DSP cores, ASICs, ASIPs, FPGAs (Field Programmable Gate Arrays), memory (RAM, ROM, FLASH, EEPROM) and buses. Data acquisition and processing Communication System logic and control Interfaces Auxiliary units display storage monitoring and protection test and diagnosis.

Example Embedded System (I): DVD
— From LSI Logic web Page Courtesy: R. Gupta, UC Irvine

Example Embedded System (II): Dryer
— From Siemens web page Courtesy: R. Gupta, UC Irvine

Reading Material Chapter 1 in Embedded Systems Design: An Introduction to Processes, Tools, and Techniques by Arnold S. Berger. S. Edwards, L. Lavagno, E. Lee, A. Sangiovanni-Vincentelli Design of Embedded Systems: Formal Methods, Validation and Synthsis. Proceedings of the IEEE, vol. 85 (no.3) , March 1997, p COMP3221/9221: Microprocessors and Embedded Systems

Microprocessors and Embedded Systems

Similar presentations

Presentation on theme: "Microprocessors and Embedded Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Microprocessors and Embedded Systems

Similar presentations

Presentation on theme: "Microprocessors and Embedded Systems"— Presentation transcript:

Similar presentations

About project

Feedback