Introduction to Microcontrollers

Introduction to Microcontrollers
Lecture 1 Introduction to Microcontrollers Dr. Konstantinos Tatas 1

Components of a microprocessor/controller
CPU: Central Processing Unit I/O: Input /Output Bus: Address bus & Data bus Memory: RAM & ROM Timer Interrupt Serial Port Parallel Port ACOE343 - Real-Time Embedded Processor Systems - Frederick University 2

General-purpose microprocessor:
CPU for Computers Commonly no RAM, ROM, I/O on CPU chip itself Many chips on motherboard Data Bus CPU General- Purpose Micro- processor Serial COM Port I/O Port Intel’s x86: 8086,8088,80386,80486, Pentium Motorola’s 680x0: 68000, 68010, 68020,68030,6040 RAM ROM Timer Address Bus ACOE343 - Real-Time Embedded Processor Systems - Frederick University 3 3

ACOE343 - Real-Time Embedded Processor Systems - Frederick University
Microcontroller : A single-chip computer On-chip RAM, ROM, I/O ports... Example：Motorola’s 6811, Intel’s 8051, Zilog’s Z8 and PIC 16X CPU RAM ROM A single chip Serial COM Port I/O Port Timer Microcontroller ACOE343 - Real-Time Embedded Processor Systems - Frederick University 4

Microprocessor vs. Microcontroller Microcontroller CPU, RAM, ROM, I/O and timer are all on a single chip fixed amount of on-chip ROM, RAM, I/O ports for applications in which cost, power and space are critical single-purpose (control-oriented) Low processing power Low power consumption Bit-level operations Instruction sets focus on control and bit-level operations Typically 8/16 bit Typically single-cycle/two-stage pipeline Microprocessor CPU is stand-alone, RAM, ROM, I/O, timer are separate designer can decide on the amount of ROM, RAM and I/O ports. expensive versatility general-purpose High processing power High power consumption Instruction sets focus on processing-intensive operations Typically 32/64 – bit Typically deep pipeline (5-20 stages) versatility 多用途的: any number of applications for PC ACOE343 - Real-Time Embedded Processor Systems - Frederick University 5 5

Some Popular Microcontrollers…
8051 Microchip Technology PIC Atmel AVR Texas Instruments MSP430 (16-bit) ACOE343 - Real-Time Embedded Processor Systems - Frederick University 6

Review questions What are the main differences between a microprocessor and a microcontroller in terms of Architecture Applications Instruction set ACOE343 - Real-Time Embedded Processor Systems - Frederick University 7

Example A uP running at 600 MHz has an average CPI of 1.2 and a average power consumption of 400 mW, while a uC running at 12 MHz with a two cycle datapath has a power consumption of 24 mW. Calculate their respective MIPS Which one is more efficient in MIPS/mW? ACOE343 - Real-Time Embedded Processor Systems - Frederick University 8

Example 2 The previous uP costs 100$, while the respective uC costs 0.96 $ Which is more efficient in MIPS/$? ACOE343 - Real-Time Embedded Processor Systems - Frederick University 9

The 8051 Microcontroller architecture
Lecture 2 The 8051 Microcontroller architecture

Contents: Introduction Block Diagram and Pin Description of the 8051
Registers Some Simple Instructions Structure of Assembly language and Running an 8051 program Memory mapping in 8051 8051 Flag bits and the PSW register Addressing Modes 16-bit, BCD and Signed Arithmetic in 8051 Stack in the 8051 LOOP and JUMP Instructions CALL Instructions I/O Port Programming 11

Three criteria in Choosing a Microcontroller
meeting the computing needs of the task efficiently and cost effectively speed, the amount of ROM and RAM, the number of I/O ports and timers, size, packaging, power consumption easy to upgrade cost per unit availability of software development tools assemblers, debuggers, C compilers, emulator, simulator, technical support wide availability and reliable sources of the microcontrollers.

The 8051 microcontroller a Harvard architecture (separate instruction/data memories) single chip microcontroller (µC) developed by Intel in 1980 for use in embedded systems. today largely superseded by a vast range of faster and/or functionally enhanced compatible devices manufactured by more than 20 independent manufacturers

Block Diagram External interrupts On-chip ROM for program code
Timer/Counter Interrupt Control On-chip RAM Timer 1 Counter Inputs Timer 0 CPU Serial Port Bus Control 4 I/O Ports OSC P0 P1 P2 P3 TxD RxD Address/Data 14

Comparison of the 8051 Family Members
Feature ROM (program space in bytes) 4K K K RAM (bytes) Timers I/O pins Serial port Interrupt sources

Pin Description of the 8051 8051 (8031)  1 2 3 4 5 6 7 8 9 10 11 12
13 14 15 16 17 18 19 20 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 P1.0 P1.1 P1.2 P1.3 P1.4 P1.5 P1.6 P1.7 RST (RXD)P3.0 (TXD)P3.1 (T0)P3.4 (T1)P3.5 XTAL2 XTAL1 GND (INT0)P3.2 (INT1)P3.3 (RD)P3.7 (WR)P3.6 Vcc P0.0(AD0 ) P0.1(AD1) P0.2(AD2 ) P0.3(AD3) P0.4(AD4) P0.5(AD5) P0.6(AD6) P0.7(AD7) EA/VPP ALE/PROG PSEN P2.7(A15) P2.6(A14 ) P2.5(A13 ) P2.4(A12 ) P2.3(A11) P2.2(A10) P2.1(A9) P2.0(A8) 8051 (8031) 

Vcc provides supply voltage to the chip. The voltage source is +5V.
Pins of 8051（1/4） Vcc（pin 40）： Vcc provides supply voltage to the chip. The voltage source is +5V. GND（pin 20）：ground XTAL1 and XTAL2（pins 19,18）： These 2 pins provide external clock. Way 1：using a quartz crystal oscillator Way 2：using a TTL oscillator Example 4-1 shows the relationship between XTAL and the machine cycle.

Pins of 8051（2/4） RST（pin 9）：reset
It is an input pin and is active high（normally low）. The high pulse must be high at least 2 machine cycles. It is a power-on reset. Upon applying a high pulse to RST, the microcontroller will reset and all values in registers will be lost. Reset values of some 8051 registers Way 1：Power-on reset circuit Way 2：Power-on reset with debounce

Pins of 8051（3/4） /EA（pin 31）：external access
There is no on-chip ROM in 8031 and The /EA pin is connected to GND to indicate the code is stored externally. /PSEN ＆ ALE are used for external ROM. For 8051, /EA pin is connected to Vcc. “/” means active low. /PSEN（pin 29）：program store enable This is an output pin and is connected to the OE pin of the ROM. See Chapter 14.

Pins of 8051（4/4） ALE（pin 30）：address latch enable
It is an output pin and is active high. 8051 port 0 provides both address and data. The ALE pin is used for de-multiplexing the address and data by connecting to the G pin of the 74LS373 latch. I/O port pins The four ports P0, P1, P2, and P3. Each port uses 8 pins. All I/O pins are bi-directional.

Figure 4-2 (a). XTAL Connection to 8051
3 0 p F C 1 X T A L 2 X T A L 1 G N D Figure 4-2 (a). XTAL Connection to 8051 Using a quartz crystal oscillator We can observe the frequency on the XTAL2 pin. 

Figure 4-2 (b). XTAL Connection to an External Clock Source
Using a TTL oscillator XTAL2 is unconnected. N C EXTERNAL OSCILLATOR SIGNAL XTAL2 XTAL1 GND 

RESET Value of Some 8051 Registers:
PC 0000 ACC 0000 B 0000 PSW 0000 SP 0007 DPTR 0000 RAM are all zero. 

Figure 4-3 (a). Power-On RESET Circuit
Vcc + 10 uF 31 EA/VPP X1 30 pF 19 MHz 8.2 K X2 18 30 pF RST 9 

Figure 4-3 (b). Power-On RESET with Debounce
Vcc 31 EA/VPP X1 10 uF 30 pF X2 RST 9 8.2 K 

Pins of I/O Port The 8051 has four I/O ports
Port 0 （pins 32-39）：P0（P0.0～P0.7） Port 1（pins 1-8）：P1（P1.0～P1.7） Port 2（pins 21-28）：P2（P2.0～P2.7） Port 3（pins 10-17）：P3（P3.0～P3.7） Each port has 8 pins. Named P0.X （X=0,1,...,7）, P1.X, P2.X, P3.X Ex：P0.0 is the bit 0（LSB）of P0 Ex：P0.7 is the bit 7（MSB）of P0 These 8 bits form a byte. Each port can be used as input or output (bi-direction). Program is to read data from P0 and then send data to P1  27

Some 8-bitt Registers of the 8051
A B R0 R1 R3 R4 R2 R5 R7 R6 DPH DPL PC DPTR Some bit Register Some 8-bitt Registers of the 8051

Memory Map (RAM)

CPU timing Most 8051 instructions are executed in one cycle.
MUL (multiply) and DIV (divide) are the only instructions that take more than two cycles to complete (four cycles) Normally two code bytes are fetched from the program memory during every machine cycle. The only exception to this is when a MOVX instruction is executed. MOVX is a one-byte, 2-cycle instruction that accesses external data memory. During a MOVX, the two fetches in the second cycle are skipped while the external data memory is being addressed and strobed.

8051 machine cycle

 Example : Find the machine cycle for (a) XTAL = 11.0592 MHz
(b) XTAL = 16 MHz. Solution: (a) MHz / 12 = kHz; machine cycle = 1 / kHz = s (b) 16 MHz / 12 = MHz; machine cycle = 1 / MHz = 0.75 s 

Edsim51 emulator diagram

KitCON-515 schematic

Timers 8051 has two 16-bit on-chip timers that can be used for timing durations or for counting external events The high byte for timer 1 (TH1) is at address 8DH while the low byte (TL1) is at 8BH The high byte for timer 0 (TH0) is at 8CH while the low byte (TL0) is at 8AH. Timer Mode Register (TMOD) is at address 88H

Timer Mode Register Bit 7: Gate bit; when set, timer only runs while \INT high. (T0) Bit 6: Counter/timer select bit; when set timer is an event counter when cleared timer is an interval timer (T0) Bit 5: Mode bit 1 (T0) Bit 4: Mode bit 0 (T0) Bit 3: Gate bit; when set, timer only runs while \INT high. (T1) Bit 2: Counter/timer select bit; when set timer is an event counter when cleared timer is an interval timer (T1) Bit 1: Mode bit 1 (T1) Bit 0: Mode bit 0 (T1)

Timer Modes M1-M0: 00 (Mode 0) – 13-bit mode (not commonly used)
M1-M0: 01 (Mode 1) - 16-bit timer mode M1-M0: 10 (Mode 2) - 8-bit auto-reload mode M1-M0: 11 (Mode 3) – Split timer mode

8051 Interrupt Vector Table

The Stack and Stack Pointer
The Stack Pointer, like all registers except DPTR and PC, may hold an 8-bit (1-byte) value. The Stack Pointer is used to indicate where the next value to be removed from the stack should be taken from. When you push a value onto the stack, the 8051 first increments the value of SP and then stores the value at the resulting memory location. When you pop a value off the stack, the 8051 returns the value from the memory location indicated by SP, and then decrements the value of SP. This order of operation is important. When the 8051 is initialized SP will be initialized to 07h. If you immediately push a value onto the stack, the value will be stored in Internal RAM address 08h. This makes sense taking into account what was mentioned two paragraphs above: First the 8051 will increment the value of SP (from 07h to 08h) and then will store the pushed value at that memory address (08h). SP is modified directly by the 8051 by six instructions: PUSH, POP, ACALL, LCALL, RET, and RETI. It is also used intrinsically whenever an interrupt is triggered

Programming the 8051 Microcontroller
Lecture 3 Programming the 8051 Microcontroller Dr. Konstantinos Tatas

A B R0 R1 R3 R4 R2 R5 R7 R6 DPH DPL PC DPTR Some bit Register Some 8-bit Registers of the 8051 A: Accumulator B: Used specially in MUL/DIV R0-R7: GPRs Registers ACOE343 - Embedded Real-Time Processor Systems - Frederick University 41

8051 Programming using Assembly

The MOV Instruction – Addressing Modes
MOV dest,source ; dest = source MOV A,#72H ;A=72H MOV A, #’r’ ;A=‘r’ OR 72H MOV R4,#62H ;R4=62H MOV B,0F9H ;B=the content of F9’th byte of RAM MOV DPTR,#7634H MOV DPL,#34H MOV DPH,#76H MOV P1,A ;mov A to port 1 Note 1: MOV A,#72H ≠ MOV A,72H After instruction “MOV A,72H ” the content of 72’th byte of RAM will replace in Accumulator. MOV AL,72H MOV A,#72H MOV AL,’r’ MOV A,#’r’ MOV BX,72H MOV AL,[BX] MOV A,72H Note 2: MOV A,R3 ≡ MOV A,3 ACOE343 - Embedded Real-Time Processor Systems - Frederick University 43

ACOE343 - Embedded Real-Time Processor Systems - Frederick University
Arithmetic Instructions ADD A, Source ;A=A+SOURCE ADD A,#6 ;A=A+6 ADD A,R6 ;A=A+R6 ADD A,6 ;A=A+[6] or A=A+R6 ADD A,0F3H ;A=A+[0F3H] ACOE343 - Embedded Real-Time Processor Systems - Frederick University 44

Set and Clear Instructions
SETB bit ; bit=1 CLR bit ; bit=0 SETB C ; CY=1 SETB P0.0 ;bit 0 from port 0 =1 SETB P3.7 ;bit 7 from port 3 =1 SETB ACC.2 ;bit 2 from ACCUMULATOR =1 SETB 05 ;set high D5 of RAM loc. 20h Note: CLR instruction is as same as SETB i.e: CLR C ;CY=0 But following instruction is only for CLR: CLR A ;A=0 ACOE343 - Embedded Real-Time Processor Systems - Frederick University 45

SUBB A,source ;A=A-source-CY SETB C ;CY=1 SUBB A,R5 ;A=A-R5-1 ADC A,source ;A=A+source+CY ADC A,R5 ;A=A+R5+1 ACOE343 - Embedded Real-Time Processor Systems - Frederick University 46

DEC byte ;byte=byte-1 INC byte ;byte=byte+1 INC R7 DEC A DEC 40H ; [40]=[40]-1 CPL A ;1’s complement Example: MOV A,#55H ;A= B L01: CPL A MOV P1,A ACALL DELAY SJMP L01 NOP & RET & RETI All are like 8086 instructions.  CALL ACOE343 - Embedded Real-Time Processor Systems - Frederick University 47

Logic Instructions ANL byte/bit ORL byte/bit XRL byte EXAMPLE: MOV R5,#89H ANL R5,#08H ACOE343 - Embedded Real-Time Processor Systems - Frederick University 48

Rotate Instructions RR A Accumulator rotate right RL A Accumulator Rotate left RRC A Accumulator Rotate right through the carry. RLC A Accumulator Rotate left through the carry. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 49

Structure of Assembly language and Running an 8051 program
EDITOR PROGRAM ASSEMBLER LINKER OH Myfile.asm Myfile.obj Other obj file Myfile.lst Myfile.abs Myfile.hex ORG 0H MOV R5,#25H MOV R7,#34H MOV A,#0 ADD A,R5 ADD A,#12H HERE: SJMP HERE END ACOE343 - Embedded Real-Time Processor Systems - Frederick University 50

Memory mapping in 8051 ROM memory map in 8051 family 0000H 0FFFH 1FFFH 7FFFH 8751 AT89C51 8752 AT89C52 4k 8k 32k DS from Atmel Corporation from Dallas Semiconductor ACOE343 - Embedded Real-Time Processor Systems - Frederick University 51

RAM memory space allocation in the 8051 7FH 30H 2FH 20H 1FH 17H 10H 0FH 07H 08H 18H 00H Register Bank 0 (Stack) Register Bank 1 Register Bank 2 Register Bank 3 Bit-Addressable RAM Scratch pad RAM ACOE343 - Embedded Real-Time Processor Systems - Frederick University 52

8051 Flag bits and the PSW register
CY AC F0 RS1 OV RS0 P -- CY PSW.7 Carry flag AC PSW.6 Auxiliary carry flag -- PSW.5 Available to the user for general purpose RS1 PSW.4 Register Bank selector bit 1 RS0 PSW.3 Register Bank selector bit 0 OV PSW.2 Overflow flag -- PSW.1 User define bit P PSW.0 Parity flag Set/Reset odd/even parity RS1 RS0 Register Bank Address H-07H H-0FH H-17H H-1FH ACOE343 - Embedded Real-Time Processor Systems - Frederick University 53

Instructions that Affect Flag Bits:
Note: X can be 0 or 1 54 ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Example: MOV A,#88H ADD A,#93H 11B CY=1 AC=0 P=0 Example: MOV A,#9CH ADD A,#64H 9C CY=1 AC=1 P=0 Example: MOV A,#38H ADD A,#2FH +2F CY=0 AC=1 P=1 ACOE343 - Embedded Real-Time Processor Systems - Frederick University 55

Addressing Modes Immediate Register Direct Register Indirect Indexed ACOE343 - Embedded Real-Time Processor Systems - Frederick University 56

Immediate Addressing Mode
MOV A,#65H MOV A,#’A’ MOV R6,#65H MOV DPTR,#2343H MOV P1,#65H Example : Num EQU 30 … MOV R0,Num MOV DPTR,#data1 ORG 100H data1: db “Example” ACOE343 - Embedded Real-Time Processor Systems - Frederick University 57

Example Write the decimal value 4 on the SSD in the following figure. Switch the decimal point off. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 58

Register Addressing Mode
MOV Rn, A ;n=0,..,7 ADD A, Rn MOV DPL, R6 MOV DPTR, A MOV Rm, Rn ACOE343 - Embedded Real-Time Processor Systems - Frederick University 59

Direct Addressing Mode
Although the entire of 128 bytes of RAM can be accessed using direct addressing mode, it is most often used to access RAM loc. 30 – 7FH. MOV R0, 40H MOV 56H, A MOV A, 4 ; ≡ MOV A, R4 MOV 6, 2 ; copy R2 to R6 ; MOV R6,R2 is invalid ! SFR register and their address MOV 0E0H, #66H ; ≡ MOV A,#66H MOV 0F0H, R2 ; ≡ MOV B, R2 MOV 80H,A ; ≡ MOV P1,A ACOE343 - Embedded Real-Time Processor Systems - Frederick University 60

Register Indirect Addressing Mode
In this mode, register is used as a pointer to the data. MOV ; move content of RAM loc.Where address is held by Ri into A ( i=0 or 1 ) MOV @R1,B In other word, the content of register R0 or R1 is sources or target in MOV, ADD and SUBB insructions. Example: Write a program to copy a block of 10 bytes from RAM location sterting at 37h to RAM location starting at 59h. Solution: MOV R0,37h ; source pointer MOV R1,59h ; dest pointer MOV R2,10 ; counter L1: MOV MOV @R1,A INC R0 INC R1 DJNZ R2,L1 jump ACOE343 - Embedded Real-Time Processor Systems - Frederick University 61

Indexed Addressing Mode And On-Chip ROM Access
This mode is widely used in accessing data elements of look-up table entries located in the program (code) space ROM at the 8051 MOVC A= content of address A +DPTR from ROM Note: Because the data elements are stored in the program (code ) space ROM of the 8051, it uses the instruction MOVC instead of MOV. The “C” means code. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 62

Example: Assuming that ROM space starting at 250h contains “Hello.”, write a program to transfer the bytes into RAM locations starting at 40h. Solution: ORG 0 MOV DPTR,#MYDATA MOV R0,#40H L1: CLR A MOVC JZ L2 MOV @R0,A INC DPTR INC R0 SJMP L1 L2: SJMP L2 ; ORG 250H MYDATA: DB “Hello”,0 END Notice the NULL character ,0, as end of string and how we use the JZ instruction to detect that. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 63

Example: Write a program to get the x value from P1 and send x2 to P2, continuously . Solution: ORG 0 ;code segment MOV DPTR, #TAB1 ;moving data segment to data pointer MOV A,#0FFH ;configuring P1 as input port MOV P1,A L01: MOV A,P1 ;reading value from P1 MOVC MOV P2,A SJMP L01 ; ORG 300H ;data segment TAB1: DB 0,1,4,9,16,25,36,49,64,81 END ACOE343 - Embedded Real-Time Processor Systems - Frederick University 64

External Memory Addressing
MOVX ; A [R1] (in external memory) MOVX A ACOE343 - Embedded Real-Time Processor Systems - Frederick University 65

16-bit, BCD and Signed Arithmetic in 8051
Exercise: Write a program to add n 16-bit number. Get n from port 1. And sent Sum to SSD a) in hex b) in decimal Write a program to subtract P1 from P0 and send result to LCD (Assume that “ACAL DISP” display A to SSD ) ACOE343 - Embedded Real-Time Processor Systems - Frederick University 66

MUL & DIV MUL AB ;B|A = A*B MOV A,#25H MOV B,#65H MUL AB ;25H*65H=0E99 ;B=0EH, A=99H MUL AB ;A = A/B, B = A mod B MOV A,#25 MOV B,#10 MUL AB ;A=2, B=5 ACOE343 - Embedded Real-Time Processor Systems - Frederick University 67

Stack in the 8051 The register used to access the stack is called SP (stack pointer) register. The stack pointer in the is only 8 bits wide, which means that it can take value 00 to FFH. When powered up, the SP register contains value 07. 7FH 30H 2FH 20H 1FH 17H 10H 0FH 07H 08H 18H 00H Register Bank 0 (Stack) Register Bank 1 Register Bank 2 Register Bank 3 Bit-Addressable RAM Scratch pad RAM ACOE343 - Embedded Real-Time Processor Systems - Frederick University 68

Example: MOV R6,#25H MOV R1,#12H MOV R4,#0F3H PUSH 6 PUSH 1 PUSH 4 0BH 0AH 09H 08H Start SP=07H 25 0BH 0AH 09H 08H SP=08H 12 25 0BH 0AH 09H 08H SP=09H F3 12 25 0BH 0AH 09H 08H SP=10H ACOE343 - Embedded Real-Time Processor Systems - Frederick University 69

Example (cont.) POP 4 POP 1 POP 6 F3 12 25 0BH 0AH 09H 08H SP=10H 12 25 0BH 0AH 09H 08H SP=09H 25 0BH 0AH 09H 08H SP=08H 0BH 0AH 09H 08H Start SP=07H ACOE343 - Embedded Real-Time Processor Systems - Frederick University 70

How to use the stack You can use the stack as temporary storage for variables when calling functions RLC A ;you can only rotate A Call function DIV AB ; A has the wrong value!!!!! … function: MOV A, #5 ;values are for example sake MOV B, #10 MUL AB ;you can only multiply on A RET ACOE343 - Embedded Real-Time Processor Systems - Frederick University 71

Example (correct) RLC A ;you can only rotate A PUSH A ;saving A and B on the stack before PUSH B ;calling function Call function POP B ;restoring B POP A ;and A (POP in reverse order) DIV AB ; A has the wrong value!!!!! … function: MOV A, #5 ;values are for example sake MOV B, #10 MUL AB ;you can only multiply on A RET ACOE343 - Embedded Real-Time Processor Systems - Frederick University 72

Saving PSW The Program Status Word registers contains flags that are often important for correct program flow You can push PSW on the stack before calling a function ADD A, R0 PUSH PSW PUSH A ;saving A and R0 on the stack before PUSH R0 ;calling function Call function POP R0 ;restoring R0 POP A ;and A (POP in reverse order) POP PSW JC loop ;If this means the carry from the ;function then don’t push PSW … function: MOV A, #5 ;values are for example sake ADD A, R2 ;the flags are set according to ADD result RET ACOE343 - Embedded Real-Time Processor Systems - Frederick University 73

LOOP and JUMP Instructions
DJNZ: Write a program to clear ACC, then add 3 to the accumulator ten times Solution: MOV A,#0; MOV R2,#10 AGAIN: ADD A,#03 DJNZ R2,AGAING ;repeat until R2=0 (10 times) MOV R5,A ACOE343 - Embedded Real-Time Processor Systems - Frederick University 74

Other conditional jumps :
JZ Jump if A=0 JNZ Jump if A/=0 DJNZ Decrement and jump if A/=0 CJNE A,byte Jump if A/=byte CJNE reg,#data Jump if byte/=#data JC Jump if CY=1 JNC Jump if CY=0 JB Jump if bit=1 JNB Jump if bit=0 JBC Jump if bit=1 and clear bit 75 ACOE343 - Embedded Real-Time Processor Systems - Frederick University

SJMP and LJMP: LJMP(long jump) LJMP is an unconditional jump. It is a 3-byte instruction in which the first byte is the opcode, and the second and third bytes represent the 16-bit address of the target location. The 20byte target address allows a jump to any memory location from 0000 to FFFFH. SJMP(short jump) In this 2-byte instruction. The first byte is the opcode and the second byte is the relative address of the target location. The relative address range of 00-FFH is divided into forward and backward jumps, that is , within -128 to +127 bytes of memory relative to the address of the current PC. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 76

CJNE , JNC Exercise: Write a program that compare R0,R1. If R0>R1 then send 1 to port 2, else if R0<R1 then send 0FFh to port 2, else send 0 to port 2. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 77

CALL Instructions Another control transfer instruction is the CALL instruction, which is used to call a subroutine. LCALL(long call) In this 3-byte instruction, the first byte is the opcode an the second and third bytes are used for the address of target subroutine. Therefore, LCALL can be used to call subroutines located anywhere within the 64K byte address space of the 8051. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 78

ACALL (absolute call) ACALL is 2-byte instruction in contrast to LCALL, which is 13 bytes. Since ACALL is a 2-byte instruction, the target address of the subroutine must be within 2K bytes address because only 11 bits of the 2 bytes are used for the address. There is no difference between ACALL and LCALL in terms of saving the program counter on the stack or the function of the RET instruction. The only difference is that the target address for LCALL can be anywhere within the 64K byte address space of the while the target address of ACALL must be within a 2K- byte range. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 79

Example A B R5 R7 Address Data ORG 0H VAL1 EQU 05H MOV R5,#25H
LOOP: MOV R7,#VAL1 MOV A,#0 ADD A,R5 ADD A,#12H RRC A DJNZ A, LOOP SETB ACC.3 CLR A CJNE A, #0, LOOP HERE: SJMP HERE END 80 ACOE343 - Embedded Real-Time Processor Systems - Frederick University

I/O Port Programming Port 1（pins 1-8）  Port 1 is denoted by P1. P1.0 ~ P1.7 We use P1 as examples to show the operations on ports. P1 as an output port (i.e., write CPU data to the external pin) P1 as an input port (i.e., read pin data into CPU bus) ACOE343 - Embedded Real-Time Processor Systems - Frederick University 81

A Pin of Port 1 D Q Clk Q Vcc Load(L1) Read latch Read pin Write to latch Internal CPU bus M1 P1.X pin P1.X TB1 TB2 P0.x 8051 IC ACOE343 - Embedded Real-Time Processor Systems - Frederick University 82

Hardware Structure of I/O Pin
Each pin of I/O ports Internal CPU bus：communicate with CPU A D latch store the value of this pin D latch is controlled by “Write to latch” Write to latch＝1：write data into the D latch 2 Tri-state buffer： TB1: controlled by “Read pin” Read pin＝1：really read the data present at the pin TB2: controlled by “Read latch” Read latch＝1：read value from internal latch A transistor M1 gate Gate=0: open Gate=1: close ACOE343 - Embedded Real-Time Processor Systems - Frederick University 83

Tri-state Buffer  Output Input Tri-state control (active high) L L H
Low Highimpedance (open-circuit) H H  ACOE343 - Embedded Real-Time Processor Systems - Frederick University 84

Writing “1” to Output Pin P1.X
D Q Clk Q Vcc Load(L1) Read latch Read pin Write to latch Internal CPU bus M1 P1.X pin P1.X TB2 2. output pin is Vcc 1. write a 1 to the pin 1 output 1 TB1 8051 IC ACOE343 - Embedded Real-Time Processor Systems - Frederick University 85

Writing “0” to Output Pin P1.X
D Q Clk Q Vcc Load(L1) Read latch Read pin Write to latch Internal CPU bus M1 P1.X pin P1.X TB2 2. output pin is ground 1. write a 0 to the pin output 0 1 TB1 8051 IC ACOE343 - Embedded Real-Time Processor Systems - Frederick University 86

Port 1 as Output（Write to a Port）
Send data to Port 1： MOV A,#55H BACK: MOV P1,A ACALL DELAY CPL A SJMP BACK Let P1 toggle. You can write to P1 directly. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 87 87

Reading Input v.s. Port Latch
When reading ports, there are two possibilities： Read the status of the input pin. （from external pin value） MOV A, PX JNB P2.1, TARGET ; jump if P2.1 is not set JB P2.1, TARGET ; jump if P2.1 is set Figures C-11, C-12 Read the internal latch of the output port. ANL P1, A ; P1 ← P1 AND A ORL P1, A ; P1 ← P1 OR A INC P ; increase P1 Figure C-17 Table C-6 Read-Modify-Write Instruction (or Table 8-5) See Section 8.3 ACOE343 - Embedded Real-Time Processor Systems - Frederick University 88

Reading “High” at Input Pin
D Q Clk Q Vcc Load(L1) Read latch Read pin Write to latch Internal CPU bus M1 P1.X pin P1.X 2. MOV A,P1 external pin=High TB2 write a 1 to the pin MOV P1,#0FFH 1 1 TB1 3. Read pin=1 Read latch=0 Write to latch=1 8051 IC ACOE343 - Embedded Real-Time Processor Systems - Frederick University 89

Reading “Low” at Input Pin
D Q Clk Q Vcc Load(L1) Read latch Read pin Write to latch Internal CPU bus M1 P1.X pin P1.X 2. MOV A,P1 external pin=Low TB2 write a 1 to the pin MOV P1,#0FFH 1 TB1 3. Read pin=1 Read latch=0 Write to latch=1 8051 IC ACOE343 - Embedded Real-Time Processor Systems - Frederick University 90

Port 1 as Input（Read from Port）
In order to make P1 an input, the port must be programmed by writing 1 to all the bit. MOV A,#0FFH ;A= B MOV P1,A ;make P1 an input port BACK: MOV A,P ;get data from P0 MOV P2,A ;send data to P2 SJMP BACK To be an input port, P0, P1, P2 and P3 have similar methods. Program is to read data from P0 and then send data to P1 ACOE343 - Embedded Real-Time Processor Systems - Frederick University 91 91

Instructions For Reading an Input Port
Following are instructions for reading external pins of ports: Mnemonics Examples Description MOV A,PX MOV A,P2 Bring into A the data at P2 pins JNB PX.Y,.. JNB P2.1,TARGET Jump if pin P2.1 is low JB PX.Y,.. JB P1.3,TARGET Jump if pin P1.3 is high MOV C,PX.Y MOV C,P2.4 Copy status of pin P2.4 to CY 92 ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Reading Latch Exclusive-or the Port 1： MOV P1,#55H ;P1= ORL P1,#0F0H ;P1= 1. The read latch activates TB2 and bring the data from the Q latch into CPU. Read P1.0=0 2. CPU performs an operation. This data is ORed with bit 1 of register A. Get 1. 3. The latch is modified. D latch of P1.0 has value 1. 4. The result is written to the external pin. External pin (pin 1: P1.0) has value 1. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 93

Reading the Latch D Q Clk Q
1. Read pin=0 Read latch=1 Write to latch=0 (Assume P1.X=0 initially) D Q Clk Q Vcc Load(L1) Read latch Read pin Write to latch Internal CPU bus M1 P1.X pin P1.X TB2 2. CPU compute P1.X OR 1 4. P1.X=1 1 1 3. write result to latch Read pin= Read latch= Write to latch=1 TB1 8051 IC ACOE343 - Embedded Real-Time Processor Systems - Frederick University 94

Read-modify-write Feature
Read-modify-write Instructions Table C-6 This features combines 3 actions in a single instruction： 1. CPU reads the latch of the port 2. CPU perform the operation 3. Modifying the latch 4. Writing to the pin Note that 8 pins of P1 work independently. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 95

Port 1 as Input（Read from latch）
Exclusive-or the Port 1： MOV P1,#55H ;P1= AGAIN: XOR P1,#0FFH ;complement ACALL DELAY SJMP AGAIN Note that the XOR of 55H and FFH gives AAH. XOR of AAH and FFH gives 55H. The instruction read the data in the latch (not from the pin). The instruction result will put into the latch and the pin. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 96 96

Read-Modify-Write Instructions
Mnemonics Example SETB P1.4 SETB PX.Y CLR P1.3 CLR PX.Y MOV P1.2,C MOV PX.Y,C DJNZ P1,TARGET DJNZ PX, TARGET INC P1 INC CPL P1.2 CPL JBC P1.1, TARGET JBC PX.Y, TARGET XRL P1,A XRL ORL P1,A ORL ANL P1,A ANL DEC P1 DEC ANL: Latch data AND with A , then save back to latch and write to the external pin ORL: OR XRL: XOR JBC: jump to TARGET if bit set and clear bit CPL: complement INC: increase DEC: decrease DJNZ: decrease P1 and jump if P1 not zero MOV the latch value to carry CLR: clear bit, SETB: set bit ACOE343 - Embedded Real-Time Processor Systems - Frederick University 97 97

You are able to answer this Questions:
How to write the data to a pin？ How to read the data from the pin？ Read the value present at the external pin. Why we need to set the pin first？ Read the value come from the latch（not from the external pin）. Why the instruction is called read-modify write? ACOE343 - Embedded Real-Time Processor Systems - Frederick University 98

Other Pins P1, P2, and P3 have internal pull-up resisters. P1, P2, and P3 are not open drain. P0 has no internal pull-up resistors and does not connects to Vcc inside the 8051. P0 is open drain. Compare the figures of P1.X and P0.X.  However, for a programmer, it is the same to program P0, P1, P2 and P3. All the ports upon RESET are configured as output. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 99

A Pin of Port 0 D Q Clk Q Read latch Read pin Write to latch Internal CPU bus M1 P0.X pin P1.X TB1 TB2 P1.x 8051 IC ACOE343 - Embedded Real-Time Processor Systems - Frederick University 100

Port 0（pins 32-39） P0 is an open drain. Open drain is a term used for MOS chips in the same way that open collector is used for TTL chips.  When P0 is used for simple data I/O we must connect it to external pull-up resistors. Each pin of P0 must be connected externally to a 10K ohm pull-up resistor. With external pull-up resistors connected upon reset, port 0 is configured as an output port. Open drain is a term used for MOS chips in the same way that open collector is used for TTL chips. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 101 101

Port 0 with Pull-Up Resistors
DS5000 8751 8951 Vcc 10 K Port 0 ACOE343 - Embedded Real-Time Processor Systems - Frederick University 102

Dual Role of Port 0 When connecting an 8051/8031 to an external memory, the uses ports to send addresses and read instructions. 8031 is capable of accessing 64K bytes of external memory. 16-bit address：P0 provides both address A0-A7, P2 provides address A8-A15. Also, P0 provides data lines D0-D7. When P0 is used for address/data multiplexing, it is connected to the 74LS373 to latch the address. There is no need for external pull-up resistors as shown in Chapter 14. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 103

74LS373 D 74LS373 ALE P0.0 P0.7 PSEN A0 A7 D0 D7 P2.0 P2.7 A8 A15 OE OC EA G 8051 ROM ACOE343 - Embedded Real-Time Processor Systems - Frederick University 104

Reading ROM (1/2) latches the address and send to ROM 1. Send address to ROM D 74LS373 ALE P0.0 P0.7 PSEN A0 A7 D0 D7 P2.0 P2.7 A8 A12 OE OC EA G 8051 ROM Address ACOE343 - Embedded Real-Time Processor Systems - Frederick University 105

Reading ROM (2/2) latches the address and send to ROM D 74LS373 ALE P0.0 P0.7 PSEN A0 A7 D0 D7 P2.0 P2.7 A8 A12 OE OC EA G 8051 ROM Address 3. ROM send the instruction back ACOE343 - Embedded Real-Time Processor Systems - Frederick University 106

ALE Pin The ALE pin is used for de-multiplexing the address and data by connecting to the G pin of the 74LS373 latch. When ALE=0, P0 provides data D0-D7. When ALE=1, P0 provides address A0-A7. The reason is to allow P0 to multiplex address and data. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 107

Port 2（pins 21-28） Port 2 does not need any pull-up resistors since it already has pull-up resistors internally. In an 8031-based system, P2 are used to provide address A8-A15. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 108

Port 3（pins 10-17） Port 3 does not need any pull-up resistors since it already has pull-up resistors internally. Although port 3 is configured as an output port upon reset, this is not the way it is most commonly used. Port 3 has the additional function of providing signals. Serial communications signal：RxD, TxD（Chapter 10） External interrupt：/INT0, /INT1（Chapter 11） Timer/counter：T0, T1（Chapter 9） External memory accesses in 8031-based system：/WR, /RD（Chapter 14） ACOE343 - Embedded Real-Time Processor Systems - Frederick University 109 109

Port 3 Alternate Functions
17 RD P3.7 16 WR P3.6 15 T1 P3.5 14 T0 P3.4 13 INT1 P3.3 12 INT0 P3.2 11 TxD P3.1 10 RxD P3.0 Pin Function P3 Bit  ACOE343 - Embedded Real-Time Processor Systems - Frederick University 110

Generating Delays You can generate short delays using a register and incrementing or decrementing its value Example: mov r1, #0ah loop: djnz r1, loop How much delay is that? Djnz is a 2-byte instruction it takes two machine cycles One machine cycle is 1/12 of the system clock period For a 12 MHz system clock that is: Machine cycle = 12/12 = 1 MHz Machine period = 1/(1 MHz) = 10^(-6) s = 1 μs Loop time = 10*2*1 μs = 20 μs ACOE343 - Embedded Real-Time Processor Systems - Frederick University 111

Generating longer delays
Each register is 8 bits long, so it can increment 256 times before overflowing For larger delays, or when interrupts are required 8051 uses two timers ACOE343 - Embedded Real-Time Processor Systems - Frederick University 112

113

TMOD Register: Gate : When set, timer only runs while INT(0,1) is high. C/T : Counter/Timer select bit. M1 : Mode bit 1. M0 : Mode bit 0. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 114

TCON Register: TF1: Timer 1 overflow flag. TR1: Timer 1 run control bit. TF0: Timer 0 overflag. TR0: Timer 0 run control bit. IE1: External interrupt 1 edge flag. IT1: External interrupt 1 type flag. IE0: External interrupt 0 edge flag. IT0: External interrupt 0 type flag. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 115

Timer Mode Register Bit 7: Gate bit; when set, timer only runs while \INT high. (T0) Bit 6: Counter/timer select bit; when set timer is an event counter when cleared timer is an interval timer (T0) Bit 5: Mode bit 1 (T0) Bit 4: Mode bit 0 (T0) Bit 3: Gate bit; when set, timer only runs while \INT high. (T1) Bit 2: Counter/timer select bit; when set timer is an event counter when cleared timer is an interval timer (T1) Bit 1: Mode bit 1 (T1) Bit 0: Mode bit 0 (T1) ACOE343 - Embedded Real-Time Processor Systems - Frederick University 116

Timer Modes M1-M0: 00 (Mode 0) – 13-bit mode (not commonly used) M1-M0: 01 (Mode 1) - 16-bit timer mode M1-M0: 10 (Mode 2) - 8-bit auto-reload mode M1-M0: 11 (Mode 3) – Split timer mode ACOE343 - Embedded Real-Time Processor Systems - Frederick University 117

Timer Control Register (TCON)
Bit 7 (TF1) 8FH : Timer 1 overflow flag; set by hardware upon overflow, cleared by software Bit 6 (TR1) 8EH: Timer 1 run-control bit; manipulated by software - setting starts timer 1, resetting stops timer 1 Bit 5 (TF0) 8DH: Timer 0 overflow flag; set by hardware upon overflow, cleared by software. Bit 4 (TR0) 8CH: Timer 0 run-control bit; manipulated by software - setting starts timer 0, resetting stops timer 0 Bit 3 (IE1) 8BH: External 1 Interrupt flag bit Bit 2 (IT1) 8AH: Bit 1 (IE0) 89H: External 0 Interrupt flag bit Bit 0 (IT0) 88H: ACOE343 - Embedded Real-Time Processor Systems - Frederick University 118

Initializing and stopping timers
MOV TMOD, #16H ;initialization SETB TR0 ;starting timers SETB TR1 CLR TR0 ; stop timer 0 CLR TR1 ; stop timer 1 MOV R7, TH0 ; reading timers MOV R6, TL0 ACOE343 - Embedded Real-Time Processor Systems - Frederick University 119

Reading timers on the fly
Reading the Timers on the Fly To read the contents of a timer while it is running (ie; on the fly) poses a problem. MOV A, TH0 ;Let TH0 = 07 and TL0 = FF MOV R6, TL0 ;Now TH0=08 and TL0=00 but A=07 and R6=00! The solution is to read the high byte, then read the low byte, then read the high byte again. If the two readings of the low-byte are not the same repeat the procedure. The code for this method is detailed below. tryAgain:MOV A, TH0 MOV R6, TL0 CJNE A, TH0, tryAgain if the first reading of the hi-byte (in A) is not equal to current reading in the hi-byte (TH0) try againMOV R7, A; if both readings of the hi- byte are the same move the first reading into R7 - the overall reading is now in R7R6 ACOE343 - Embedded Real-Time Processor Systems - Frederick University 120

Generating delays using the timers
To generate a 50 ms (or 50,000 us) delay we start the timer counting from 15,536. Then, 50,000 steps later it will overflow. Since each step is 1 us (the timer's clock is 1/12 the system frequency) the delay is 50,000 us. 0 MOV TMOD, #10H; set up timer 1 as 16-bit interval timer CLR TR1 ; stop timer 1 (in case it was started in some other subroutine) MOV TH1, #3CH MOV TL1, #0B0H ; load 15,536 (3CB0H) into timer 1 SETB TR1 ; start timer 1 JNB TF1, $; repeat this line while timer 1 overflow flag is not set CLR TF1; timer 1 overflow flag is set by hardware on transition from FFFFH - the flag must be reset by software CLR TR1 ; stop timer 1 ACOE343 - Embedded Real-Time Processor Systems - Frederick University 121

Generating long delays
If the microcontroller has a system clock frequency of 12 MHz then the longest delay we can get from either of the timers is 65,536 usec. To generate delays longer than this, we need to write a subroutine to generate a delay of (for example) 50 ms and then call that subroutine a specific number of times. ... MOV TMOD, #10H; set up timer 1 as 16-bit interval timer fiftyMsDelay:CLR TR1 ; stop timer 1 (in case it was started in some other subroutine) MOV TH1, #3CH MOV TL1, #0B0H ; load 15,536 (3CB0H) into timer 1 SETB TR1 ; start timer 1 JNB TF1, $; repeat this line while timer 1 overflow flag is not set CLR TF1; timer 1 overflow flag is set by hardware on transition from FFFFH - the flag must be reset by software CLR TR1 ; stop timer 1 RET oneSecDelay:PUSH PSW PUSH AR0 ; save processor status MOV R0, #20 ; move 20 (in decimal) into R0 loop:CALL fiftyMsDelay ; call the 50 ms delay DJNZ R0, loop ; 20 times - resulting in a 1 second delay POP AR0 POP PSW ; retrieve processor status RET ACOE343 - Embedded Real-Time Processor Systems - Frederick University 122

Using timers to measure execution time
Timers are often used to measure the execution time of a program ORG 0H MOV TMOD, #16H ;initialization SETB TR0 ;starting timer 0 … ;main … ;program CLR TR0 ; stop timer 0 MOV R7, TH0 ; reading timer 0 MOV R6, TL0 ACOE343 - Embedded Real-Time Processor Systems - Frederick University 123

Interrupt : ACOE343 - Embedded Real-Time Processor Systems - Frederick University 124

Interrupt Enable Register :
EA : Global enable/disable. : Undefined. ET2 :Enable Timer 2 interrupt. ES :Enable Serial port interrupt. ET1 :Enable Timer 1 interrupt. EX1 :Enable External 1 interrupt. ET0 : Enable Timer 0 interrupt. EX0 : Enable External 0 interrupt. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 125

Interrupt handling 8051 Interrupt Vector Table ACOE343 - Embedded Real-Time Processor Systems - Frederick University 126

Interrupt Service Routines
ORG 0 JMP main ORG 0003H ; external interrupt 0 vector …. ; interrupt handler code for external interrupt 0 RETI ORG 0013H ; external interrupt 1 vector …. ;interrupt handler code for external interrupt 1 RETI ORG 0030H ; main program main:SETB IT0 ; set external interrupt 0 as edge activated SETB IT1 ; set external interrupt 1 as edge activated SETB EX0 ; enable external interrupt 0 SETB EX1 ; enable external interrupt 1 SETB EA ; global interrupt enable … ACOE343 - Embedded Real-Time Processor Systems - Frederick University 127

Examples Write a 8051 assembly program that matches 8 switches with 8 LEDs Write a 8051 assembly program that uses a two-digit SSD to display the temperature as input from an ADC. Assume that the 0- 5V range corresponds to 0-50 °C. The ADC uses RD, WR and INT pins. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 128

8051 Programming Using C

Programming microcontrollers using high-level languages
Most programs can be written exclusively using high-level code like ANSI C Extensions To achieve low-level (Assembly) efficiency, extensions to high-level languages are required Restrictions Depending on the compiler, some restrictions to the high-level language may apply ACOE343 - Embedded Real-Time Processor Systems - Frederick University 130

Keil C keywords data/idata: Description: The variable will be stored in internal data memory of controller. example: unsigned char data x; //or unsigned char idata y; bdata: Description: The variable will be stored in bit addressable memory of controller. example: unsigned char bdata x; //each bit of the variable x can be accessed as follows x ^ 1 = 1; //1st bit of variable x is set x ^ 0 = 0; //0th bit of variable x is cleared xdata: Description: The variable will be stored in external RAM memory of controller. example: unsigned char xdata x; ACOE343 - Embedded Real-Time Processor Systems - Frederick University 131

Keil C keywords code: Description: This keyword is used to store a constant variable in code and not data memory. example: unsigned char code str="this is a constant string"; _at_: Description: This keyword is used to store a variable on a defined location in ram. example: CODE: unsigned char idata x _at_ 0x30; // variable x will be stored at location 0x30 // in internal data memory sbit: Description: This keyword is used to define a special bit from SFR (special function register) memory. example: sbit Port0_0 = 0x80; // Special bit with name Port0_0 is defined at address 0x80 ACOE343 - Embedded Real-Time Processor Systems - Frederick University 132

Keil C keywords sfr: Description: sfr is used to define an 8-bit special function register from sfr memory. example: sfr Port1 = 0x90; // Special function register with name Port1 defined at addrress 0x90 sfr16: Description: This keyword is used to define a two sequential 8-bit registers in SFR memory. example: sfr16 DPTR = 0x82; // 16-bit special function register starting at 0x82 // DPL at 0x82, DPH at 0x83 using: Description: This keyword is used to define register bank for a function. User can specify register bank 0 to example: void function () using 2{ // code } // Funtion named "function" uses register bank 2 while executing its code Interrupt: Description: defines interrupt service routine void External_Int0() interrupt 0{ //code } ACOE343 - Embedded Real-Time Processor Systems - Frederick University 133

Pointers //Generic Pointer char * idata ptr; //character pointer stored in data memory int * xdata ptr1; //Integer pointer stored in external data memory //Memory Specific pointer char idata * xdata ptr2; //Pointer to character stored in Internal Data memory //and pointer is going to be stored in External data memory int xdata * data ptr3; //Pointer to character stored in External Data memory //and pointer is going to be stored in data memory ACOE343 - Embedded Real-Time Processor Systems - Frederick University 134

Writing hardware-specific code
#include <REGx51.h> //header file for 89C51 void main(){ //main function starts unsigned int i; //Initializing Port1 pin1 P1_1 = 0; //Make Pin1 o/p while(1){ //Infinite loop main application //comes here for(i=0;i<1000;i++) ; //delay loop P1_1 = ~P1_1; //complement Port1.1 //this will blink LED connected on Port1.1 } } ACOE343 - Embedded Real-Time Processor Systems - Frederick University 135

C and Assembly together
extern unsigned long add(unsigned long, unsigned long); void main(){ unsigned long a; a = add(10,30); //calling Assembly function while(1); } ACOE343 - Embedded Real-Time Processor Systems - Frederick University 136

C and Assembly together
name asm_test ?PR?_add?asm_test segment code ?DT?_add?asm_test segment data ;let other function use this data space for passing variables public ?_add?BYTE ;make function public or accessible to everyone public _add ;define the data segment for function add rseg ?DT?_add?asm_test ?_add?BYTE: parm1: DS 4 ;First Parameter parm2: ds 4 ;Second Parameter ;either you can use parm1 for reading passed value as shown below ;or directly use registers used to pass the value. rseg ?PR?_add?asm_test _add: ;reading first argument mov parm1+3,r7 mov parm1+2,r6 mov parm1+1,r5 mov parm1,r4 ;param2 is stored in fixed location given by param2 ;now adding two variables mov a,parm2+3 add a,parm1+3 ;after addition of LSB, move it to r7(LSB return register for Long) mov r7,a mov a,parm2+2 addc a,parm1+2 ;store second LSB mov r6,a mov a,parm2+1 addc a,parm1+1 ;store second MSB mov r5,a mov a,parm2 addc a,parm1 mov r4,a ret end ACOE343 - Embedded Real-Time Processor Systems - Frederick University 137

The infinite loop A loop with no termination condition or one that will never be met may be unwanted in computer systems, but common in embedded systems. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 138

Example 1 Generate a 5V peek-to-peek 200μs period square waveform on the DAC output ACOE343 - Embedded Real-Time Processor Systems - Frederick University 139

Example 2 Generate a 5V peek-to-peek 200μs period sawtooth waveform on the DAC output ACOE343 - Embedded Real-Time Processor Systems - Frederick University 140

Example 3 Generate a 5V peek-to-peek 2ms period sine waveform on the DAC output code unsigned char Sine[180] = { /* Sine values */ 127,131,136,140,145,149,153,158,162,166,170,175, 179,183,187,191,194,198,202,205,209,212,215,218, 221,224,227,230,232,235,237,239,241,243,245,246, 248,249,250,251,252,253,253,254,254,254,254,254, 253,253,252,251,250,249,248,246,245,243,241,239, 237,235,232,230,227,224,221,218,215,212,209,205, 202,198,194,191,187,183,179,175,170,166,162,158, 153,149,145,140,136,131,127,123,118,114,109,105, 101, 96, 92, 88, 84, 79, 75, 71, 67, 64, 60, 56, 52, 49, 45, 42, 39, 36, 33, 30, 27, 24, 22, 19, 17, 15, 13, 11, 9, 8, 6, 5, 4, 3, 2, 1, 1, 0, 0, 0, 0, 0, 1, 1, 2, 3, 4, 5, 6, 8, 9, 11, 13, 15, 17, 19, 22, 24, 27, 30, 33, 36, 39, 42, 45, 49, 52, 56, 60, 63, 67, 71, 75, 79, 84, 88, 92, 96, 101,105,109,114,118 }; /************************************************************ * START of the PROGRAM * ************************************************************ / void main (void) { unsigned char i; * Enable the D/A Converter * ENDAC0 = 1; /* Enable DAC0 */ * Create the waveforms on DAC0 * while(1){ /* Run for ever */ for(i = 0; i < 179; i++) DAC0 = Sine[i]; } * while(1) */ } /* main() */ } ACOE343 - Embedded Real-Time Processor Systems - Frederick University 141

Mixed C/Assembly code (μVision Version 2.06)
Parameter passing in registers Examples: ACOE343 - Embedded Real-Time Processor Systems - Frederick University 142

Function return values
ACOE343 - Embedded Real-Time Processor Systems - Frederick University 143

Example extern unsigned char add2_func(unsigned char, unsigned char); void main(){ unsigned char a; a = add2_func(10,30); //a will have 40 after execution while(1); } ;assembly file “add2.asm” NAME _Add2_func ?PR?add2_func?Add2 SEGMENT CODE PUBLIC add2_func RSEG ?PR?Add2_func?Add2 add2_func: mov a, r7 ;first parameter passed to r7 add a, r5 ;second parameter passed to r5 mov r7, a ;return parameter must be in r7 RET END ACOE343 - Embedded Real-Time Processor Systems - Frederick University 144

Calling C from Assembly
NAME A_FUNC ?PR?a_func?A_FUNC SEGMENT CODE EXTRN CODE (c_func) PUBLIC a_func RSEG ?PR?a_func?A_FUNC a_func: USING 0 LCALL c_func RET END void c_func (void) { } ACOE343 - Embedded Real-Time Processor Systems - Frederick University 145

Data Converters Analog to Digital Converters (ADC)
Convert an analog quantity (voltage, current) into a digital code Digital to Analog Converters (DAC) Convert a digital code into an analog quantity (voltage, current) Dr. Konstantinos Tatas and Dr. Costas Kyriacou 146

Video (Analog - Digital)
Modulator Amplifier Filters Analog Pre- amplifier A/D Digital Image enhancement and coding ACOE343 - Embedded Real-Time Processor Systems - Frederick University 147 147

Temperature Recording by a Digital System
Time Temperature (ºC) Time Sampling & quantization Coding ACOE343 - Embedded Real-Time Processor Systems - Frederick University 148 148

Need for Data Converters
Digital processing and storage of physical quantities (sound, temperature, pressure etc) exploits the advantages of digital electronics Better and cheaper technology compared to the analog More reliable in terms of storage, transfer and processing Not affected by noise Processing using programs (software) Easy to change or upgrade the system (e.g. Media Player 7  Media Player 8 ή Real Player) Integration of different functions (π.χ. Mobile = phone + watch + camera + games + + ACOE343 - Embedded Real-Time Processor Systems - Frederick University 149 149

Signals (Analog - Digital)
2 4 6 8 10 12 14 16 u(V) 1 7 3 5 9 t (S) Analog Signal can take infinity values can change at any time 1010 1110 1111 1100 1000 Digital Signal can take one of 2 values (0 or 1) can change only at distinct times Reconstruction of an analog signal from a digital one (Can take only predefined values) 1001 0110 0101 0100 2 4 6 8 10 12 14 16 u(V) 1 7 3 5 9 t (S) ADC 1001 0110 0101 1010 1111 1110 1000 1100 0100 D3 D2 D1 D0 1 1 1 1 1 DAC ACOE343 - Embedded Real-Time Processor Systems - Frederick University 150 150

QUANTIZATION ERROR The difference between the true and quantized value of the analog signal Inevitable occurrence due to the finite resolution of the ADC The magnitude of the quantization error at each sampling instant is between zero and half of one LSB. Quantization error is modeled as noise (quantization noise) 151 ACOE343 - Embedded Real-Time Processor Systems - Frederick University

SAMPLING FREQUENCY (RATE)
The frequency at which digital values are sampled from the analog input of an ADC A low sampling rate (undersampling) may be insufficient to represent the analog signal in digital form A high sampling rate (oversampling) requires high bitrate and therefore storage space and processing time A signal can be reproduced from digital samples if the sampling rate is higher than twice the highest frequency component of the signal (Nyquist-Shannon theorem) Examples of sampling rates Telephone: 4 KHz (only adequate for speech, ess sounds like eff) Audio CD: 44.1 KHz Recording studio: 88.2 KHz ACOE343 - Embedded Real-Time Processor Systems - Frederick University 152

Digital to Analog Converters
Vout (mV) 1 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 The analog signal at the output of a D/A converter is linearly proportional to the binary code at the input of the converter. If the binary code at the input is 0001 and the output voltage is 5mV, then If the binary code at the input becomes 1001, the output voltage will become 45mV If a D/A converter has 4 digital inputs then the analog signal at the output can have one out of …… values. 16 If a D/A converter has N digital inputs then the analog signal at the output can have one out of ……. values. 2Ν ACOE343 - Embedded Real-Time Processor Systems - Frederick University 153 153

Characteristics of Data Converters
Number of digital lines The number bits at the input of a D/A (or output of an A/D) converter. Typical values: 8-bit, 10-bit, 12-bit and 16-bit Can be parallel or serial Microprocessor Compatibility Microprocessor compatible converters can be connected directly on the microprocessor bus as standard I/O devices They must have signals like CS, RD, and WR Activating the WR signal on an A/D converter starts the conversion process. Polarity Polar: the analog signals can have only positive values Bipolar: the analog signals can have either a positive or a negative value Full-scale output The maximum analog signal (voltage or current) Corresponds to a binary code with all bits set to 1 (for polar converters) Set externally by adjusting a variable resistor that sets the Reference Voltage (or current) ACOE343 - Embedded Real-Time Processor Systems - Frederick University 154 154

Characteristics of Data Converters (Cont…)
Resolution The analog voltage (or current) that corresponds to a change of 1LSB in the binary code It is affected by the number of bits of the converter and the Full Scale voltage (VFS) For example if the full-scale voltage of an 8-bit D/A converter is 2.55V the the resolution is: VFS/(2N-1) = 2.55 /(28-1) 2.55/255 = 0.01 V/LSB = 10mV/LSB Conversion Time The time from the moment that a “Start of Conversion” signal is applied to an A/D converter until the corresponding digital value appears on the data lines of the converter. For some types of A/D converters this time is predefined, while for others this time can vary according to the value of the analog signal. Settling Time The time needed by the analog signal at the output of a D/A converter to be within 10% of the nominal value. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 155 155

ADC RESPONSE TYPES Linear Most common Non-linear Used in telecommunications, since human voice carries more energy in the low frequencies than the high. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 156

ADC TYPES Direct Conversion Fast Low resolution Successive approximation Low-cost Slow Not constant conversion delay Sigma-delta High resolution, low-cost, high accuracy ACOE343 - Embedded Real-Time Processor Systems - Frederick University 157

Interfacing with Data Converters
Microprocessor compatible data converters are attached on the microprocessor’s bus as standard I/O devices. ACOE343 - Embedded Real-Time Processor Systems - Frederick University 158 158

Programming Example 1 Write a program to generate a positive ramp at the output of an 8-bit D/A converter with a 2V amplitude and a 1KHz frequency. Assume that the full scale voltage of the D/A converter is 2.55V. The D/A converter is in P0 and the WR signal is in P1.1 main() { do { for (i=0;i<200;i++) P1_0=1; P0=i; delayu(5); } } while (1) ACOE343 - Embedded Real-Time Processor Systems - Frederick University 159 159

D/A Converters example
Write a program to generate the waveform, shown below, at the output of an 8-bit digital to analog converter. The period of the waveform should be approximately 8 ms. Assume that a time delay function with a 1 μs resolution is available. The full scale output of the converter is 5.12 V and the address of the DAC is P0, while the WR signal is in P1.1. Assuming that an 8-bit A/D converter is used to interface a temperature sensor measuring temperature values in the temperature range , specify: The resolution in of the system in The digital output word for a temperature of 32.5 The temperature corresponding to a digital output word of ACOE343 - Embedded Real-Time Processor Systems - Frederick University 160 160

Introduction to Digital Signal Processors (DSPs)
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas

Outline/objectives Identify the most important DSP processor architecture features and how they relate to DSP applications Understand the types of code appropriate for DSP implementation ACOE343 - Embedded Real-Time Processor Systems - Frederick University 162

What is a DSP? A specialized microprocessor for real- time DSP applications Digital filtering (FIR and IIR) FFT Convolution, Matrix Multiplication etc 163 ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Hardware used in DSP ASIC FPGA GPP DSP Performance Very High High
Medium Medium High Flexibility Very low Power consumption low Low Medium Development Time Long Short 164 ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Common DSP features Harvard architecture Dedicated single-cycle Multiply-Accumulate (MAC) instruction (hardware MAC units) Single-Instruction Multiple Data (SIMD) Very Large Instruction Word (VLIW) architecture Pipelining Saturation arithmetic Zero overhead looping Hardware circular addressing Cache DMA ACOE343 - Embedded Real-Time Processor Systems - Frederick University 165

Harvard Architecture Physically separate memories and paths for instruction and data 166 ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Single-Cycle MAC unit Can compute a sum of n- products in n cycles ACOE343 - Embedded Real-Time Processor Systems - Frederick University 167

Single Instruction - Multiple Data (SIMD)
A technique for data-level parallelism by employing a number of processing elements working in parallel ACOE343 - Embedded Real-Time Processor Systems - Frederick University 168

Very Long Instruction Word (VLIW)
A technique for instruction-level parallelism by executing instructions without dependencies (known at compile-time) in parallel Example of a single VLIW instruction: F=a+b; c=e/g; d=x&y; w=z*h; 169 ACOE343 - Embedded Real-Time Processor Systems - Frederick University

CISC vs. RISC vs. VLIW ACOE343 - Embedded Real-Time Processor Systems - Frederick University 170

Pipelining DSPs commonly feature deep pipelines TMS320C6x processors have 3 pipeline stages with a number of phases (cycles): Fetch Program Address Generate (PG) Program Address Send (PS) Program ready wait (PW) Program receive (PR) Decode Dispatch (DP) Decode (DC) Execute 6 to 10 phases ACOE343 - Embedded Real-Time Processor Systems - Frederick University 171

Saturation Arithmetic
fixed range for operations like addition and multiplication normal overflow and underflow produce the maximum and minimum allowed value, respectively Associativity and distributivity no longer apply 1 signed byte saturation arithmetic examples: = 127 -127 – 5 = -128 ( ) – 25 = 122 ≠ 64 + (70 -25) = 109 ACOE343 - Embedded Real-Time Processor Systems - Frederick University 172

Examples Perform the following operations using one-byte saturation arithmetic 0x77 + 0x99 = 0x4*0x42= 0x3*0x51= ACOE343 - Embedded Real-Time Processor Systems - Frederick University 173

Zero Overhead Looping Hardware support for loops with a constant number of iterations using hardware loop counters and loop buffers No branching No loop overhead No pipeline stalls or branch prediction No need for loop unrolling ACOE343 - Embedded Real-Time Processor Systems - Frederick University 174

Hardware Circular Addressing
A data structure implementing a fixed length queue of fixed size objects where objects are added to the head of the queue while items are removed from the tail of the queue. Requires at least 2 pointers (head and tail) Extensively used in digital filtering y[n] = a0x[n]+a1x[n-1]+…+akx[n-k] 175 ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Direct Memory Access (DMA)
The feature that allows peripherals to access main memory without the intervention of the CPU Typically, the CPU initiates DMA transfer, does other operations while the transfer is in progress, and receives an interrupt from the DMA controller once the operation is complete. Can create cache coherency problems (the data in the cache may be different from the data in the external memory after DMA) Requires a DMA controller ACOE343 - Embedded Real-Time Processor Systems - Frederick University 176

Cache memory Separate instruction and data L1 caches (Harvard architecture) Cache coherence protocols required, since most systems use DMA ACOE343 - Embedded Real-Time Processor Systems - Frederick University 177

DSP vs. Microcontroller
Harvard Architecture VLIW/SIMD (parallel execution units) No bit level operations Hardware MACs DSP applications Microcontroller Mostly von Neumann Architecture Single execution unit Flexible bit-level operations No hardware MACs Control applications ACOE343 - Embedded Real-Time Processor Systems - Frederick University 178

Examples Estimate how long will the following code fragment take to execute on A general purpose processor with 1 GHz operating frequency, five-stage pipelining and 5 cycles required for multiplication, 1 cycle for addition A DSP running at 500 MHz, zero overhead looping and 6 independent ALUs and 2 independent single- cycle MAC units? for (i=0; i<8; i++) { a[i] = 2*i + 3; b[i] = 3*i + 5; } ACOE343 - Embedded Real-Time Processor Systems - Frederick University 179

Review Questions Which of the following code fragments is appropriate for SIMD implementation? a[0]=b[0]+c[0]; a[0]=b[0]&c[0]; a[2]=b[2]+c[2]; a[0]=b[0]%c[0]; a[4]=b[4]+c[4]; a[0]=b[0]+c[0]; a[6]=b[6]+c[6]; a[0]=b[0]/c[0]; Can the following instructions be merged into one VLIW instruction? If not in how many? a=b+c; d=c/e; f=d&a; g=b%c; ACOE343 - Embedded Real-Time Processor Systems - Frederick University 180

Review Questions Which of the following is not a typical DSP feature? Dedicated multiplier/MAC Von Neumann memory architecture Pipelining Saturation arithmetic Which implementation would you choose for lowest power consumption? ASIC FPGA General-Purpose Processor DSP ACOE343 - Embedded Real-Time Processor Systems - Frederick University 181

Examples How many VLIW instructions does the following program fragment require if there two independent data paths (a,b), with 3 ALUs and 1 MAC available in each and 8 instructions/word? How many cycles will it take to execute if they are the first instructions in the program and all instructions require 1 cycle, assuming the pipelining architecture of slide 10 with 6 phases of execution? ADD a1,a2,a3 ;a3 = a1+a2 SUB b1,b3,b4 ;b4 = b1-b3 MUL a2,a3,a5 ;a5 = a2-a3 MUL b3,b4,b2 ;b2 = b3*b4 AND a7,a0,a1 ;a1 = a7 AND a0 MUL a3,a4,a5 ;a5 = a3*a4 OR a6,a3,a2 ;a2 = a6 OR a3 ACOE343 - Embedded Real-Time Processor Systems - Frederick University 182

References DR. Chassaing, “DSP Applications using C and the TMS320C6x DSK”, Wiley, 2002 Texas Instruments, TMS320C64x datasheets Analog Devices, ADSP-21xx Processors ACOE343 - Embedded Real-Time Processor Systems - Frederick University 183

The TMS320C6x Family of DSPs
Lecture 5 The TMS320C6x Family of DSPs

Features High-Performance Fixed-Point Digital Signal Processor (TMS320C6413/C6410) − TMS320C6413 2-ns Instruction Cycle Time 500-MHz Clock Rate 4000 MIPS − TMS320C6410 2.5-ns Instruction Cycle Time 400-MHz Clock Rate 3200 MIPS Eight 32-Bit Instructions/Cycle ACOE343 - Real-Time Embedded Processor Systems - Frederick University

Features Eight Highly Independent Functional Units Six ALUs (32-/40-Bit), Each Supports Single 32- Bit, Dual 16-Bit, or Quad 8-Bit Arithmetic per Clock Cycle Two Multipliers Support Four 16 x 16-Bit Multiplies (32-Bit Results) per Clock Cycle or Eight 8 x 8-Bit Multiplies (16-Bit Results) per Clock Cycle Load-Store Architecture 64 32-Bit General-Purpose Registers Instruction Packing Reduces Code Size All Instructions Conditional ACOE343 - Real-Time Embedded Processor Systems - Frederick University

Features L1/L2 Memory Architecture − 128K-Bit (16K-Byte) L1P Program Cache (Direct Mapped) − 128K-Bit (16K-Byte) L1D Data Cache (2-Way Set- Associative) − 2M-Bit (256K-Byte) L2 Unified Mapped RAM/Cache [C6413] − 1M-Bit (128K-Byte) L2 Unified Mapped RAM/Cache [C6410] Endianess: Little Endian, Big Endian − 512M-Byte Total Addressable External Memory Space Enhanced Direct-Memory-Access (EDMA) Controller (64 Independent Channels) 16 prioritized interrupts ACOE343 - Real-Time Embedded Processor Systems - Frederick University

Block Diagram ACOE343 - Real-Time Embedded Processor Systems - Frederick University

Cache ACOE343 - Real-Time Embedded Processor Systems - Frederick University

Timers Two 32-bit timers used as timers/event/counters/interrupt sources Configuring a timer requires four basic steps: If the timer is not currently in the hold state, place the timer in hold (HLD = 0). Note that after device reset, the timer is already in the hold state Write the desired value to the timer period register (PRD). Write the desired value to the timer control register (CTL). Do not change the GO and HLD bits in CTL. Start the timer by setting the GO and HLD bits in CTL to 1. ACOE343 - Real-Time Embedded Processor Systems - Frederick University

Interrupts 16 interrupt sources 2 timer interrupts 4 external interrupts 4 McBSP interrupts 4 DMA interrupts ACOE343 - Real-Time Embedded Processor Systems - Frederick University

Interrupt control registers
CSR: control status register IER: Interrupt enable register IFR: interrupt flag register ISR: interrupt set register ICR: interrupt clear register ISTP: interrupt service table pointer IRP: interrupt return pointer NRP: NMI return pointer ACOE343 - Real-Time Embedded Processor Systems - Frederick University

Interrupt Priority ACOE343 - Real-Time Embedded Processor Systems - Frederick University

Interrupt Service Table (single fetch packet)
ACOE343 - Real-Time Embedded Processor Systems - Frederick University

Interrupt Service Table (branch to additional code)
ACOE343 - Real-Time Embedded Processor Systems - Frederick University

DMA ACOE343 - Real-Time Embedded Processor Systems - Frederick University

Programming the TMS320C6x Family of DSPs
Lecture 6 Programming the TMS320C6x Family of DSPs

Programming the TMS320C6x Family of DSPs
Programming model Assembly language Assembly code structure Assembly instructions C/C++ Intrinsic functions Optimizations Software Pipelining Inline Assembly Calling Assembly functions Using Interrupts Using DMA ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Programming model Two register files: A and B 16 registers in each register file (A0-A15), (B0-B15) A0, A1, B0, B1 used in conditions A4-A7, B4-B7 used for circular addressing ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Assembly language structure
A TMS320C6x assembly instruction includes up to seven items: Label Parallel bars Conditions Instruction Functional unit Operands Comment Format of assembly instruction: Label: parallel bars [condition] instruction unit operands ;comment ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Parallel bars || : indicates that current instruction executes in parallel with previous instruction, otherwise left blank ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Condition All assembly instructions are conditional If no condition is specified, the instruction executes always If a condition is specified, the instruction executes only if the condition is valid Registers used in conditions are A1, A2, B0, B1, and B2 Examples: [A] ;executes if A ≠ 0 [!A] ;executes if A = 0 [B0] ADD .L1 A1,A2,A3 || [!B0] ADD .L2 B1,B2,B3 ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Instruction Either directive or mnemonic Directives must begin with a period (.) Mnemonics should be in column 2 or higher Examples: .sect data ;creates a code section .word value ;one word of data ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Functional units (optional)
L units: 32/40 bit arithmetic/compare and 32 bit logic operations S units: 32-bit arithmetic operations, 32/40-bit shifts and 32-bit bit-field operations, 32-bit logical operations, Branches, Constant generation, Register transfers to/from control register file (.S2 only) M units: 16 x 16 multiply operations D units: 32-bit add, subtract, linear and circular address calculation, Loads and stores with 5-bit constant offset, Loads and stores with 15-bit constant, offset (.D2 only) ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Operands All instructions require a destination operand. Most instructions require one or two source operands. The destination operand must be in the same register file as one source operand. One source operand from each register file per execute packet can come from the register file opposite that of the other source operand. Example: ADD .L1 A0,A1,A3 ADD .L1 A0,B1,A2 ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Instruction format Fetch packet The same functional unit cannot be used in the same fetch packet ADD .S1 A0, A1, A2 ;.S1 is used for || SHR .S1 A3, 15, A4 ;...both instructions ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Arithmetic instructions
Add/subtract/multiply: ADD .L1 A3,A2,A1 ;A1←A2+A3 SUB .S1 A1,1,A1 ;decrement A1 MPY .M2 A7,B7,B6 ;multiply LSBs || MPYH .M1 A7,B7,A6 ;multiply MSBs ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Move and Load/store Instructions- Addressing Modes
Loading constants: MVK .S1 val1, A4 ;move low halfword MVKH .S1 val1, A4 ;move high halfword Indirect Addressing Mode: LDH .D2 *B2++, B7 ;load halfword B7←[B2], increment B2 || LDH .D1 *A2++, A7 ; load halfword A7←[A2], increment A2 STW .D2 A1, *+A4[20] ;store [A4]+20 words ← A2, ;preincrement/don’t modify A4 ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Example Calculate the values of register and memory for the following instructions: A2= 0x , MEM[0x ] = 0x0, MEM[0x ] = 0x1, MEM[0x ] = 0x2, MEM[0x C] = 0x3, LDH .D1 *++A2, A7 A2= ? A7= ? LDH .D1 *A2--[2], A7 A2= ? A7= ? LDH .D1 *-A2, A7 A2= ? A7= ? LDH .D1 *++A2[2], A7 A2= ? A7= ? ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Branch and Loop Instructions
Loop example: MVK .S1 count, A1 ;loop counter || MVKH .S2 count, A1 LOOP MVK .S1 val1, A4 ;loop MVKH .S1 val1, A4 ;body SUB .S1 A1,1,A1 ;decrement counter [A1] B .S2 Loop ;branch if A1 ≠ 0 NOP 5 ;5 NOPs for branch ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Assembler Directives .short : initiates 16-bit integer .int (.word .long) : initiates 32-bit integer .float : 32-bit single-precision floating-point .double : 64-bit double-precision floating-point .trip : .bss .far .stack ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Programming Using C Data types Intrinsic functions Inline assembly Linear assembly Calling assembly functions Code optimizations Software pipelining ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Data types char, signed char 8 bits ASCII unsigned char Short 16 bits 2's complement unsigned short 16 bits binary int, signed int 32 bits 2's complement unsigned int 32 bits binary long, signed long 40 bits 2's complement unsigned long 40 bits binary Enum Float 32 bits IEEE 32-bit Double 64 bits IEEE 64-bit long double Pointers 3 ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Intrinsic functions Available C functions used to increase efficiency int_mpy(): MPY instruction, multiplies 16 LSBs int_mpyh(): MPYH instruction, multiplies 16 MSBs int_mpylh(): MPYHL instruction, multiplies 16 LSBs with 16 MSBs int_mpyhl(): MPYHL instruction, multiplies 16 MSBs with 16 LSBs ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Inline Assembly Assembly instructions and directives can be incorporated within a C program using the asm statement asm (“assembly code”); ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Calling Assembly Functions
An external declaration of an assembly function can be called from a C program extern int func(); ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Example Program that calculates S=n+(n-1)+…+1 by calling assembly function #include <stdio.h> main() { short n=6; short result; result = sumfunc(n); printf(“sum = %d”, result); } ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Example (continued) Assembly function: .def _sumfunc _sumfunc: MV .L1 A4,A1 ;n is loop counter SUB .S1 A1,1,A1 ;decrement n LOOP: ADD .L1 A4,A1,A4 ;A4 is accumulator [A1] B .S2 LOOP ;branch if A1 ≠ 0 NOP 5 ;branch delay nops B .S2 B3 ;return from calling NOP 5 ;five NOPS for delay .end ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Example Write a program that calculates the first 6 Fibonacci numbers by calling an assembly function ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Linear Assembly enables writing assembly-like programs without worrying about register usage, pipelining, delay slots, etc. The assembler optimizer program reads the linear assembly code to figure out the algorithm, and then it produces an optimized list of assembly code to perform the operations. Source file extension is .sa The linear assembly programming lets you: use symbolic names forget pipeline issues ignore putting NOPs, parallel bars, functional units, register names more efficiently use CPU resources than C. ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Linear Assembly Example
_sumfunc: .cproc np ;.cproc directive starts a C callable procedure .reg y ;.reg directive use descriptive names for values that will be stored in registers MVK np,cnt loop: .trip 6 ; trip count indicates how many times a loop will iterate SUB cnt,1,cnt ADD y,cnt,y [cnt] B loop .return y .endproc ; .endproc to end a C procedure Equivalent assembly function .def _sumfunc _sumfunc: MV .L1 A4,A1 ;n is loop counter LOOP: SUB .S1 A1,1,A1 ;decrement n ADD .L1 A4,A1,A4 ;A4 is accumulator [A1] B .S2 LOOP ;branch if A1 ≠ 0 NOP 5 ;branch delay nops B .S2 B3 ;return from calling NOP 5 ;five NOPS for delay .end ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Software Pipelining A loop optimization technique so that all functional units are utilized within one cycle. Similar to hardware pipelining, but done by the programmer or the compiler, not the processor Three stages: Prolog (warm-up): instructions needed to build up the loop kernel (cycle) Loop kernel (cycle): all instructions executed in parallel. Entire kernel executed in one cycle. Epilog (cool-off): Instructions necessary to complete all iterations ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Software pipelining procedure
Draw a dependency graph Draw nodes and paths Write number of cycles for each instruction Assign functional units Set up a scheduling table Obtain code from scheduling table ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Software pipelining example
for (i=0; i<16; i++) sum = sum + a[i]*b[i]; ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Dependency Graph LDH: 5 cycles MPY: 2 cycles ADD: 1 cycle SUB: 1 cycle
LOOP: 6 cycles ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Scheduling Table Unit C1, C9.. C2, C10… C3, C11.. C4, C12… C5, C13… C6, C14… C7, C15… C8, C16… .D1 LDH .D2 .M1 MPY .L1 ADD .L2 SUB .S2 B Unit C1, C9.. C2, C10… C3, C11.. C4, C12… C5, C13… C6, C14… C7, C15… C8, C16… Prolog Kernel .D1 LDH .D2 .M1 MPY .L1 ADD .L2 SUB .S2 B ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Assembly Code ;cycle 1 MVK .L2 16,B1 ;loop count || ZERO .L1 A7 ;sum || LDH .D1 *A4++,A2 ;input in A2 || LDH .D2 *B4++,B2 ;input in B2 ;cycle 2 LDH .D1 *A4++,A2 ;input in A2 || [B1] SUB .L2 B1,1,B1 ;decrement count ;cycle 3 || [B1] SUB .L2 B1,1,B1 ;decrement || [B1] B .S2 LOOP ;cycle 4 ;cycle 5 ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Assembly code ;cycle 6 LDH .D1 *A4++,A2 ;input in A2 || LDH .D2 *B4++,B2 ;input in B2 || [B1] SUB .L2 B1,1,B1 ;decrement || [B1] B .S2 LOOP || MPY .M1x A2,B2,A6 ;cycle 7 ;cycles 8-21(loop kernel) LOOP: LDH .D1 *A4++,A2 ;input in A2 || MPY .M1x A2,B2,A6 ;multiplication || ADD .L1 A6,A7,A7 ;cycle 22 (epilog) ADD .L1 A6,A7,A7 ;final sum ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Example Use software pipelining in the following example: for (i=0; i<16; i++) sum = sum + a[i]*b[i]; ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Loop unrolling A technique for reducing the loop overhead The overhead decreases as the unrolling factor increases at the expense of code size Doesn’t work with zero overhead looping hardware DSPs for (i=0; i<64; i++) { sum +=*(data++); } for (i=0; i<64/4; i++) { sum +=*(data++); } ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Loop Unrolling example
Unroll the following loop by a factor of 2, 4, and eight for (i=0; i<64; i++) { a[i] = b[i] + c[i+1]; } ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Code optimization steps
When code performance is not satisfactory the following steps can be taken: Use intrinsic functions Use compiler optimization levels Use profiling then convert functions that need optimization to linear ASM Optimize code in ASM ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Profiling using profiling tool
ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Profiling using clock function
#include <time.h> /* in order to call clock()*/ main() { … clock_t start, stop, overhead; start = clock(); /* Calculate overhead of calling clock*/ stop = clock(); /* and subtract this value from The results*/ overhead = stop − start; start = clock(); /* code to be profiled */ stop = clock(); printf(”cycles: %d\n”, stop − start − overhead); } ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Code optimization Use instructions in parallel Eliminate NOPs Unroll loops Use software pipelining ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Using Interrupts 16 interrupt sources 2 timer interrupts 4 external interrupts 4 McBSP interrupts 4 DMA interrupts ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Loop program with interrupt
interrupt void c_int11 //ISR { int sample_data; sample_data = input_sample(); //input data output_sample(sample_data); //output data } void main() comm_intr(); //init DSK, codec, McBSP //enable INT11 and GIE while(1); //infinite loop ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Using DMA ACOE343 - Embedded Real-Time Processor Systems - Frederick University

Real-time Digital Signal Processing with the TMS320C6x
Lecture 7 Real-time Digital Signal Processing with the TMS320C6x Dr. Konstantinos Tatas

Outline Digital Signal Processing Basics
ADC and sampling Aliasing Discrete-time signals Digital Filtering FFT Effects of finite fixed-point wordlength Efficient DSP application implementation on TMS320C6x processors FIR filter implementation IIR filter implementation

Digital Signal Processing Basics
A basic DSP system is composed of: An ADC providing digital samples of an analog input A Digital Processing system (μP/ASIC/FPGA) A DAC converting processed samples to analog output Real-time signal processing: All processing operation must be complete between two consecutive samples

ADC and Sampling An ADC performs the following:
Quantization Binary Coding Sampling rate must be at least twice as much as the highest frequency component of the analog input signal

Aliasing When sampling at a rate of fs samples/s, if k is any positive or negative integer, it’s impossible to distinguish between the sampled values of a sinewave of f0 Hz and a sinewave of (f0+kfs) Hz.

Discrete-time signals
A continuous signal input is denoted x(t) A discrete-time signal is denoted x(n), where n = 0, 1, 2, … Therefore a discrete time signal is just a collection of samples obtained at regular intervals (sampling frequency)

Common Digital Sequences
Unit-step sequence: Unit-impulse sequence:

The z transform Discrete equivalent of the Laplace transform

z-transform properties
Linear Shift theorem Note:

Transfer Function z-transform of the output/z transfer of the input
Pole-zero form

Pole-zero plot

System Stability Position of the poles affects system stability
The position of zeroes does not

Example 1 A system is described by the following equation:
y(n)=0.5x(n) + 0.2x(n-1) + 0.1y(n-1) Plot the system’s transfer function on the z plane Is the system stable? Plot the system’s unit step response Plot the system’s unit impulse response

The Discrete Fourier Transform (DFT)
Discrete equivalent of the continuous Fourier Transform A mathematical procedure used to determine the harmonic, or frequency, content of a discrete signal sequence

The Fast Fourier Transform (FFT)
FFT is not an approximation of the DFT, it gives precisely the same result

Digital Filtering In signal processing, the function of a filter is to remove unwanted parts of the signal, such as random noise, or to extract useful parts of the signal, such as the components lying within a certain frequency range Analog Filter: Input: electrical voltage or current which is the direct analogue of a physical quantity (sensor output) Components: resistors, capacitors and op amps Output: Filtered electrical voltage or current Applications: noise reduction, video signal enhancement, graphic equalisers Digital Filter: Input: Digitized samples of analog input (requires ADC) Components: Digital processor (PC/DSP/ASIC/FPGA) Output: Filtered samples (requires DAC)

Averaging Filter

Ideal Filter Frequency Response

Realistic vs. Ideal Filter Response

FIR filtering Finite Impulse Response (FIR) filters use past input samples only Example: y(n)=0.1x(n)+0.25x(n-1)+0.2x(n-2) Z-transform: Y(z)=0.1X(z)+0.25X(z)z^(- 1)+0.2X(z)(z^-2) Transfer function: H(z)=Y(z)/X(z)= z^(-1)+0.2(z^-2) No poles, just zeroes. FIR is stable

FIR filter design Inverse DFT of H(m)

FIR Filter Implementation
y(n)=h(0)x(n)+h(1)x(n-1)+h(2)x(n-2)+h(3)x(n-3)

Example 2 A filter is described by the following equation:
y(n)=0.5x(n) + 1x(n-1) + 0.5x(n-2), with initial condition y(-1) = 0 What kind of filter is it? Plot the filter’s transfer function on the z plane Is the filter stable? Plot the filter’s unit step response Plot the filter’s unit impulse response

IIR Filtering Infinite Impulse Response (IIR) filters use past outputs together with past inputs

IIR Filter Implementation
y(n)=b(0)x(n)+b(1)x(n-1)+b(2)x(n-2)+b(3)x(n-3) + a(0)y(n)+a(1)y(n-1)+a(2)y(n-2)+a(3)y(n-3)

FIR - IIR filter comparison
Simpler to design Inherently stable Can be designed to have linear phase Require lower bit precision IIR Need less taps (memory, multiplications) Can simulate analog filters

Example 3 A filter is described by the following equation:
y(n)=0.5x(n) + 0.2x(n-1) + 0.5y(n-1) + 0.2y(n-2), with initial condition y(-1)=y(-2) = 0 What kind of filter is it? Plot the filter’s transfer function on the z plane Is the filter stable? Plot the filter’s unit step response Plot the filter’s unit impulse response

Software and Hardware Implementation of FIR filters

Fixed-Point Binary Representation
Representation of a number with integer and fractional part: This is denoted as Qnm representation The binary point is implied It will affect the accuracy (dynamic range and precision) of the number Purely a programmer’s convention and has no relationship with the hardware.

Examples x = b Q0.15 => x= 2^(-1) + 2^(-4) + 2^(-11)+2^(-12) Q1.14 => x= 2^0 + 2^(−3) + 2^(−10) + 2^(−11) Q2.13 => x = 2^1 + 2^(−2) + 2^(−9) + 2^(−10) Q7.8 => x = ? Q12.3 => x = ?

EFFECTS OF FINITE FIXED-POINT BINARY WORD LENGTH
Quantization Errors ADC Coefficients Truncation Rounding Data Overflow

ADC Quantization Error
ADC converts an analog signal x(t) into a digital signal x(n), through sampling, quantization and encoding Assuming x(n) is interpreted as the Q15 fractional number such that −1 ≤ x(n) < 1 dynamic range of fractional numbers is 2. Since the quantizer employs B bits, the number of quantization levels available is 2B The spacing between two successive quantization levels is Δ = 2/2^B = 2^(1-B) Therefore the quantization error is |e(n)|≤Δ/2

Coefficient Quantization Error
Effects on FIR filters Location of zeroes changes Therefore, frequency response changes Effects on IIR filters Location of poles and zeroes change Could move poles outside of unit circle, leading to unstable implementations

Roundoff error

Overflow error signals and coefficients normalized in the range of −1 to 1 for fixed-point arithmetic, the sum of two B-bit numbers may fall outside the range of −1 to 1. Severely distorts the signal Overflow handling Saturation arithmetic “Clips” the signal, although better than overflow Should only be used to guarantee no overflow, but should not be the only solution Scaling of signals and coefficients

Coefficient representation
Fractional 2’s complement (Q) representation is used To avoid overflow, often scaling down by a power of two factor (S) (right shift) is used. The scaling factor is given by the equation: S=Imax(|h(0)|+|h(1)|+|h(2)|+…) Furthermore, filter coefficient larger than 1, cause overflow and are scaled down further by a factor B, in order to be less than 1

Example 1 Given the FIR filter Solution:
y(n)=0.1x(n)+0.25x(n-1)+0.2x(n-2) Assuming the input range occupies ¼ of the full range Develop the DSP implementation equations in Q-15 format. What is the coefficient quantization error? Solution: S=1/4((|h(0)|+|h(1)|+|h(2)|) = ¼( )=3.25/4 Overflow cannot occur, no input (S) scaling required No coefficents > 1, no coefficient (B) scaling required

Example 2 Given the FIR filter Solution:
y(n)=0.8x(n)+3x(n-1)+0.6x(n-2) Assuming the input range occupies ¼ of the full range Develop the DSP implementation equations in Q-15 format. What is the coefficient quantization error? Solution: S=1/4((|h(0)|+|h(1)|+|h(2)|) = ¼( )=4.4/4 = 1.1 Therefore: S=2 Largest coefficient: h(1) = 3, therefore B=4 ys(n)=0.2xs(n)+0.75xs(n-1)+0.15xs(n-2)

Simple FIR Hardware implementation
VHDL 4-tap filter entity ENTITY my_fir is Port (clk, rst: in std_logic; sample_in: in std_logic_vector(length-1 downto 0); sample_out: out std_logic_vector(length-1 downto 0) ); END ENTITY my_fir;

Example (continued) VHDL 4-tap filter architecture
ARCHITECTURE rtl of my_fir is type taps is array 0 to 3 of std_logic_vector(length-1 downto 0); Signal h: taps_type; Signal x_p1, x_p2, x_p3: std_logic_vector(length-1 downto 0); --past samples Signal y: std_logic_vector(2*length-1 downto 0); Begin Process (clk, rst) If rst=‘1’ then x_p1 <= (others => ‘0’); x_p2 <= (others => ‘0’); x_p3 <= (others => ‘0’); Elsif rising_edge(clk) then x_p1 <= sample_in; --delay registers x_p2 <= x_p1; x_p3 <= x_p2; End if; End process; y <= sample_in*h(0)+x_p1*h(1)+x_p2*h(2)+x_p3*h(3); Sample_out <= y(2*length-1 downto length); End;

FIR Software Implementation
int yn=0; //filter output initialization short xdly[N+1]; //input delay samples array interrupt void c_int11() //ISR { short i; yn=0; short h[N] = { //coefficients }; xdly[0]=input_sample(); for (i=0; i<N; i++) yn += (h[i]*xdly[i]); for (i=N-1; i>0; i--) xdly[i] = xdly[i-1]; output_sample(yn >> 15); //filter output return; //return from ISR }

IIR C implementation int yn=0; //filter output initialization
short xdly[N+1]; //input delay samples array Short ydly[M]; //output delay array interrupt void c_int11() //ISR { short i; yn=0; short a[N] = { //coefficients }; short b[M] = { //coefficients }; xdly[0]=input_sample(); for (i=0; i<N; i++) yn += (a[i]*xdly[i]); for (i=0; i<M; i++) yn += (b[i]*ydly[i]); for (i=N-1; i>0; i--) xdly[i] = xdly[i-1]; ydly[0] = yn >> 15; for (i=M-1; i>0; i--) ydly[i] = ydly[i-1]; output_sample(yn >> 15); //filter output return; //return from ISR }

Introduction to Microcontrollers

Similar presentations

Presentation on theme: "Introduction to Microcontrollers"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Microcontrollers

Similar presentations

Presentation on theme: "Introduction to Microcontrollers"— Presentation transcript:

Similar presentations

About project

Feedback