Embedded Systems Design

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

Computer Architecture
DSPs Vs General Purpose Microprocessors
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
INSTRUCTION SET ARCHITECTURES
MICROPROCESSORS TWO TYPES OF MODELS ARE USED :  PROGRAMMER’S MODEL :- THIS MODEL SHOWS FEATURES, SUCH AS INTERNAL REGISTERS, ADDRESS,DATA & CONTROL BUSES.
Real time DSP Professors: Eng. Julian S. Bruno Eng. Jerónimo F. Atencio Sr. Lucio Martinez Garbino.
Computer Organization and Architecture
Processor System Architecture
MICRO PROCESSER The micro processer is a multipurpose programmable, clock driven, register based, electronic integrated device that has computing and decision.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Computer Organization and Architecture
Computer Organization and Architecture
TigerSHARC and Blackfin Different Applications. Introduction Quick overview of TigerSHARC Quick overview of Blackfin low power processor Case Study: Blackfin.
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
1 Architectural Analysis of a DSP Device, the Instruction Set and the Addressing Modes SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software.
1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc.
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
Microprocessor Systems Design I Instructor: Dr. Michael Geiger Spring 2012 Lecture 2: 80386DX Internal Architecture & Data Organization.
Unit-1 PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE Advance Processor.
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Engineering 1040: Mechanisms & Electric Circuits Fall 2011 Introduction to Embedded Systems.
Ehsan Shams Saeed Sharifi Tehrani. What is DSP ? Digital Signal Processing (DSP) is used in a wide variety of applications, and it is hard to find a good.
Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Introduction of Intel Processors
Advanced Computer Architecture 0 Lecture # 1 Introduction by Husnain Sherazi.
Computer Organization - 1. INPUT PROCESS OUTPUT List different input devices Compare the use of voice recognition as opposed to the entry of data via.
Module : Algorithmic state machines. Machine language Machine language is built up from discrete statements or instructions. On the processing architecture,
Introduction to Microprocessors
Computer Hardware A computer is made of internal components Central Processor Unit Internal External and external components.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
DIGITAL SIGNAL PROCESSORS. Von Neumann Architecture Computers to be programmed by codes residing in memory. Single Memory to store data and program.
80386DX functional Block Diagram PIN Description Register set Flags Physical address space Data types.
Stored Program A stored-program digital computer is one that keeps its programmed instructions, as well as its data, in read-write,
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
Instruction Sets: Addressing modes and Formats Group #4  Eloy Reyes  Rafael Arevalo  Julio Hernandez  Humood Aljassar Computer Design EEL 4709c Prof:
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
What is a Microprocessor ? A microprocessor consists of an ALU to perform arithmetic and logic manipulations, registers, and a control unit Its has some.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Overview von Neumann Architecture Computer component Computer function
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Fundamentals of Programming Languages-II
بسم الله الرحمن الرحيم MEMORY AND I/O.
Instruction Sets. Instruction set It is a list of all instructions that a processor can execute. It is a list of all instructions that a processor can.
1 Basic Processor Architecture. 2 Building Blocks of Processor Systems CPU.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.
William Stallings Computer Organization and Architecture 8th Edition
Embedded Systems Design
Processor Organization and Architecture
Digital Signal Processors
Subject Name: Digital Signal Processing Algorithms & Architecture
Subject Name: Digital Signal Processing Algorithms & Architecture
Introduction to Microprocessor Programming
Digital Signal Processors-1
Introduction to 5685x Series
EECE.3170 Microprocessor Systems Design I
Chapter 11 Processor Structure and function
ADSP 21065L.
Presentation transcript:

Embedded Systems Design DSP Processors Engr. Naveed Khan Baloch

Digital Signal Processing Processing of digitally represented signals Signals represented digitally via sequence of samples Digital signals obtained from physical signals via Transducers and Analog to digital convertors (ADC) Digital Signal Processor Electronic system that processes digital signals

Definition A digital signal processor (DSP) is a specialized microprocessor with optimized architecture for fast operational needs of Digital Signal Processing.

DSP Applications Audio Communication Control Medical Defense Coding, Decoding, Surround-sound Communication Scrambling, Cellular phones, software radios Control Robotics, Disk drive control, motor control Medical Diagnostics equipment, hearing aids Defense Radar and sonar processing, missile guidance

Why DSP Processors ? Reprogrammable Cost effective Fast computation Energy Efficiency Fast Multipliers Multiple Execution Units Efficient Memory Accesses Circular Buffering Data Format Zero-Overhead Looping Streamlined I/O Specialized Instruction Sets SIMD

Reprogrammable

Cost Effective There is no need for a separate signal processing unit Signal processing and control functions can be performed on a single silicon chip

Faster computation Because of specialized Hardware for DSP application computation becomes vary fast Separate MAC units and fast multipliers are used for many DSP algorithms for faster execution i.e. FIR Filter IIR Filter DCT FFT

Fast Multipliers Originally, microprocessors implemented multiplications by a series of shift and add operations, each of which consumed one or more clock cycles. Most DSP processors can only take one clock cycle for the multiplication operation. modern DSP processors include at least one dedicated single- cycle multiplier or combined multiply-accumulate (MAC) unit

Multiple Execution Units DSP applications typically have very high computational requirements in comparison to other types of computing tasks, since they often must execute DSP algorithms (such as FIR filtering) in real time on lengthy segments of signals sampled at 10-100 KHz or higher. Hence, DSP processors often include several independent execution units that are capable of operating in parallel—for example, in addition to the MAC unit, they typically contain an arithmetic- logic unit (ALU) and a shifter.

Efficient Memory Accesses Small bank of RAM near the processor core that is used as an instruction cache Many DSP processors also support “circular addressing,” which allows the processor to access a block of data sequentially and then automatically wrap around to the beginning address

Circular Buffering The process by which the Data Address Generator (DAG) “wraps around” or repeatedly steps through a range of registers. Instructions Accommodate 3 elements Buffer Address Buffer Size Increment

Data Format Fixed point and floating point processors. Use of Accumulator to reduce the overflow.

Assignment # 2 Highlight the difference between the Architecture of Fixed point and Floating point DSP processors with at least 2 examples from TI and Blackfin processors. www.TI.com www.analog.com

Zero-Overhead Looping Special loop or repeat instruction is provided which allows the programmer to implement a for-next loop without expending any clock cycles for updating and testing the loop counter or branching back to the top of the loop. This feature is often referred to as “zero-overhead looping.”

Streamlined I/O To allow low-cost, high-performance input and output, most DSP processors incorporate one or more specialized serial or parallel I/O interfaces, and streamlined I/O handling mechanisms, such as low-overhead interrupts and direct memory access (DMA), to allow data transfers to proceed with little or no intervention from the processor's computational units.

Specialized Instruction Sets DSP processor instruction sets have traditionally been designed with two goals in mind Maximum use of the processor's underlying hardware Minimize the amount of memory space required to store DSP programs Highly Specialized Complicated Irregular Use Assembly instead of C for maximum benefit

SIMD SIMD, or single-instruction, multiple-data, is not a class of architecture itself, but is instead an architectural technique that can be used within any of the classes of architectures Improves performance on some algorithms by allowing the processor to execute multiple instances of the same operation For example, a SIMD multiplication instruction could perform two or more multiplications on different sets of input operands in parallel in a single clock cycle.

Outline Blackfin Family Overview The Blackfin Core Arithmetic operations Data fetching Sequencing The Blackfin Bus Architecture and Memory Modified Harvard architecture Hierarchical memory structure Flexible memory management Additional Blackfin Core Features DMA Dynamic power management On-chip debug support

Blackfin Family Overview The Blackfin family consists of: A broad range of Blackfin processors Software development tools Hardware evaluation and debug tools Extensive third-party support Development tools Operating systems TCP/IP stacks Hardware building blocks Software solutions

Blackfin Processors All Blackfin processors combine extensive DSP capability with high end MCU functions on the same core. Creates a highly efficient and cost-effective solution. A single software development tool chain All Blackfin processors are based on the same core architecture. Once you understand one Blackfin processor, you can easily migrate from one family member to another. Code compatible across family members. Processors vary in clock speed, amount of on-chip memory, peripheral suite, package types and sizes, power, and price. Large selection lets you optimize your choice of a Blackfin processor for your application.

Blackfin Family Peripherals The Blackfin family supports a wide variety of I/O: EBIU (External Bus Interface Unit) Parallel peripheral interface (PPI) Serial ports (SPORTS) GPIO Timers UARTS SPI® Ethernet USB CAN® Two Wire Interface (TWI) Pixel compositor Lockbox™secure technology Host DMA ATAPI SDIO

Blackfin Processors Perform Signal Processing and Microcontroller Functions

Blackfin Architecture What does it mean for the developer? Combining controller and DSP capabilities into a single core, along with rich I/O, enables development of efficient, low cost embedded media applications. For example, multimedia over IP, digital cameras, telematics, software radio From a development perspective, a single core means there is only one tool chain. An embedded application consisting of both control and signal processing modules is built using the same compiler. The result is dense control code and high performance DSP code.

Features Controller L1 memory space for stack and heap Dedicated stack and frame pointers Byte addressability Simple bit-level manipulation DSP Fast, flexible arithmetic computational units Unconstrained data flow to/from computational units Extended precision and dynamic range Efficient sequencing Efficient I/O processing The DSP aspect of the Blackfin core is optimized to perform FFTsand convolutions

Blackfin Core (e.g., ADSP-BF54x)

The Blackfin Core

The Blackfin CoreThe core consists of: Arithmetic unit Supports SIMD operation Load/store architectureAddressing Addressing unit Supports dual data fetch Sequencer Efficient program flow control Register files Data Addressing

The Arithmetic Unit The Arithmetic UnitPerforms arithmetic operations Dual 40-bit ALU (Arithmetic/Logic Unit) Performs 16-/32-/40-bit arithmetic and logical operations Dual 16 x 16 multiplier Performs dual MACs(multiply-accumulates) when used with ALUs Barrel shifter Performs shifts, rotates, bit operations

Data Registers There are 8x 32-bit registers in the data register file. Used to hold 32-bit vales or packed 16-bit There are also 2x 40-bit accumulators. Typically used for MAC operations

16-Bit ALU Operations—Examples The Algebraic Assembly syntax is intuitive and makes it easy to understand what the instruction is doing.

32-Bit ALU Operations—Examples

Dual MAC Operations—Example

Barrel Shifter Enable shifting or rotating any number of bits within a 16-/32-/40-bit register in a single cycle Perform individual bit operations on 32-bit data register contents BITSET, BITCLR, BITTGL, BITTST Field Extract and Deposit instructions Extract or insert a field of bits out of or into a 32-bit data register

8-Bit ALU Operations Four 8-bit ALUsprovide parallel computational power targeted mainly for video operations. Quad 8-bit add/subtract Quad 8-bit average SAA (Subtract-Absolute-Accumulate) instruction A quad 8-bit ALU instruction takes one cycle to complete.

Additional Arithmetic Instructions There are a number of specialized instructions that are used to speed up the inner loop on various algorithms. Bitwise XOR Enable creating LFSR (Linear Feedback Shift Registers) for use in CRC calculations or the generation of PRN sequences Bit stream multiplexing, add on sign, compare select Convolutionalencoder and Viterbidecoder support Add/Subtract with prescaleup/down IEEE 1180–compliant 2D 8 x 8 DCTs(Discrete Cosine Transforms) Vector search Enable search a vector a pair at a time for greatest or least value

The Addressing Unit UnitThe addressing unit generates addresses for data fetches. Two DAG (Data Address Generator) arithmetic units enable generation of independent 32-bit wide addresses that can reach anywhere within the Blackfin memory space. Up to two fetches can occur at the same time.

Address Registers There are 6x general-purpose Pointer Registers. Used for GP 8-/16-/32-bit fetches from memory There are four sets of registers used for DSP-style data accesses. Used for 16-/32-bit DSP data fetches such as dual data fetch, circular buffer addressing, and bit reversal There are also dedicated stack (SP) and frame (FP) pointers. These are used for 32-bit accesses to stack frames.

Addressing Addressing Unit supports: Addressing only With specified Pointer or Index Register Provide address and post modify Add an offset after the fetch is done Circular buffering supported with this method Provide address with an offset Add an offset before the fetch, but no pointer update Update address only Modify address with reverse carry add All addressing is Register Indirect.

Addressing Index Registers I0-I3 (32-/16-bit accesses) Pointer Registers P0–P5 (32-/16-/8-bit accesses) Stack and Frame Pointer Registers (32-bit accesses) All addresses are Byte addresses. Ordering is Little Endian. Addresses must be aligned for the word size being fetched. i.e., 32-bit fetches from addresses that are a multiple of four

Circular Buffer Example ExampleBase address (B) and Starting address (I) = 0 Buffer length L = 44(There are 11 data elements and each data element is 4-bytes) Modify value M = 16 (4 elements *4-bytes/element) Example memory access:R1 = [I0 ++ M2]; The Addressing Unit supports Circular Buffer pointer addressing. The process of boundary checking and pointer wrapping to stay inbounds happens in hardware with no overhead. Buffers can be placed anywhere in memory without restriction dueto the Base address registers.

The Sequencer The sequencer’s function is to generate addresses for fetching instructions. Uses a variety of registers to select the next address Aligns instructions as they are fetched Always reads 64 bits from memory Realigns what is fetched into individual 16-/32-/64-bit opcodes before sending to the execution pipeline Handles events Interrupts and exceptions Conditional execution