Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2007 Elsevier Lecture 6: Embedded Processors Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte Based on slides and textbook from Wayne.

Similar presentations


Presentation on theme: "© 2007 Elsevier Lecture 6: Embedded Processors Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte Based on slides and textbook from Wayne."— Presentation transcript:

1 © 2007 Elsevier Lecture 6: Embedded Processors Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte Based on slides and textbook from Wayne Wolf

2 High Performance Embedded Computing © 2007 Elsevier Topics Embedded microprocessor market. Categories of CPUs. RISC, DSP, and Multimedia processors. CPU mechanisms.

3 High Performance Embedded Computing © 2007 Elsevier Demand for Embedded Processors Embedded processors account for  Over 97% of total processors sold  Over 60% of total sales from processors Sales expected to increase by roughly 15% each year

4 High Performance Embedded Computing © 2007 Elsevier Flynn’s taxonomy of processors Single-instruction single-data (SISD) Single-instruction multiple-data (SIMD) Multiple-instruction multiple-data (MIMD) Multiple-instruction single data (MISD) What is an example of each? Which would you expect to see in embedded systems?

5 High Performance Embedded Computing © 2007 Elsevier Other axes of comparison RISC vs. CISC---Instruction set style. Instruction issue width. Static vs. dynamic scheduling for multiple- issue machines. Scalar vs. vector processing. Single-threaded vs. multithreading. A single CPU can fit into multiple categories.

6 High Performance Embedded Computing © 2007 Elsevier Embedded vs. general-purpose processors Embedded processors may be customized for a category of applications.  Customization may be narrow or broad. We may judge embedded processors using different metrics:  Code size.  Energy efficiency.  Memory system performance.  Predictability.

7 High Performance Embedded Computing © 2007 Elsevier Embedded RISC processors RISC processors often have simple, highly- pipelinable instructions Pipelines of embedded RISC processors have grown over time:  ARM7 has 3-stage pipeline.  ARM9 has 5-stage pipeline  ARM11 has 8-stage pipeline. ARM11 pipeline [ARM05].

8 High Performance Embedded Computing © 2007 Elsevier RISC processor families ARM:  ARM7 has in-order execution, and no memory management or branch prediction;  ARM9 ARM11 has out of order execution, memory management, and branch prediction, MIPS:  MIPS32 4K has 5-stage pipeline;  4KE family has DSP extension;  4KS is designed for security. PowerPC:  PowerPC 400 series includes several embedded processors;  Motorola and IBM offer superscalar versions of the PowerPC

9 High Performance Embedded Computing © 2007 Elsevier Embedded DSP Processors DSP processors feature  Deterministic execution times  Fast multiply-accumulate instructions  Multiple data accesses per cycle  Specialized addressing modes  Efficient support for loops and interrupts  Efficient processing of “streaming” data n Embedded DSP processors are optimized to perform DSP algorithms; speech coding, filtering, convolution, fast Fourier transforms, discrete cosine transforms

10 High Performance Embedded Computing © 2007 Elsevier Example: TI C55x/C54x DSPs 40-bit arithmetic (32-bit values + 8 guard bits). Barrel shifter. 17 x 17 multiplier. Two address generators. Lots of special purpose registers and addressing modes Coprocessors for compute-intensive functions including pixel interpolation, motion estimation, and DCT/IDCT computations

11 High Performance Embedded Computing © 2007 Elsevier TI C55x microarchitecture

12 High Performance Embedded Computing © 2007 Elsevier Parallelism extraction Static:  Use compiler to analyze program.  Simpler CPU.  Can’t depend on data values.  VLIW Dynamic:  Use hardware to identify opportunities.  More complex CPU.  Can make use of data values.  Superscalar

13 High Performance Embedded Computing © 2007 Elsevier VLIW architectures Each very long instruction word (VLIW) erforms multiple operations in parallel Needs a good compiler that understands the architecture Allows deterministic execution times Code growth can be reduced by allowing  Operations within an instruction to be performed sequentially  A given field to specify different types of operations BranchMemory ArithmeticLogic Vector Branch/Mem Mem/Arith Vector Arith/Logic Seq

14 High Performance Embedded Computing © 2007 Elsevier Simple VLIW architecture Large register file feeds multiple function units. Register file E box Add r1,r2,r3; Sub r4,r5,r6; Ld r7,foo; St r8,baz; NOP ALU Load/store FU

15 High Performance Embedded Computing © 2007 Elsevier Clustered VLIW architecture Register file, function units divided into clusters. What are advantages/disadvantages of having clusters in VLIW architectures? Execution Register file Execution Register file Cluster bus

16 High Performance Embedded Computing © 2007 Elsevier TI C62x/C67x DSPs VLIW with up to 8 instructions/cycle. 32 32-bit registers. Function units:  Two multipliers.  Six ALUs. All instructions execute conditionally.

17 High Performance Embedded Computing © 2007 Elsevier TI C6x data operations 8/16/32-bit arithmetic. 40-bit operations. Bit manipulation operations. C67x processors add floating-point arithmetic.

18 High Performance Embedded Computing © 2007 Elsevier C6x block diagram Data path 1/ Reg file 1 Data path 2/ Reg file 2 Execute DMA timers Serial Program RAM/cache 512K bits Data RAM 512K bits JTAG PLL bus

19 High Performance Embedded Computing © 2007 Elsevier Texas Instruments C62x N. Seshan, “High VelociTI processing [Texas Instruments VLIW DSP architecture]”, IEEE Signal Processing Magazine, v. 15, no. 2, pp. 86-101, 117, 1998.

20 High Performance Embedded Computing © 2007 Elsevier Emerging DSP Architectures Parallelism at multiple levels  Multiple processors System-on-a-chip designs  Multiple simultaneous tasks Multithreaded processors  Multiple instruction per cycle Very Long Instruction Word (VLIW) architectures  Multiple operation per instruction Single Instruction Multiple Data (SIMD) instructions Architecture/compiler pairs improve performance and help manage application complexity

21 High Performance Embedded Computing © 2007 Elsevier Superscalar processors Instructions are dynamically scheduled.  Dependencies are checked at run time in hardware. Used to some extent in embedded processors.  Embedded Pentium is two-issue in-order.  Some PowerPCs are superscalar What advantages/disadvantages do VLIW processors compared to superscalar?

22 High Performance Embedded Computing © 2007 Elsevier SIMD and subword parallelism Many special-purpose SIMD machines  All processors perform same operation on different data Subword parallelism is widely used for video.  ALU is divided into subwords for independent operations on small operands. Vector processing is another form of SIMD processing Lots of times these terms are interchanged

23 High Performance Embedded Computing © 2007 Elsevier SIMD Instructions Recent multimedia processors commonly support Single Instruction Multiple data (SIMD) instructions The same operation is performed on multiple data operands using a single instruction Exploits low precision and high data parallelism of multimedia applications A3A2A1A0 B3B2B1B0 A3+B3A2+B2A1+B1A0+B0

24 High Performance Embedded Computing © 2007 Elsevier Operand characteristics in MediaBench

25 High Performance Embedded Computing © 2007 Elsevier Dynamic behavior of loops in MediaBench The loops of media applications in many cases are not very deep Path ratio = (instructions executed per iteration) / (total number of loop instructions). What does the path ratio reveal?

26 High Performance Embedded Computing © 2007 Elsevier TriMedia TM-1 characteristics Characteristics  Floating point support  Sub-word parallelism support  VLIW  Additional custom operations

27 High Performance Embedded Computing © 2007 Elsevier Trimedia TM-1 memory interface video in audio in I2CI2C timers image co-p PCI video out audio out serial VLD co-p VLIW CPU

28 High Performance Embedded Computing © 2007 Elsevier TM-1 VLIW CPU register file read/write crossbar FU1FU27 slot 1slot 2slot 3slot 4slot 5...

29 High Performance Embedded Computing © 2007 Elsevier Multithreading Low-level parallelism mechanism. Interleaved multithreading (IMT) alternately fetches instructions from separate threads.  Often used with VLIW and vector processors Simultaneous multithreading (SMT) fetches instructions from several threads on each cycle.  Often used with superscalar processors What advantages/disadvantages does IMT have relative to SMT?

30 High Performance Embedded Computing © 2007 Elsevier Dynamic voltage scaling (DVS) Power scales with V 2 while performance scales roughly as V. Reduce operating voltage, add parallel operating units to make up for lower clock speed. DVS doesn’t work well in processors with high- leakage power.

31 High Performance Embedded Computing © 2007 Elsevier Dynamic voltage and frequency scaling (DVFS) Scale both voltage and clock frequency. Can use control algorithms to match performance to application, reduce power.

32 High Performance Embedded Computing © 2007 Elsevier Razor architecture Razor runs clock faster than worst case allows Used specialized latch to detect errors. Recovers only on errors, gains average- case performance.


Download ppt "© 2007 Elsevier Lecture 6: Embedded Processors Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte Based on slides and textbook from Wayne."

Similar presentations


Ads by Google