Download presentation
Presentation is loading. Please wait.
Published byDerrick John McCoy Modified over 9 years ago
2
Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units
3
Mobile Computing Design Considerations Low power Real-time data processing Small size Low cost Quick time to market
4
Metric Introduction Processor specialization Instruction set Interconnect Memory specialization Functional & Data path units Power Specialization
5
Metric: Processor Specialization Central controlling point of embedded system Examples: –VLIW to perform multiple instructions in parallel. –RISC architecture
6
Metric: Instruction Set Specialization Introduction of new instructions to extract optimal performance from the processor Examples: –Multiply-accumulate –Vector operations
7
Metric: Interconnect Provides means for different modules to communicate Optimizations can lead to reduced complexity, cost, and power consumption
8
Metric: Memory Specialization Specialization is achieved through optimization of number and size of memory banks, number and size of access ports Optimizations can improve performance, power consumption, and chip area
9
Metric: Functional & Data Path Units Functional units are often specialized hardware units implementing a frequently used software algorithm Examples: –DSP co-processors, interrupt priority co- processors, memory access modules, and timer modules
10
Metric: Power Specialization Major concern in mobile systems Kept under control by: –Using low voltage –Slow clock speed –Custom circuit solutions
11
Architectures to be discussed M*CORE D30V/MPEG SuperENC 1.3-GOPS Parallel DSP IA-32 w/ Enhanced Data Streaming
12
M*CORE Low power embedded applications Wireless mobile devices Cellular phones
13
M*CORE Processor Specialization Simple RISC architecture 4 stage pipeline 16-bit instruction word length Compiler designed in parallel with architecture Barrel shifter built into ALU
14
M*CORE Instruction Set Specialization Multimedia instructions –Multiple data transfers from memory to register and register to memory. –Fast register saves FF1 – Find First 1 –Finding highest priority interrupt in hardware
15
M*CORE Interconnect Specialization 16 – bit data bus to match 16 bit word length –Reduces memory bandwidth, complexity, chip area layout, and power consumption MDI – MCU–to-DSP Interface –Dual access memory messaging unit General I/O bus for a peripherals
16
M*CORE Memory Specialization Alternate register bank –Fast register saves for context switches
17
M*CORE Functional & Data Path Units 32 channel programmable interrupt controller Protocol timer DSP core
18
M*CORE Power Specialization 1.8 Volts Uses 0.5 Watts Power aware pipeline Programmable power states –Stop –Wait –Dose –Normal
19
M*CORE Summary Low power and programmable power states make it ideal for mobile devices Interface to built in DSP core makes it ideal for cell phone applications
20
650 MHZ IA-32 Microprocessor designed to accelerate data- streaming applications Three-dimensional graphics Video encode/decode
21
650 MHZ IA-32 Processor Specialization IA-32 architecture 70 new instructions SIMD floating point data type Improvements in regard to circuit implementation
22
650 MHZ IA-32 Instruction Set Specialization 70 new instructions –SIMD FP operations –Control for new 8-entry register file –Multimedia extension 12 new integer instructions
23
650 MHZ IA-32 Interconnect Specialization Front Side Bus of 66, 100, 133 MHz Back Side Bus –Half the clock frequency for mobile and desktop applications –Full clock frequency for server/workstation applications
24
650 MHZ IA-32 Memory Specialization 3 new non-temporal store instructions with write combining buffers –Burst write protocol –Write data throughput of 1.066 Gbytes/sec on a 133 MHz bus 4 new data pre-fetch instructions –Overlap, reduces cache miss penalties
25
650 MHZ IA-32 Functional Specialization 8 entry register file –Reduces register starvation for SIMD unit –128 bits wide four independent single precision elements packed in parallel Dedicated table based lookup unit for reciprocal operations –Completes reciprocal operations in one clock cycle –Error of 1.5 * 2^-12
26
650 MHZ IA-32 Low Power Usage 1.4 V ~ 2.2 V at 650 MHz close to room temperature
27
650 MHZ IA-32 Performance 1.5X to 2.0X performance boost for 3-D transform and lighting kernels Real-time MPEG-2 video/audio encoding at 30 frames per second –Achieved through improvement to SIMD unit, at a cost of only 2% increase of unit area size
28
D30V/MPEG Multimedia applications –Decoding MPEG-2
29
D30V/MPEG Processor Specialization 2 way VLIW Dual issue RISC pipeline 2 way assigned SIMD module Pipeline has ability to re-route data through execution path
30
D30V/MPEG Instruction Set Specialization Saturate and Add DSP instructions built in –Modular addressing –Block repeat –Multiply accumulate Half word instructions –Effectively double number of useable registers
31
D30V/MPEG Interconnect Specialization Chip layout specialized for decoding streaming mpeg data
32
D30V/MPEG Memory Specialization 32 Kbyte data RAM 64 Kbyte instruction RAM 4 Kbyte RAM for Variable Length Encoder/Decoder (VLC/VLD) tables Special Registers –MOD_S & MOD_E for modulo addressing –RPT_S, RPT_E, and RPT_C for looping
33
D30V/MPEG Functional Specialization VLC/VLD Variable Length Encoding/Decoding units
34
D30V/MPEG Low Power Usage 2.5 Volts at 243 MHz Uses 2.0 Watts
35
D30V/MPEG Performance 12 % speedup from inter-pipe bypasses Special VLC/VLD functional blocks speedup MPEG decoding
36
1.3 GOPS Parallel DSP Achieve real-time image processing capability Employ data parallelism to achieve goal –High level algorithms, non-parallelizable Arithmetic encoding –Medium level algorithms, medium parallelizable Contour tracking of binary images –Low level algorithms, high parallelizable Filters and transforms Data independent control and data flow 80 % of MPEG-2, 60% of MPEG-4
37
1.3 GOPS Parallel DSP Processor Specialization Central control unit –RISC based –Controls multiple SIMD units
38
1.3 GOPS Parallel DSP Instruction Set Specialization VLIW instructions –3 instructions per issue 1 load/store 16 bit data 2 arithmetic operations on 16/32 bit data
39
1.3 GOPS Parallel DSP Interconnect Specialization DMA/MCU (Direct Memory Access/Memory Control Unit) –Handles cache misses –Performs prefetch operations from matrix memory –Interfaces with external 64 bit data bus and 32 bit address bus for SRAM and DRAM modules
40
1.3 GOPS Parallel DSP Memory Specialization Memory tailored to image processing needs –Provides parallel high bandwidth access to shared data with matrix shaped access patterns Individual Cache Memory –Services irregular memory requests
41
1.3 GOPS Parallel DSP Functional Specialization Multiple SIMD units –Currently 4 units for prototype –16 units planned for future versions –SIMD approach has been extended with ASIMD, autonomous instruction selection capability Improves handling of conditional branches
42
1.3 GOPS Parallel DSP Low Power Usage 3.3 Volts Using 650 milliwatts
43
1.3 GOPS Summary Sustained performance 380 MIPS –Around 90% utilization
44
SuperENC MPEG-2 video encoder
45
SuperENC Processor Specialization Software implemented RISC architecture –5 stage pipeline –81 MHz, 32 bit wide data/instruction path Software implemented SIMD/SDIF (SDRAM Interface) modules
46
SuperENC Instruction Set Specialization There is no instruction set specialization mentioned in the paper.
47
SuperENC Interconnect Specialization SDIF –All memory access goes through SDIF –Relay data without going to external memory Reduces memory bandwidth and power consumption
48
SuperENC Memory Specialization Uses external RAM –Can access two 16 Mbit SDRAMS or one 64 Mbit SDRAM
49
SuperENC Functional Specialization MPEG algorithm is broken up into hardware functional blocks –Example DCT, Discrete Cosine Transfer IDCT, Inverse Discrete Cosine Transfer ME. Motion Estimation MC, Motion Compensation
50
SuperENC Low Power Usage 2.5 Volts internal 3.3 Volts I/O 1.5 Watts
51
SuperENC Summary SuperENC makes use of many hardware functional blocks to implement the MPEG decoding algorithm
52
Metric Results D30V/MPEG highest rated
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.