Presentation is loading. Please wait.

Presentation is loading. Please wait.

By : Majid Namaki Custom Implementation of DSP Systems, Spring 2010 Instructor: Dr S. M. Fakhraei May 2010 1 Nathan J. Ickes, “A Micropower DSP for Sensor.

Similar presentations


Presentation on theme: "By : Majid Namaki Custom Implementation of DSP Systems, Spring 2010 Instructor: Dr S. M. Fakhraei May 2010 1 Nathan J. Ickes, “A Micropower DSP for Sensor."— Presentation transcript:

1 by : Majid Namaki Custom Implementation of DSP Systems, Spring 2010 Instructor: Dr S. M. Fakhraei May 2010 1 Nathan J. Ickes, “A Micropower DSP for Sensor Applications,” PhD thesis, MIT, 2008.

2 Introduction Why low power? Heat dissipation limits and battery lifetime concerns. Wireless microsensor networks, implanted medical devices are two examples of such applications. 2

3 Microsensor Applications Microsensor networks may consist of many-perhaps hundreds or thousands-of miniature sensor nodes scattered throughout an area of interest and linked by a wireless network. The network of sensors collaborates as a whole, combining measurements made by each individual node and delivering high-quality observations to a central base station. Large number of nodes in a microsensor network => high- resolution, multi-dimensional observations and fault- tolerance superior to more traditional sensing systems. 3

4 Microsensor Applications (cont.) Applications: inventory tracking, environmental monitoring, machine-mounted sensing, medical monitoring, and building climate control. Primary advantage of microsensor networks: the spatial diversity of the data collected by the network as a whole. Alternatively, the sensor network may be used to imitate a single very large sensor, one that might be impractically large to build or deploy 4

5 Microsensor Applications (cont.) Extremely small, yet long-lived sensor => power efficiency (the central issue in design of microsensors) Self-powered node : scavenging energy from ambient solar, thermal, or mechanical sources; But it is physically large and limited to outdoor applications. 5

6 Common Characteristics Low duty cycle: Nodes can be idle over 99% of the time => Minimizing standby power Event driven: Typical events handled by nodes include sending or receiving radio data, and collecting measurement data => Events must be handled quickly and efficiently to maximize node lifetime. 6

7 Common Characteristics (cont.) Localized data processing: Preliminary signal processing and data analysis occurs within the network. E.g. To save energy nearby nodes might aggregate their data, so reducing amount of data that must be sent to the network base station => Increase the peak processing capability required on each node. Unpredictable performance requirements: Performance demands on any given node are variable and unpredictable before deployment.=> variations in the nodes required radio transmission power, variations in the amount and type of signal processing required 7

8 Acoustic Tracking Application 8

9 The µAMPS DSP MIT µAMPS (micro, adaptive, multi-domain, power aware sensors) project. µAMPS microsensors are designed for acoustic tracking and other applications requiring sensor sampling rates of 1 -100 kS/s and significant post-acquisition signal processing, such as filtering, compression, or spectral analysis 4 MIPS, 10 pJ per instruction DSP designed to form the core of a µAMPS sensor node. The DSP is implemented in 90 nm low-power CMOS. 6.3 million transistors (6 million of which are contained in the on-chip memory). 9

10 µAMPS Sensor Node Architecture The node consists of three primary components: the DSP, a custom 12-bit 100 kSPS ADC, and a commercial ZigBee radio (the ChipCon CC2420) 10

11 DSP Block Diagram 11

12 Performance 12

13 Main Contributions Memory power optimization Instruction cache design Modeling of power-gating Hardware accelerators 13

14 Miniature Instruction cache The cache is direct-mapped and organized as sixteen lines of four words. The cache memory is implemented using flip-flops (rather than SRAM), allowing it to operate at the lower logic power supply voltage. The tag comparison and valid-flag logic is asynchronous, so that in the event of a cache miss, a main memory access can be initiated on the same cycle. An instruction can therefore be fetched on every clock cycle, regardless of whether a cache hit or miss occurs. 14

15 Power Gating Clock Gating => reduces dynamic power consumption in idle logic Power Gating => reduces leakage idle-mode power consumption, particularly for deep-sleep states and modern sub-100 nm process technologies. Power Gating is complicated: Power cannot be turned on and off on a cycle-by-cycle basis as is the case in clock gating. Some amount of planning ahead is required before powering off a logic block, to ensure that power can be restored in time before the logic is needed again. I. Higher threshold voltage device for the power switch II. Boosting the gate voltage to the power switch 15

16 Power Gating (cont.) 12 independent power domains: nine memory banks, the FFT and FIR accelerator cores, and the CPU. 16

17 µAMPS CPU Architecture Primary design strategy was to minimize the complexity of the control logic in the processor => All instructions execute in one clock cycle (CPI=1) All instructions have the same 16-bit length. A second design goal was to minimize the number of data memory accesses The processor contains three functional units: an ALU implementing add, subtract, and bitwise logical operations (AND, OR, XOR, NOT), a barrel shifter, and a multiply- accumulate (MAC) unit. The MAC consists of a 16 x 16-bit single- cycle multiplier and a a 48-bit accumulator register. The accumulator is readable and writable as special purpose registers r8, r9, and r10. 3-stage (fetch, execute, and write back) pipeline 17

18 Accelerator Cores The µAMPS DSP, being designed for acoustic sensing applications, incorporates accelerators for both FIR filtering and FFTs. The accelerators are implemented as memory-mapped devices. Energy savings obtained by using a hardware accelerator: Intrinsic savings in performing the actual computation (e.g., reduced cycle count and control logic overhead), Extrinsic savings from reduced utilization of global resources (e.g., reducing the number of main memory accesses). 18

19 FIR Accelerator An FIR filter accelerator implements up to 16- tap (symmetric) filters. The accelerator consists of a register file holding up to eight 16-bit tap coefficients, a 16×16 circular buffer for holding the input samples, a single multiply accumulate unit, an adder/subtracter, and a control state machine. Due to their small size, the sample and coefficient memories are implemented using Flip-Flops, rather than SRAM macros. 19

20 FFT Accelerator The FFT core computes transforms on 128-, 256-, 512- or 1024-point real-valued inputs, with 16-bit precision. The accelerator performs a complete butterfly in one clock cycle, compared to ~95 cycles per butterfly required for a software implementation. The local memory for the accelerator is split into four banks, based on the MSB and parity of each address. Each butterfly computation operates on values from two different banks, allowing both values to be fetched at the same time. The butterfly operations are specifically ordered so that sequential butterflies involve disjoint sets of memory banks. This allows processing one butterfly per clock cycle, with the results from one butterfly being written back to two memory banks while the inputs to the next butterfly are read from the other two banks. A small number of hazards are unavoidable and result in stalling the datapath for one cycle. 20

21 Comparison of the µAMPS DSP with other micropower processors 21


Download ppt "By : Majid Namaki Custom Implementation of DSP Systems, Spring 2010 Instructor: Dr S. M. Fakhraei May 2010 1 Nathan J. Ickes, “A Micropower DSP for Sensor."

Similar presentations


Ads by Google