Download presentation
Presentation is loading. Please wait.
1
Design Exploration of a Human-machine Interface (HMI) Application Francis Li Sam Madden
2
The Application Data glove interface –Wired, bulky SmartDust scenario –A mote on each fingertip Investigate implementations Explore design alternatives
3
Proof-of-Concept Prototype By SmartDust group –Atmel AVR Microprocessor –RFM TR1000 Radio –6 accelerometers –Host PC performs processing Analysis –Power: 45 mW measured –Continuous operation of processor, accelerometers, communication with host
4
Application Analysis Processing (on PC) –Do 20 times per second, for each accelerometer Read in X and Y samples (10 bits each) Compute rolling average to smooth input data Convert averages to polar coordinates –Dominates cost: sqrt, acos, atan –Secondary cost: floating point operations –Periodically, calculate gesture via simple template matching (static hand positions)
5
Application Analysis (cont) Communication (from Atmel to PC) –20 samples / sec 6 accelerometers 4 bytes/sample 480 bytes/sec –115.6 kb/sec RF link –Radio = 12mA @ 3V, when transmitting 1.2 mW for radio alone Real world power >> 1.2 mW, due to software and analog overhead ( real world analysis later )
6
Optimization Process Match Application to HW
7
Optimization Process Match Application to HW Match Hardware to Application
8
Optimization Process Match Application to HW –Local computation to reduce communication Match Hardware to Application
9
Optimization Process Match Application to HW –Local computation to reduce communication –Floating point Fixed Point Match Hardware to Application
10
Optimization Process Match Application to HW –Local computation to reduce communication –Floating point Fixed Point Match Hardware to Application –Distributed vs. Centralized
11
Optimization Process Match Application to HW –Local computation to reduce communication –Floating point Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel
12
Optimization Process Match Application to HW –Local computation to reduce communication –Floating point Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel –DSP
13
Optimization Process Match Application to HW –Local computation to reduce communication –Floating point Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel –DSP
14
Communication vs.Computation Estimates of local processing cost on Atmel (via simulation of GCC program) Average: 2223 instr. x 2 CalcPolar: 19017 instr. 2.83x10 6 instructions Report gesture once per second FindGestureError: 5444 instr. 10 gestures, 6 accelerometers 5444 60 3.26x10 5 instr. Memory operations are 2 cyles/instruction Total cycles ~ 3.7M 4Mhz 13.5 mW Communication = 8 bits/sec negligible cost Loop 620 / sec
15
Communication vs.Computation 2 Cost of communication to Host PC (measured) 4317 nJ/bit From Culler, Hill, Szewczyk, Woo, “System Architecture For Networked Sensors.” 4317nJ/bit 480 bytes/sec 8 = 16.57 mW Processor still sucks power –Current implementation requires 13.5mW –Using sleep, only 1.17 mW 17.74 mW total
16
Optimization Process Match Application to HW –Local computation to reduce communication –Floating point Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel –DSP
17
Distributed vs. Centralized Move some processing to each sensor –6 processors Each computing average, polar transform Transmitting 4 x 8 = 32bits once/second Using Atmel processor on each mote –Computation ~.5M cycles/sec 2mA @ 2.7V 5.4mW –Communication Very small: 4317nJ 32 =.13 mW –5.53 mW/mote = 33.2 mW total (Bad Idea!)
18
Optimization Process Match Application to HW –Local computation to reduce communication –Floating point Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel –DSP
19
TI Microcontroller Evaluation A microcontroller with better specs –MSP430P112 330 A/Mhz active mode 1.5 A standby (6 ns wakeup) Used IAR Systems compiler, profiler, development environment Analysis –Centralized 3.3V, 4 Mhz: 3.8 mW –Distributed 2.5V, 1 Mhz: 0.48 mW per mote Six processors 2.9 mW
20
Optimization Process Match Application to HW –Local computation to reduce communication –Floating point Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel –DSP
21
TI DSP Evaluation TMS320C54x Used TI Code Composer Studio, compiler, simulator Power –Active Mode, 3.3V 10 Mhz: 33 mW –IDLE1, 0.36 mW Analysis –Centralized: 7.8 mW –Distributed: 1.6 mW per mote Six processors = 9.6 mW total
22
TI DSP Evaluation Part 2 TMS320C55x (two parallel MACs) Same tools, with C55x compiler, simulator Power: No details available... –Advertised: 0.9V, 0.05 mW/Mhz Analysis –Centralized: 1170240 cycles (vs 2290440 54x) 2 Mhz: 0.1 mW –Distributed: 195040 cycles (vs 381740 54x) 1 Mhz: 0.05 mW Six processors: 0.3 mW total
23
Other Explorations Hand optimized code –Possible to massively reduce computation cost –FP/Transcendentals conspicuously painful –Outside scope of our exploration Radio Hardware –Bluetooth ~ 100 times more efficient Reconfigurable Computing Other circuitry (e.g. accelerometers)
24
Results Summary Cost, in mW of various implementations 17.74 using sleep mode, 28 without 31/104 % improvement with same hardware 170x improvement with new hardware
25
Conclusions By finding better mappings from SW HW Application, big performance gains are possible. Effective use of local processor resources can reduce communication overheads, which are significant. DSPs and other specialized processors can be a big win and don’t require hand-coded assembly or reconfigurable design
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.