Chris Savarese, Yashesh Shroff, Greg Lawrence MAP ART Mapping Architectural Properties to an Algorithm for Redundant Triangulation Chris Savarese, Yashesh Shroff, Greg Lawrence Advisor: Dr. Jan Rabaey April 27, 2000 CS252
Outline Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our Dream Architecture Future Work
Introduction Goal: Given a basic localization algorithm, explore architectural alternatives for the minimization of energy consumption. The concept of localization Energy saving techniques What we did…
Outline Introduction Background Time and Energy Profiling Background Parallel Architectures Conclusions: Our Dream Architecture Future Work Background
The Localization Algorithm U N2 N1 N3 N1(x1,y1,z1) N2(x2,y2,z2) U (x,y,z) N3(x3,y3,z3) (x1-xn) (y1-yn) (z1-zn) (xn-1-xn) (yn-1-yn) (zn-1-zn) . .. x y z = b1 bn-1 Am3 U31 Bn-11 [Am3] [Qm3] ·[R33] Solve: U = R-1QT b QRdcmp()
The StrongARM Architecture Power: 200mW, 0.25m, 1.5V Clock Speed: 200 MHz Cache: 16 KB I-cache 8 KB D-cache 32-way set-associative, round-robin replacement 512B, 2-way Minicache 31/16 GPR (32-bit) Auto-increment addressing No FP processor MAC
The Tensilica Xtensa Architecture Processor Configuration Power: 200mW, 0.25 m, 1.5V Clock Speed: 170 MHz Cache: 16 KB I-cache 16 KB D-cache Direct mapped 32 Registers (32-bits) Xtensibility Use of TIE instructions No FP processor Zero overhead loops
Outline Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our Dream Architecture Future Work Time and Energy Profiling
Profiling Results Profiler Output: StrongARM Processor 68J ----------------------------------------------- _fmul 18.21% 18.21% 0.00% 188000 lubksb 15.27% 5.17% 10.10% 10000 _fneq 0.37% 0.00% 14000 _fdiv 4.23% 0.00% 30000 _fmul 5.03% 0.00% 52000 _frsb 0.46% 0.00% 52000 StrongARM Processor 68J Xtensa Processor 144J Floating Point Energy = nom. core power #cycles clock period
Fixed Point Arithmetic Floating Point vs. Fixed Point Add / Sub are straightforward Multiply / Divide require shifting Why can we use it for localization? Low accuracy requirements Limited range in measurements (< 10m) Small matrices small error propagation 0000 . 0000 16 16 S E Mantissa 1 8 23
Fixed Point Profiling Results Profiler Output: ----------------------------------------------- _fmul 18.21% 18.21% 0.00% 188000 lubksb 15.27% 5.17% 10.10% 10000 _fneq 0.37% 0.00% 14000 _fdiv 4.23% 0.00% 30000 _fmul 5.03% 0.00% 52000 _frsb 0.46% 0.00% 52000 StrongARM Processor 68J Xtensa Processor 144J Floating Point StrongARM Processor 43J Xtensa Processor 69J Fixed Point (37% less) (52% less) Energy = nom. core power #cycles clock period
Outline Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our Dream Architecture Future Work Parallel Architectures
Parallel Architectures - Write sequential code in Matlab - Extract data-dependencies - Workload analysis CP1 CP2 CP3 P
Outline Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our Dream Architecture Future Work Conclusions: Our Dream Architecture
Our Dream Architecture Floating point hardware MAC hardware Zero overhead loops Auto increment Register file size Cache Direct mapped
Future Work FPGA implementation Xtensa customizations TIE instructions Floating Point Coprocessor Realistic algorithm for PicoRadio
Many Thanks To… Dr. Bart Kienhuis, EECS Post Doc Ptolemy and other tools: Parallel issues Fred Burghardt, BWRC Technical Staff PicoRadio Testbed Marlene Wan, BWRC Student StrongARM Energy Profiling Vandana Prabhu, BWRC Student Tensilica Tools The Berkeley Wireless Research Center