Download presentation
Presentation is loading. Please wait.
1
Ants in the Ocean: System Design Techniques for Underwater Sensing Applications Ryan Kastner Dept. of Electrical and Computer Engineering University of California, Santa Barbara Computer Engineering Seminar Northwestern University May 24, 2004
2
Ecological Research Programs Santa Barbara Channel Long Term Ecological Research (SBC LTER) Partnership for Interdisciplinary Studies of Coastal Oceans (PISCO) Goals Focuses on understanding the nearshore ecosystems of the west coast Time/space variation of individual organisms, populations, and ecological communities
3
Ecological Studies Importance of land vs. ocean processes in giant kelp forests How do different nutrients effect ecosystem? How/when are nutrients delivered? Runoff during storms very important time Rough conditions, high surf, undertow How to measure? Quantify larval transport to nearshore habitats Red tide Conditions for larval transport - temperature, salinity, chemicals? Marine management tools When/where to make protected zones? What is effect of environmental factors on marine life?
4
Enabling Ecological Research Many studies done over limited time frames Drop sensors, sit on boat Storm comes, run out to boats and throw sensors into water Alternatively, leave sensors unattended Malfunction loses months of data Sensors get lost/stolen Ideal situation - real-time, adaptive sampling techniques
5
CoastalNet Temp/Depth Sensors 802.11 AP Conductivity Sensor Acoustic Link Low Rate/FSK Acoustic Link Acoustic Modem/Array Signal Processor Directional Antenna (“Pringles Can”) 802.11Access Point To Internet Wi/Fi Link up to 7 miles OFDM/DS Acoustic Link
6
CoastalNet Challenges Underwater communication Water more complex medium than air - severe multipath problems, doppler shifts and long latencies Commercial modems ~2400 baud – fine for simple sensors. What about higher data rates? Sensors Variety of different types, sizes – video, salinity, pressure, temperature, … System design issues - equip with batteries, antennas, waterproof
7
Applications Sample Kalman Filter z -1 Weight Update MMSE Detector Sample Kalman Filter z -1 Weight Update MMSE Detector Multiuser Detection EKF Underwater Acoustic Receivers Radiolocation (GSIC) Filters (FIR, ARF, EWF)
8
System Design Goal: Map application specification to system architecture Subject to always increasing constraints – power, energy, latency, cost, size, … © Sangiovanni-Vincentelli
9
System Design and Architecture Problem – take application code and map it to some system platform (e.g. reconfigurable device) System platforms are extremely (and increasingly) complicated, multiprocessing computing systems Mix of hardware and software components Microprocessors – RISC, DSP, network, … Logic level (FPGA) Reconfigurable logic Specs for current high performance FPGA (Xilinx Virtex II) 3K to 125K logic cells, Four PowerPC processor cores Complex memory hierarchy - 1,738 KB block RAM, external memory, local memory in CLBs Possibility of soft core processors – DSP Custom hardware - embedded multipliers, fast carry chain logic, etc. Large amount of performance improvement possible, IF there is a good mapping How do we best represent the application for mapping?
10
Obligatory Design Flow Slide.c program Syntactic/Semantic Analysis AST Parallelizing compiler transforms SUIF Function Level SSA CFG Generation SSA CFG Machine SUIF Proc Backend x86RISC x86 Code RISC Code Profiler sample inputs PDG+SSA Generation Coarse-grain Optimizations System Partitioning HDL Generation Fine-grain Optimizations SSA CFG AST device architecture description System Compiler Synthesizable HDL Behavioral, Logic and Physical Synthesis Platform Programming Software bitstream Functional Embedded System Backend SSA CFG
11
Design Flow Application specification Can be written in C, SystemC, SystemVerilog, linear systems, signal flow graph, CDFGs Must have front end to task graphs Focusing first on a C to task graph Signal Flow Graph if(x < y) i = 10; else i = 255; while(i) x = y++; C codeLinear Systems
12
Intermediate Representation val = pred; for(i = 0; i<len; i++) val += diff; if(val > 32767) val = 32767; else if(val < -32768) val = -32768; Must exploit fine AND coarse-grain parallelism Ideally want automatic mapping Need a form that can do synthesis to both hardware/software ?
13
PDG+SSA Representation val = pred; for(i = 0; i<len; i++) val += diff; if(val > 32767) val = 32767; else if(val < -32768) val = -32768; Input Application (in C) CDFG Form
14
PDG+SSA Representation CDFG FormPDG+SSA Form
15
Advantages of PDG+SSA Exploits parallelism Explicitly shows control and data dependences Control structures do not limit data parallelism Regions are hyperblocks – allows aggressive optimizations Synthesis to hardware and software Looks complicated! What does it buy us?
16
Comparing CDFG, PDG Benchmarks – bunch of MediaBench functions PDG, CDFG 2-3 times faster than sequential execution PDG about 7% faster than CDFG PDG, CDFG approx. same area
17
Comparing Different Predicated Forms Comparison with PSSA, sequential execution PSSA - predicated static single assignment Used by several other projects – CASH, Sea Cucumber PDG+SSA on average 8% faster than PSSA
18
Map Application to HW/SW Cores Dependence analysis to exploit fine/coarse grain parallelism Interprocedural dependencies – selective inlining Control dependencies – loop optimizations, hoisting, if conversion Data dependencies – arrays, aliasing, liveness System partitioning Cluster into coarser grained tasks Decide how to divide application onto platform
19
System Partitioning How do you decide where to map different parts of the application? Hardware or software – which processor, which memory, exact location, etc. Extremely hard set of problems (NP-Hard) Must be flexible - different applications/systems have wide variety of models Fundamental problem - many different heuristic methods have been developed Simulated annealing Genetic Algorithms Tabu Search Kernighan/Lin …
20
Task Graph Model Application synthesis model Directed Acyclic Graph Each node is amount of computation Coarse grained – loops, function calls, summations, filtering Fine grained – addition, multiplication, comparison, shifting System Partitioning Map coarse grain task nodes onto a set of computational cores Cores – RISC processor, CLBs, Digital Signal Processors, IPs, etc.
21
Our approach – Ant System Heuristic Inspired by ethological study on the behavior of ants [Goss et. al. 1989] A meta heuristic A multi-agent cooperative searching method A new way for combining global/local heuristics Extensible and flexible
22
Ant System Heuristic
31
Autocatalytic Effect
32
Formulating Problems Using Ant Search Problem model – define search space, create decision variables Pheromone model – used as a global heuristic, distribution of pheromones, evaporation and strengthening strategies Ant search strategy – local heuristics and solution space traversal Solution construction – method of creating an answer from decision variables Feedback – provide assessment of solution quality and adjust pheromones accordingly
33
Ant System Algorithm
34
System Partitioning Problem Model Example: Task 1, 2, 7 and 8 are assigned to the GPP Task 3, 4, and 6 onto the configurable logic The inbound edges are colored accordingly We don’t care the coloring for virtual nodes t 0 and t n We don’t care the coloring for edge e 8n Each task node is assigned a color Color for each computational core
35
Pheromone Model Each computing resource is assigned with a color c k Each edge e ij is associated with a set of global heuristics (pheromone trails) ij (k) indicating the favorableness for t j to be colored with c k A coherent coloring is defined as: Each task node in the DAG is colored All the inbound edges of a task node have the same coloring as that of the corresponding task node
36
Ant Search Strategy Each ant traverses the graph in topologically sorted order Guarantees that each inbound edge to the current node has been already examined At each node, the ant will: Make guesses for the coloring of the successor nodes Make decision on the coloring of the current node
37
Ant Search Strategy At task node t i, the ant makes guesses the coloring for each of the successor nodes t j : ij (k) : global heuristic on coloring t j with c k j (k) : local heuristic on coloring t j with c k
38
Solution Construction Upon entering a new task node t i, the ant makes a decision on the coloring of t i : probabilistically based on the guesses made by all the immediate precedents of t i Inbound edges are correspondingly colored once this decision is made
39
t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n
40
t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n
41
t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n
42
t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n
43
t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n
44
t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n
45
t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n
46
t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n
47
Find the best and update the pheromone trails based on the solution’s quality t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n Next iteration Feedback
48
Experimental Setup Testing benchmarks: DAGs of different sizes are generated randomly with average branching factor of 5 Real functions (in C/C++) extracted from the MediaBench suits are mapped onto the task nodes Tasks are analyzed using SUIF and Machine SUIF tools to achieve detailed CDFG level description Simplified communication interface between tasks Major problem: Real applications task graphs Goal: Find the optimal resource partition that achieves the best worst case execution time under FPGA area constraint
49
Definitive Quality Assessment 91.7% of the results are within the top 3% 77% of the results of AS are within the top 2% 63.5% of the results are within top 0.1% Comparing AS results with brute force search Offers definitive measurement for the quality Gives full solution space (can filter out EASY cases)
50
Result Quality Assessment 33 difficult testing cases 3 25 possible partitions SA-50 has comparable run time as the AS SA-500 and SA-1000 runs at 10 and 20 times Larger testcases – too big for brute force search Comparison with Simulated Annealing
51
Code Generation Once task graph is partitioned Generate code for each task Create communication protocols Bus creation & arbitration Memory hierarchy Currently assume simple direct communication Need code generation from every input specification to every computational core Software – use conventional compiler flow Hardware – need flow from task to HDL Scheduling is fundamental problem for both hardware and software synthesis
52
Instruction Scheduling Given: set of instructions and collection of computational units Instruction modeled using data flow graph (DFG) Directed acyclic graph Each node is instruction Each edge is a data dependence Find schedule for instructions to minimize some function (latency, area, power, …) Auto Regressive Filter
53
Instruction Scheduling NP-hard Fundamental problem - many different heuristic methods have been developed ILP Force directed Genetic algorithm Path based Graph theoretic Computational geometry List scheduling + NOP +< - - 1 2 3 4 v2v1 v3 v4 v5 vn v6 v7 v8 v9 v10 v11
54
List Scheduling Simple and effective Greedy strategy Operation selection decided by criticality O(n) time complexity Make a priority list of the instructions based on some measure (mobility, instruction depth, number of successors, etc.) No single priority function works well over all applications Highly dependent on problem instance Priority function quality highly varied
55
Combining Ants and Lists Ants determine priority list List scheduling framework evaluates the “goodness” of the list + NOP +< - - 1 2 3 4 v2v1 v3 v4 v5 vn v6 v7 v8 v9 v10 v11
56
Ant Search Strategy Every iteration each ant creates a priority list Fill one instruction at a time Memory about instructions already selected At step j ant has already selected j-1 instructions jth instruction selected probabilistically
57
Ant Search Strategy ij (k) : global heuristic (pheromone) for selecting instruction i at j position j (k) : local heuristic – can use different properties Instruction mobility (IM) Instruction depth (ID) Latency weighted instruction depth (LWID) Successor number (SN) , control influence of global and local heuristics
58
Pheromone Model Each instruction op i I associated with n pheromone trails where j = 1, …, n ij (k) indicates the favorableness for op i to be positioned at jth position in the priority list Initially all set to fixed value 0 Evaporation rate ij
59
ARF Pheromones
60
Experimental Results ILP (optimal) using CPLEX List scheduling Instruction mobility (IM), instruction depth (ID), latency weighted instruction depth (LWID), successor number (SN) Ant scheduling results using different local heuristics (Averaged over 100 runs)
61
Other Topics of Research Data & computation layout – simultaneous distribution of data/computation, utilizing on-chip block RAM, exploiting parallelism, DSP synthesis – optimization of polynomial expressions, synthesizing multiple constant multiplications to hardware Low power microarchitecture techniques – cache design to minimize leakage current, configurable memory hierarchy to minimize power, prefetching to minimize power
62
ExPRESS Group ExPRESS - EXtensible, Programmable, Reconfigurable Embedded SystemS Extensible – customized processors, configurable instruction sets Programmable – post-manufacturing customization Reconfigurable – rapid configuration changes ASIC RISC RAM FPGA ARM DSP System On Chip (SOC)
63
More ExPRESS Information… Members PhD Students Wenrui Gong Anup Hosangadi Yan Meng Gang Wang Undergrads Daniel Grund Willis Hoang Webpage - http://express.ece.ucsb.edu/http://express.ece.ucsb.edu/
64
Extra Slides
65
Radiolocation (GSIC)
66
Multiuser Detection Sample Kalman Filter z -1 Weight Update MMSE Detector Sample Kalman Filter z -1 Weight Update MMSE Detector
67
Underwater Acoustic Receiver EKF
68
Filters Finite Impulse Response
69
Auto Regressive Filter
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.