Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ants in the Ocean: System Design Techniques for Underwater Sensing Applications Ryan Kastner Dept. of Electrical and Computer Engineering University of.

Similar presentations


Presentation on theme: "Ants in the Ocean: System Design Techniques for Underwater Sensing Applications Ryan Kastner Dept. of Electrical and Computer Engineering University of."— Presentation transcript:

1 Ants in the Ocean: System Design Techniques for Underwater Sensing Applications Ryan Kastner Dept. of Electrical and Computer Engineering University of California, Santa Barbara Computer Engineering Seminar Northwestern University May 24, 2004

2 Ecological Research Programs  Santa Barbara Channel Long Term Ecological Research (SBC LTER)  Partnership for Interdisciplinary Studies of Coastal Oceans (PISCO)  Goals  Focuses on understanding the nearshore ecosystems of the west coast  Time/space variation of individual organisms, populations, and ecological communities

3 Ecological Studies  Importance of land vs. ocean processes in giant kelp forests  How do different nutrients effect ecosystem?  How/when are nutrients delivered?  Runoff during storms very important time  Rough conditions, high surf, undertow  How to measure?  Quantify larval transport to nearshore habitats  Red tide  Conditions for larval transport - temperature, salinity, chemicals?  Marine management tools  When/where to make protected zones?  What is effect of environmental factors on marine life?

4 Enabling Ecological Research  Many studies done over limited time frames  Drop sensors, sit on boat  Storm comes, run out to boats and throw sensors into water  Alternatively, leave sensors unattended  Malfunction loses months of data  Sensors get lost/stolen Ideal situation - real-time, adaptive sampling techniques

5 CoastalNet Temp/Depth Sensors 802.11 AP Conductivity Sensor Acoustic Link Low Rate/FSK Acoustic Link Acoustic Modem/Array Signal Processor Directional Antenna (“Pringles Can”) 802.11Access Point To Internet Wi/Fi Link up to 7 miles OFDM/DS Acoustic Link

6 CoastalNet Challenges  Underwater communication  Water more complex medium than air - severe multipath problems, doppler shifts and long latencies  Commercial modems ~2400 baud – fine for simple sensors. What about higher data rates?  Sensors  Variety of different types, sizes – video, salinity, pressure, temperature, …  System design issues - equip with batteries, antennas, waterproof

7 Applications Sample Kalman Filter z -1 Weight Update MMSE Detector Sample Kalman Filter z -1 Weight Update MMSE Detector Multiuser Detection EKF Underwater Acoustic Receivers Radiolocation (GSIC) Filters (FIR, ARF, EWF)

8 System Design  Goal: Map application specification to system architecture  Subject to always increasing constraints – power, energy, latency, cost, size, … © Sangiovanni-Vincentelli

9 System Design and Architecture  Problem – take application code and map it to some system platform (e.g. reconfigurable device)  System platforms are extremely (and increasingly) complicated, multiprocessing computing systems  Mix of hardware and software components  Microprocessors – RISC, DSP, network, …  Logic level (FPGA) Reconfigurable logic  Specs for current high performance FPGA (Xilinx Virtex II)  3K to 125K logic cells,  Four PowerPC processor cores  Complex memory hierarchy - 1,738 KB block RAM, external memory, local memory in CLBs  Possibility of soft core processors – DSP  Custom hardware - embedded multipliers, fast carry chain logic, etc.  Large amount of performance improvement possible, IF there is a good mapping How do we best represent the application for mapping?

10 Obligatory Design Flow Slide.c program Syntactic/Semantic Analysis AST Parallelizing compiler transforms SUIF Function Level SSA CFG Generation SSA CFG Machine SUIF  Proc Backend x86RISC x86 Code RISC Code Profiler sample inputs PDG+SSA Generation Coarse-grain Optimizations System Partitioning HDL Generation Fine-grain Optimizations SSA CFG AST device architecture description System Compiler Synthesizable HDL Behavioral, Logic and Physical Synthesis Platform Programming Software bitstream Functional Embedded System Backend SSA CFG

11 Design Flow  Application specification  Can be written in C, SystemC, SystemVerilog, linear systems, signal flow graph, CDFGs  Must have front end to task graphs  Focusing first on a C to task graph Signal Flow Graph if(x < y) i = 10; else i = 255; while(i) x = y++; C codeLinear Systems

12 Intermediate Representation val = pred; for(i = 0; i<len; i++) val += diff; if(val > 32767) val = 32767; else if(val < -32768) val = -32768;  Must exploit fine AND coarse-grain parallelism  Ideally want automatic mapping  Need a form that can do synthesis to both hardware/software ?

13 PDG+SSA Representation val = pred; for(i = 0; i<len; i++) val += diff; if(val > 32767) val = 32767; else if(val < -32768) val = -32768; Input Application (in C) CDFG Form

14 PDG+SSA Representation CDFG FormPDG+SSA Form

15 Advantages of PDG+SSA  Exploits parallelism  Explicitly shows control and data dependences  Control structures do not limit data parallelism  Regions are hyperblocks – allows aggressive optimizations  Synthesis to hardware and software Looks complicated! What does it buy us?

16 Comparing CDFG, PDG  Benchmarks – bunch of MediaBench functions  PDG, CDFG 2-3 times faster than sequential execution  PDG about 7% faster than CDFG  PDG, CDFG approx. same area

17 Comparing Different Predicated Forms  Comparison with PSSA, sequential execution  PSSA - predicated static single assignment  Used by several other projects – CASH, Sea Cucumber  PDG+SSA on average 8% faster than PSSA

18 Map Application to HW/SW Cores  Dependence analysis to exploit fine/coarse grain parallelism  Interprocedural dependencies – selective inlining  Control dependencies – loop optimizations, hoisting, if conversion  Data dependencies – arrays, aliasing, liveness  System partitioning  Cluster into coarser grained tasks  Decide how to divide application onto platform

19 System Partitioning  How do you decide where to map different parts of the application?  Hardware or software – which processor, which memory, exact location, etc.  Extremely hard set of problems (NP-Hard)  Must be flexible - different applications/systems have wide variety of models  Fundamental problem - many different heuristic methods have been developed  Simulated annealing  Genetic Algorithms  Tabu Search  Kernighan/Lin  …

20 Task Graph Model  Application synthesis model  Directed Acyclic Graph  Each node is amount of computation  Coarse grained – loops, function calls, summations, filtering  Fine grained – addition, multiplication, comparison, shifting  System Partitioning  Map coarse grain task nodes onto a set of computational cores  Cores – RISC processor, CLBs, Digital Signal Processors, IPs, etc.

21 Our approach – Ant System Heuristic  Inspired by ethological study on the behavior of ants [Goss et. al. 1989]  A meta heuristic  A multi-agent cooperative searching method  A new way for combining global/local heuristics  Extensible and flexible

22 Ant System Heuristic

23

24

25

26

27

28

29

30

31 Autocatalytic Effect

32 Formulating Problems Using Ant Search  Problem model – define search space, create decision variables  Pheromone model – used as a global heuristic, distribution of pheromones, evaporation and strengthening strategies  Ant search strategy – local heuristics and solution space traversal  Solution construction – method of creating an answer from decision variables  Feedback – provide assessment of solution quality and adjust pheromones accordingly

33 Ant System Algorithm

34 System Partitioning Problem Model Example:  Task 1, 2, 7 and 8 are assigned to the GPP  Task 3, 4, and 6 onto the configurable logic  The inbound edges are colored accordingly  We don’t care the coloring for virtual nodes t 0 and t n  We don’t care the coloring for edge e 8n  Each task node is assigned a color  Color for each computational core

35 Pheromone Model  Each computing resource is assigned with a color c k  Each edge e ij is associated with a set of global heuristics (pheromone trails)  ij (k) indicating the favorableness for t j to be colored with c k  A coherent coloring is defined as:  Each task node in the DAG is colored  All the inbound edges of a task node have the same coloring as that of the corresponding task node

36 Ant Search Strategy  Each ant traverses the graph in topologically sorted order  Guarantees that each inbound edge to the current node has been already examined  At each node, the ant will:  Make guesses for the coloring of the successor nodes  Make decision on the coloring of the current node

37 Ant Search Strategy  At task node t i, the ant makes guesses the coloring for each of the successor nodes t j :   ij (k) : global heuristic on coloring t j with c k   j (k) : local heuristic on coloring t j with c k

38 Solution Construction  Upon entering a new task node t i, the ant makes a decision on the coloring of t i :  probabilistically based on the guesses made by all the immediate precedents of t i  Inbound edges are correspondingly colored once this decision is made

39 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n

40 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n

41 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n

42 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n

43 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n

44 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n

45 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n

46 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n

47 Find the best and update the pheromone trails based on the solution’s quality t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 0 t n Next iteration Feedback

48 Experimental Setup  Testing benchmarks:  DAGs of different sizes are generated randomly with average branching factor of 5  Real functions (in C/C++) extracted from the MediaBench suits are mapped onto the task nodes  Tasks are analyzed using SUIF and Machine SUIF tools to achieve detailed CDFG level description  Simplified communication interface between tasks  Major problem: Real applications task graphs  Goal: Find the optimal resource partition that achieves the best worst case execution time under FPGA area constraint

49 Definitive Quality Assessment  91.7% of the results are within the top 3% 77% of the results of AS are within the top 2% 63.5% of the results are within top 0.1%  Comparing AS results with brute force search  Offers definitive measurement for the quality  Gives full solution space (can filter out EASY cases)

50 Result Quality Assessment  33 difficult testing cases  3 25 possible partitions  SA-50 has comparable run time as the AS  SA-500 and SA-1000 runs at 10 and 20 times  Larger testcases – too big for brute force search  Comparison with Simulated Annealing

51 Code Generation  Once task graph is partitioned  Generate code for each task  Create communication protocols  Bus creation & arbitration  Memory hierarchy  Currently assume simple direct communication  Need code generation from every input specification to every computational core  Software – use conventional compiler flow  Hardware – need flow from task to HDL  Scheduling is fundamental problem for both hardware and software synthesis

52 Instruction Scheduling  Given: set of instructions and collection of computational units  Instruction modeled using data flow graph (DFG)  Directed acyclic graph  Each node is instruction  Each edge is a data dependence  Find schedule for instructions to minimize some function (latency, area, power, …) Auto Regressive Filter

53 Instruction Scheduling  NP-hard  Fundamental problem - many different heuristic methods have been developed  ILP  Force directed  Genetic algorithm  Path based  Graph theoretic  Computational geometry  List scheduling + NOP   +< - - 1 2 3 4 v2v1 v3 v4 v5 vn v6 v7 v8 v9 v10 v11

54 List Scheduling  Simple and effective  Greedy strategy  Operation selection decided by criticality  O(n) time complexity  Make a priority list of the instructions based on some measure (mobility, instruction depth, number of successors, etc.)  No single priority function works well over all applications  Highly dependent on problem instance  Priority function quality highly varied

55 Combining Ants and Lists  Ants determine priority list  List scheduling framework evaluates the “goodness” of the list + NOP   +< - - 1 2 3 4 v2v1 v3 v4 v5 vn v6 v7 v8 v9 v10 v11

56 Ant Search Strategy  Every iteration each ant creates a priority list  Fill one instruction at a time  Memory about instructions already selected  At step j ant has already selected j-1 instructions  jth instruction selected probabilistically

57 Ant Search Strategy   ij (k) : global heuristic (pheromone) for selecting instruction i at j position   j (k) : local heuristic – can use different properties  Instruction mobility (IM)  Instruction depth (ID)  Latency weighted instruction depth (LWID)  Successor number (SN)  ,  control influence of global and local heuristics

58 Pheromone Model  Each instruction op i  I associated with n pheromone trails  where j = 1, …, n   ij (k) indicates the favorableness for op i to be positioned at jth position in the priority list  Initially all  set to fixed value  0  Evaporation rate ij

59 ARF Pheromones

60 Experimental Results  ILP (optimal) using CPLEX  List scheduling  Instruction mobility (IM), instruction depth (ID), latency weighted instruction depth (LWID), successor number (SN)  Ant scheduling results using different local heuristics (Averaged over 100 runs)

61 Other Topics of Research  Data & computation layout – simultaneous distribution of data/computation, utilizing on-chip block RAM, exploiting parallelism,  DSP synthesis – optimization of polynomial expressions, synthesizing multiple constant multiplications to hardware  Low power microarchitecture techniques – cache design to minimize leakage current, configurable memory hierarchy to minimize power, prefetching to minimize power

62 ExPRESS Group  ExPRESS - EXtensible, Programmable, Reconfigurable Embedded SystemS  Extensible – customized processors, configurable instruction sets  Programmable – post-manufacturing customization  Reconfigurable – rapid configuration changes ASIC RISC RAM FPGA ARM DSP System On Chip (SOC)

63 More ExPRESS Information…  Members  PhD Students  Wenrui Gong  Anup Hosangadi  Yan Meng  Gang Wang  Undergrads  Daniel Grund  Willis Hoang  Webpage - http://express.ece.ucsb.edu/http://express.ece.ucsb.edu/

64 Extra Slides

65 Radiolocation (GSIC)

66 Multiuser Detection Sample Kalman Filter z -1 Weight Update MMSE Detector Sample Kalman Filter z -1 Weight Update MMSE Detector

67 Underwater Acoustic Receiver EKF

68 Filters Finite Impulse Response

69 Auto Regressive Filter


Download ppt "Ants in the Ocean: System Design Techniques for Underwater Sensing Applications Ryan Kastner Dept. of Electrical and Computer Engineering University of."

Similar presentations


Ads by Google