Download presentation
Presentation is loading. Please wait.
Published byGerald Nash Modified over 6 years ago
1
Paul D. Reynolds Russell W. Duren Matthew L. Trumbo Robert J. Marks II
High Speed Implementation of Particle Swarm Optimization for Neural Networks Paul D. Reynolds Russell W. Duren Matthew L. Trumbo Robert J. Marks II
2
Original Problem To determine the optimal sonar setup to maximize the ensonification of a grid of water. Influences to ensonification: Environmental Conditions – Temperature, Wind Speed Bathymetry – Bottom Type, Shape of Bottom Sonar System Total of 27 different factors accounted for
3
Ensonification Example
15 by 80 pixel grid Red: High signal to interference ratio Blue: Low signal to interference ratio Bottom: No signal
4
Original Solution Take current conditions
Match to previous optimum sonar setups with similar conditions Run acoustic model using current conditions and previous optimum setups Use sonar setup with highest signal to interference ratio
5
New Problem Problem: Solution
One acoustic model run took tens of seconds Solution Train a Neural Network on the acoustic model (APL & University of Washington)
6
Neural Network Overview
Inspired by the human ability to recognize patterns. Mathematical structure able to mimic a pattern Trained using known data Show the network several examples and identify each example The network learns the pattern Show the network a new case and let the network identify it.
7
Neural Network Structure
OUTPUTS Each neuron is the squashed sum of the inputs to that neuron A squash is a non-linear function that restricts outputs to between 0 and 1 Each arrow is a weight times a neuron output WEIGHT LAYER NEURON INPUTS
8
Ensonification Neural Network
Taught using examples from the acoustical model. Recognizes a pattern between the 27 given inputs and 15 by 80 grid output Architecture Squash =
9
Did the neural network solve the problem?
Yes: Neural network acoustic model approximation: 5 ms However- Same method of locating best: Run many possible setups in neural network Choose best Problem: Better, but still not real time
10
How to find a good setup solution: Particle Swarm Optimization
Idea Several Particles Wandering over a Fitness Surface Math xk+1 = xk + vk vk+1 = vk + rand*w1*(Gb-xk)+rand*w2*(Pb-xk) Theory Momentum pushes particles around surface Pulled towards Personal Best Pulled towards Global Best Eventually particles oscillate around Global Best
11
Particle Swarm in Operation
12
Particle Swarm Optimization
27 Inputs to Neural Network, Sonar System Setup Fitness Surface Calculated from neural network output Two Options Match a desired output Sum of the difference from desired output Minimize the difference Maximize signal to interference ratio in an area Ignore output in undesired locations
13
Particle Swarm in Operation
14
New Problem Enters Time for 100k step particle swarm using a 2.2Ghz Pentium: nearly 6 minutes Desire a real time version Solution: Implement the neural network and particle swarm optimization in parallel on reconfigurable hardware
15
Implementation Hardware
SRC-6e Reconfigurable Computing Environment 2 Intel Microprocessors 2 Xilinx Virtex II Pro 6000 FPGAs 100 Mhz 76,032 Logic Gates x 18 multipliers
16
Three Design Stages Activation Function Design Neural Network Design
Sigmoid not efficient to calculate Neural Network Design Parallel Design Particle Swarm Optimization Hardware Implementation
17
Activation Function Design
Fixed Point Design Sigmoid Accuracy Level Weight Accuracy Level
18
Fixed Point Design Data Range of -50 to 85 Fractional Portion
2’s Complement 7 integer bits 1 sign bit Fractional Portion Sigmoid outputs less than 1 Some number of fractional bits
19
Sigmoid Accuracy Level
20
Weight Accuracy Level
21
Total Accuracy
22
Fixed Point Results 16-bit Number Advantages 1 Sign Bit 7 Integer Bits
8 Fractional Bits Advantages 18 x 18 multipliers 64-bit input banks
23
Activation Function Approximation
Compared 4 Designs Look-up Table Shift and Add CORDIC Taylor Series
24
Look-up Table Advantages Disadvantages Unlimited Accuracy
Short Latency of 3 Disadvantages Desire entirely in chip design LUT will not fit on chip with 92,000 Weights
25
Look-up Table
26
Shift and Add Y(x)=2-n*x + b Advantages Disadvantages Small Design
Short Latency of 5 Disadvantages Piecewise Outputs Limited Accuracy
27
Shift and Add
28
CORDIC Computation Divide Argument By 2 Series of Rotations
Sinh(x) Cosh(x) Division for Tanh(x) Shift and Add for Result
29
CORDIC Advantages Disadvantages Unlimited Accuracy Real Calculation
Long Latency of 50 Large Design
30
CORDIC
31
Taylor Series Y(x) = a+b(x-x0)+c(x-x0)2 Advantages Average
Unlimited Accuracy Average Latency of 10 Medium Size Design Disadvantages 3 multipliers
32
Taylor Series
33
Neural Network Design Desired Limitations
Architecture Maximum Parallel Design Entirely on Chip design Limitations 92, bit weights in 144 RAMB16s Layers are Serial 144 18x18 Multipliers
34
Neural Network Design Initial Test Design Serial Pipeline
One Multiply per Clock 92,000 Clocks 1 ms=PC equivalent
35
Test Output FPGA output Real output
36
Test Output FPGA output Real output
37
Test Output FPGA output Real output
38
Neural Network Design Maximum Parallel Version
71 Multiplies in Parallel Zero weight padding Treat all layers as the same length 71 25 clock wait for Pipeline Total 1475 clocks per Network Evaluation 15 microseconds 60,000 Networks Evaluations per Second
39
Neural Network Design
40
Particle Swarm Optimization
2 Chips in SRC Particle Swarm Controls inputs Sends to Fitness Chip Receives a fitness back Fitness Function Calculates Network Compares to Desired Output
41
Particle Swarm Implementation
Problem - randomness vk+1 = vk + rand*w1*(Gb-xk)+rand*w2*(Pb-xk) Solution - remove randomness? vk+1 = vk + w1*(Gb-xk) + w2*(Pb-xk) Does it work? Yes, but not as well Optimization takes more fitness evaluations
42
Random vs. Deterministic
Deterministic – Blue Random – Green/Red
43
Particle Swarm Chip 10 Agents Restrictions
Preset Starting Points and Velocities 8 from Previous Data, Random Velocities 1 at maximum range, aimed down 1 at minimum range, aimed up Restrictions Maximum Velocity Range
44
Update Equation Implementation
Xmaxk Xmink XnDimk VnDimk PnDimk Gk Vmaxk Xmaxk Xmink X+V VnDimk P-X G-X Vmaxk Compare V+1/8(P-X)+1/16(G-X) Vmaxk New XnDimk Compare New XnDimk New VnDimk xk+1 = xk + vk vk+1 = vk + w1*(Gb-xk)+w2*(Pb-xk)
45
Results – Output Matching 100k iteration PSO ->1.76 s
SWARM REAL
46
Results – Output Matching 100k iteration PSO ->1.76 s
SWARM REAL
47
Results – Output Matching 100k iteration PSO ->1.76 s
SWARM REAL
48
Particle Swarm-Area Specific 100k iteration PSO ->1.76 s
49
Particle Swarm-Area Specific 100k iteration PSO ->1.76 s
50
Particle Swarm-Area Specific 100k iteration PSO ->1.76 s
51
ANY QUESTIONS?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.