Download presentation
Presentation is loading. Please wait.
Published byCharlotte Waters Modified over 9 years ago
1
University of Veszprém Department of Image Processing and Neurocomputing Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs Zoltán Nagy, Péter Szolgay
2
Nagy 2 MAPLD 2005/153 Introduction Cellular Neural/Nonlinear Networks Universal Machine (CNN-UM) Ocean modeling Results Conclusions
3
Nagy 3 MAPLD 2005/153 Cellular Neural/Nonlinear Networks (CNN) 2 or N dimensional grid Locally connected Analog processing elements State value is continuous in time
4
Nagy 4 MAPLD 2005/153 Structure of a CNN cell u ij input x ij state y ij output z ij constant bias A ij,kl feedback template B ij,kl feed-forward template
5
Nagy 5 MAPLD 2005/153 CNN-UM implementations Software simulation Easy to implement Slow, even if using processor specific instructions Emulated digital VLSI Specialized digital architecture Selectable computing precision (Castle architecture: 1, 6, 12 bit) Orders faster than the software simulation Long design time Analog VLSI Huge computing power (~TeraOP/s) Low accuracy (7-8 bit) Noise and temperature sensitivity
6
Nagy 6 MAPLD 2005/153 Structure of the Falcon emulated digital CNN-UM Mixer Contains cell values for the next updates Memory unit Contains a belt of the cell array Template memory Arithmetic unit Processors can be connected on a grid Linear speedup
7
Nagy 7 MAPLD 2005/153 Structure of the arithmetic unit Cell update in row wise order Cycle time depends on template size Fully pipelined
8
Nagy 8 MAPLD 2005/153 Configurable parameters State, template and constant width between 2 to 64 bits Number of templates Size of the templates Width of the cell array slice Number of layers Number and arrangement of the processor cores
9
Nagy 9 MAPLD 2005/153 Example: Solution of a simple PDE on CNN The Wave equation Spatial discretization 2 layer CNN
10
Nagy 10 MAPLD 2005/153 Ocean models Barotropic model Baroclinic models z-coordinate model σ-coordinate model isopycnal Fine resolution models Real-time forecast Fishing industry Search and rescue Coarse resolution models Long term predictions Climate modeling
11
Nagy 11 MAPLD 2005/153 The Princeton Ocean Model (POM) Sigma coordinate model Vertical coordinate is scaled on the water column depth Second moment turbulence closure sub-model Provides vertical mixing coefficients Solution technique: Mode splitting Internal mode (3D) o Vertical structure equations o Implicit solution External mode (2D) o Vertically integrated equations o Explicit solution (Leapfrog method)
12
Nagy 12 MAPLD 2005/153 Governing equations of the external (2D) mode u x, u y mass transport η free surface elevation Ω angular rotation of the Earth Θ latitude H depth of the ocean g gravitational acceleration τ w, τ b wind and bottom stress A lateral viscosity
13
Nagy 13 MAPLD 2005/153 Solution on CNN Spatial discretization on a uniform grid 3-layer CNN structure Non-linear template required for advection term Cannot be solved on analog VLSI CNN chips Solvable on the modified Falcon architecture Support of non-linearity Specialized cell model
14
Nagy 14 MAPLD 2005/153 The modified arithmetic unit of the Falcon architecture
15
Nagy 15 MAPLD 2005/153 Implementation on FPGA Complicated arithmetic unit Fixed-point number representation Configurable precision High level hardware description language required (e.g. Handel-C)
16
Nagy 16 MAPLD 2005/153 Performance
17
Nagy 17 MAPLD 2005/153 The Seamount problem
18
Nagy 18 MAPLD 2005/153 Results after 72 hours Circulation patternElevation
19
Nagy 19 MAPLD 2005/153 Error of the solution
20
Nagy 20 MAPLD 2005/153 Error of the solution
21
Nagy 21 MAPLD 2005/153 Memory requirements of the internal (3D) equations Extended memory hierarchy New level stores 3 cross sectional slices from the 3D array o Large memory required (e.g. 512x512x64 sized grid, 3x512x64 elements per state variable) o Cannot be stored on-chip o Off-chip storage requires huge I/O bandwidth Processor array should be used The 3D array is divided between the processors Optimal data set for on chip storage: 2048 elements per cross sectional slice (512x32x64 sized grid per processor) Each processor located on a separate FPGA
22
Nagy 22 MAPLD 2005/153 Solution of the internal (3D) equations Implicit solution Fixed-point solution o Requires large precision to avoid rounding errors o Seems to be impractical Floating-point solution o Requires large area (especially add/sub) Explicit solution Smaller timestep Simpler arithmetic unit
23
Nagy 23 MAPLD 2005/153 Conclusions Ocean modeling using emulated digital CNN is very promising Moderate precision is required in 2D mode 1% accuracy using 24 bits Expected speedup (compared to an Athlon64 2GHz microprocessor) 80 times on our RC200 prototyping board 3700 times on the largest available FPGA
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.