1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
HPEC 2012 Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing Quinn Martin Alan George.
ICAP CONTROLLER FOR HIGH-RELIABLE INTERNAL SCRUBBING Quinn Martin Steven Fingulin.
Reconfigurable Computers in Space: Problems, Solutions and Future Directions Neil W. Bergmann, Anwar S. Dawood CRC for Satellite Systems Queensland University.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
Super Fast Camera System Performed by: Tokman Niv Levenbroun Guy Supervised by: Leonid Boudniak.
Team Morphing Architecture Reconfigurable Computational Platform for Space.
Reliable Data Storage using Reed Solomon Code Supervised by: Isaschar (Zigi) Walter Performed by: Ilan Rosenfeld, Moshe Karl Spring 2004 Part A Final Presentation.
Parallel JPEG2000 Compression System Performed by: Dmitry Sezganov, Vitaly Spector Instructor: Stas Lapchev, Artyom Borzin.
Configurable System-on-Chip: Xilinx EDK
Reliable Data Storage using Reed Solomon Code Supervised by: Isaschar (Zigi) Walter Performed by: Ilan Rosenfeld, Moshe Karl Spring 2004 Midterm Presentation.
Parallel compressing system for satellite on programmable chip Yifat Manzor Yifat Manzor & Reshef Dahan Reshef Dahan Supervisor: Eran Segev.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
הטכניון - מכון טכנולוגי לישראל הפקולטה להנדסת חשמל Technion - Israel institute of technology department of Electrical Engineering Virtex II-PRO Dynamical.
1 Design of the Front End Readout Board for TORCH Detector 10, June 2010.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
Still Image Conpression JPEG & JPEG2000 Yu-Wei Chang /18.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf
Ross Brennan On the Introduction of Reconfigurable Hardware into Computer Architecture Education Ross Brennan
A New Reference Design Development Environment for JPEG 2000 Applications Bill Finch CAST, Inc. Warren Miller AVNET Design Services DesignCon 2003 January.
Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.
A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.
LaRC p174/ MAPLD 2004Jones Slide 1 Experiences in the Development of an FPGA Based Radiation Tolerant Design Mark Jones
MAPLD 2009 Presentation Poster Session
DLS Digital Controller Tony Dobbing Head of Power Supplies Group.
SLAAC SV2 Briefing SLAAC Retreat, May 2001 Heber, UT Brian Schott USC Information Sciences Institute.
1 of 23 Fouts MAPLD 2005/C117 Synthesis of False Target Radar Images Using a Reconfigurable Computer Dr. Douglas J. Fouts LT Kendrick R. Macklin Daniel.
Electronics in High Energy Physics Introduction to Electronics in HEP Field Programmable Gate Arrays Part 1 based on the lecture of S.Haas.
RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture:
U N C L A S S I F I E D FVTX Detector Readout Concept S. Butsyk For LANL P-25 group.
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
Leo Greiner IPHC meeting HFT PIXEL DAQ Prototype Testing.
J. Christiansen, CERN - EP/MIC
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
Group No 5 1.Muhammad Talha Islam 2.Karim Akhter 3.Muhammad Arif 4.Muhammad Umer Khalid.
Somervill RSC 1 125/MAPLD'05 Reconfigurable Processing Module (RPM) Kevin Somervill 1 Dr. Robert Hodson 1
Swankoski MAPLD 2005 / B103 1 Dynamic High-Performance Multi-Mode Architectures for AES Encryption Eric Swankoski Naval Research Lab Vijay Narayanan Penn.
Design of a Novel Bridge to Interface High Speed Image Sensors In Embedded Systems Tareq Hasan Khan ID: ECE, U of S Term Project (EE 800)
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
Leo Greiner PIXEL Hardware meeting HFT PIXEL detector LVDS Data Path Testing.
2 Systems Architecture, Fifth Edition Chapter Goals Describe the system bus and bus protocol Describe how the CPU and bus interact with peripheral devices.
1 Aerospace Data Storage and Processing Systems SEAKR Engineering Proprietary Information SEAKR Engineering Inc. On-Board Processing SEAKR Engineering.
Universal Reconfigurable Processing Platform for Space Presented by Dorian Seagrave Gordonicus LLC.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the FPX.
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
Copyright © 2004, Dillon Engineering Inc. All Rights Reserved. An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs  Architecture optimized.
Somervill RSC 1 125/MAPLD'05 Reconfigurable Processing Module (RPM) Kevin Somervill 1 Dr. Robert Hodson 1
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Delivered by.. Love Jain p08ec907. Design Styles  Full-custom  Cell-based  Gate array  Programmable logic Field programmable gate array (FPGA)
Parallel compressing system for satellite on programmable chip Yifat Manzor Yifat Manzor & Reshef Dahan Supervisor: Eran Segev Part A.
MAPLD 2005/213Kakarla & Katkoori Partial Evaluation Based Redundancy for SEU Mitigation in Combinational Circuits MAPLD 2005 Sujana Kakarla Srinivas Katkoori.
Programmable Logic Devices
Introduction to Programmable Logic
CFTP ( Configurable Fault Tolerant Processor )
SEU Mitigation Techniques for Virtex FPGAs in Space Applications
Architecture & Organization 1
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
RECONFIGURABLE PROCESSING AND AVIONICS SYSTEMS
Architecture & Organization 1
Wavelet “Block-Processing” for Reduced Memory Transfers
Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.
Dynamic High-Performance Multi-Mode Architectures for AES Encryption
Hardware Assisted Fault Tolerance Using Reconfigurable Logic
Presentation transcript:

1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented by Damon Van Buren SEAKR Engineering MAPLD 2004 Submission 133

2 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 The Sensor Bandwidth Problem oCommercial satellite imaging systems are experiencing growth in imaging capability... Higher resolution: < 1 m Larger images: >10k image width and height More spectral components –Panchromatic –Red/Green/Blue –Multi-spectral oImproved capabilities are leading to high sensor data rates Data output rates > 2 Gbps for some systems oProviding storage and downlink bandwidth for the data is becoming a significant challenge for system designers The largest data recorders can store less than 20 minutes of data at 2 Gbps Downlinks must be several hundred Mbps to downlink 15 minutes of data in under an hour Data storage and high-bandwidth downlinks require lots of power oBy reducing the amount of image data, compression provides a solution to the bandwidth problem!

3 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 Desired Compressor Features oReal Time Compression must be performed in real time, prior to storage. High throughput (> 2 Gbps) oExcellent Performance in Lossy and Lossless Modes Purchasers of satellite imagery are sensitive to reductions in image quality caused by lossy compression. Scientific users prefer undistorted data (bit true). oSpace-Qualified Must survive hazards of launch and space operation, including radiation. oLow Risk Satellite imaging companies seek high reliability solutions.. oLow Cost Commercial customers require cost effective solutions. oFlexible The ability to support varying compression ratios and contents would allow more effective use of available storage and bandwidth.

4 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 JPEG2000 Algorithm oJPEG2000 is an excellent choice for satellite image compression. Latest still image compression standard from the JPEG committee oMeets two key requirements for satellite image compression: Excellent performance in both lossy and lossless modes. –~1.7 to 1 lossless compression for typical satellite imagery - 70% improvement! –Visually lossless compression > 2 to % improvement in storage and downlink performance. Very flexible: –Many options for compressed images. oOther advantages: International Standard Wavelet based –High quality lossy images with comp. ratios > 100:1 Packet oriented –Allows random access to the compressed code stream. –Makes compressed data more robust in the presence of bit errors. –Allows selection of image quality, spatial region, resolution, and color component after compression.

5 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 JPEG2000 Implementation Challenges oJPEG2000 is a very complex algorithm. More Features = More Complexity. oOperation intensive Several hundred operations per pixel, because each bit must be processed many times, for the wavelet transform, entropy coding, MQ coding, packet generation, etc. oComplex Many different stages to produce compressed output. –Wavelet transform. –Quantization. –Context generation. –Arithmetic coding. –Packet generation. Many parameters must be tracked individually for each code block (64x64). oMemory intensive Each pixel must be accessed many times, so many small buffers are needed to get good throughput. oFew processors are capable of implementing JPEG2000 at high rates!

6 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 High-Performance Processing Using Xilinx FPGAs oXilinx FPGAs have many advantages for fast parallel processing: Millions of gates. System clocks of several hundred MHz. High speed I/O –622 Mbps LVDS –Multi-Gigabit serial I/O Hundreds of internal block RAMS. Hundreds of internal 18 bit multipliers. oXilinx FPGAs are available in a space qualified versions: Radiation testing is complete on the Virtex and Virtex-II devices. –~200 kRad total dose, latchup immune. Radiation testing to begin on the Virtex-II Pro devices soon. oXilinx FPGAs are very flexible, reducing risk: May be re-programmed an infinite number of times. Configurations may be uploaded at any time during the mission to fix errors or add new capability. oXilinx FPGAs are the best solution for fast compression in space!

7 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 Challenges for Xilinx Use in Space oThe effects of radiation in spacecraft electronics are well known. Caused primarily by charged particles. May cause permanent damage over time by ionizing SiO 2 (total dose). May also cause errors in digital logic by upsetting registers (single event effects). Mitigation techniques are used to reduce or eliminate the effect of radiation upsets. –Triple Modular Redundancy (TMR) uses voting to select the correct output from 3 separate instances of the design. oMitigation of radiation effects in SRAM-based FPGAs presents an additional challenge: As with other digital electronics, the functional logic of the device is susceptible to upset, however... Another layer of logic (configuration logic) controls the routing of the part, giving the device its capability to be reprogrammed to perform different functions. Configuration logic is also susceptible to radiation upsets. oXilinx FPGAs require system level mitigation strategies in addition to the device level mitigation techniques (such as TMR) that are commonly used for space electronics. Configuration data must be continuously re-written, or scrubbed using a read- and-correct approach.

8 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 SEAKR’s RCC Board Processing Solutions oSEAKR has developed a line of Reconfigurable Computing (RCC) products based on the Xilinx FPGAs. RCC 1 – 4x Virtex 1000s RCC 2 – 4x Virtex II 6000s RCC 3 (NTRCC) – 4x Virtex II Pro 70/100s oBoards include system-level upset mitigation (scrub) for the Xilinx devices. Configuration data is continuously read and checked for errors. Errors are corrected by overwriting the corrupted frames, without interrupting the operation of the device. oOther devices on board employ radiation mitigation strategies as well: Radiation hardened EDAC oBoards also have dedicated resources to support high-performance processing: High speed I/O. External memories. oIndustry standard form-factor: 6U Compact PCI.

9 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 Network RCC (NTRCC) oFour Xilinx XC2VP70-6FF1704 FPGA CO-Processors Design compatible with XC2VP100-6FF1706 and V2P-X o(4) banks of 1Mx36 Quad Data Rate (QDR) SRAMs for each COP o512MB of DDRII Shared SDRAM memory for prototype 1GB of 128M x 64 EDAC (R-S) Protected DDRII SDRAM shared memory using 1Gbit memory oNetwork IF (2) parallel 16bit RapidIO ports to front panel (8 Gbps) (1) 4x3.125 Gbps serial port to front panel (>10Gbps) 4x3.125 Gbps ports from NIC to each COP (>10Gbps) 4x3.125 Gbps ports from each COP to each neighbor COP (>10Gbps) oShared Data Buses Cop Interconnect Bus (~4.224 Gbps) cPCI 32bit 33Mhz oRead and write COP configurations via cPCI oExtended 6U form factor oConfiguration RAM SEU detection and correction DDRII SDRAM on configuration controller for shadow config program storage oNon-Volatile memory for 16 different configurations (1 Gbit Flash)

10 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 Network RCC Block Diagram

11 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 NTRCC Layout o24 Layer board oMicroVias, blind vias, via-in-pad oHigh speed Gbps Serial links o82 pages of schematic capture o10 weeks of PCB layout time

12 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 Implementation of the JPEG2000 Algorithm oThe JPEG2000 core has been in development for over a year. Eventual target data rate 600 Mbps/device. Written in VHDL. Simulations performed in Modelsim. Synthesis in Synplify_Pro. oTargeted to the NTRCC-R summer ‘04. Targeted to a reduced version of the NTRCC with a single coprocessor. Take advantage of improved external memory throughput. Ultimately use the high-speed serial I/O to move image information on the board. oDesigned for high throughput. Cycle efficient coding style. Highly parallel design. Pipelined architecture. Rolling wavelet transform. oDesigned for flexible output file format. Output is divided into quality layers for easy selection of compression ratio.

13 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 JPEG2000 Block Diagram

14 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 JPEG2000 Coding Steps oImage is broken into tiles oTiles are wavelet transformed 5/3 reversible or 9/7 irreversible, also user defined. Selectable number of transform levels. oEach subband from the transform is further broken up into code blocks (typically 32x32 or 64x64) for entropy coding. oEach code block is entropy coded, starting from the top bit plane and working down. The current bit of each pixel is passed to an arithmetic coder, along with context information. The MQ encoder takes advantage of any skewing of the probability for each context, and adapts contexts as the coding progresses. oPackets are formed by combining the entropy coder outputs from a single resolution. oTile parts are formed from all the packet in a given bit plane.

15 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 JPEG2000 Architecture Drivers oTo achieve high data rates, the processing must be paralleled as much as possible. oThe “tall pole in the tent” is the arithmetic coding, because the coding of a single data bit with its context can take several clock cycles. oSignificance propagation coding is also a challenge, because each coefficient must be accessed many times, as each bit plane is processed. oOther operations, such as wavelet transform, code block loading, and packet generation are much more efficient, and require fewer parallel paths. oA pipelined architecture with many entropy coders in parallel was used to achieve the required throughput.

16 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 Architecture Description oProcesses 256x256 tiles. oPipelined architecture, using separate external memories for image, tile, and compressed data storage. o19 Entropy coders working in parallel to improve throughput, one for each code block. 64x64 code blocks. oFIFO buffering between the stages improves data flow efficiency. oA rolling wavelet transform is used to reduce memory accesses and improve efficiency. oEntropy coder outputs are formed into layers, giving each tile a progressive output format. oTile parts are interleaved as the image tiles are processed. oPerforms lossy or lossless compression.

17 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 NTRCC-R Implementation Results oThe JPEG2000 encoder was targeted to the V2Pro 70 FPGA on the NTRCC-R. Lossless or Lossy compression. Data precision up to 13 bits. oSimulation and Routing Results: Slices: out of 33088, 90% Block RAMS: 148 out of 328, 45% Max system clock ~43 MHz without optimization. oHardware Throughput: ~140 Mbps w/ 33 MHz clock (depending on image.) ~180 Mbps w/ 43 Mhz clock.

18 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 JPEG2000 Floorplan oThe Pro 70 Device is quite full!

19 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 Planned Improvements oOptimize design to hit 66 MHz. Un-optimized design will operate at up to 43 MHz. Use of asynchronous fifos will allow optimal clocking of various parts of the design. oImprove pipelining of code block loader and wavelet transform. Allow “autonomous” operation of each stage, so that operations take place as soon as input data and output buffers are ready. oMake use of additional QDR SRAMs available to each coprocessor by creating separate buffers for wavelet transform and packetizer output. NTRCC has 4 QDR memories for each coprocessor. oArithmetic coder bypass. Arithmetic coder requires > 2 cycles per bit coded, on average. o9/7 wavelet transform with quantization. Use of the 9/7 wavelet results in better SNR and max error performance for lossy compression. oAdd RapidIO serial interface to Network Interface Chip (NIC).

20 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 Conclusions oThe JPEG2000 core is expected to provide a valuable option for satellite imagery systems. Compression will result in a dramatic improvement in system performance. Lossless compression will allow ~70% more image data to be stored and downlinked by a system. Lossy compression will allow even greater improvements. oNTRCC hardware is an excellent platform for the compressor. High bandwidth interconnect and I/O (several Gbps). High bandwidth external memories. Excellent processing capability with the Virtex-II Pro devices. oThe sky’s the limit! Target rate of 600 Mbps per device appears to be a realistic goal. Some improvements are left to be made to the clock rate and pipelining of the design.