Supervisor: INA RIVKIN Students: Video manipulation algorithm on ZYNQ Part B.

Slides:



Advertisements
Similar presentations
Nios Multi Processor Ethernet Embedded Platform Final Presentation
Advertisements

3D Graphics Content Over OCP Martti Venell Sr. Verification Engineer Bitboys.
System Integration and Performance
Performance of Cache Memory
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
Microprocessor and Microcontroller
1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 3: Input/output and co-processors dr.ir. A.C. Verschueren.
Section A A Step-By-Step Description of the System Generator Flow For a Colour Space Convertor In this section, a colour image stored as.
Video enhances, dramatizes, and gives impact to your multimedia application. Your audience will better understand the message of your application.
Mid semester Presentation Data Packages Generator & Flow Management Data Packages Generator & Flow Management Data Packages Generator & Flow Management.
SWE 423: Multimedia Systems Chapter 7: Data Compression (1)
Double buffer SDRAM Memory Controller Presented by: Yael Dresner Andre Steiner Instructed by: Michael Levilov Project Number: D0713.
Memory Management 2010.
Zach Allen Chris Chan Ben Wolpoff Shane Zinner Project Z: Stereo Range Finding Based on Motorola Dragonball Processor.
Pipelining By Toan Nguyen.
Final presentation Encryption/Decryption on embedded system Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Winter 2013 Part A.
2 Lines Electronics I 2 C Analyzer Ching-Yen Beh Robert S. Stookey Advisor: Dr. J. W. Bruce.
Project – Video manipulator (based on Zed Board) Final presentation
USB host for web camera connection
IE433 CAD/CAM Computer Aided Design and Computer Aided Manufacturing Part-2 CAD Systems Industrial Engineering Department King Saud University.
Digital Images The digital representation of visual information.
Page 18/30/2015 CSE 40373/60373: Multimedia Systems 4.2 Color Models in Images  Colors models and spaces used for stored, displayed, and printed images.
CS 1308 Computer Literacy and the Internet. Creating Digital Pictures  A traditional photograph is an analog representation of an image.  Digitizing.
JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.
Computer Organization
Sub- Nyquist Sampling System Hardware Implementation System Architecture Group – Shai & Yaron Data Transfer, System Integration and Debug Environment Part.
Ross Brennan On the Introduction of Reconfigurable Hardware into Computer Architecture Education Ross Brennan
Digilent System Board Capabilities Serial Port (RS-232) Parallel Port 1 Pushbutton Hint: Good for a reset button Connected to a clock input. See Digilent.
Impulse Embedded Processing Video Lab Generate FPGA hardware Generate hardware interfaces HDL files HDL files FPGA bitmap FPGA bitmap C language software.
Practical PC, 7th Edition Chapter 17: Looking Under the Hood
Department of Electrical Engineering Electronics Computers Communications Technion Israel Institute of Technology High Speed Digital Systems Lab. High.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Matrix Multiplication on FPGA Final presentation One semester – winter 2014/15 By : Dana Abergel and Alex Fonariov Supervisor : Mony Orbach High Speed.
Infrastructure design & implementation of MIPS processors for students lab based on Bluespec HDL Students: Danny Hofshi, Shai Shachrur Supervisor: Mony.
The 8253 Programmable Interval Timer
MOI PROJECT Gugulethu Mabuza Bachelor Science Electrical Engineering Michigan State University.
CS1Q Computer Systems Lecture 8
Introduction to Experiment 5 VGA Signal Generator ECE 448 Spring 2009.
EE 113D Final Project: Colorization Spring, 2006 Group Members: Johnny Cheng Brian Cheung Austin Wong Professor: R. Jain TA: Rick Lan.
Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project.
Department of Electrical Engineering Electronics Computers Communications Technion Israel Institute of Technology High Speed Digital Systems Lab. High.
Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison.
EE3A1 Computer Hardware and Digital Design
Lecture 7: Intro to Computer Graphics. Remember…… DIGITAL - Digital means discrete. DIGITAL - Digital means discrete. Digital representation is comprised.
PROJECT - ZYNQ Yakir Peretz Idan Homri Semester - winter 2014 Duration - one semester.
Computer Architecture Lecture 32 Fasih ur Rehman.
Performed by: Eliran Cohen & Michael Rapoport Instructor: Ina Rivkin המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון.
ECE 448: Lab 5 VGA Display. Breaking-Bricks..
Performed by: Dor Kasif, Or Flisher Instructor: Rolf Hilgendorf Jpeg decompression algorithm implementation using HLS PDR presentation Winter Duration:
Final Presentation Encryption on Embedded System Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Spring 2014 Part B.
November 29, 2011 Final Presentation. Team Members Troy Huguet Computer Engineer Post-Route Testing Parker Jacobs Computer Engineer Post-Route Testing.
CS1Q Computer Systems Lecture 8
JPEG.
Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Annual project אביב תשס " ט.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Sub- Nyquist Sampling System Hardware Implementation System Architecture Group – Shai & Yaron Data Transfer, System Integration and Debug Environment Part.
Roman Kofman & Sergey Kleyman Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part A (Annual project)
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Implementation of Pong over VGA on the Nexys 4 FPGA
Lab 4 HW/SW Compression and Decompression of Captured Image
Binary Notation and Intro to Computer Graphics
REGISTER TRANSFER LANGUAGE (RTL)
Chapter 10 Computer Graphics
ENG3050 Embedded Reconfigurable Computing Systems
Bandwidth vs Frequency (Subsampling Concepts) TIPL 4700
Highly Efficient and Flexible Video Encoder on CPU+FPGA Platform
A Step-By-Step Description of the System Generator Flow
A Step-By-Step Description of the System Generator Flow
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
♪ Embedded System Design: Synthesizing Music Using Programmable Logic
Presentation transcript:

supervisor: INA RIVKIN Students: Video manipulation algorithm on ZYNQ Part B

Motivation our goal is to build an embedded system which can receive video, process the video by hardware and software and finally send it out to a monitor. The system is based on the ZYNQ device of Xillinx. embedded system

Project goal Zed Board FMC module Add the ability to receive and draw out video signals to and from the embedded system. Add the ability to process the video signals by hardware, software and both. HDMI IN HDMI OUT

Background The ZYNQ component The board we are working on is called ZedBoard. The main device on our board is the ZYNQ. The ZYNQ consists of two main parts: the FPGA (programmable logic), and the ARM dual processor..We consider the above to be an embedded system

The The HDMI Input/Output FMC Module The FMC module enables us to receive and send the video HDMI data, the FMC module connects to an FMC carrier in the ZedBoard, and provides the following interfaces: 1) HDMI Input 2) HDMI Output 3) The interface for the ZedBoard The interface for the ZedBoard HDMI INPUT HDMI OUTPUT

Work environment Hardware design: Planahead 14.4 – for Xillinx FPGA design Xillinx Platform Studio (XPS) Software design: Xillinx Software Development Kit (SDK) Debugging SDK

At the previous project-part A  We built the embedded system, based on the ZYNQ device and the FMC module.  The system was able to receive and send video signals.  Input video signals passed through the FPGA components and through the ARM processor.  Video processing was by hardware and software  Video manipulation was simple and done by the software.

The full system part A Software block Hardware block AXI FMC interface Video detector Video resolution Frame buffer Video generator HDMI out HDMI in Video out Video in VDMA Frame buffer DDR AXI4S _in VTC _0 AXI4S_ out VTC _1

Project part B  Use the embedded we built at the previous project.  Perform complex processing using the software.  Perform a complex processing using the hardware.  Combine the two kinds of processing into single project.

 Our system works with “luma” representation “YCbCr” instead of RGB in order to be more efficiant.  RGB pixel -> 32 bit  YCbCr pixel ->16 bit Video color space

 RGB format uses 32 bit for each pixel, each pixel is composed of 4 channels 8 bits each the firs three are the color channels the RED GREEN and BLUE colors  The fourth channel is the intensity (transparency) channel (0-transperent pixel, 1- opaque).  RGB format is divided into two main representations the alpha and beta, alpha is represented as we discussed above and beta is represented by already multiplied RGB channels in the intensity channel to achieve higher efficiency RGB video format

 This format is encoding of RGB format and the final pixel we see on screen depends on the interpretation we do to this format (hardware video out component)  8 lsbits are for Y component, that is the intensity component ( luminance)  8 msbits are for Cb and Cr component 4 bits each, they are the color components (chroma) YCbCr video format

 We build a (x,y) grid on the standard color map, the x,y axis's are controlled with the Cb and Cr components, so given CbCr we know the color.  Now we add z axis to set the brightness of the pixel, which is controlled by thy Y luminance components. How does it works

 Cb and Cr is the X and Y axises (respectively), as we can see at the figure below.  Z axis (luminance component) is directed inside the page How does it looks like

 The way our eye sees colors allows us to modify and manipulate pixels,instead of the larger 32 bit RGB format, in the much smaller luma format (16 bit)  It allows us to achieve higher efficiency when trying to manipulate the pixel (or just streaming video frames) thus many displays use the luma format.  for more accurate manipulation it is possible to use rgb format but then we have to change the hardware components *note: Xilinx offers suitable IP cores for example we will show the hdmi_out_rgb core In conclusion

RGB Hardware components This is the rgb -> luma interpretation Timing control convert data from the AXI4-Stream video protocol interface to video domain interface used to convert between different video formats Interpretation from signals to pixels

SOFTWARE

Software manipulation Pressing the switches to freeze a video frame and pressing switches back to return to streaming. While frames are frozen we can manipulate them as much as we want We choose to manipulate the colors.

Micro controller manipulation

Software block diagram Video detector Video resolution Frame buffer Video generator Software Input video signals Manipula tion block output video signals

Software processing results Each switch is responsible for one processing

How does it looks Frames enter and leave the buffer Frames in Frames out Frame buffer

Freeze frame and process it Frames being processed in frame buffer Frames thrown away Frames out frozen Frame buffer

Proceed frames sent to display Frames thrown away Processed Frames frozen Frame buffer No frames in frame buffer

 Inside the frame buffer we have up to 7 frames, each frame consists of 1920X1080 pixels.  We iterate over the frames then we iterate over each pixel to manipulate its data.  We build 4 manipulations (one for each switch), the manipulations are closing each stream(cr,cb), switching between the streams and switching between the intensity and the color(y,cbcr). Software architecture

How does it look We have our frames inside the Frame buffer We iterate over each frame in series when frame is selected it is sent to the manipulation process

Manipulation process We iterate over each pixel in the manipulation process *note: picture not in scale

Pixel manipulation Our 16 bit pixel representation 8 bit Y intensity channel 4 bit Cb color channel 4 bit Cr color channel At this point we have full access to the pixel information and we can manipulate the pixel by changing the information represented in these bits

1 st manipulation Our 16 bit pixel representation 8 bit Y intensity channel 4 bit Cb color channel 4 bit Cr color channel Cb channel all zeroes, y&Cr channel untouched

2 nd manipulation Our 16 bit pixel representation 8 bit Y intensity channel 4 bit Cb color channel 4 bit Cr color channel Cr channel all zeroes, y&Cb channel untouched

3 rd manipulation Our 16 bit pixel representation 8 bit Y intensity channel 4 bit Cb color channel 4 bit Cr color channel Cb and Cr channels are swapped, y channel untouched

4 th manipulation Our 16 bit pixel representation 8 bit Y intensity channel 4 bit Cb color channel 4 bit Cr color channel Cb&cr channel are swapped with y channel

 Our system clock MHZ  Axi4 bandwidth is 32 bit I/O  Pixel represented in luma (YCrCb) is 16 bit  So totally every clk cycle 2 pixels enter and leave the cpu  Our frame size is 1920X1080  So frame frequency is Processing speed

 Trying to do so is equally implementing a single core GPU  More elaborate explanation:  each frame has 1920X1080 pixels and the buffer has up to 15 frames, the minimum is 3 frames so total amount of pixels to manipulate 6e6  software processing is actually iterating over each pixels on every cycle so even without taking into consideration architecture penalties such as branch miss predictions on every end of loop(about 6e3 outer loops) and tacking into account multi scalar pipe we still have e6 iterations to do in short time. Impossible to manipulate video on the fly

 time between each and every two frames (1/150)sec  Our CPU works at ~1GHZ so each cycle takes ~1 nsec  Each iteration consists of at list 100 cycles (again assuming best case scenario that is all data and instruction in L1 cache for instance ), so manipulating the whole buffer will take almost 1sec while we have only 1/150 sec Impossible to manipulate video on the fly

 Can only manipulate single frame each time  Need to stop streaming  May be useful for some applications  For processing HDMI video on the fly we have only one solution: Use Hardware, FPGA Problems & solutions

HARDWARE

Hardware manipulation The hardware is able to achieve a real time manipulation of video streaming, the manipulation we choose to implement is color manipulation, which is based on our software manipulation.

 as we saw it is impossible to process video in real time with only software running on our micro controller so we have to use hardware In order to achieve our goal  we would like to perform similar manipulation, which we tried to do with our software, in our hardware  The place on the embedded system we choose for our hardware manipulation is the HDMI_OUT block Hardware manipulation

HDMI_OUT block manipulation

 Similar to the software in our hardware we also have access to the whole pixel information (its bits).  Our manipulation is done by changing the wiring of each pixel using 3 video processing algorithms  The algorithms are Luma2RGB,RGB2RGB grey scale and RGB2Luma  We can achieve any real time color processing due to the use of wiring manipulation. HDMI_OUT block manipulation

How does it look Received luma Received chroma Arriving pixels buffer block(HDMI_OUT) Signals in hexadecimal Luma data chroma data Video_data signal[0-15] arriving from axi block Video_data_d signal[0-15] leaving hdmi_out block

Adding manipulation Arriving pixels New buffer block(HDMI_OUT) Video_data signal[0-15] arriving from axi block Manipulation block buffer block(HDMI_OUT) Video_data_d signal[0-15] leaving hdmi_out block

Manipulation block Colored pixel Greyscale pixel luma2rgb rgb2luma rgb2rgb grey scale

We use the ordinary equations to perform the transformation :  R´ = 1.164(Y - 16) (Cr - 128)  G´ = 1.164(Y - 16) (Cr - 128) (Cb - 128)  B´ = 1.164(Y - 16) (Cb - 128) Luma 2 RGB

We use the ordinary equations to perform the transformation :  R = 0.3*R’ *G’ *B’  G = 0.3*R’ *G’ *B’  B = 0.3*R’ *G’ *B’ RGB 2 RGB grey scale

We use the ordinary equations to perform the transformation :  Y = 0.299*R *G *B  Cb = (B-Y)*  Cr = (R-Y)* RGB 2 Luma

Hardware processing results

Software VS. Hardware  Software:  Can only manipulate single frame each time.  Need to stop streaming.  Not a real time processing.  Hardware:  Can manipulate the arriving frames.  Don’t need to stop streaming.  A real time processing.

In our project we learned the correct way to work with embedded systems, the more powerful and less powerful sides of each component, for example we learned that for achieving real time processing of the data itself we must use the hardware components of our system, but for stream processing or data parking we can use our microcontroller and software processing In conclusion