Graphics on Key by Eyal Sarfati and Eran Gilat Supervised by Prof. Shmuel Wimer, Amnon Stanislavsky and Mike Sumszyk 1.

Slides:



Advertisements
Similar presentations
Nios Multi Processor Ethernet Embedded Platform Final Presentation
Advertisements

Real-Time Rendering 靜宜大學資工研究所 蔡奇偉副教授 2010©.
3D Graphics Content Over OCP Martti Venell Sr. Verification Engineer Bitboys.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
Graphics Pipeline.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
ECE 734: Project Presentation Pankhuri May 8, 2013 Pankhuri May 8, point FFT Algorithm for OFDM Applications using 8-point DFT processor (radix-8)
3D Graphics Rendering and Terrain Modeling
Computes the partial dot products for only the diagonal and upper triangle of the input matrix. The vector computed by this architecture is added to the.
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
3D Flight Simulator for CE Balaban Nir Lander Shiran Supervisor: Futerman Yan.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
Enhancing and Optimizing the Render Cache Bruce Walter Cornell Program of Computer Graphics George Drettakis REVES/INRIA Sophia-Antipolis Donald P. Greenberg.
Game Engine Design ITCS 4010/5010 Spring 2006 Kalpathi Subramanian Department of Computer Science UNC Charlotte.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Written by: Haim Natan Benny Pano Supervisor:
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
University of Texas at Austin CS 378 – Game Technology Don Fussell CS 378: Computer Game Technology Beyond Meshes Spring 2012.
OS Implementation On SOPC Final Presentation
Sub-Nyquist Sampling DSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina Rivkin, Moshe Mishali Winter 2010High Speed Digital.
USB host for web camera connection
Lab 2: Capturing and Displaying Digital Image
1.  Project Goals.  Project System Overview.  System Architecture.  Data Flow.  System Inputs.  System Outputs.  Rates.  Real Time Performance.
Technology and Historical Overview. Introduction to 3d Computer Graphics  3D computer graphics is the science, study, and method of projecting a mathematical.
By: Oleg Schtofenmaher Maxim Fudim Supervisor: Walter Isaschar Characterization presentation for project Winter 2007 ( Part A)
Elad Hadar Omer Norkin Supervisor: Mike Sumszyk Winter 2010/11, Single semester project. Date:22/4/12 Technion – Israel Institute of Technology Faculty.
MIT EECS 6.837, Durand and Cutler Graphics Pipeline: Projective Transformations.
CS 450: COMPUTER GRAPHICS REVIEW: INTRODUCTION TO COMPUTER GRAPHICS – PART 2 SPRING 2015 DR. MICHAEL J. REALE.
Advanced SW/HW Optimization Techniques for Application Specific MCSoC m Yumiko Kimezawa Supervised by Prof. Ben Abderazek Graduate School of Computer.
Towards the Design of Heterogeneous Real-Time Multicore System m Yumiko Kimezawa February 1, 20131MT2012.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
1 Computer Graphics Week2 –Creating a Picture. Steps for creating a picture Creating a model Perform necessary transformation Lighting and rendering the.
Computer Architecture System Interface Units Iolanthe II approaches Coromandel Harbour.
1 The Rendering Pipeline. CS788 Topic of HCI 2 Outline  Introduction  The Graphics Rendering Pipeline  Three functional stages  Example  Bottleneck.
COMPUTER GRAPHICS CSCI 375. What do I need to know?  Familiarity with  Trigonometry  Analytic geometry  Linear algebra  Data structures  OOP.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital.
1 by: Ilya Melamed Supervised by: Eyal Sarfati High Speed Digital Systems Lab.
1 Presented By: Eyal Enav and Tal Rath Eyal Enav and Tal Rath Supervisor: Mike Sumszyk Mike Sumszyk.
PROJECT - ZYNQ Yakir Peretz Idan Homri Semester - winter 2014 Duration - one semester.
Network On Chip Platform
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Jason Li Jeremy Fowers 1. Speedups and Energy Reductions From Mapping DSP Applications on an Embedded Reconfigurable System Michalis D. Galanis, Gregory.
Review on Graphics Basics. Outline Polygon rendering pipeline Affine transformations Projective transformations Lighting and shading From vertices to.
Subject Name: Computer Graphics Subject Code: Textbook: “Computer Graphics”, C Version By Hearn and Baker Credits: 6 1.
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
USB host for web camera connection Characterization presentation Presenters: Alexander Shapiro Sergey Alexandrov Supervisor: Mike Sumszyk High Speed Digital.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Advanced SW/HW Optimization Techniques for Application Specific MCSoC m Yumiko Kimezawa Supervised by Prof. Ben Abderazek Graduate School of Computer.
UW EXTENSION CERTIFICATE PROGRAM IN GAME DEVELOPMENT 2 ND QUARTER: ADVANCED GRAPHICS The GPU.
The Graphics Pipeline Revisited Real Time Rendering Instructor: David Luebke.
A 1.2V 26mW Configurable Multiuser Mobile MIMO-OFDM/-OFDMA Baseband Processor Motivations –Most are single user, SISO, downlink OFDM solutions –Training.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
Computer Graphics One of the central components of three-dimensional graphics has been a basic system that renders objects represented by a set of polygons.
PipeliningPipelining Computer Architecture (Fall 2006)
Applications and Rendering pipeline
GPU Architecture and Its Application
- Introduction - Graphics Pipeline
Graphics Processing Unit
3D Graphics Rendering PPT By Ricardo Veguilla.
The Graphics Rendering Pipeline
CS451Real-time Rendering Pipeline
Computer Graphics One of the central components of three-dimensional graphics has been a basic system that renders objects represented by a set of polygons.
Low cost FPGA implimentation of tracking system from USB to VGA
Presentation transcript:

Graphics on Key by Eyal Sarfati and Eran Gilat Supervised by Prof. Shmuel Wimer, Amnon Stanislavsky and Mike Sumszyk 1

Overview Motivation Algorithm Improvements Software simulation GPU VLSI Design GoK system design Challenges and contributions Summary Demo 2

Motivation GPU (Graphics Processing Unit) is the key for high- performance in graphics applications (games, flight simulations, virtual worlds, etc.) Mobile systems (e.g. cellphones, handheld devices…) lack a suitable GPU 3 GoK External GPU with a standard interface can significantly enhance graphic performance of systems with limited computing resources

Project Goal Develop a low-cost prototype which performs 3D animation and displays it on a 2D RGB screen. USB VGA GoK 4 Host Standard interface for data input/output Provides real time graphics processing to systems with limited computing resources

Project Stages Software Design Implementing algorithm in Matlab Simulation and analysis Adaptation of algorithm to hardware ASIC Design Architectural design Implementation in VHDL Synthesis and layout System Design Implementation of system blocks including SW and HW interfaces System integration System performance enhancement 5

Graphic Animation Elementary operations : Translation Rotation Scaling 6 3D Data Representation Series of triangles α β γ Each triangle is represented by:  3 vertices  3 RGB vectors  1 normal vector

Rendering Algorithm stages [Wimer] Rendering Algorithm stages [Wimer] Elementary transformations Four transformations are executed for every triangle: Three matrix multiplications for vertex co-ordinates One matrix multiplication for normal vector Projection of triangles on viewing plane Composed of 2 stages : Transformation from 3D to 2D (projection) Transformation from real co-ordinates to screen co-ordinates Determine potential triangle visibility Hidden triangles are discarded on the basis of their normal direction  This detection reduces the processed data by 50%

Algorithm Details Algorithm Details Determine projected triangle’s visibility Scan all points and compare their depth with depth of previously saved points Scan in 3D space using inverse transformation 8 II I Color of visible points Compute pixel color from the RGB vector and the current lighting vector Using mathematical average for all the pixels inside triangles rather than linear interpolation To increase efficiency : Split triangles Increase parallelism

MATLAB Simulation Matlab implementation of rendering algorithm [Wimer] 9 Run Time on Arm based processor : 16 seconds Run Time on Matlab based software : 1 hour

System Overview 10 GoK Concept USB VGA GoK Prototype Host

GPU Architecture Design Principles Design Goal: maximize throughput Use parallel architecture to overcome bottlenecks Minimize expensive memory accesses Optimize accuracy for fast calculations 11

Prefetch & Visibility Detection Unit 3D Transformation Unit Triangle pre-processor FIFO task queue Rasterization 10 Rasterization 1 Rasterization 0 Scheduler Unit Z-Buffer Arbiter Snooping Cache RGB Arbiter Snooping Cache Triangles RGB Frame Z-Buffer GPU Architecture Z RGBRGBRGBRGB

Sort Coordinates according to y axis Triangle slopes calculation Create 2 half triangles D calculation FIFO -1 / C RGB Color Set Vertex / Normal Transform Project Triangle Transformation and Pre-processor 13 3D Transformation Unit Triangle pre-processor Note : Early elimination of invisible triangles reduces load by 50% !

Prefetch & Visibility Detection Unit 3D Transformation Unit Triangle pre-processor FIFO task queue Rasterization 10 Rasterization 1 Rasterization 0 Scheduler Unit Z-Buffer Arbiter Snooping Cache RGB Arbiter Snooping Cache Triangles RGB Frame Z-Buffer GPU Architecture Z RGBRGBRGBRGB

FIFO Task Queue Stalls input stream to prevent overflow by means of a backward communication protocol Backwards communication permeable to the Prefetch and Visibility Detection Unit 15 Triangle pre- processor FIFO task queue Scheduler Unit Target : Maximize throughput  Minimize idle time of rasterization units  Immediately issue next half triangle for processing upon completion of processing previous triangle FIFO task queue Rasterization 10 Rasterization 1 Rasterization 0 Scheduler Unit

Prefetch & Visibility Detection Unit 3D Transformation Unit Triangle pre-processor FIFO task queue Rasterization 10 Rasterization 1 Rasterization 0 Scheduler Unit Z-Buffer Arbiter Snooping Cache RGB Arbiter Snooping Cache Triangles RGB Frame Z-Buffer GPU Architecture Z RGBRGBRGBRGB

Rasterization Units For each point of each half triangle: 1. Calculate the new Z value 2. Read the stored Z value and compare it with the calculated one 3. Update both the Z-Buffer and RGB Frame Buffer accordingly 17 Rasterization 10 Rasterization 1 Rasterization 0 Scheduler Unit Z-Buffer Arbiter Snooping Cache RGB Arbiter Snooping Cache

Multi Core Architecture Problem 18 Multi core architecture with shared memory must cope with: 1. Efficient management of multiple requests to the shared memory 2. Guaranteeing data coherency  Solution : Arbiter Snooping Multi Cache Rasterization 10 Rasterization 1 Rasterization 0 RGB Frame Z-Buffer Z RGBRGBRGBRGB

Arbiter Snooping Multi Cache (ASMC) Reduce memory access time   Cache memory Simultaneous multiple memory access requests   Arbiter for efficient memory access management Data Coherency   Add Snooping mechanism to cache to guarantee data coherency Shared Memory 19 Rasterization 10 Rasterization 1 Rasterization 0 Snooping Multi - Cache Arbiter Deadlock Using Snooping mechanism Using Watchdog mechanism

GPU ASIC Implementation 20 Technology : 65ns CMOS 8LM Clock frequency : 300Mhz Core area : 2.25 mm 2 Power consumption : Approx. 300Mhz USB Host can supply up to 400mW

GoK System Requirements Input: The data is sent by the host to the GoK in two stages: Initialization : a list of triangles are sent to the GoK Animation : a transformation for all triangles is sent to the GoK every 40 msec (25 FPS) Output: Real-time object animation at : x120 pixels resolution ,000 triangles/sec frames/sec 21

FPGA USB System Overview - SoPC System Controller Communication Bus USB Controller Memory Controller VGA Controller 22 ASMCProcessor GPU Host GPU

Summary 23

Challenges Matlab implementation and simulation for detailed investigation and evaluation of algorithm VLSI design and implementation of an efficient architecture (with maximum parallelism) for GPU algorithm Real-time embedded system design on FPGA NIOS II, USB1.1, DDR2, VGA, Avalon Bus, Software drivers & code GPU integration in the system Modification of USB1.1 driver for acceptable reliability of data transfer Modification of standard VGA interface core to enable 100Mhz GPU core to interface with 50Mhz VGA unit 24

Main Contributions Enhancement of algorithm for increased performance Early elimination of invisible triangles - 50% computation reduction Splitting of triangles to reduce computation complexity and increase parallelism Simplification of pixel color computation Pre-process the triangles data for fast rasterization computation Efficient scheduling of half triangles to rasterization units Design and implementation of arbiter snooping multi cache Shared memory management, cache memory, data coherency Double memory buffer for continuous motion of animation 25

The Bottom Line Implementation of a “Graphics on Key” that enhances the graphic performance of low power, low cost gadgets The device performs the required computations and displays the animation on screen Project required specifications : 120, X120 resolution. 26 Achieved performance : 1,000, X480 resolution. Approx. 50Mhz

Demonstration 27