Download presentation
Presentation is loading. Please wait.
Published byCollin Walters Modified over 9 years ago
1
Graphics on Key by Eyal Sarfati and Eran Gilat Supervised by Prof. Shmuel Wimer, Amnon Stanislavsky and Mike Sumszyk 1
2
Overview Motivation Algorithm Improvements Software simulation GPU VLSI Design GoK system design Challenges and contributions Summary Demo 2
3
Motivation GPU (Graphics Processing Unit) is the key for high- performance in graphics applications (games, flight simulations, virtual worlds, etc.) Mobile systems (e.g. cellphones, handheld devices…) lack a suitable GPU 3 GoK External GPU with a standard interface can significantly enhance graphic performance of systems with limited computing resources
4
Project Goal Develop a low-cost prototype which performs 3D animation and displays it on a 2D RGB screen. USB VGA GoK 4 Host Standard interface for data input/output Provides real time graphics processing to systems with limited computing resources
5
Project Stages Software Design Implementing algorithm in Matlab Simulation and analysis Adaptation of algorithm to hardware ASIC Design Architectural design Implementation in VHDL Synthesis and layout System Design Implementation of system blocks including SW and HW interfaces System integration System performance enhancement 5
6
Graphic Animation Elementary operations : Translation Rotation Scaling 6 3D Data Representation Series of triangles α β γ Each triangle is represented by: 3 vertices 3 RGB vectors 1 normal vector
7
Rendering Algorithm stages [Wimer] Rendering Algorithm stages [Wimer] Elementary transformations Four transformations are executed for every triangle: Three matrix multiplications for vertex co-ordinates One matrix multiplication for normal vector 7 1 2 Projection of triangles on viewing plane Composed of 2 stages : Transformation from 3D to 2D (projection) Transformation from real co-ordinates to screen co-ordinates Determine potential triangle visibility Hidden triangles are discarded on the basis of their normal direction This detection reduces the processed data by 50%
8
Algorithm Details Algorithm Details Determine projected triangle’s visibility Scan all points and compare their depth with depth of previously saved points Scan in 3D space using inverse transformation 8 II I Color of visible points Compute pixel color from the RGB vector and the current lighting vector Using mathematical average for all the pixels inside triangles rather than linear interpolation To increase efficiency : Split triangles Increase parallelism
9
MATLAB Simulation Matlab implementation of rendering algorithm [Wimer] 9 Run Time on Arm based processor : 16 seconds Run Time on Matlab based software : 1 hour
10
System Overview 10 GoK Concept USB VGA GoK Prototype Host
11
GPU Architecture Design Principles Design Goal: maximize throughput Use parallel architecture to overcome bottlenecks Minimize expensive memory accesses Optimize accuracy for fast calculations 11
12
Prefetch & Visibility Detection Unit 3D Transformation Unit Triangle pre-processor FIFO task queue Rasterization 10 Rasterization 1 Rasterization 0 Scheduler Unit Z-Buffer Arbiter Snooping Cache RGB Arbiter Snooping Cache Triangles RGB Frame Z-Buffer GPU Architecture Z RGBRGBRGBRGB
13
Sort Coordinates according to y axis Triangle slopes calculation Create 2 half triangles D calculation FIFO -1 / C RGB Color Set Vertex / Normal Transform Project Triangle Transformation and Pre-processor 13 3D Transformation Unit Triangle pre-processor Note : Early elimination of invisible triangles reduces load by 50% !
14
Prefetch & Visibility Detection Unit 3D Transformation Unit Triangle pre-processor FIFO task queue Rasterization 10 Rasterization 1 Rasterization 0 Scheduler Unit Z-Buffer Arbiter Snooping Cache RGB Arbiter Snooping Cache Triangles RGB Frame Z-Buffer GPU Architecture Z RGBRGBRGBRGB
15
FIFO Task Queue Stalls input stream to prevent overflow by means of a backward communication protocol Backwards communication permeable to the Prefetch and Visibility Detection Unit 15 Triangle pre- processor FIFO task queue Scheduler Unit Target : Maximize throughput Minimize idle time of rasterization units Immediately issue next half triangle for processing upon completion of processing previous triangle FIFO task queue Rasterization 10 Rasterization 1 Rasterization 0 Scheduler Unit
16
Prefetch & Visibility Detection Unit 3D Transformation Unit Triangle pre-processor FIFO task queue Rasterization 10 Rasterization 1 Rasterization 0 Scheduler Unit Z-Buffer Arbiter Snooping Cache RGB Arbiter Snooping Cache Triangles RGB Frame Z-Buffer GPU Architecture Z RGBRGBRGBRGB
17
Rasterization Units For each point of each half triangle: 1. Calculate the new Z value 2. Read the stored Z value and compare it with the calculated one 3. Update both the Z-Buffer and RGB Frame Buffer accordingly 17 Rasterization 10 Rasterization 1 Rasterization 0 Scheduler Unit Z-Buffer Arbiter Snooping Cache RGB Arbiter Snooping Cache
18
Multi Core Architecture Problem 18 Multi core architecture with shared memory must cope with: 1. Efficient management of multiple requests to the shared memory 2. Guaranteeing data coherency Solution : Arbiter Snooping Multi Cache Rasterization 10 Rasterization 1 Rasterization 0 RGB Frame Z-Buffer Z RGBRGBRGBRGB
19
Arbiter Snooping Multi Cache (ASMC) Reduce memory access time Cache memory Simultaneous multiple memory access requests Arbiter for efficient memory access management Data Coherency Add Snooping mechanism to cache to guarantee data coherency Shared Memory 19 Rasterization 10 Rasterization 1 Rasterization 0 Snooping Multi - Cache Arbiter Deadlock Using Snooping mechanism Using Watchdog mechanism
20
GPU ASIC Implementation 20 Technology : 65ns CMOS 8LM Clock frequency : 300Mhz Core area : 2.25 mm 2 Power consumption : Approx. 130mW @ 300Mhz USB Host can supply up to 400mW
21
GoK System Requirements Input: The data is sent by the host to the GoK in two stages: 1. 1. Initialization : a list of triangles are sent to the GoK 2. 2. Animation : a transformation for all triangles is sent to the GoK every 40 msec (25 FPS) Output: Real-time object animation at : 1. 1. 160x120 pixels resolution 2. 2. 120,000 triangles/sec 3. 3. 25 frames/sec 21
22
FPGA USB System Overview - SoPC System Controller Communication Bus USB Controller Memory Controller VGA Controller 22 ASMCProcessor GPU Host GPU
23
Summary 23
24
Challenges Matlab implementation and simulation for detailed investigation and evaluation of algorithm VLSI design and implementation of an efficient architecture (with maximum parallelism) for GPU algorithm Real-time embedded system design on FPGA NIOS II, USB1.1, DDR2, VGA, Avalon Bus, Software drivers & code GPU integration in the system Modification of USB1.1 driver for acceptable reliability of data transfer Modification of standard VGA interface core to enable 100Mhz GPU core to interface with 50Mhz VGA unit 24
25
Main Contributions Enhancement of algorithm for increased performance Early elimination of invisible triangles - 50% computation reduction Splitting of triangles to reduce computation complexity and increase parallelism Simplification of pixel color computation Pre-process the triangles data for fast rasterization computation Efficient scheduling of half triangles to rasterization units Design and implementation of arbiter snooping multi cache Shared memory management, cache memory, data coherency Double memory buffer for continuous motion of animation 25
26
The Bottom Line Implementation of a “Graphics on Key” that enhances the graphic performance of low power, low cost gadgets The device performs the required computations and displays the animation on screen Project required specifications : 120,000 triangles/sec @ 160X120 resolution. 26 Achieved performance : 1,000,000 triangles/sec @ 640X480 resolution. Approx. 25mW @ 50Mhz
27
Demonstration 27
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.