Download presentation
Presentation is loading. Please wait.
Published byStephen Charles Modified over 9 years ago
1
COOL Chips IV A High Performance 3D Graphics Rasterizer with Effective Memory Structure Woo-Chan Park, Kil-Whan Lee*, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung, Il-San Kim, Won-Ho Chun, Won-Suk Kim, Tack-Don Han, Moon-Key Lee, Sung-Bong Yang, and Shin-Dug Kim Media System Lab. Yonsei University Seoul, Korea E-mail : kiwh@kurene.yonsei.ac.kr
2
COOL Chips IV - 2 - Outline Introduction David Simulator High Performance 3D Graphics Rasterizer with Effective Memory Structure (David Rasterizer) Performance Analysis Conclusions
3
COOL Chips IV Introduction
4
COOL Chips IV 5th Year 1st Year 2nd Year 3rd Year ∼ Yearly Research Plan Study Simulation Environment 3D GA Simulator development Survey of Related Works Propose an Effective Architecture Performance Evaluation IP Co-Development High Performance Architecture Building international core IP Implementation of Prototype System Parallel Rendering Architecture Basic Environment & Research Technology Prevalent 3D Graphic Accelerator High Performance 3D Graphic Accelerator NRL Project Properties Technology Prevalent (15million textured polygons) Technology Prevalent (15million textured polygons) High Performance (25million textured polygons) High Performance (25million textured polygons) Institution for bringing up an excellent lab. with a core technology A Government-initiated Project Necessities Leadership of an advanced research area Synergy effect through research interchanges Title The Design of High Performance 3D Graphics Accelerator for Realistic Image
5
COOL Chips IV - 5 - Architecture Research The Design of A High Performance 3D Graphics Accelerator for Realistic Image & Building Core IPs The Design of A High Performance 3D Graphics Accelerator for Realistic Image & Building Core IPs Geometry Processing Unit, Rendering Unit Realization Mapping Unit P-M Architecture, Memory Architecture for 3D GA Cache & Memory Hierarchy Parallel 3D Rendering System Technology Prevalent 3D Graphics Accelerator High Performance 3D Graphics Accelerator Design Research Execution Model(VLIW, SIMD, RISC etc.), Control & Interface Appliance to other system library by implementing VHDL SW Research API & Rendering Algorithm Geometry Compression / Modeling Verification /Integration Construction of Simulation Environment & Verification by Simulation Prototype System
6
COOL Chips IV - 6 - Current Research Work Related Works & Basic Research Texture cache simulation for various cache architectures Texture Cache Sharing Memory bandwidth saving scheme for texture data Texture cache simulation for various cache architectures Texture Cache Sharing Memory bandwidth saving scheme for texture data Perspective texture mapping Efficient bump mapping Modified anti-aliasing execution model Perspective texture mapping Efficient bump mapping Modified anti-aliasing execution model Object-oriented rendering using the analytic model Effective reuse buffer for triangle mesh Order-independent transparency Object-oriented rendering using the analytic model Effective reuse buffer for triangle mesh Order-independent transparency Overlapped light geometry processing(OLGP) New method for topological compression Overlapped light geometry processing(OLGP) New method for topological compression Geometry Unit Rendering Unit Realization Unit Memory Unit High performance floating point adder/subtractor High performance floating point multiplier High performance floating point divider High performance floating point adder/subtractor High performance floating point multiplier High performance floating point divider Arithmetic Unit Major Research David Simulator Vertex data(x,y,z,w) Model-view Transform ( Lighting ) Clipping Viewport Transform Triangle Setup Projection Divide by w Texture Mapping Z-buffering Scan-conversion Bump map. Perspective Fog / Alpha blending Anti-aliasing Pixel data
7
COOL Chips IV David Simulator
8
COOL Chips IV - 8 - David Simulator Simulation Work Flow Model Data Parser Mesa Library Call Geometry Simulator Call FPU Instructions Vertex Buffer Performance Result Rasterizer Inst. & Data transfer & Store to Local memory Instruction Fetch Decode Execute #1 Write Back Execute #2 Execute #3 Write data to local memory Geometry Engine OpenGL format Setup pipeline Edge work pipeline Span processing MC for frame buffer access Texture cache Mapping unit (texture, bump, environment, displacement) MC for image map access Z Compare Color Blend Rasterizer Simulator Call Obj format Performance Result
9
COOL Chips IV - 9 - David Rasterizer Block Diagram Z-test pipelineMapping pipeline
10
COOL Chips IV High Performance 3D Graphics Rasterizer with Effective Memory Structure (David Rasterizer)
11
COOL Chips IV - 11 - Architectural features of David rasterizer Performing z-test pipeline before TBE(Texture, Bump, and Environment) mapping completion Saving memory bandwidth Solve the incosistency problem with tagging scheme for pixel cache Texture cache sharing with BE(Bump and Environment) mapping Efficient structure
12
COOL Chips IV - 12 - Rasterizer model : Neon, S3 Texture cache Texture read / filter Texture blend Pixel information memory Z read Z test Z write Alpha test Destination read Alpha blend Destination write memory Texture cache Z read Z test Z write Texture read / filter Texture blend Alpha test Destination read Alpha blend Destination write Pixel information Neon S3
13
COOL Chips IV - 13 - David Rasterizer memory Texture cache Tag test and Z read Z test Texture read / filter Texture blend Alpha test Destination read Alpha blend Destination write Pixel information Pixel cache Tag update and Z write wide separate
14
COOL Chips IV - 14 - Architecture Comparison Neon When is texture mapping performed? before Z test OpenGL semantics for perfectly transparent texture Support S3 after Z test Not support David after Z test Support Advantages Support OpenGL semantics No wasting bandwidth No fetching texture data that are obscured Simple scheme No wasting bandwidth No fetching texture data that are obscured Support OpenGL semantics Disadvantages Wasting bandwidth Unable to support OpenGL semantics Wide separation Inconsistency problem Solve it using additional flag bits in a pixel cache
15
COOL Chips IV - 15 - Texture Cache Sharing Texture Mapping #2 Texture Mapping #1 Cache DRAM Bump Mapping Environment Mapping Current 3D Architecture Texture Mapping #1 DRAM David 3D Architecture Shared H/W Bump Mapping Texture Mapping #2 Shared H/W Environment Mapping Shared Cache DRAM Current ArchitectureDavid Architecture Mapping Hardware Cache Size # of Read Port in Cache Independent H/W Shared H/W Reduce H/W cost (about 30%) Same 22 Throughput 1 Cycle Texture Mapping : 1 Cycle BE Mapping : 2 Cycles (infrequent operations) Features BE Mapping : DRAM Access BE Mapping : Cache Access Remove Pipeline Stalls due to DRAM Access
16
COOL Chips IV Performance Analysis
17
COOL Chips IV - 17 - Z Depth Complexity, # of Z Test Fails Environments for Performance Evaluation Model Data Mesa Library Call Miss Ratio, Performance Rasterizer OpenGL format Setup pipeline Edge work pipeline Span processing MC for frame buffer access Texture cache Mapping unit (texture, bump, environment, displacement) MC for image map access Z Compare Color Blend Rasterizer Simulator Call Trace Generation Texture Cache Simulator Pixel Cache Simulator
18
COOL Chips IV - 18 - Model Data Blocks Flarge LightscapeProCDRS Crystal Space (Game engine) SPECviewperf
19
COOL Chips IV - 19 - Bandwidth Saving in Texture Data(ProCDRS) Depth Complexity Bandwidth Saving Average 2.2221.55%
20
COOL Chips IV - 20 - Bandwidth Saving in Texture Data(Light) Depth Complexity Bandwidth Saving Average 2.3129.16%
21
COOL Chips IV - 21 - Bandwidth Saving in Texture Data(Blocks) Depth Complexity Bandwidth Saving Average 1.434.76%
22
COOL Chips IV - 22 - Bandwidth Saving in Texture Data(Flarge) Depth Complexity Bandwidth Saving Average 1.314.93%
23
COOL Chips IV Conclusions
24
COOL Chips IV - 24 - Conclusions Simulation Environment (David Simulator) Evaluation for 3D graphics accelerator architecture Performance comparison David Architecture Performing z-test before TBE mapping completion 5%~29% bandwidth savings for texture data in Scenes with 1~3 depth complexity As the depth complexity grows, the amount of bandwidth savings become large –Recently, a 3D graphic application shows high depth complexity Texture cache sharing with BE mapping Hardware saving from sharing and hardware reduction for BE mappings
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.