Petri Nordlund Chief Architect Bitboys Oy

Slides:



Advertisements
Similar presentations
3D Graphics Content Over OCP Martti Venell Sr. Verification Engineer Bitboys.
Advertisements

Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
GPU and PC System Architecture UC Santa Cruz BSoE – March 2009 John Tynefield / NVIDIA Corporation.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
G30™ A 3D graphics accelerator for mobile devices Petri Nordlund CTO, Bitboys Oy.
WSCG 2007 Hardware Independent Clipmapping A. Seoane, J. Taibo, L. Hernández, R. López, A. Jaspe VideaLAB – University of A Coruña (Spain)
System Design Tricks for Low-Power Video Processing Jonah Probell, Director of Multimedia Solutions, ARC International.
Real-Time Rendering TEXTURING Lecture 02 Marina Gavrilova.
Rage Fury MAXX™. The Answer to today’s 3D dilemma High performance AND High quality AND Universal application acceleration.
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
X86 and 3D graphics. Quick Intro to 3D Graphics Glossary: –Vertex – point in 3D space –Triangle – 3 connected vertices –Object – list of triangles that.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
Status – Week 283 Victor Moya. 3D Graphics Pipeline Akeley & Hanrahan course. Akeley & Hanrahan course. Fixed vs Programmable. Fixed vs Programmable.
Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.
CPU Chips The logical pinout of a generic CPU. The arrows indicate input signals and output signals. The short diagonal lines indicate that multiple pins.
PlayStation 2 Architecture Irin Jose Farid Momin Quy Ngo Olivia Wong.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Raghu Machiraju Slides: Courtesy - Prof. Huamin Wang, CSE, OSU
COOL Chips IV A High Performance 3D Graphics Rasterizer with Effective Memory Structure Woo-Chan Park, Kil-Whan Lee*, Seung-Gi Lee, Moon-Hee Choi, Won-Jong.
* Definition of -RAM (random access memory) :- -RAM is the place in a computer where the operating system, application programs & data in current use.
Xilinx at Work in Hot New Technologies ® Spartan-II 64- and 32-bit PCI Solutions Below ASSP Prices January
Background image by chromosphere.deviantart.com Fella in following slides by devart.deviantart.com DM2336 Programming hardware shaders Dioselin Gonzalez.
© Copyright Khronos Group, Page 1 Harnessing the Horsepower of OpenGL ES Hardware Acceleration Rob Simpson, Bitboys Oy.
Next-Generation Consoles Brenden Schubert & Nathaniel Williams With a Special talk by John F. Rhoads on Video Game Ethics.
® GDC’99 Performance Tuning with Intel ® Graphics Tools Larry Wickstrom Sr. Software Engineer Judith Stanley Application Engineer Intel Corporation March.
A Sorting Classification of Parallel Rendering Molnar et al., 1994.
6/4/ dfx Proprietary and Confidential 6/4/ dfx Rampage.
Xbox MB system memory IBM 3-way symmetric core processor ATI GPU with embedded EDRAM 12x DVD Optional Hard disk.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
© Copyright 3Dlabs, Page 1 - PROPRIETARY & CONFIDENTIAL Virtual Textures Texture Management in Silicon Chris Hall Director, Product Marketing 3Dlabs.
Advanced Rendering Technology The AR250 A New Architecture for Ray Traced Rendering.
“With 1 MB RAM, we had a memory capacity which will NEVER be fully utilized” - Bill Gates.
Computer Graphics Lecture 03 Graphics Systems Cont… Taqdees A. Siddiqi
University of Tehran 1 Microprocessor System Design Omid Fatemi.
ALPHA 21164PC. Alpha 21164PC High-performance alternative to a Windows NT Personal Computer.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
Computer Engg, IIT(BHU)
Computer Graphics Graphics Hardware
Itanium® 2 Processor Architecture
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Video RAM Presented by GHOLAMREZA KAKAMANSHADI
Petri Nordlund Chief Architect Bitboys Oy
Types of RAM (Random Access Memory)
IEEE 1394, USB, and AGP High Speed Transfer
2D GPU Platform with Hardware-Accelerated Features
Computer Graphics Lecture 32
Discovering Computers 2011: Living in a Digital World Chapter 4
Lynn Choi School of Electrical Engineering
What happens inside a CPU?
Deferred Lighting.
Cache Memory Presentation I
Chapter III Desktop Imaging Systems & Issues
The Graphics Rendering Pipeline
Understanding Theory and application of 3D
Milad Hashemi, Onur Mutlu, Yale N. Patt
Lecture: DRAM Main Memory
Understanding Theory and application of 3D
Graphics Processing Unit
Graphics Hardware: Specialty Memories, Simple Framebuffers
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Computer Graphics Graphics Hardware
Computer Evolution and Performance
Direct Rambus DRAM (aka SyncLink DRAM)
RADEON™ 9700 Architecture and 3D Performance
Graphics Processing Unit
ADSP 21065L.
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

Petri Nordlund Chief Architect Bitboys Oy (pnord@bitboys.fi) Glaze3DÔ Petri Nordlund Chief Architect Bitboys Oy (pnord@bitboys.fi) Bitboys Oy

Introduction Glaze3DÔ is a new consumer-level 2D/3D-graphics accelerator chip Fillrate: 1200 million texels / second Designed and developed by Bitboys Oy, a Finnish 3D-graphics hardware company Uses Infineon Technologies’ 0.20 m eDRAM process 9 MB of embedded framebuffer memory, 128 MB (max) of external video memory Bitboys Oy

Design goals Traditional, proven rendering architecture PC’99, Microsoft Windows, Direct3D and OpenGL compatibility Multi-chip support, two- and four-chip configurations Support additional geometry processor, also in multi-chip configurations Takes full advantage of embedded DRAM Small and efficient rendering core required, embedded DRAM in Glaze3D takes most of the available silicon Bitboys Oy

Performance Quad-pixel pipeline @ 150 MHz 600 million pixels / second (dual textured) 1200 million texels / second 4.5 million fully featured triangles / second (sustained) Cycle-accurate, bit-accurate simulator together with in-house developed PCIBuilder allows performance tuning with real-world applications (Quake III Arena, Viewperf) Bitboys Oy

Performance Texture cache: 16 KB cache for even mipmap levels and surface textures, 8 KB cache for odd mipmap levels and lightmaps. Both caches two-way set associative. Block coverage issue - 4-pixel horizontal blocks, expect 90% coverage with average-size triangles Quake III arena - 200 FPS with all features on @ 800x600x32 400 MPIX/s 350.000 drawn triangles/s 3.5 MB of textures / frame, 670 MB/s of texture bandwidth Depth complexity of 4 Bitboys Oy

Features 4 simultaneous textures with trilinear filtering DXTC texture compression Full-scene, order independent anti-aliasing Environment bump mapping GDI+ features Multiple scaled transparent video overlays Digital flat-panel and TV-out support Bitboys Oy

The Glaze3DÔ chip 304 pin BGA 1.5M logic gates 130 mm2 die size External SDR SDRAM interface depth and/or color buffer stored here in higher resolutions max 128 MB 64- or 128-bit interface PCI and 2X/4X AGP interfaces: AGP interface supports direct AGP texturing Bitboys Oy

Glaze3D architecture Bitboys Oy

Triangle setup engine Bitboys Oy

Pixel pipeline Bitboys Oy

Why embedded DRAM? Graphics accelerator needs GB/s of memory bandwidth, to render at 600 MPIX/s at true color and 32-bit Z, 7.2 GB/s of memory bandwidth is required External memory can no longer provide enough bandwidth for future graphics accelerators Cost-efficient - less chips on board Reduced power consumption Customized size - we needed exactly 9 MB (= 72 Mbits) Customized organization in terms of bus width, banks, etc. Bitboys Oy

Cell-concepts: Trench versus stack competitor’s HSG block stacked cell (hard to add multi- level metallization) Infineon’s trench capacitor cell (ideally suited for adding multi-level metallization) metal 1 bitline with BL contact bitline Si surface Si surface trench capacitor The trench technology combined with CMP (chemical mechanical polishing) techniques gives the advantage of being able to deposit the logic metallization onto a globally planar surface. Bitboys Oy

Embedded DRAM 72 Mbits (9 MB) of eDRAM 150 MHz core/memory clock 9.6 GB/s memory bandwidth 512-bit interface divided into four 18 Mbit modules of 3 banks each Stores framebuffer and Z buffer - enough for 1024x768x32 bit Wide internal buses, need lots of metal layers! Bitboys Oy

Multichip configurations Custom bus interface built into Glaze3DÔ, a cost effective multi-chip solution Thor is a geometry processor The monster configuration is capable of 2400 MPIX/s, 10M triangles/s sustained - 4.8 gigatexels/s. Target markets are: PC desktop high-end Arcade systems Bitboys Oy

Tiled rendering order Full linear framebuffer in video memory but primitives rendered as tiles instead of scanlines Framebuffer is divided into tiles (16x16, 32x32, 64x64) SLI is not sufficient - trashes texture caches! In a four chip configuration, one chip renders 1/4th of the tiles A Glaze3DÔ-rendering chip ignores the primitive if it doesn’t fall into one of the tiles this chip renders Framebuffer split between the rendering chips - monster configuration has a 36 MB embedded FB Note: Traditional architecture, not like PowerVR or Talisman. Maximum resolution in the monster configuration is 2048x1536 with true color and 32-bit IEEE floating point depth buffer! Bitboys Oy

Key parameters for next technologies Technology C9DD1 C10DD0 C10DD1 feature size 0.20 µm 0.17 µm 0.15 µm 1Mb block size 0.64 mm² 0.38 mm² 0.30 mm² raw gate density 45 Kgates/mm² 90 Kgates/mm² ~115 Kgates/mm² max. clock rate 200 MHz 250 MHz 300 MHz bus width 512 bit 1024 bit 1024 bit max. bandwidth 12 GByte/s 32 GByte/s 37 GByte/s memory / logic on 150 mm² 100 Mbit 2.5 Mgates 140 Mbit 5 Mgates 180 Mbit 6.4 Mgates Bitboys Oy

Future Pump more and more triangles through the pipeline Critical: CPU - 3D-hardware interface, drivers Geometry processors, advanced geometry processing More pixels and texels Expect 8 gigatexels/s in 2001 48 GB/s of memory bandwidth - embedded DRAM is the only solution! More features per pixel better texture filtering (anisotropic for 2D only) programmability (procedural textures) realistic materials and surface properties Bitboys Oy

Thank you! Bitboys Oy