Image Compositing Hardware The Metabuffer: A Scalable Multiresolution Multidisplay 3-D Graphics System Using Commodity Rendering Engines Lightning-2: A.

Slides:



Advertisements
Similar presentations
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Advertisements

Graphics Pipeline.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
Computer Graphic Creator: Mohsen Asghari Session 2 Fall 2014.
Lecture 1 Computer Graphics Hardware Basic graphics hardware –Display devices –Video controller –Memory –CPU –System bus Graphics Hardware # 1 CG show.
Ray-casting in VolumePro™ 1000
Graphics Hardware and Software Architectures
Copyright© 2000 OPNET Technologies, Inc. R.W. Dobinson, S. Haas, K. Korcyl, M.J. LeVine, J. Lokier, B. Martin, C. Meirosu, F. Saka, K. Vella Testing and.
Router Architecture : Building high-performance routers Ian Pratt
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
Super Fast Camera System Performed by: Tokman Niv Levenbroun Guy Supervised by: Leonid Boudniak.
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
OUTLINE WHAT ? HOW ? WHY ? BLUEPOST Poster and Message Content Specified by the User Displaying the Poster Content on a Monitor Sending Messages to.
Parallel Rendering Ed Angel
1 Angel: Interactive Computer Graphics 4E © Addison-Wesley 2005 Models and Architectures Ed Angel Professor of Computer Science, Electrical and Computer.
Ch 1 Intro to Graphics page 1CS 367 First Day Agenda Best course you have ever had (survey) Info Cards Name, , Nickname C / C++ experience, EOS experience.
Remote Surveillance Vehicle Design Review By: Bill Burgdorf Tom Fisher Eleni Binopolus-Rumayor.
Parallel Graphics Rendering Matthew Campbell Senior, Computer Science
Router Architectures An overview of router architectures.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Presentation by David Fong
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Raghu Machiraju Slides: Courtesy - Prof. Huamin Wang, CSE, OSU
University of Texas at Austin CS 378 – Game Technology Don Fussell CS 378: Computer Game Technology Beyond Meshes Spring 2012.
Router Architectures An overview of router architectures.
Sort-Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays Paper by Kenneth Moreland, Brian Wylie, and Constantine Pavlakos Presented.
Under the Hood: 3D Pipeline. Motherboard & Chipset PCI Express x16.
Computer Organization CSC 405 Bus Structure. System Bus Functions and Features A bus is a common pathway across which data can travel within a computer.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
1.  Project Goals.  Project System Overview.  System Architecture.  Data Flow.  System Inputs.  System Outputs.  Rates.  Real Time Performance.
Parallel Rendering 1. 2 Introduction In many situations, standard rendering pipeline not sufficient ­Need higher resolution display ­More primitives than.
Elad Hadar Omer Norkin Supervisor: Mike Sumszyk Winter 2010/11, Single semester project. Date:22/4/12 Technion – Israel Institute of Technology Faculty.
Computer Graphics An Introduction. What’s this course all about? 06/10/2015 Lecture 1 2 We will cover… Graphics programming and algorithms Graphics data.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
A Sorting Classification of Parallel Rendering Molnar et al., 1994.
CS 450: COMPUTER GRAPHICS REVIEW: INTRODUCTION TO COMPUTER GRAPHICS – PART 2 SPRING 2015 DR. MICHAEL J. REALE.
Seminar II: Rendering Architectures Yan Cui Love Joy Mendoza Oscar Kozlowski John Tang.
LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R.
CSC 461: Lecture 3 1 CSC461 Lecture 3: Models and Architectures  Objectives –Learn the basic design of a graphics system –Introduce pipeline architecture.
OpenGL Conclusions OpenGL Programming and Reference Guides, other sources CSCI 6360/4360.
Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002.
Simply Gaming Final Project Project Leader: PJ Acevedo Fall 2009.
1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Parallel Rendering. 2 Introduction In many situations, a standard rendering pipeline might not be sufficient ­Need higher resolution display ­More primitives.
Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.
CDR- Digital Audio Recorder/Player Brian Cowdrey Mike Ingoldby Gaurav Raje Jeff Swetnam.
Jump to first page One-gigabit Router Oskar E. Bruening and Cemal Akcaba Advisor: Prof. Agarwal.
1 The Rendering Pipeline. CS788 Topic of HCI 2 Outline  Introduction  The Graphics Rendering Pipeline  Three functional stages  Example  Bottleneck.
Computer Graphics Chapter 6 Andreas Savva. 2 Interactive Graphics Graphics provides one of the most natural means of communicating with a computer. Interactive.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
PROJECT - ZYNQ Yakir Peretz Idan Homri Semester - winter 2014 Duration - one semester.
OpenGL Architecture Display List Polynomial Evaluator Per Vertex Operations & Primitive Assembly Rasterization Per Fragment Operations Frame Buffer Texture.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
CS 4396 Computer Networks Lab Router Architectures.
Parallel Rendering Ed Angel Professor Emeritus of Computer Science University of New Mexico 1 E. Angel and D. Shreiner: Interactive Computer Graphics 6E.
Subject Name: Computer Graphics Subject Code: Textbook: “Computer Graphics”, C Version By Hearn and Baker Credits: 6 1.
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
1cs426-winter-2008 Notes. 2 Atop operation  Image 1 “atop” image 2  Assume independence of sub-pixel structure So for each final pixel, a fraction alpha.
Chapter 1 Graphics Systems and Models Models and Architectures.
Electronic visualization laboratory, university of illinois at chicago Sort Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays.
GPU Architecture and Its Application
Graphics Processing Unit
Graphics Hardware: Specialty Memories, Simple Framebuffers
Graphics Processing Unit
Presentation transcript:

Image Compositing Hardware The Metabuffer: A Scalable Multiresolution Multidisplay 3-D Graphics System Using Commodity Rendering Engines Lightning-2: A High-Performance Display Subsystem for PC Clusters Scalable Interactive Volume Rendering Using Off-the-Shelf Components (Another exciting presentation) © 2002 Brenden Schubert Figure 1: The Title Slide

The Metabuffer Figure 2: The Metabuffer Overview Slide Sort-last parallel realtime rendering using COTS hardware Independently scalable in renderers and displays Any renderer viewport can map to any area of final composited image

System Architecture Figure 3: The Metabuffer System Architecture Slide [A] = COTS renderer [B] = onboard framebuffer {C} = compositing unit Not shown: compositing framebuffers used to drive displays

Data Flow Figure 4: The Metabuffer Data Flow Slide Viewport configuration kept track of on rendering nodes’ CPUs -Routing/Compositing info encoded in image Total data is proportional to input size

Bus Scheduling Figure 5: The Bus Scheduling Slide IRSA round robin Idle recovery slot allocation Allows for best overall use of bus among “composers” Composer vs. Compositor?

Metabuffer Operation Figure 6: The Metabuffer Operation Slide Frame Transition Waiting for PIPEREADY Buffers are filled Output framebuffers signal completion Compositor relays finish signal Master compositor signals start of frame Compositors in pipe begin frame Input framebuffer streams out data

Do we have interns at a major hardware manufacturer? Figure 7: The Slide with the humorous title Nope. Guess we’ll just have to simulate/emulate this Simulator Tries to accurately implement the architecture C++, multithreaded by object Emulator Tries to produce the same output that the Metabuffer would as fast as possible 128 node (PIII-800 GeForce2) Beowulf cluster

Other Things Figure 8: The Other Things Slide Software simulation uses raytracer to generate images and depthmap Antialiasing can be done with supersampling and filtering at the output framebuffers Foveated vision: multiple user view-tracking? Stanford views pi meeting High speed active network switch

F-22 Lightning-2 by Figure 9: The Joke Lightning-2 Overview Slide

Lightning-2 Figure 10: The Lightning-2 Overview Slide Hardware compositing switch for clusters DVI-to-DVI interface Scales independently in inputs and outputs Actually fabricated in hardware Costs a lot

Lightning-2 Architecture Figure 11: The First Lightning-2 Architecture Slide Connects to DVI-out from commodity graphics cards in cluster DVI captured by first column of modules and propogated across rows Compositing for each display output occurs down each column

L-2 Module Architecture Figure 12: The Second Lightning-2 Architecture Slide DVI inputs RS-232 back-channel DVI Compositing in/out Memory controller Double-buffered framebuffer (SDRAM)

Pixel Reorganization Figure 13: The Pixel Reorganization Slide Input frames arrive in input-space raster order Compositors operate in output-space raster order Must map input order to output order Forward-Mapping: -Pixels reordered on write -Pixels are stored in output-space raster order Reverse-Mapping: -Pixels reordered on read -Pixels are stored in input-space raster order

Input Figure 14: The Input Slide Three ways to get data rendered by cluster video cards: Read pixels back into system memory Use scan-converters and capturer Use DVI interface We’ll take #3 Thus need to transfer pixel mapping information via DVI Specified by encoding strip header in first two pixels of each scanline

Frame/Depth Transfer Figure 15: The Frame Transfer Slide Buffer swap on horizontal blanking decreases latency All information encoded in scanline header – no frame- specific dependencies Serial back-channel used to tell rendering nodes when previous frame has been received Introduces additional frame of latency since rendering nodes could be up to 1 frame out of sync Depth information must be read back to main memory and then copied to color buffer to send via DVI Introduces another frame of latency

Results Figure 16: The Lightning-2 Results Slide Frame Transfer Protocol Rendering node operates at 70Hz, Lightning-2 operates at 60Hz Allows 2.3ms for back-channel frame transfer signal Achieved >90% of the time (pesky os interference!) Depth Compositing Depth copy has constant cost of appx. 17ms At >4 nodes this takes up more than half of the frame time and begins to drop speedup below ideal

Sepia-2 Figure 17: The Sepia-2 Title Slide Custom PCI-card for parallel interactive volume rendering on a cluster Single-stage crossbar Supports blending and non- commutative compositing operations

Sepia Architecture Figure 18: The Sepia Architecture Slide Rendering accelerators (VolumePro 500 raycasters) Display device (standard OpenGL renderer) PCI Card - 1 for each rendering node and display device 3 FPGAs 2 ServerNet-2 gigabit network ports RAM buffers High-speed network switches

Sepia PCI Card Figure 19: The Sepia PCI Card Slide

Comparison Figure 20: The Sepia Comparison Slide Sepia-2 uses parallel blending operations Lightning-2 uses scan-line based pixel mapping MetaBuffer uses viewport transformations and depth compositing Sepia-2 scales linearly (inputs + outputs) using Clos topology networking Lightning-2 and MetaBuffer scale ~quadratically (inputs x outputs) using mesh topology

TeraVoxel Operation Figure 21: The TeraVoxel Operation Slide VolumePro generates viewpoint-dependent base plane (sub- volume image) Writes it to main memory BP is copied to larger-resolution SBP Sepia-2 cards each read SBP from local main memories Transmit over network to display node Display node performs compositing operations Display node textures result onto polygon to display using OpenGL

Sepia Firmware Figure 22: The Sepia Firmware Slide Need an associative blending operator to do raycasting in parallel “F” is proven to be associative and equivalent to the basic subvoxel blending operation Proof is elementary

Clos Topology Figure 23: The Clos Topology Slide Sepia-2 depends on full crossbar networking with linear scaling Recursive Clos topology is proven to provide this Read the 1953 Clos paper if you want to know more

Sepia Results Figure 24: The Sepia Results Slide Time spent in VolumePro calculation: 36-42ms Rest of calculation pipelined with no stage >36ms So optimal refresh rates are 24-28fps Copying subimage to memory: 5ms Transfer over network and blending calculation: 34ms Maximum wait for vertical refresh to display new frame: 17ms Worst-case combined latency: 102ms