Download presentation
Presentation is loading. Please wait.
1
Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman Pat Hanrahan February 10th, 2003
2
February 11th, 20042 Brook: general purpose streaming language developed for PCA Program/Merrimac –compiler: RStream Reservoir Labs –DARPA PCA Program Stanford: SmartMemories UT Austin: TRIPS MIT: RAW –Brook version 0.2 spec: http://merrimac.stanford.edu –Brook for GPUs: http://brook.sourceforce.net Stream Execution Unit Stream Register File Memory System Network Interface Scalar Execution Unit text DRDRAM Network
3
February 11th, 20043 Brook: general purpose streaming language stream programming model –enforce data parallel computing streams –encourage arithmetic intensity kernels C with streams
4
February 11th, 20044 Brook for gpus demonstrate gpu streaming coprocessor –make programming gpus easier hide texture/pbuffer data management hide graphics based constructs in CG/HLSL hide rendering passes virtualize resources –performance! … on applications that matter –highlight gpu areas for improvement features required general purpose stream computing
5
February 11th, 20045 system outline.br Brook source files brcc source to source compiler brt Brook run-time library
6
February 11th, 20046 Brook language streams streams –collection of records requiring similar computation particle positions, voxels, FEM cell, … float3 positions ; float3 velocityfield ; – encourage data parallelism
7
February 11th, 20047 Brook language kernels kernels –functions applied to streams similar to for_all construct kernel void foo (float a<>, float b<>, out float result<>) { result = a + b; } float a ; float b ; float c ; foo(a,b,c); for (i=0; i<100; i++) c[i] = a[i]+b[i]; – no dependencies between stream elements encourage high arithmetic intensity
8
February 11th, 20048 Brook language kernels Ray Triangle Intersection kernel void krnIntersectTriangle(Ray ray<>, Triangle tris[], RayState oldraystate<>, GridTrilist trilist[], out Hit candidatehit<>) { float idx, det, inv_det; float3 edge1, edge2, pvec, tvec, qvec; if(oldraystate.state.y > 0) { idx = trilist[oldraystate.state.w].trinum; edge1 = tris[idx].v1 - tris[idx].v0; edge2 = tris[idx].v2 - tris[idx].v0; pvec = cross(ray.d, edge2); det = dot(edge1, pvec); inv_det = 1.0f/det; tvec = ray.o - tris[idx].v0; candidatehit.data.y = dot( tvec, pvec ) * inv_det; qvec = cross( tvec, edge1 ); candidatehit.data.z = dot( ray.d, qvec ) * inv_det; candidatehit.data.x = dot( edge2, qvec ) * inv_det; candidatehit.data.w = idx; } else { candidatehit.data = float4(0,0,0,-1); }
9
February 11th, 20049 Brook language additional features reductions –scalar –stream stride & repeat GatherOp & ScatterOp –a[i] += p –p = a[i]++
10
February 11th, 200410 brcc compiler infrastructure based on ctool –http://ctool.sourceforge.net parser –build code tree –extend C grammar to accept Brook convert –tree transformations codegen –generate cg & hlsl code –call cgc, fxc –generate stub function
11
February 11th, 200411 Applications Ray-tracer FFT Segmentation Linear Algebra: –BLAS, LINPACK, LAPACK
12
February 11th, 200412 Brook Performance
13
February 11th, 200413 GPU Gotchas Time Registers Used
14
February 11th, 200414 GPU Gotchas NVIDIA NV3x: Register usage vs. Time Time Registers Used
15
February 11th, 200415 GPU Gotchas NVIDIA: Register Penalty Render to Texture Limitation –Requires explicit copy or heavy pbuffer solution –Superbuffer extension needed http://mirror.ati.com/developer/SIGGRAPH03/Percy_OpenGL_Extensions SIG03.pdf
16
February 11th, 200416 GPU Gotchas ATI Radeon 9800 Pro Limited dependent texture lookup 96 instructions 24-bit floating point –s16e7 Integers up to 131,072 (s23e8: 16,777,216) Memory Refs Math Ops Memory Refs Math Ops Memory Refs Math Ops Memory Refs Math Ops 1 1 2 2 3 3 4 4
17
February 11th, 200417 GPU Catch-Up! Integer & Bit Ops & Double Precision Memory Addressing CGC/FXC Performance –Hand code performance critical code No native reduction support No native scatter support –p[i] = a (indirect write) No programmable blend –GatherOp / ScatterOp Limited 4x4 output –Brook virtualized kernel outputs Readback still slow –NV35 OpenGL: 600 MB/sec Download 170 MB/sec Readback –ATI DirectX: 550 MB/sec Download 50 MB/sec Readback
18
February 11th, 200418 GPUs of the future (we hope) Complete Instruction Sets –Integers, Bit Ops, Doubles, Mem Access Integration –Streaming coprocessor not just a rendering device Streaming architectures SDRAM Stream Register File ALU Cluster
19
February 11th, 200419 Brook for GPUs Release v0.3 available on Sourceforge Project Page –http://graphics.stanford.edu/projects/brook Source –http://www.sourceforge.net/projects/brook Over 4K downloads! Questions? Fly-fishing fly images from The English Fly Fishing ShopThe English Fly Fishing Shop
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.