Download presentation
Presentation is loading. Please wait.
1
Status – Week 240 Victor Moya
2
Summary Post Geometry Pipeline. Post Geometry Pipeline. Rasterization. Rasterization. Triangle Setup. Triangle Setup. Triangle Traversal. Triangle Traversal. Interpolation. Interpolation. Current status. Current status.
3
Post Geometry Pipeline Divide by w? Divide by w? Clipping? Clipping? NVidia doesn’t seem to have geometric clipping. NVidia doesn’t seem to have geometric clipping. Alpha kill in NV2x for user clip planes. Alpha kill in NV2x for user clip planes. ATI seems to have geometric clipping. ATI seems to have geometric clipping. Proper user clipping. Proper user clipping. No support for transformed and lit vertex clipping. No support for transformed and lit vertex clipping. What do we do? What do we do?
4
Post Geometry Pipeline Clipping: Clipping: 6 frustum clip planes. 6 frustum clip planes. At least 6 user clip planes. At least 6 user clip planes. Hardware requeriments: Hardware requeriments: Plane – edge intersection (?). Plane – edge intersection (?). Generates new vertices (for triangles 1 or 2). Generates new vertices (for triangles 1 or 2). –Interpolate output attributes at the new vertex. Can generate new triangles (for triangles 1). Can generate new triangles (for triangles 1). –Affects primitive assembly. At least frustum clipping should be fast. At least frustum clipping should be fast.
5
Post Geometry Pipeline Viewport Transformation Viewport Transformation Delay to end of rasterization (at conversion from fixed point to float point fragment attributes). Delay to end of rasterization (at conversion from fixed point to float point fragment attributes). Use fixed point device coordinates [-1, 1] for rasterization. Use fixed point device coordinates [-1, 1] for rasterization. Rasterization. Rasterization.
6
MC StF StOC StC PA TS TT Int StL Shader 1 1 1 11 1 A*TL+L 2111 MC: Memory ControllerShader: Vertex Shader StF: Streamer FetchPA: Primitive Assembly StL: Streamer LoaderTS: Triangle Setup StOC: Streamer Output CacheTT: Triangle Traversal StC: Streamer CommitInt: Interpolation
7
Rasterization We can divide it in three phases: We can divide it in three phases: Setup. Setup. Calculate linear equation coefficients, start values and slopes. Calculate linear equation coefficients, start values and slopes. Perform area and face culling. Perform area and face culling. Traversal. Traversal. Traverse the triangle generating fragments inside the triangle. Traverse the triangle generating fragments inside the triangle. Clipping of fragments by frustum and user clip. Clipping of fragments by frustum and user clip. Interpolation. Interpolation. Interpolate all fragment attributes for the generated fragment. Interpolate all fragment attributes for the generated fragment.
9
Triangle Setup Use 2DH rasterization setup. Use 2DH rasterization setup. Create matrix (inverse or just adjoint matrix?) from the three vertex 2DH positions. Create matrix (inverse or just adjoint matrix?) from the three vertex 2DH positions. Calculate determinant. Calculate determinant. Cull for sign (face culling) and zero (zero area). Cull for sign (face culling) and zero (zero area). Send the edge equation coefficients or/and start and slope values to Triangle Traversal. Send the edge equation coefficients or/and start and slope values to Triangle Traversal. Optional: send other equations (1/w, clip planes, interpolators …). Optional: send other equations (1/w, clip planes, interpolators …).
10
Triangle Setup Adjoint rasterization matrix adj(M): Adjoint rasterization matrix adj(M): First level: 18 muls. First level: 18 muls. Second level: 9 adds. Second level: 9 adds. a 0 = y 1 w 2 – y 2 w 1 a 0 = y 1 w 2 – y 2 w 1 a 1 = y 2 w 0 – y 0 w 2 a 1 = y 2 w 0 – y 0 w 2 a 2 = y 0 w 1 – y 1 w 0 a 2 = y 0 w 1 – y 1 w 0 b 0 = x 2 w 1 – x 1 w 2 b 0 = x 2 w 1 – x 1 w 2 b 1 = x 0 w 2 – x 2 w 0 b 1 = x 0 w 2 – x 2 w 0 b 2 = x 1 w 0 – x 0 w 1 b 2 = x 1 w 0 – x 0 w 1 c 0 = x 1 y 2 – x 2 y 1 c 0 = x 1 y 2 – x 2 y 1 c 1 = x 2 y 0 - x 0 y 2 c 1 = x 2 y 0 - x 0 y 2 c 2 = x 0 y 1 – x 1 y 0 c 2 = x 0 y 1 – x 1 y 0
11
Triangle Setup Matrix determinant det(M): Matrix determinant det(M): 1 DP3: {w 0, w 1, w 2 } X {c 0, c 1, c 2 } 1 DP3: {w 0, w 1, w 2 } X {c 0, c 1, c 2 } Inverse matrix M -1 (not needed?): Inverse matrix M -1 (not needed?): First level: 1 reciproque: 1/det(M). First level: 1 reciproque: 1/det(M). Second level: 9 muls. Second level: 9 muls. Edge equations: Edge equations: M -1 rows. M -1 rows. E 0 = [a 0, b 0, c 0 ] E 0 = [a 0, b 0, c 0 ] E 1 = [a 1, b 1, c 1 ] E 1 = [a 1, b 1, c 1 ] E 2 = [a 2, b 2, c 2 ] E 2 = [a 2, b 2, c 2 ]
12
Triangle Setup 1/w equation: 1/w equation: Sum of rows (param vector {1, 1, 1}). Sum of rows (param vector {1, 1, 1}). Can be calculated as the sum of the edge equations. Can be calculated as the sum of the edge equations. Additional equations: Additional equations: param vector {u 0, u 1, u 2 } X M -1 : 3 DP3. param vector {u 0, u 1, u 2 } X M -1 : 3 DP3. Frustum/Viewport clip: Frustum/Viewport clip: D 0 = [1, 0, -x 0 ] D 0 = [1, 0, -x 0 ] D 1 = [-1, 0, x 0 + w] D 1 = [-1, 0, x 0 + w] D 2 = [0, 1, -y 0 ] D 2 = [0, 1, -y 0 ] D 3 = [0, -1, y 0 + h] D 3 = [0, -1, y 0 + h]
13
** * + + * * DP3
14
Triangle Traversal Different algorithms: Different algorithms: I don’t know which is better. I don’t know which is better. Scanline. Scanline. Centerline (PixelVision). Centerline (PixelVision). Tiled (Neon, McCormack). Tiled (Neon, McCormack). Incremental and Hierarchical Hilbert Order (McCool). Incremental and Hierarchical Hilbert Order (McCool). Others? Others?
15
Triangle Traversal Traversal algorithm effects: Traversal algorithm effects: Can improve the texture pattern access (Neon, Hilbert). Can improve the texture pattern access (Neon, Hilbert). Can improve framebuffer memory access (Neon). Can improve framebuffer memory access (Neon). Traversal algorithm requeriments: Traversal algorithm requeriments: Must produce at least 2x2 fragments per cycle or multiples (2 2x2 or 3 2x2, etc). Must produce at least 2x2 fragments per cycle or multiples (2 2x2 or 3 2x2, etc). Must be efficient and generate the less fragments outside the triangle. Must be efficient and generate the less fragments outside the triangle. Antialiasing? Antialiasing?
16
Triangle Traversal Uses edge equation coefficients and/or start and slope values calculated from then to walk the triangle. Uses edge equation coefficients and/or start and slope values calculated from then to walk the triangle. One ‘step’ per cycle. One ‘step’ per cycle. Fixed point arithmetic : integer addition. Fixed point arithmetic : integer addition. Requires to save state (2 to 3 saved states) or must use walk back (spends cycles). Requires to save state (2 to 3 saved states) or must use walk back (spends cycles). Tests (sign) the edge equations values at n positions per cycle. Tests (sign) the edge equations values at n positions per cycle. May test frustum and znear/zfar clip at the same time. May test frustum and znear/zfar clip at the same time.
17
Triangle Traversal Hardware requeriments: Hardware requeriments: Multiple fixed point adders. Multiple fixed point adders. Multiple sign testers. Multiple sign testers. Registers for current (at least 3 for each edge equation) and saved states. Registers for current (at least 3 for each edge equation) and saved states. Registers for edge slops/increments (as many as fragments generated per cycle and edge equations?). Registers for edge slops/increments (as many as fragments generated per cycle and edge equations?).
18
Traversal Algorithm + + + TEST
19
Interpolation. Using barycentric method: Using barycentric method: Use the edge equation result (McCool): Use the edge equation result (McCool): F 0 (x,y) = E 0 F 0 (x,y) = E 0 F 1 (x,y) = E 1 F 1 (x,y) = E 1 F 2 (x,y) = E 2 F 2 (x,y) = E 2 Calculate sum of edge equations at the fragment: Calculate sum of edge equations at the fragment: R’(x,y) = F 0 (x,y) + F 1 (x,y) + F 2 (x,y) R’(x,y) = F 0 (x,y) + F 1 (x,y) + F 2 (x,y) Calculate reciproque: Calculate reciproque: r = 1/R’(x,y) r = 1/R’(x,y) Interpolate attribute at the fragment: Interpolate attribute at the fragment: p k (x,y) = p k0 rF 0 (x,y) + p k1 rF 1 (x,y) + p k2 rF 2 (x,y) p k (x,y) = p k0 rF 0 (x,y) + p k1 rF 1 (x,y) + p k2 rF 2 (x,y)
20
Interpolation Alternative (Olano & Greer): Alternative (Olano & Greer): At setup: At setup: Use 2DH method and calculate coefficients for all the attributes. Use 2DH method and calculate coefficients for all the attributes. Calculate 1/w (sum of rows) coefficients. Calculate 1/w (sum of rows) coefficients. Requires a vector matrix mul per attribute. Requires a vector matrix mul per attribute. At traverse/interpolation: At traverse/interpolation: Interpolate 1/w and attributes using fixed point incremental arithmetic. Interpolate 1/w and attributes using fixed point incremental arithmetic. Calculate reciproque of 1/w. Calculate reciproque of 1/w. Mul interpolated attribute by reciproque of 1/w Mul interpolated attribute by reciproque of 1/w
21
Interpolation Barycentric coordinates (McCool): Barycentric coordinates (McCool): no cost at setup. no cost at setup. store the parameter values at the three triangle edges. store the parameter values at the three triangle edges. fixed: 1 addition, 1 reciproque and 3 muls fixed: 1 addition, 1 reciproque and 3 muls per parameter: 1 DP3. per parameter: 1 DP3. Interpolation using Olano & Greer: Interpolation using Olano & Greer: vector matrix mul at setup per parameter and 1/w: 3 DP3. vector matrix mul at setup per parameter and 1/w: 3 DP3. store current state and slope increment for all the parameters and 1/w. store current state and slope increment for all the parameters and 1/w. fixed: 1 addition, 1 reciproque fixed: 1 addition, 1 reciproque per parameter: 1 addition, 1 mul. per parameter: 1 addition, 1 mul.
22
Interpolation How many attributes/parameters can be interpolated per cycle? How many attributes/parameters can be interpolated per cycle? XBOX: XBOX: 5 interpolators? 5 interpolators? general interpolator: color diffuse + color specular (shared). general interpolator: color diffuse + color specular (shared). Texture interpolators: 4? Texture interpolators: 4? Note: each of those interpolators is for a 4D vector. Note: each of those interpolators is for a 4D vector.
23
VERTEX ATTRIBUTES +1/x * * * * * * + FRAGMENT ATTRIBUTES
24
Current status Implemented Primitive Assembly box (with trivial degenerate triangle rejection). Implemented Primitive Assembly box (with trivial degenerate triangle rejection). Added GPU_VERTEX_OUTPUT_ATTRIBUTE register. Added GPU_VERTEX_OUTPUT_ATTRIBUTE register. Boolean vector of MAX_VERTEX_ATTRIBUTES that stores if a vertex output register is written in the shader (and therefore must be transmited). Boolean vector of MAX_VERTEX_ATTRIBUTES that stores if a vertex output register is written in the shader (and therefore must be transmited). Now the transmission latency for vertex between the Shader and Streamer Commit and between Streamer Commit and Primitive Assembly is determined by the number of ouput attributes. Now the transmission latency for vertex between the Shader and Streamer Commit and between Streamer Commit and Primitive Assembly is determined by the number of ouput attributes.
25
Current Status Started Triangle Setup box and support classes. Started Triangle Setup box and support classes.
26
Current Status Comments: Comments: Streamer Loader to Shader transmission should also have transmission latency penalty? Streamer Loader to Shader transmission should also have transmission latency penalty? Where are stored the vertex output attributes? Where are stored the vertex output attributes? How many times we must pay the vertex transmission penalty? How many times we must pay the vertex transmission penalty?
27
Current Status Signal Analyzer: Signal Analyzer: Already works with large traces. Already works with large traces.
28
References Triangle Scan Conversion using 2D Homogeneous Coordinates, Marc Olano, Trey Greer. Triangle Scan Conversion using 2D Homogeneous Coordinates, Marc Olano, Trey Greer. Tiled Polygon Traversal Using Half- Plane Edge Functions, Joel McCormack, Robert McNamara. Tiled Polygon Traversal Using Half- Plane Edge Functions, Joel McCormack, Robert McNamara. Incremental and Hierarchical Hilber Order Edge Equation Polygon Rasterization, Michael D. McCool, Chris Wales, Kevin Moule. Incremental and Hierarchical Hilber Order Edge Equation Polygon Rasterization, Michael D. McCool, Chris Wales, Kevin Moule.
29
References A Parallel Algorithm for Polygon Rasterization, Juan Pineda. A Parallel Algorithm for Polygon Rasterization, Juan Pineda.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.