Parallel Solution to the Radiative Transport

Parallel Solution to the Radiative Transport
EG PGV 2009 Szirmay-Kalos László Liktor Gábor Tamás Umenhoffer Tóth Balázs Glenn Lupton Kumar Shree The title of this paper is … It presents the work carried out jointly by the TU Budapest and HP. TU Budapest

Overview Radiative transport Challenges of parallel iteration
Our approach Initial estimation Modified iteration FCC GRID CUDA Results The simulation of the radiative transport means the solution of the global illumination problem in participating media, which is important not only in graphics, but also in engineering and medical simulation. In this talk, first the radiative transport is reviewed, and the challenges of the iterative solution are presented. Then, our solution is described, that uses a quick initial estimation and a modified iteration scheme to provide good scalability.

Radiative transport Out-scattering In-scattering Emission Absroption
screen Camera Radiative transport examines the change of radiation in participating media. This change is due to different factors, including Negative terms like Absorption when photons are absorbed, And Outscattering when the direction changes from the direction of the pixel And also positive terms like Inscattering when the direction changes to the direction of the pixel and emission. Putting these changes together, we obtain an integro-differential equation for the radiance of a ray. Outgoing radiancia: L(s+ds) Incident radiancia: L(s) path: ds Out-scattering In-scattering Emission Absroption

Solution methods Monte-Carlo simulation Iteration O(m -0.5)
Parallelization is trivial Iteration O(m) Parallelization is a challenge There are basically two alternatives to solve this differential equation, Monte Carlo method and iteration. The Monte Carlo approach samples light paths connecting the source to the eye via scattering points and obtain the final solution as the average of their contribution. As this approach generates sample paths independently, its parallelization is trivial and the solution would scale up linearly. However, due to the fact that this method generates sample paths independently, it cannot reuse information gathered with previous paths, thus the convergence is slow. In fact, as for all Monte Carlo methods, the error is inversely proportional to the square root of the sample number. It means that the reduction of the error to its tenth needs hundred times more samples, which is not very appealing. Iteration, on the other hand, represents the current radiances estimate using finite element techniques and refines it in each step. Thus, it can reuse previous information, and consequently converges with the speed of a geometric series. However, as iteration reuses the complete estimate of the previous step, its parallelization is non-trivial and to obtain a scalable algorithm is a challenge.

||Ln-L||<n ||L0-L||
Iteration Finite-element approaches (grid) Iteratively refines the estimation Error depends on the initial guess L = TL + Q Ln= TLn-1 + Q ||Ln-L||<n ||L0-L|| We like challenges and prefer algorithms with good convergence speed, thus we selected the iterational approach. Integrating the differential equation and representing the unknown radiance just at finite number of sample points and directions, the integro differential equation is turned to be a large system of linear equations, where L is the unknown radiance values at the grid points, T is the matrix of the radiance transport and Q is the source term. Iteration, in fact, solves this linear equation by substituting a guess into the right side and obtaining a new guess on the left. If matrix T is a contraction, then this process will converge to the solution from an arbitrary initial guess. However, the initial guess has a great impact on the error. As can be shown, the error at step n is proprotional to the nth power of contraction and the error of the first guess. Thus having a good guess, the number of iteration steps can be reduced.

Boundary affecting multiple blocks
Parallel Iteration Boundary affecting multiple blocks Costly data exchanges Less frequent data exchanges Few iteration steps: Good initial guess Unscattered component Homogeneous solution Approximate inhomogeneous node1 node2 node3 In a parallel implementation the volume is decomposed to blocks, which are assigned to nodes to solve the transfer locally. As iteration refines the previous guess in each step, in a parallel implementation the compute nodes should exchange their boundary after each iteration step to get a globally correct result. These data exchanges may quickly become the bottleneck of the parallel solution since current GPUs are so fast, that inter GPU or inter computer communication has become relatively slow. In order to attack this bottleneck, we propose to exchange boundary conditions less frequently and let nodes work alone more before communication is needed. To make this idea feasible, we should solve several problems. We should guarantee that the nodes have something to refine, so each block is not far from the final solution even at the beginning of the iteration. In the previous work, this problem is usually solved by initializing the radiance with the direct, unscattered component or the solution obtained assuming that the volume is homogeneous. However, the direct component fades very quickly so far from the sources, this approximation is bad. The homogeneous solution, on the other hand, is obviously bad when the volume is heterogeneous. Thus, we use a novel approach that approximates both the direct and the indirect components in heterogeneous volume and initializes iteration with this approximation. The second problem that needs consideration is the analysis of the effects of not exchanging boundary conditions and the optimal setting of the exchange frequency. node4

Initial approximation
Direct term Direct + Indirect Solve diff. equation for each ray assuming spherical symmetry In order to get a good initial guess for both the direct and the indirect terms, we trace rays from the point source and process spherical fronts. Unlike previous methods, while marching on a ray, not only the direct term, but also the indirect component is approximated. Unfortunately, the indirect term depends on whole volume, so rays cannot be traced independently. In our approximation, we assume for each ray that other rays face exactly the same material properties, or in other words, the volume has spherical symmetry. In such case, the differential equation can be solved along a ray providing also the indirect term. Note that this does not mean that the solution will be spherically symmetric since each ray is computed according to the properties of the voxels it intersects, the assumption is about the rest of the volume. This method is exact if the volume is really spherically symmetric, and is an approximation if it is not. Due to coherence properties, the approximation is quite good even for inhomogeneous volumes. When initial ray casting is over, the direct and indirect terms are approximated at each voxel using tri-linear interpolation.

Iteration refinement - Finite element: FCC grid
BCC FCC The initial radiance estimate is refined by parallel iteration. Iteration updates a finite element representation of the radiance, that is, it stores the radiance just at finite points and directions. As the incoming radiance in a particular direction of a sample point is the outgoing radiance in the same direction of another voxel, sample points and directions should be carefully selected. An obvious solution would store the radiance of the voxel corners in the direction of the neighboring voxel corners. However, in a conventional Cartesian cubic grid, a voxel has just 6 neighbors of the same distance, which seems to be too small to represent the directional variation of the radiance. The body centered cubic grid, where sample points include voxel corners and voxel centers, is better since here a sample point has 8 neighbors. The best solution is the face centered cubic grid, which includes voxel corners and the face centers, giving 12 neighbors for each sample point. In our implementation, we use face centered cubic grid.

Reduced data exchanges
node1 TLn-1+Q Ln Ln-1 node2 Ln1 T1 Ln-11 node3 T12 As noted, the iteration solution updates a vector of voxel radiance values by a matrix multiplication. Sample values of different blocks affect each other through the boundary layer, which prohibits compute nodes calculating a part of the vector to run independently without communication. To reduce the communication overhead, we do not exchange boundary conditions after each iteration step, which means that a node may use older estimates from other nodes. Ln2 T21 T2 Ln-22 Ln-12 node4

Noise converges to zero!
Reduced data exchanges + TLn-1+Q Ln TLn-2+Q TLn-2+Q T[T12](Ln-3-Ln-2) As can be shown, the effect of not exchanging boundary conditions after each step is the addition of some noise or error in each step, where the error depends on the difference of older and newer estimates. When the iteration converges, this noise goes to zero, so the final result is not corrupted. In fact, only the speed of convergence is decreased but increase the speed of a single iteration cycle. This way a flexible compromise can be found dependent of the relative performance of the computation and communication. Noise converges to zero!

Iteration solution: CUDA
Sampling Summarizing, our proposed approach starts by setting the sampling points in the examined volume.

Sampling Illumination network Sample points and the FCC grid establish an illumination network.

Sampling Illumination network Initial radiance distribution An initial guess of both the direct and the indirect randiance is made by independently processing rays starting from the source.

Sampling Illumination network Initial radiance distribution Iteration The initial guess is refined by iterating the radiance of the FCC grid.

Sampling Illumination network Initial radiance distribution Iteration Visualization Finally, the radiance arriving at the eye is obtained.

Visualization: 5 node HP SVA
We implemented the system on a 5 node HP Scaleable Visualization Array, which was working as a GPU cluster. The slaves executed the simulation of the rendering of their own blocks, and the partial images were sent over the Infiniband network. The alpha-blending compositing operation of the ParaComp library was also executed in parallel. node 2 …

Error analysis for the initial distribution
These error plots show the effectiveness of the initial radiance distribution. The red curve was obtained with distributing both the direct and the indirect estimations, while green curve was obtained after the distribution of only the direct term. As the computational times of the direct radiance distribution and the direct/indirect distribution are similar, the proposed approach has no overhead in this respect. As we can observe, after distributing the indirect term as well, the error is gets much smaller, and we can get the same accuracy using just half of the iterations.

Compute + Communication
Scalability Error 2% Single iteration Concerning scalability data, I note that although we have not implemented the initial distribution in parallel, so each ray is traced in all blocks, it scales quite well. The explanation of this contradiction is that the blocks get smaller, so tracing a particular ray gets faster. Iteration time consists of the time of computation and the average time of the exchanges of boundary conditions. As expected, communication time gets critical very soon, but exchanging the boundary conditions just in every fifth iteration cycle keeps communication time under control. The number of extra iterations needed to compensate the increase noise is just about 5 percent, so frequency control seems to be an efficient way of attacking the communication overhead. Compute + Communication

Results Direct term 25 iterations 100 iterations
These figures show the results after different iteration numbers when only the direct term is distributed, and also the direct/indirect initialization. Note that the estimation is not far from the final solution, so it is not surprising that it can greatly improve the speed. I also show an application for interactive radiation source placement. Direct+Indirect estimation

Conclusions Interactive solution of the radiation transport
Scalable iteration scheme Current limitations No specular reflections Point sources

Parallel Solution to the Radiative Transport

Similar presentations

Presentation on theme: "Parallel Solution to the Radiative Transport"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel Solution to the Radiative Transport

Similar presentations

Presentation on theme: "Parallel Solution to the Radiative Transport"— Presentation transcript:

Similar presentations

About project

Feedback