Accelerating the Optimization in SEE++ Presentation at RISC, Hagenberg Johannes Watzl 04/27/2006 Cooperation Project by RISC and UAR
Contents The project survey The problem - short overview Optimization Thesis contribution Accellerating the Sequential Program Parallelization Interpolation of the Torquefunction Conclusion
The Project Survey 1 SEE-Kid, SEE-Grid Software: biomechanical simulation of the human eye (UAR: Michael Buchberger, Thomas Kaltofen RISC: Wolfgang Schreiner, Karoly Bosa) For choosing optimal surgery techniques for the treatment of certain eye motility disorders Simulation of the Hess-Lancaster test (examination by which the pathology of the patient can be estimated)
The Project Survey 2 Oblique superior muscle: the upper diagonal eye muscle (for downward and inside motions) Example: Hess Lancaster Chart for Right Superior Oblique Palsy
The Problem – a Short Overview 1 Stable eye position: minimum of a specific function (torque-function). Computation: Levenberg Marquardt optimization SEE++
The Problem – a Short Overview 2 Minimization of the torque function: … the Torque function … a vector of six elements (representing the muscle force, length, …) … describes the eye position (Ab-, Adduction, Elevation, Depression)
The Problem – a Short Overview 3 Example 1: Torque function of a healthy eye
The Problem – a Short Overview 4 Example 2: Torque function of a pathological eye: (some muscle data (in this case muscle force) changed)
Optimization 1 General structure of an optimization algorithm: Input:, starting value x 1 begin k:=1; while !(convergence criterion) do begin compute search direction compute step size, with end
Optimization 2 Newton method Iteration: uses the Hessian matrix Quadratic convergence (number of correct decimal places doubles in every iteration step)
Optimization 3 Gauß-Newton method Instead of using the Hessian matrix, the Hessian is approximated by: Jacobian matrix used to approximate the Hessian (J T J always symmetric and positive definite) Quadratic convergence
Optimization 4 In our case: Problem: These methods converge only if the starting value is near the minimum!!!
Optimization 5 Levenberg-Marquardt algorithm (LM) Search direction p k in Newton method too big (only local convergence) → construct Trust-Region in every step (we compute our search direction inside this Trust-Region) with certain conditions complied Inside this Trust-Region we do the „normal“ Iteration step. Combination of Gauß-Newton and a Trust- region method → Converges nearly quadratic The starting value doesn‘t have to be near the minimum for finding the solution.
Optimization 6 Levenberg Marquardt is used in SEE++ Based on a Matlab implementation called EyeLab SEE-Kid Model different from EyeLab Model but the optimization routine is the same Matlab code was converted to C++ code
Accelerating the sequential program The computation of the Jacobian matrix J in every step is very costly. → compute the new Jacobian matrix by updating the last.
Accelerating the sequential program Broyden Rank-1 update:
Accelerating the sequential program In every step of the Broyden method we have to: 1. Solve the equation: 2. Compute: 3. Compute:
Accelerating the sequential program Implementation Prototype implementation in Matlab based on the EyeLab source code For experiments and testing (functionality like without Broyden for every pathological case) If successful: Converting the Matlab code into C++
Parallelizing the existing implementation 1 Decomposition of the domain of eye positions Problem: most of the steps are done near the minimum, so one of the processors does the main part of the work. (→ not really parallel because after some time only one processor has to compute the main part)
Parallelizing the existing implementation 2 Approximation of the Hessian matrix: Divide J and compute in parallel (parallel matrix multiplication): Each computation of the vector-vector product J i J k can be run as a separate parallel process. n...number of processors
Parallelizing the existing implementation 3 Problem: Small Dimension of the matrices (n=6) Absolute computation time: ~7sec (P4 3.4Ghz) →Use shared memory systems to reduce the communication overhead, Speedup: ~2 (will be attempted later) For distributed memory systems or Grid we have to look for alternative approaches.
Interpolation of the Torque function 1 Why interpolation? During optimization: lots of function evaluations (~8500 evaluations in ~4sec) More than the half of computation time is used for function evaluation!!! → Interpolation: can be done in parallel using domain decomposition (distributed memory systems and GRID too!)
Interpolation of the Torquefunction 2 Triangulated terrain: (input: set of points (the vertices of the triangles))
Interpolation of the Torquefunction 3 Delaunay-Triangulation Approximation of a terrain (not plane) with given points We need a certain number of function evaluations for building up our triangles at the beginning. If we want to run this in parallel we have to divide the domain into several parts and do the Delaunay-Triangulation in every subdomain.
Conclusions Timeline: Til end of May: Implementation of Broyden Update Shared Memory Parallelization (Multithreaded) Basis (Levenberg Marquardt) If successful in basis → in Broyden Update too Til middle of July: Implementation of the interpolation Til end of November: Writing thesis