Accelerating the Optimization in SEE++ Presentation at RISC, Hagenberg Johannes Watzl 04/27/2006 Cooperation Project by RISC and UAR.

Slides:

Advertisements

Similar presentations

Steady-state heat conduction on triangulated planar domain May, 2002

Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.

Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.

WP A1-c „SEE-GRID“ (Virtual Eye Surgery) Wolfgang Schreiner Research Institute for Symbolic Computation (RISC)

Control Structure Selection for a Methanol Plant using Hysys/Unisim

Siddharth Choudhary.  Refines a visual reconstruction to produce jointly optimal 3D structure and viewing parameters  ‘bundle’ refers to the bundle.

Optimization of thermal processes

Computer Graphics Visible Surface Determination. Goal of Visible Surface Determination To draw only the surfaces (triangles) that are visible, given a.

Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.

Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)

CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.

Function Optimization Newton’s Method. Conjugate Gradients

1cs542g-term Notes  Extra class this Friday 1-2pm  If you want to receive s about the course (and are auditing) send me .

Methods For Nonlinear Least-Square Problems

ENGG 1801 Engineering Computing MATLAB Lecture 7: Tutorial Weeks Solution of nonlinear algebraic equations (II)

High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.

Thomas algorithm to solve tridiagonal matrices

Advanced Topics in Optimization

Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.

Why Function Optimization ?

3.7. O THER G AME P HYSICS A PPROACHES Overview of other game engine physics approaches.

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.

Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)

UNCONSTRAINED MULTIVARIABLE

MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 9. Optimization problems.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.

Solving Scalar Linear Systems Iterative approach Lecture 15 MA/CS 471 Fall 2003.

Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.

Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.

Qualifier Exam in HPC February 10 th, Quasi-Newton methods Alexandru Cioaca.

Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.

New Modeling Techniques for the Global Routing Problem Anthony Vannelli Department of Electrical and Computer Engineering University of Waterloo Waterloo,

A conservative FE-discretisation of the Navier-Stokes equation JASS 2005, St. Petersburg Thomas Satzger.

Triangular Mesh Decimation

LTSI (1) Faculty of Mech. & Elec. Engineering, University AL-Baath, Syria Ahmad Karfoul (1), Julie Coloigner (2,3), Laurent Albera (2,3), Pierre Comon.

Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.

Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.

© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.

Solution of Nonlinear Functions

Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.

23/5/20051 ICCS congres, Atlanta, USA May 23, 2005 The Deflation Accelerated Schwarz Method for CFD C. Vuik Delft University of Technology

Variations on Backpropagation.

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Programming Massively Parallel.

Finding Rightmost Eigenvalues of Large, Sparse, Nonsymmetric Parameterized Eigenvalue Problems Minghao Wu AMSC Program Advisor: Dr. Howard.

Geometric Camera Calibration

Xing Cai University of Oslo

3.7. Other Game Physics Approaches

GPU Implementations for Finite Element Methods

ENGG 1801 Engineering Computing

Variations on Backpropagation.

Chapter 10. Numerical Solutions of Nonlinear Systems of Equations

Numerical Algorithms Quiz questions

Parallelization of Sparse Coding & Dictionary Learning

~ Least Squares example

~ Least Squares example

Variations on Backpropagation.

Performance Optimization

Parallel Programming in C with MPI and OpenMP

Computer Animation Algorithms and Techniques

CS5321 Numerical Optimization

Presentation transcript:

Accelerating the Optimization in SEE++ Presentation at RISC, Hagenberg Johannes Watzl 04/27/2006 Cooperation Project by RISC and UAR

Contents The project survey The problem - short overview Optimization Thesis contribution Accellerating the Sequential Program Parallelization Interpolation of the Torquefunction Conclusion

The Project Survey 1 SEE-Kid, SEE-Grid Software: biomechanical simulation of the human eye (UAR: Michael Buchberger, Thomas Kaltofen RISC: Wolfgang Schreiner, Karoly Bosa) For choosing optimal surgery techniques for the treatment of certain eye motility disorders Simulation of the Hess-Lancaster test (examination by which the pathology of the patient can be estimated)

The Project Survey 2 Oblique superior muscle: the upper diagonal eye muscle (for downward and inside motions) Example: Hess Lancaster Chart for Right Superior Oblique Palsy

The Problem – a Short Overview 1 Stable eye position: minimum of a specific function (torque-function). Computation: Levenberg Marquardt optimization SEE++

The Problem – a Short Overview 2 Minimization of the torque function: … the Torque function … a vector of six elements (representing the muscle force, length, …) … describes the eye position (Ab-, Adduction, Elevation, Depression)

The Problem – a Short Overview 3 Example 1: Torque function of a healthy eye

The Problem – a Short Overview 4 Example 2: Torque function of a pathological eye: (some muscle data (in this case muscle force) changed)

Optimization 1 General structure of an optimization algorithm: Input:, starting value x 1 begin k:=1; while !(convergence criterion) do begin compute search direction compute step size, with end

Optimization 2 Newton method Iteration: uses the Hessian matrix Quadratic convergence (number of correct decimal places doubles in every iteration step)

Optimization 3 Gauß-Newton method Instead of using the Hessian matrix, the Hessian is approximated by: Jacobian matrix used to approximate the Hessian (J T J always symmetric and positive definite) Quadratic convergence

Optimization 4 In our case: Problem: These methods converge only if the starting value is near the minimum!!!

Optimization 5 Levenberg-Marquardt algorithm (LM) Search direction p k in Newton method too big (only local convergence) → construct Trust-Region in every step (we compute our search direction inside this Trust-Region) with certain conditions complied Inside this Trust-Region we do the „normal“ Iteration step. Combination of Gauß-Newton and a Trust- region method → Converges nearly quadratic The starting value doesn‘t have to be near the minimum for finding the solution.

Optimization 6 Levenberg Marquardt is used in SEE++ Based on a Matlab implementation called EyeLab SEE-Kid Model different from EyeLab Model but the optimization routine is the same Matlab code was converted to C++ code

Accelerating the sequential program The computation of the Jacobian matrix J in every step is very costly. → compute the new Jacobian matrix by updating the last.

Accelerating the sequential program Broyden Rank-1 update:

Accelerating the sequential program In every step of the Broyden method we have to: 1. Solve the equation: 2. Compute: 3. Compute:

Accelerating the sequential program Implementation Prototype implementation in Matlab based on the EyeLab source code For experiments and testing (functionality like without Broyden for every pathological case) If successful: Converting the Matlab code into C++

Parallelizing the existing implementation 1 Decomposition of the domain of eye positions Problem: most of the steps are done near the minimum, so one of the processors does the main part of the work. (→ not really parallel because after some time only one processor has to compute the main part)

Parallelizing the existing implementation 2 Approximation of the Hessian matrix: Divide J and compute in parallel (parallel matrix multiplication): Each computation of the vector-vector product J i J k can be run as a separate parallel process. n...number of processors

Parallelizing the existing implementation 3 Problem: Small Dimension of the matrices (n=6) Absolute computation time: ~7sec (P4 3.4Ghz) →Use shared memory systems to reduce the communication overhead, Speedup: ~2 (will be attempted later) For distributed memory systems or Grid we have to look for alternative approaches.

Interpolation of the Torque function 1 Why interpolation? During optimization: lots of function evaluations (~8500 evaluations in ~4sec) More than the half of computation time is used for function evaluation!!! → Interpolation: can be done in parallel using domain decomposition (distributed memory systems and GRID too!)

Interpolation of the Torquefunction 2 Triangulated terrain: (input: set of points (the vertices of the triangles))

Interpolation of the Torquefunction 3 Delaunay-Triangulation Approximation of a terrain (not plane) with given points We need a certain number of function evaluations for building up our triangles at the beginning. If we want to run this in parallel we have to divide the domain into several parts and do the Delaunay-Triangulation in every subdomain.

Conclusions Timeline: Til end of May: Implementation of Broyden Update Shared Memory Parallelization (Multithreaded) Basis (Levenberg Marquardt) If successful in basis → in Broyden Update too Til middle of July: Implementation of the interpolation Til end of November: Writing thesis