Antonio M. Vidal Jesús Peinado Toward an automatic parallel tool for solving systems of nonlinear equations Antonio M. Vidal Jesús Peinado Departamento de Sistemas Informáticos y Computación Universidad Politécnica de Valencia
Solving Systems of Nonlinear Equations Newton’s iteration: Newton’s Algorithm
Methods to solve Nonlinear Systems Newton’s Methods: To solve the linear system by using a direct method (LU, Cholesky,..) Several approaches : Newton, Shamanskii, Chord,.. Quasi-Newton Methods: To approximate the Jacobian matrix . (Broyden Method, BFGS,...) B(xc) ≈ J(xc) B(x+)= B(xc)+uvT Inexact Newton Methods : To solve the linear system by using an iterative method (GMRES, C. Gradient,..) . ||J(xk )sk+ F(xk )||2 = ηk ||F(xk )||2
Difficulties in the solution of Nonlinear Systems by a non-expert Scientist Several methods Slow convergence A lot of trials are needed to obtain the optimum algorithm If parallelization is tried the possibilities increase dramatically: shared memory, distributed memory, passing message environments, computational kernels, several parallel numerical libraries,… No help is provided by libraries to solve a nonlinear system
Objective To achieve a software tool which automatically obtains the best from a sequential or parallel machine for solving a nonlinear system, for every problem and transparently to the user
Work done A set of parallel algorithms have been implemented: Newton’s, Quasi-Newton and Inexact Newton algorithms for symmetric and nonsymmetric Jacobian matrices Implementations are independent of the problem They have been tested with several problems of different kinds They have been developed by using the support and the philosophy of ScaLAPACK They can be seen as a part of a more general environment related to software for message passing machines
SCALAPACK Example of distribution for solving a linear system with J Jacobian Matrix and F problem function Programming Model: SPMD. Interconnection network: Logical Mesh Two-dimensional distribution of data: block cyclic
Software environment Authomatic Parallel Tool USER Numerical Paralell Algorithms ScaLAPACK Scalable Linear Algebra Package MINPACK Global Minimization PBLAS Package Parallel BLAS LAPACK Linear Algebra Package BLACS Basic Linear Algebra Communication Subroutines Other CERFACS: Local packages.. CG,GMRES Iterative Solvers Message-passing primitives (MPI, PVM, ...) BLAS Basic Linear Algebra Subroutines
Developing a systematic approach How to chose the best method? Specification of data problem Starting point. Function F. Jacobian Matrix J. Structure of Jacobian Matrix (dense, sparse, band, …) Required precision. Using of chaotic techniques. Possibilities of parallelization (function, Jacobian Matrix,…). Sometimes only the Function is known: Prospecting with a minimum simple algorithm (Newton+finite differences+sequential approach) can be interesting
La metodología(1). Esquema general
Developing a systematic approach Method flops C + k N ( E J 2 3 n ) Newton C c + k S ( J 2 3 n m E )) Shamanskii C c + J 2 3 n k ( E ) Chord C + k NCH ( E J n 3 ) Newton-Cholesky C + E J 4 3 n k B ( 29 2 ) Broyden C + E J n 3 k BF ( 2 ) m - )( 15 BFGS C + E k NG ( J G 2 n m ) Newton-GMRES C + E k NCG ( J CG n 2 ) Newton-CG CE= Function evaluation cost; CJ=Jacobian matrix evaluation cost
Developing a systematic approach Function and Jacobian Matrix characterize the nonlinear system It is important to know features of both: sparse or dense, how to compute (sequential or parallel), structure,… It is be interesting to classify the problems according to their cost, specially to identify the best method or to avoid the worst method and to decide what must be parallelized J F O(n) O(n 2 ) 3 4 >O(n P 11 12 13 14 1+ 21 22 23 24 2+ 31 P 32 33 34 3+ O(n 4 ) 41 42 43 44 4+ >O(n +1 +2 +3 +4 ++
Developing a systematic approach Once the best sequential option has been selected the process can be finalized If the best parallel algorithm is required the following items must be analyzed: Computer architecture: (tf, t, b ) Programming environments: PVM/MPI…. Data distribution to obtain the best parallelization. Cost of the parallel algorithms
Developing a systematic approach Data Distribution It depends on the parallel environment. In the case of ScaLAPACK: Cyclic by blocks distribution: optimize the size of block and the size of the mesh Parallelization chances Function evaluation and/or Computing the Jacobian matrix. Parallelize the more expensive operation! Cost of the parallel algorithms Utilize the table for parallel cost with the parameters of the parallel machine: (tf, t, b)
Developing a systematic approach Final decision for chosing the method Cost < O(n3) => 0; Cost >= O(n3) => 1 CE CJ Advisable Chose according to the speed of convergence. If it is slow chose Newton or Newton GMRES 1 Avoid to compute the Jacobian matrix. Chose Broyden or use finite differences Newton or Newton-GMRES adequate. Avoid to compute the function Try to do a small number of iterations. Use Broyden to avoid the computation of Jacobian matrix
Developing a systematic approach Final decision for parallelization No chance of parallelization => 0; Chance of parallelization => 1 Fun Jac. Advisable Try to do few iterations. Use Broyden or Chord to avoid the computation of Jacobian matrix 1 Newton or Newton-GMRES adequate. Do few iterations and avoid to compute the function Compute few times Jacobian matrix. Use Broyden or Chord if possible. Chose according to speed of convergence. Newton or Newton-GMRES adequate
Developing a systematic approach Finish or feedback: IF selected method is convenient THEN finish ELSE feedback Sometimes bad results are obtained due to: No convergence. High computational cost Parallelization no satisfactory.
La metodología(12). Esquema del proceso guiado
La metodología(12). Esquema del proceso guiado
How does it work? Inverse Toeplitz Symmetric Eigenvalue Problem Well known problem: Starting point, function, analytical Jacobian matrix or finite difference approach, … Kind of problem Anal.Jac. Fin.Dif. Jac Cost of Jacobian matrix high: Avoid compute it. Use Chord o Broyden. High chance of parallelization, even if finite difference is used. If speed of convergence is slow use Broyden but insert some Newton iterations.
How does it work? Leakage minimization in a network of water distribution Well known problem: Starting point, function, analytical Jacobian matrix or finite difference approach, … Jacobian matrix: symmetric, positive def. Kind of problem Avoid methods with high cost of a iteration like Newton-Cholesky Computation of F and J can be parallelized. Use Newton-CG (to speed-up convergence) or BFGS
Conclusions Part of this work has been done in the Ph.D. Thesis of J.Peinado: “Resolución Paralela de Sistemas de Ecuaciones no Lineales”. Univ.Politécnica de Valencia. Sept. 2003 All specifications and parallel algorithms have been developed Implementation stage of the automatic parallel tool starts in January 2004 in the frame of a CITYT Project: “Desarrollo y optimización de código paralelo para sistemas de Audio 3D”. TIC2003-08230-C02-02