Antonio M. Vidal Jesús Peinado

Slides:



Advertisements
Similar presentations
Yi Heng Second Order Differentiation Bommerholz – Summer School 2006.
Advertisements

Zhen Lu CPACT University of Newcastle MDC Technology Reduced Hessian Sequential Quadratic Programming(SQP)
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
A Discrete Adjoint-Based Approach for Optimization Problems on 3D Unstructured Meshes Dimitri J. Mavriplis Department of Mechanical Engineering University.
Advanced Computational Software Scientific Libraries: Part 2 Blue Waters Undergraduate Petascale Education Program May 29 – June
Solving Linear Systems (Numerical Recipes, Chap 2)
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Function Optimization Newton’s Method. Conjugate Gradients
Unconstrained Optimization Rong Jin. Recap  Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
ENGG 1801 Engineering Computing MATLAB Lecture 7: Tutorial Weeks Solution of nonlinear algebraic equations (II)
Advanced Topics in Optimization
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
ECE 552 Numerical Circuit Analysis Chapter Six NONLINEAR DC ANALYSIS OR: Solution of Nonlinear Algebraic Equations Copyright © I. Hajj 2012 All rights.
Optimization Methods One-Dimensional Unconstrained Optimization
Accelerating the Optimization in SEE++ Presentation at RISC, Hagenberg Johannes Watzl 04/27/2006 Cooperation Project by RISC and UAR.
Ordinary Differential Equations (ODEs)
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.

Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
UNCONSTRAINED MULTIVARIABLE
ITERATIVE TECHNIQUES FOR SOLVING NON-LINEAR SYSTEMS (AND LINEAR SYSTEMS)
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
MUMPS A Multifrontal Massively Parallel Solver IMPLEMENTATION Distributed multifrontal.
® Backward Error Analysis and Numerical Software Sven Hammarling NAG Ltd, Oxford
CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.
Basic Numerical methods and algorithms
84 b Unidimensional Search Methods Most algorithms for unconstrained and constrained optimisation use an efficient unidimensional optimisation technique.
1 Using the PETSc Parallel Software library in Developing MPP Software for Calculating Exact Cumulative Reaction Probabilities for Large Systems (M. Minkoff.
Qualifier Exam in HPC February 10 th, Quasi-Newton methods Alexandru Cioaca.
Application of Differential Applied Optimization Problems.
Nonlinear programming Unconstrained optimization techniques.
ESA living planet symposium 2010 ESA living planet symposium 28 June – 2 July 2010, Bergen, Norway GOCE data analysis: realization of the invariants approach.
ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 1 Barry Smith Argonne National Laboratory DOE 2000 Numerics Capability
Outline 3  PWA overview Computational challenges in Partial Wave Analysis Comparison of new and old PWA software design - performance issues Maciej Swat.
Issues in (Financial) High Performance Computing John Darlington Director Imperial College Internet Centre Fast Financial Algorithms and Computing 4th.
1 Incorporating Iterative Refinement with Sparse Cholesky April 2007 Doron Pearl.
Parallel Solution of the Poisson Problem Using MPI
CS240A: Conjugate Gradients and the Model Problem.
Parco Auto-optimization of linear algebra parallel routines: the Cholesky factorization Luis-Pedro García Servicio de Apoyo a la Investigación Tecnológica.
Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia.
A comparison between PROC NLP and PROC OPTMODEL Optimization Algorithm Chin Hwa Tan December 3, 2008.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Automatic Parameterisation of Parallel Linear Algebra Routines Domingo Giménez Javier Cuenca José González University of Murcia SPAIN Algèbre Linéaire.
Hand-written character recognition
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Report from LBNL TOPS Meeting TOPS/ – 2Investigators  Staff Members:  Parry Husbands  Sherry Li  Osni Marques  Esmond G. Ng 
Programming Massively Parallel Graphics Multiprocessors using CUDA Final Project Amirhassan Asgari Kamiabad
Performance of BLAS-3 Based Tridiagonalization Algorithms on Modern SMP Machines Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
CSCE 441: Computer Graphics Forward/Inverse kinematics Jinxiang Chai.
Parallel Programming & Cluster Computing Linear Algebra Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
Intro to Scientific Libraries Intro to Scientific Libraries Blue Waters Undergraduate Petascale Education Program May 29 – June
MA237: Linear Algebra I Chapters 1 and 2: What have we learned?
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
CSCE 441: Computer Graphics Forward/Inverse kinematics
A survey of Exascale Linear Algebra Libraries for Data Assimilation
A computational loop k k Integration Newton Iteration
ACCELERATING SPARSE CHOLESKY FACTORIZATION ON GPUs
GPU Implementations for Finite Element Methods
ENGG 1801 Engineering Computing
Chapter 10. Numerical Solutions of Nonlinear Systems of Equations
Matrix Methods Summary
Automatic optimization of parallel linear algebra software
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
A computational loop k k Integration Newton Iteration
Presentation transcript:

Antonio M. Vidal Jesús Peinado Toward an automatic parallel tool for solving systems of nonlinear equations Antonio M. Vidal Jesús Peinado Departamento de Sistemas Informáticos y Computación Universidad Politécnica de Valencia

Solving Systems of Nonlinear Equations Newton’s iteration: Newton’s Algorithm

Methods to solve Nonlinear Systems Newton’s Methods: To solve the linear system by using a direct method (LU, Cholesky,..) Several approaches : Newton, Shamanskii, Chord,.. Quasi-Newton Methods: To approximate the Jacobian matrix . (Broyden Method, BFGS,...) B(xc) ≈ J(xc) B(x+)= B(xc)+uvT Inexact Newton Methods : To solve the linear system by using an iterative method (GMRES, C. Gradient,..) . ||J(xk )sk+ F(xk )||2 = ηk ||F(xk )||2

Difficulties in the solution of Nonlinear Systems by a non-expert Scientist Several methods Slow convergence A lot of trials are needed to obtain the optimum algorithm If parallelization is tried the possibilities increase dramatically: shared memory, distributed memory, passing message environments, computational kernels, several parallel numerical libraries,… No help is provided by libraries to solve a nonlinear system

Objective To achieve a software tool which automatically obtains the best from a sequential or parallel machine for solving a nonlinear system, for every problem and transparently to the user

Work done A set of parallel algorithms have been implemented: Newton’s, Quasi-Newton and Inexact Newton algorithms for symmetric and nonsymmetric Jacobian matrices Implementations are independent of the problem They have been tested with several problems of different kinds They have been developed by using the support and the philosophy of ScaLAPACK They can be seen as a part of a more general environment related to software for message passing machines

SCALAPACK Example of distribution for solving a linear system with J Jacobian Matrix and F problem function Programming Model: SPMD. Interconnection network: Logical Mesh Two-dimensional distribution of data: block cyclic

Software environment Authomatic Parallel Tool USER Numerical Paralell Algorithms ScaLAPACK Scalable Linear Algebra Package MINPACK Global Minimization PBLAS Package Parallel BLAS LAPACK Linear Algebra Package BLACS Basic Linear Algebra Communication Subroutines Other CERFACS: Local packages.. CG,GMRES Iterative Solvers Message-passing primitives (MPI, PVM, ...) BLAS Basic Linear Algebra Subroutines

Developing a systematic approach How to chose the best method? Specification of data problem Starting point. Function F. Jacobian Matrix J. Structure of Jacobian Matrix (dense, sparse, band, …) Required precision. Using of chaotic techniques. Possibilities of parallelization (function, Jacobian Matrix,…). Sometimes only the Function is known: Prospecting with a minimum simple algorithm (Newton+finite differences+sequential approach) can be interesting

La metodología(1). Esquema general

Developing a systematic approach Method flops   C + k N ( E J 2 3 n ) Newton   C c + k S ( J 2 3 n m E )) Shamanskii   C c + J 2 3 n k ( E ) Chord   C + k NCH ( E J n 3 ) Newton-Cholesky   C + E J 4 3 n k B ( 29 2 ) Broyden   C + E J n 3 k BF ( 2 ) m - )( 15 BFGS   C + E k NG ( J G 2 n m ) Newton-GMRES   C + E k NCG ( J CG n 2 ) Newton-CG CE= Function evaluation cost; CJ=Jacobian matrix evaluation cost

Developing a systematic approach Function and Jacobian Matrix characterize the nonlinear system It is important to know features of both: sparse or dense, how to compute (sequential or parallel), structure,… It is be interesting to classify the problems according to their cost, specially to identify the best method or to avoid the worst method and to decide what must be parallelized J F O(n) O(n 2 ) 3 4 >O(n P 11 12 13 14 1+ 21 22 23 24 2+ 31 P 32 33 34 3+ O(n 4 ) 41 42 43 44 4+ >O(n +1 +2 +3 +4 ++

Developing a systematic approach Once the best sequential option has been selected the process can be finalized If the best parallel algorithm is required the following items must be analyzed: Computer architecture: (tf, t, b ) Programming environments: PVM/MPI…. Data distribution to obtain the best parallelization. Cost of the parallel algorithms

Developing a systematic approach Data Distribution It depends on the parallel environment. In the case of ScaLAPACK: Cyclic by blocks distribution: optimize the size of block and the size of the mesh Parallelization chances Function evaluation and/or Computing the Jacobian matrix. Parallelize the more expensive operation! Cost of the parallel algorithms Utilize the table for parallel cost with the parameters of the parallel machine: (tf, t, b)

Developing a systematic approach Final decision for chosing the method Cost < O(n3) => 0; Cost >= O(n3) => 1 CE CJ Advisable Chose according to the speed of convergence. If it is slow chose Newton or Newton GMRES 1 Avoid to compute the Jacobian matrix. Chose Broyden or use finite differences Newton or Newton-GMRES adequate. Avoid to compute the function Try to do a small number of iterations. Use Broyden to avoid the computation of Jacobian matrix

Developing a systematic approach Final decision for parallelization No chance of parallelization => 0; Chance of parallelization => 1 Fun Jac. Advisable Try to do few iterations. Use Broyden or Chord to avoid the computation of Jacobian matrix 1 Newton or Newton-GMRES adequate. Do few iterations and avoid to compute the function Compute few times Jacobian matrix. Use Broyden or Chord if possible. Chose according to speed of convergence. Newton or Newton-GMRES adequate

Developing a systematic approach Finish or feedback: IF selected method is convenient THEN finish ELSE feedback Sometimes bad results are obtained due to: No convergence. High computational cost Parallelization no satisfactory.

La metodología(12). Esquema del proceso guiado

La metodología(12). Esquema del proceso guiado

How does it work? Inverse Toeplitz Symmetric Eigenvalue Problem Well known problem: Starting point, function, analytical Jacobian matrix or finite difference approach, … Kind of problem Anal.Jac. Fin.Dif. Jac Cost of Jacobian matrix high: Avoid compute it. Use Chord o Broyden. High chance of parallelization, even if finite difference is used. If speed of convergence is slow use Broyden but insert some Newton iterations.

How does it work? Leakage minimization in a network of water distribution Well known problem: Starting point, function, analytical Jacobian matrix or finite difference approach, … Jacobian matrix: symmetric, positive def. Kind of problem Avoid methods with high cost of a iteration like Newton-Cholesky Computation of F and J can be parallelized. Use Newton-CG (to speed-up convergence) or BFGS

Conclusions Part of this work has been done in the Ph.D. Thesis of J.Peinado: “Resolución Paralela de Sistemas de Ecuaciones no Lineales”. Univ.Politécnica de Valencia. Sept. 2003 All specifications and parallel algorithms have been developed Implementation stage of the automatic parallel tool starts in January 2004 in the frame of a CITYT Project: “Desarrollo y optimización de código paralelo para sistemas de Audio 3D”. TIC2003-08230-C02-02