C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan Accelerating generalized Cholesky decomposition using multiple processors
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan Application in Least-Squares Collocation
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan Error-covariance estimation
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan Cholesky Factorization L: lower triangular matrix
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan Generalized Cholesky
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan More Generalized Cholesky
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan Parallization When diagonal element has been computed may each element in the row be reduced separately: Hence each processor may take care of one column.
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan Blockwise factorization Should one row be factorized at at time ? Or should we make the factorization of blocks of elements ? Out-of-core factorization needed for large matrices, so let the processors work on blocked matrices.
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan blocks ‘Column-wise’ 1-dim. of size 9 3 blocks rectangular 2-dim. of size 3*3 Block division Column-wise and rectangular Blocks 1 2 3Blocks 1 2 c 11 c 21 c 31 c 41 c 51 c 12 c 22 c 13 c 32 c 23 c 33 c 14 c 24 c 34 c 42 c 43 c 52 c 15 c 25 c 35 c 44 c 45 c 16 c 26 c 36 c 46 c 56 c 55 c 54 c 53 c 61 c 62 c 66 c 65 c 64 c 63 Block 3 c 11 c 21 c 31 c 41 c 51 c 12 c 22 c 13 c 32 c 23 c 33 c 14 c 24 c 34 c 42 c 43 c 52 c 15 c 25 c 35 c 44 c 45 c 16 c 26 c 36 c 46 c 56 c 55 c 54 c 53 c 61 c 62 c 66 c 65 c 64 c 63
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan Blocksize tests NEQ = 10000, Nproc = 4NEQ = 20000, Nproc = 2
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan Parallelization Flowchart over the Choleski factorisation with NES_MP and related subroutine(s)
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan Parallelization Results Results (Perf. test on two PCs, Compiler PGF90) GOCE (4x3GHz, 2GB)IKOS (4x2.66GHz, 4GB) PROCNEQ.NESNES_MPNESNES_MP
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan Integration in GEOCOL18 ServerNEQGeocol17aGeocol18zr Processors 124 GOCE IKOS Geocol integration tests: Timing (in s) for equation solving only.
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan Performance Increase
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan Conclusion Generalized Cholesky-factorization enables the use of parallelization for solution and error-covariance computation. Time gain using parallelization depends on number of processors, block-size and how busy the computer is doing other things.
C.C.Tscherning & M.Veicherts, University of Copenhagen, Jan Note: further use of multiprocessing Evaluation of spherical harmonic series (N.Pavlis et al.). Establishing the normal-equation matrix or computing a column of covariances Factorisation may start as soon as a row of blocks has been established. Gives realistic speeds of LSC applications (minutes instead of days).