Xiu-Lin Wang, Xiao-Feng Gong, Qiu-Hua Lin

Slides:

Advertisements

Similar presentations

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.

Advertisements

Introduction to Neural Networks Computing

Component Analysis (Review)

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

1 Maarten De Vos SISTA – SCD - BIOMED K.U.Leuven On the combination of ICA and CPA Maarten De Vos Dimitri Nion Sabine Van Huffel Lieven De Lathauwer.

B.Macukow 1 Lecture 12 Neural Networks. B.Macukow 2 Neural Networks for Matrix Algebra Problems.

SOLVING SYSTEMS OF LINEAR EQUATIONS. Overview A matrix consists of a rectangular array of elements represented by a single symbol (example: [A]). An individual.

Rayan Alsemmeri Amseena Mansoor. LINEAR SYSTEMS Jacobi method is used to solve linear systems of the form Ax=b, where A is the square and invertible.

10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.

Lecture 5 A Priori Information and Weighted Least Squared.

Linear Algebraic Equations

Sampling algorithms for l 2 regression and applications Michael W. Mahoney Yahoo Research (Joint work with P. Drineas.

Special Matrices and Gauss-Siedel

Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.

1 Multiple Kernel Learning Naouel Baili MRL Seminar, Fall 2009.

Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.

Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

LINEAR PROGRAMMING SIMPLEX METHOD.

Particle Filter-Assisted Positioning Method for Identifying RFID-Tag Implanted in the Organism Gen Imai*, Katsushi Matsuda*, Hiromi Takahata and Minoru.

LU Decomposition 1. Introduction Another way of solving a system of equations is by using a factorization technique for matrices called LU decomposition.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

Independent Components Analysis with the JADE algorithm

AN ITERATIVE METHOD FOR MODEL PARAMETER IDENTIFICATION 4. DIFFERENTIAL EQUATION MODELS E.Dimitrova, Chr. Boyadjiev E.Dimitrova, Chr. Boyadjiev BULGARIAN.

Hongyan Li, Huakui Wang, Baojin Xiao College of Information Engineering of Taiyuan University of Technology 8th International Conference on Signal Processing.

Mathematical foundationsModern Seismology – Data processing and inversion 1 Some basic maths for seismic data processing and inverse problems (Refreshement.

Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

JAVA AND MATRIX COMPUTATION

2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 2) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.

Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.

Review of Matrix Operations Vector: a sequence of elements (the order is important) e.g., x = (2, 1) denotes a vector length = sqrt(2*2+1*1) orientation.

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

1. Systems of Linear Equations and Matrices (8 Lectures) 1.1 Introduction to Systems of Linear Equations 1.2 Gaussian Elimination 1.3 Matrices and Matrix.

Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University

Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 11:00-12:00.

CSE 554 Lecture 8: Alignment

Part 3 Chapter 12 Iterative Methods

Compressive Coded Aperture Video Reconstruction

Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.

Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.

LECTURE 11: Advanced Discriminant Analysis

Monte Carlo simulation

Non-additive Security Games

Introduction to Matrices

Gauss-Siedel Method.

Data Mining K-means Algorithm

Outlier Processing via L1-Principal Subspaces

NAME: OLUWATOSIN UTHMAN ZUBAIR (145919) COURSE: NETWORK FLOW

Lecture 22: Parallel Algorithms

Autar Kaw Benjamin Rigsby

CSE Social Media & Text Analytics

Singular Value Decomposition

Unfolding Problem: A Machine Learning Approach

RECORD. RECORD Gaussian Elimination: derived system back-substitution.

Metode Eliminasi Pertemuan – 4, 5, 6 Mata Kuliah : Analisis Numerik

Numerical Analysis Lecture 16.

Numerical Analysis Lecture 17.

Whitening-Rotation Based MIMO Channel Estimation

Shared Memory Accesses

Neuro-Computing Lecture 2 Single-Layer Perceptrons

Approximation Algorithms for the Selection of Robust Tag SNPs

Unfolding with system identification

Numerical Modeling Ramaz Botchorishvili

Ax = b Methods for Solution of the System of Equations:

Ax = b Methods for Solution of the System of Equations (ReCap):

Presentation transcript:

A Study on Parallelization of Successive Rotation Based Joint Diagonalization Xiu-Lin Wang, Xiao-Feng Gong, Qiu-Hua Lin Dalian University of Technology, China 19th DSP, August, 2014

Content 1. Introduction 2. The successive rotation 3. The proposed algorithm 4. Simulation results 5. Conclusion 1

1. Introduction Joint diagonalization …… JD seeks unloading matrix so that are diagonal Joint diagonalization Target matrices …… 2

1. Introduction Joint Diagonalization(JD) has been widely applied Array processing (X. F. Gong, 2012) Tensor decomposition (L. Delathauwer, 2008; X. F. Gong, 2013) Speech signal processing (D. T. Pham, 2003) Blind source separation (J. F. Cardoso, 1993; A. Mesloub, 2014) Slicewise form: Fig.1 Visualization of models for CPD 3

1. Introduction Joint Diagonalization(JD) has been widely applied Array processing (X. F. Gong, 2012) Tensor decomposition (L. Delathauwer, 2008; X. F. Gong, 2013) Speech signal processing (D. T. Pham, 2003) Blind source separation (J. F. Cardoso, 1993; A. Mesloub, 2014) Slicewise form: Fig.1 Visualization of models for CPD 3

1. Introduction Joint Diagonalization(JD) has been widely applied Array processing (X. F. Gong, 2012) Tensor decomposition (L. Delathauwer, 2008; X. F. Gong, 2013) Speech signal processing (D. T. Pham, 2003) Blind source separation (J. F. Cardoso, 1993; A. Mesloub, 2014) Slicewise form: Fig.1 Visualization of models for CPD 3

1. Introduction Joint Diagonalization(JD) has been widely applied Array processing (X. F. Gong, 2012) Tensor decomposition (L. Delathauwer, 2008; X. F. Gong, 2013) Speech signal processing (D. T. Pham, 2003) Blind source separation (J. F. Cardoso, 1993; A. Mesloub, 2014) Slicewise form: Fig.1 Visualization of models for CPD 3

1. Introduction Joint Diagonalization(JD) has been widely applied Array processing (X. F. Gong, 2012) Tensor decomposition (L. Delathauwer, 2008; X. F. Gong, 2013) Speech signal processing (D. T. Pham, 2003) Blind source separation (J. F. Cardoso, 1993; A. Mesloub, 2014) Slicewise form: Fig.1 Visualization of models for CPD 3

1. Introduction Joint Diagonalization(JD) has been widely applied Array processing (X. F. Gong, 2012) Tensor decomposition (L. Delathauwer, 2008; X. F. Gong, 2013) Speech signal processing (D. T. Pham, 2003) Blind source separation (J. F. Cardoso, 1993; A. Mesloub, 2014) Mixed signals Construct target matrices JD Unloading matrix Separated signals Fig.2 Blind source separation algorithm’s block diagram based JD 4

1. Introduction Joint diagonalization Orthogonality of unloading matrix Orthogonal JD (OJD): Pre-whitening Non-orthogonal JD(NOJD): No pre-whitening JD Several criteria applied to JD problems minimization of off-norm weighted least squares information theory Optimization strategies some specific optimization like as Newton, Gauss iteration… algebraic strategies like as Jacobi-type or successive rotation 5

1. Introduction Joint diagonalization Orthogonality of unloading matrix Orthogonal JD (OJD): Pre-whitening Non-orthogonal JD(NOJD): No pre-whitening JD Several criteria applied to JD problems minimization of off-norm weighted least squares information theory Optimization strategies some specific optimization like as Newton, Gauss iteration… algebraic strategies like as Jacobi-type or successive rotation 5

1. Introduction Joint diagonalization Orthogonality of unloading matrix Orthogonal JD (OJD): Pre-whitening Non-orthogonal JD(NOJD): No pre-whitening JD Several criteria applied to JD problems minimization of off-norm weighted least squares information theory Optimization strategies some specific optimization like as Newton, Gauss iteration… algebraic strategies like as Jacobi-type or successive rotation 5

1. Introduction Joint diagonalization Orthogonality of unloading matrix Orthogonal JD (OJD): Pre-whitening Non-orthogonal JD(NOJD): No pre-whitening JD Several criteria applied to JD problems minimization of off-norm weighted least squares information theory Optimization strategies some specific optimization like as Newton, Gauss iteration… algebraic strategies like as Jacobi-type or successive rotation 5

2. The Successive Rotation Cost function: while k<Niter && err>Tol end for i = 1:N-1 Sweep for j = i +1:N k = k + 1 Niter: Maximal sweep number Tol: Stopping threshold Obtain , Rotation to minimize ρ 6

2. The Successive Rotation Cost function: while k<Niter && err>Tol end for i = 1:N-1 Sweep for j = i +1:N k = k + 1 Niter: Maximal sweep number Tol: Stopping threshold Obtain , Rotation to minimize ρ 6

2. The Successive Rotation Cost function: while k<Niter && err>Tol end for i = 1:N-1 Sweep for j = i +1:N k = k + 1 Niter: Maximal sweep number Tol: Stopping threshold Obtain , Rotation to minimize ρ 6

2. The Successive Rotation Cost function: while k<Niter && err>Tol end for i = 1:N-1 Sweep for j = i +1:N k = k + 1 Niter: Maximal sweep number Tol: Stopping threshold Obtain , Rotation to minimize ρ 6

2. The Successive Rotation Cost function: while k<Niter && err>Tol end for i = 1:N-1 Sweep for j = i +1:N k = k + 1 Niter: Maximal sweep number Tol: Stopping threshold Obtain , Rotation to minimize ρ 6

2. The Successive Rotation Cost function: while k<Niter && err>Tol end for i = 1:N-1 Sweep for j = i +1:N k = k + 1 Niter: Maximal sweep number Tol: Stopping threshold Obtain , Rotation to minimize ρ Problem: The time consumed in the above process is in quadratic relationship with the dimensionality of target matrices N Solution: So we consider the parallelization of these successive rotations to address this problem. 6

2. The Successive Rotation In general, the elementary rotation matrix: only impacts the ith and jth row and column of JADE: Joint Approximate Diagonalization of Eigenmatrices by J.-F. Cardoso,1993 CJDi: Complex Joint Diagonalization by A. Mesloub, 2014 7

2. The Successive Rotation In general, the elementary rotation matrix: only impacts the ith and jth row and column of JADE: Joint Approximate Diagonalization of Eigenmatrices by J.-F. Cardoso,1993 CJDi: Complex Joint Diagonalization by A. Mesloub, 2014 7

2. The Successive Rotation In general, the elementary rotation matrix: only impacts the ith and jth row and column of JADE: Joint Approximate Diagonalization of Eigenmatrices by J.-F. Cardoso,1993 CJDi: Complex Joint Diagonalization by A. Mesloub, 2014 7

2. The Successive Rotation But in LUCJD, the elementary rotation matrix: only impacts the ith row and column of LUCJD： LU decomposition for Complex Joint Diagonalization by K. Wang, LVA/ICA2012 To solve the complex non-orthogonal joint diagonalization problem via LU decomposition and successive rotation 8

2. The Successive Rotation But in LUCJD, the elementary rotation matrix: only impacts the ith row and column of LUCJD： LU decomposition for Complex Joint Diagonalization by K. Wang, LVA/ICA2012 To solve the complex non-orthogonal joint diagonalization problem via LU decomposition and successive rotation 8

2. The Successive Rotation But in LUCJD, the elementary rotation matrix: only impacts the ith row and column of LUCJD： LU decomposition for Complex Joint Diagonalization by K. Wang, LVA/ICA2012 To solve the complex non-orthogonal joint diagonalization problem via LU decomposition and successive rotation 8

2. The Successive Rotation But in LUCJD, the elementary rotation matrix: only impacts the ith row and column of LUCJD： LU decomposition for Complex Joint Diagonalization by K. Wang, LVA/ICA2012 To solve the complex non-orthogonal joint diagonalization problem via LU decomposition and successive rotation 8

2. The Successive Rotation But in LUCJD, the elementary rotation matrix: only impacts the ith row and column of LUCJD： LU decomposition for Complex Joint Diagonalization by K. Wang, LVA/ICA2012 To solve the complex non-orthogonal joint diagonalization problem via LU decomposition and successive rotation 8

2. The Successive Rotation But in LUCJD, the elementary rotation matrix: only impacts the ith row and column of LUCJD： LU decomposition for Complex Joint Diagonalization by K. Wang, LVA/ICA2012 To solve the complex non-orthogonal joint diagonalization problem via LU decomposition and successive rotation In this paper, we consider the parellelization of LUCJD 8

3. The proposed algorithm Parallelization of LUCJD The key to an efficient parallelization is the segmentation of entire index pairs into multiple subsets Then those optimal elementary rotations could be calculated at one shot Noting that the index pairs in one subset are non-conflicting We develop the following 3 parallelization schemes: Row-wise parallelization Column-wise parallelization Diagonal-wise parallelization 9

3. The proposed algorithm Parallelization of LUCJD The key to an efficient parallelization is the segmentation of entire index pairs into multiple subsets Then those optimal elementary rotations could be calculated at one shot Noting that the index pairs in one subset are non-conflicting We develop the following 3 parallelization schemes: Row-wise parallelization Column-wise parallelization Diagonal-wise parallelization 9

3. The proposed algorithm Parallelization of LUCJD The key to an efficient parallelization is the segmentation of entire index pairs into multiple subsets Then those optimal elementary rotations could be calculated at one shot Noting that the index pairs in one subset are non-conflicting We develop the following 3 parallelization schemes: Row-wise parallelization Column-wise parallelization Diagonal-wise parallelization 9

3. The proposed algorithm Rotation (a) Column-wise (b) Row-wise (c) Diagonal-wise Fig.3 Subset segmentation for parallelization of LUCJD 10

3. The proposed algorithm Rotation (a) Column-wise (b) Row-wise (c) Diagonal-wise Fig.3 Subset segmentation for parallelization of LUCJD 10

3. The proposed algorithm Rotation (a) Column-wise (b) Row-wise (c) Diagonal-wise Fig.3 Subset segmentation for parallelization of LUCJD 10

3. The proposed algorithm Rotation (a) Column-wise (b) Row-wise (c) Diagonal-wise Fig.3 Subset segmentation for parallelization of LUCJD 10

3. The proposed algorithm Rotation (a) Column-wise (b) Row-wise (c) Diagonal-wise Fig.3 Subset segmentation for parallelization of LUCJD 10

3. The proposed algorithm Rotation (a) Column-wise (b) Row-wise (c) Diagonal-wise Fig.3 Subset segmentation for parallelization of LUCJD 10

3. The proposed algorithm Column/diagonal-wise i i i i 12

3. The proposed algorithm Column/diagonal-wise i i i i 12

3. The proposed algorithm Row-wise i i i i 12

3. The proposed algorithm Row-wise i i i i The subset elements of row-wise scheme are not non-conflicting comparing with other two schemes which might result in performance loss, this will be shown later. 12

3. The proposed algorithm Table 1 summarization of row-wise of LUCJD while k<Niter && err>Tol end Niter: Maximal sweep number Tol: Stopping threshold for i = 1:N-1 Sweep end k = k + 1 Obtain , Rotation to minimize ρ 13

4. Simulation results The target matrices are generated as: Signal-to-noise ratio(SNR): Performance index(PI): to evaluate the JD quality We also consider the TPO parallelized strategy (Tournament Player’s Ordering, by A. Holobar, EUROCON2003) Simulation 1-Convergence Pattern Simulation 2-Execution Time Simulation 3-Joint Diagonalization Quality 14

4. Simulation results The target matrices are generated as: Signal-to-noise ratio(SNR): Performance index(PI): to evaluate the JD quality We also consider the TPO parallelized strategy (Tournament Player’s Ordering, by A. Holobar, EUROCON2003) Simulation 1-Convergence Pattern Simulation 2-Execution Time Simulation 3-Joint Diagonalization Quality 14

4. Simulation results The target matrices are generated as: Signal-to-noise ratio(SNR): Performance index(PI): to evaluate the JD quality We also consider the TPO parallelized strategy (Tournament Player’s Ordering, by A. Holobar, EUROCON2003) Simulation 1-Convergence Pattern Simulation 2-Execution Time Simulation 3-Joint Diagonalization Quality 14

4. Simulation results The target matrices are generated as: Signal-to-noise ratio(SNR): Performance index(PI): to evaluate the JD quality We also consider the TPO parallelized strategy (Tournament Player’s Ordering, by A. Holobar, EUROCON2003) Simulation 1-Convergence Pattern Simulation 2-Execution Time Simulation 3-Joint Diagonalization Quality 14

4. Simulation Results Simulation 1-Convergence Pattern Matrix numbers: K = 10 Matrix dimensionality: N = 10 5 independent runs Signal to Noise Ratio: SNR = 20dB Fig.4 PI versus number of iterations 15

4. Simulation Results Simulation 1-Convergence Pattern Matrix numbers: K = 10 Matrix dimensionality: N = 10 5 independent runs Signal to Noise Ratio: SNR = 20dB Fig.4 PI versus number of iterations 15

4. Simulation Results Simulation 1-Convergence Pattern Matrix numbers: K = 10 Matrix dimensionality: N = 10 5 independent runs Signal to Noise Ratio: SNR = 20dB Fig.4 PI versus number of iterations 15

4. Simulation Results Simulation 1-Convergence Pattern Matrix numbers: K = 10 Matrix dimensionality: N = 10 5 independent runs Signal to Noise Ratio: SNR = 20dB Fig.4 PI versus number of iterations 15

4. Simulation Results Simulation 1-Convergence Pattern Matrix numbers: K = 10 Matrix dimensionality: N = 10 5 independent runs Signal to Noise Ratio: SNR = 20dB Similar iteration number Little impact on the convergence Fig.4 PI versus number of iterations 15

4. Simulation Results Simulation 1-Convergence Pattern Matrix numbers: K = 10 Matrix dimensionality: N = 10 5 independent runs Noise free Fig.5 PI versus number of iterations 16

4. Simulation Results Simulation 2-Execution Time Matrix numbers: K = 20 Signal to Noise Ratio: SNR = 20dB 100 Monte Carlo runs Largely reduce execution time More significant as N increases Row>Column>TPO>Diagonal >sequential Fig.6 Average running time versus dimensionality 17

4. Simulation Results Simulation 3-Joint Diagonalization Quality Matrix numbers: K = 20 Matrix dimensionality: N = 30 100 Monte Carlo runs equal performance for low SNR Row-wise’s performance loss Parallelization schemes perform well Fig.7 Performance index versus SNR 18

5. Conclusion Considers the parallelization of JD problems - Joint diagonalization has been widely applied. - This paper addresses the parallelization of JD with successive rotation. - Introduces 3 parallelized schemes. Behavior of the parallelized schemes - Largely reduce the running time without losing the JD quality - Row-wise scheme has slight performance loss 19

5. Conclusion Considers the parallelization of JD problems - Joint diagonalization has been widely applied. - This paper addresses the parallelization of JD with successive rotation. - Introduces 3 parallelized schemes. Behavior of the parallelized schemes - Largely reduce the running time without losing the JD quality - Row-wise scheme has slight performance loss 19

5. Conclusion Considers the parallelization of JD problems - Joint diagonalization has been widely applied. - This paper addresses the parallelization of JD with successive rotation. - Introduces 3 parallelized schemes. Behavior of the parallelized schemes - Largely reduce the running time without losing the JD quality - Row-wise scheme has slight performance loss 19

谢谢！ Thank you！ 20