Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22.

Slides:



Advertisements
Similar presentations
Introduction to C Programming
Advertisements

Yafeng Yin, Lei Zhou, Hong Man 07/21/2010
Computational Physics Linear Algebra Dr. Guy Tel-Zur Sunset in Caruaru by Jaime JaimeJunior. publicdomainpictures.netVersion , 14:00.
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Elementary Data Structures: Part 2: Strings, 2D Arrays, Graphs
GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.
Timothy Blattner and Shujia Zhou May 18, This project is sponsored by Lockheed Martin We would like to thank Joseph Swartz, Sara Hritz, Michael.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Solving Linear Systems (Numerical Recipes, Chap 2)
Chapter 2, Linear Systems, Mainly LU Decomposition.
1cs542g-term Notes  Assignment 1 will be out later today (look on the web)
1cs542g-term Notes  Assignment 1 is out (questions?)
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Linear Algebra on GPUs Vasily Volkov. GPU Architecture Features SIMD architecture – Don’t be confused by scalar ISA which is only a program model We use.
CE 311 K - Introduction to Computer Methods Daene C. McKinney
Orthogonal Matrices and Spectral Representation In Section 4.3 we saw that n  n matrix A was similar to a diagonal matrix if and only if it had n linearly.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Linear Algebra Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
Graphics CSE 581 – Interactive Computer Graphics Mathematics for Computer Graphics CSE 581 – Roger Crawfis (slides developed from Korea University slides)
® Backward Error Analysis and Numerical Software Sven Hammarling NAG Ltd, Oxford
By: David McQuilling; Jesus Caban Deng Li Numerical Linear Algebra.
High Performance Computing 1 Numerical Linear Algebra An Introduction.
1 Intel Mathematics Kernel Library (MKL) Quickstart COLA Lab, Department of Mathematics, Nat’l Taiwan University 2010/05/11.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
Day 1 Eigenvalues and Eigenvectors
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
Sobolev Showcase Computational Mathematics and Imaging Lab.
Automatic Performance Tuning Jeremy Johnson Dept. of Computer Science Drexel University.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
By: David McQuilling and Jesus Caban Numerical Linear Algebra.
CMPS 1371 Introduction to Computing for Engineers MATRICES.
PDCS 2007 November 20, 2007 Accelerating the Complex Hessenberg QR Algorithm with the CSX600 Floating-Point Coprocessor Yusaku Yamamoto 1 Takafumi Miyata.
Intel Math Kernel Library (MKL) Clay P. Breshears, PhD Intel Software College NCSA Multi-core Workshop July 24, 2007.
Accelerating the Singular Value Decomposition of Rectangular Matrices with the CSX600 and the Integrable SVD September 7, 2007 PaCT-2007, Pereslavl-Zalessky.
Intel Math Kernel Library Wei-Ren Chang, Department of Mathematics, National Taiwan University 2009/03/10.
Presented by The Lapack for Clusters (LFC) Project Piotr Luszczek The MathWorks, Inc.
Lesson 3 CSPP58001.
1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.
Parallel Computing.
Eigenvalues The eigenvalue problem is to determine the nontrivial solutions of the equation Ax= x where A is an n-by-n matrix, x is a length n column.
CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Performance of BLAS-3 Based Tridiagonalization Algorithms on Modern SMP Machines Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
Parallel Computing Presented by Justin Reschke
Parallel Programming & Cluster Computing Linear Algebra Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
Intro to Scientific Libraries Intro to Scientific Libraries Blue Waters Undergraduate Petascale Education Program May 29 – June
Graphics Graphics Korea University kucg.korea.ac.kr Mathematics for Computer Graphics 고려대학교 컴퓨터 그래픽스 연구실.
SimTK 1.0 Workshop Downloads Jack Middleton March 20, 2008.
TEMPLATE DESIGN © H. Che 2, E. D’Azevedo 1, M. Sekachev 3, K. Wong 3 1 Oak Ridge National Laboratory, 2 Chinese University.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
1 Exploiting BLIS to Optimize LU with Pivoting Xianyi Zhang
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
05/23/11 Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries Christos Theodosiou User and Application Support.
Review of Linear Algebra
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 2, Linear Systems, Mainly LU Decomposition
Matrices and vector spaces
Linear Systems, Mainly LU Decomposition
ACCELERATING SPARSE CHOLESKY FACTORIZATION ON GPUs
Introduction to GSL CS 3414 From GNU Scientific Library Reference Manual at
AN INTRODUCTION ON PARALLEL PROCESSING
Matrix Eigensystem Tutorial For Parallel Computation
Chapter 2, Linear Systems, Mainly LU Decomposition
Presentation transcript:

Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22

2011/07/22Introduction to Parallel Computing2 Parallel Computing

What is parallel computing? 2011/07/22Introduction to Parallel Computing3  Traditionally, software has been written for serial computation:

What is parallel computing? 2011/07/22Introduction to Parallel Computing4  In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem:

Resource 2011/07/22Introduction to Parallel Computing5  The compute resource  A single computer with multiple processors;  An arbitrary number of computers connected by a network;  A combination of both. Core 1Core 2Core 3 Core 4 thread 1thread 3thread 4thread 2

Resource 2011/07/22Introduction to Parallel Computing6  The compute resource  A single computer with multiple processors;  An arbitrary number of computers connected by a network;  A combination of both. core 4 core 3core 2core 1 several threads

Resource 2011/07/22Introduction to Parallel Computing7  The compute resource  A single computer with multiple processors;  An arbitrary number of computers connected by a network;  A combination of both.

Resource 2011/07/22Introduction to Parallel Computing8  The compute resource  A single computer with multiple processors;  An arbitrary number of computers connected by a network;  A combination of both.

Why use parallel computing? 2011/07/22Introduction to Parallel Computing9  The primary reasons for using parallel computing:  Save time – wall clock time  Solve larger problems  Provide concurrency (do many things at the same time)  Other reasons might include:  Taking advantage of non-local resources  Cost savings  Overcoming memory constraints

Amdahl’s Law 2011/07/22Introduction to Parallel Computing10  Speedup of a parallel program is limited by amount of serial works.

Amdahl’s Law 2011/07/22Introduction to Parallel Computing11  Speedup of a parallel program is limited by amount of serial works.

Flynn’s Taxonomy 2011/07/22Introduction to Parallel Computing12  Classification for parallel computers and programs Single InstructionMultiple Instruction Single Data SISD (single core CPU) MISD (very rare) Multiple Data SIMD (GPU/vector processor) MIMD (multiple core CPU)

Flynn’s Taxonomy 2011/07/22Introduction to Parallel Computing13  Classification for parallel computers and programs SISDSIMD

Flynn’s Taxonomy 2011/07/22Introduction to Parallel Computing14  Classification for parallel computers and programs MISDMIMD

2011/07/22Introduction to Parallel Computing15 Intel Math Kernel Library

Overview 2011/07/22Intel MKL Quickstart16  The Intel® Math Kernel Library (Intel® MKL) provides Fortran routines and functions that perform a wide variety of operations on vectors and matrices including sparse matrices. The library also includes fast Fourier transform (FFT) functions, as well as vector mathematical and vector statistical functions with Fortran and C interfaces.  The versions of Intel MKL intended for Windows* and Linux* operating systems also include ScaLAPACK software and Cluster FFT software for solving respective computational problems on distributed-memory parallel computers.

Intel MKL: Intel Math Kernel Library 2011/07/22Intel MKL Quickstart17  Functionality  BLAS and Sparse BLAS Routines  LAPACK Routines: Linear Equations  LAPACK Routines: Eigenvalue Problems  ScaLAPACK  Sparse Solver Routines  Fast Fourier Transforms  Cluster Fast Fourier Transforms

System Requirements (Hardware) 2011/07/22Intel MKL Quickstart18  Hardware:  Intel® Core™ processor family  Intel® Xeon® processor family  Intel® Pentium® 4 processor family  Intel® Pentium® lll processor  Intel® Pentium® processor (300 MHz or faster)  Intel® Celeron® processor  A MD Athlon* and Opteron* processors  How do you know that information about the CPUs ?  $ cat /proc/cpuinfo

System Requirements (Software) 2011/07/22Intel MKL Quickstart19  Following is the list of supposed operating system :  Red Hat* Enterprise Linux* 3, 4, 5  Red Hat* Fedora* 9  Debian * GNU/Linux 4.0  Ubuntu* 8.04  How do you know that information about the operating system?  $ cat /etc/*release  Following is the list of supposed C/C++ and Fortran compilers :  Intel® Fortran Compiler 10.1 for Linux*  Intel® C++ Compiler 10.1 for Linux*  GNU Compiler Collection ( gcc, g77, gfortran )

Installing Intel MKL on a Linux* System 2011/07/22Intel MKL Quickstart20  Tools & Downloads  (google “intel software” )

Installing Intel MKL on a Linux* System 2011/07/22Intel MKL Quickstart21

Installing Intel MKL on a Linux* System 2011/07/22Intel MKL Quickstart22

Installing Intel MKL on a Linux* System 2011/07/22Intel MKL Quickstart23

Installing Intel MKL on a Linux* System 2011/07/22Intel MKL Quickstart24

Installing Intel MKL on a Linux* System 2011/07/22Intel MKL Quickstart25

Installing Intel MKL on a Linux* System 2011/07/22Intel MKL Quickstart26

Installing Intel MKL on a Linux* System 2011/07/22Intel MKL Quickstart27  wget “URL”  ll  $ tar –zxvf l_mkl_p_10.2.x.yyy.tar.gz

Installing Intel MKL on a Linux* System 2011/07/22Intel MKL Quickstart28  cd l_mkl_p_10.2.x.yyy ./install.sh

Installing Intel MKL on a Linux* System 2011/07/22Intel MKL Quickstart29

Installing Intel MKL on a Linux* System 2011/07/22Intel MKL Quickstart30

31 Intel MKL Quickstart Some Examples

Example 2011/07/22Intel MKL Quickstart32  Brief examples to  BLAS Level 1 Routines (vector-vector operations)  BLAS Level 2 Routines (matrix-vector operations)  BLAS Level 3 Routines (matrix-matrix operations)  Compute the LU factorization of a matrix (LAPACK)  Solve linear system (LAPACK)  Solve eigen system (LAPACK)  Fast Fourier Transforms

Example 2011/07/22Intel MKL Quickstart33  Brief examples to  BLAS Level 1 Routines (vector-vector operations)  BLAS Level 2 Routines (matrix-vector operations)  BLAS Level 3 Routines (matrix-matrix operations)  Compute the LU factorization of a matrix (LAPACK)  Solve linear system (LAPACK)  Solve eigen system (LAPACK)  Fast Fourier Transforms

Ex1. The complex dot product ( ) 2011/07/22Intel MKL Quickstart34 #include #include "mkl_blas.h” #define N 5 typedef struct{ double re; double im; }mkl_complex; int main() { int n, incx = 1, incy = 1, i; mkl_complex x[N], y[N], res; void zdotc(); n = N; for( i = 0; i < n; i++ ){ x[i].re = (double)i; x[i].im = (double)i * 2.0; y[i].re = (double)(n - i); y[i].im = (double)i * 2.0; } zdotc( &res, &n, x, &incx, y, &incy ); printf( “The complex dot product is: ( %6.2f, %6.2f )\n", res.re, res.im ); return 0; }

?dotc 2011/07/22Intel MKL Quickstart35  Computes a dot product of a conjugate vector with another vector.  Description : The routine is declared in  Fortran77 : mkl_blas.fi  Fortran95 : blas.f90  C : mkl_blas.h  Input Parameters ( zdotc(&res,&n,x,&incx,y,&incy) )  n : The length of two vectors.  incx : Specifies the increment for the elements of x  incy : Specifies the increment for the elements of y  output Parameters ( zdotc(&res,&n,x,&inca,y,&incb) )  res : The final result

Makefile (Sequential) 2011/07/22Intel MKL Quickstart36 Test : blas_c CC = icc MKL_HOME = /home/opt/intel/mkl/ MKL_INCLUDE = $(MKL_HOME)/include MKL_PATH = $(MKL_HOME)/lib/em64t EXE = blas_c.exe blas_c: $(CC) -o $(EXE) blas_c.c -I$(MKL_INCLUDE) -L$(MKL_PATH) -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread

Makefile (Parallel) 2011/07/22Intel MKL Quickstart37 Test = blas_c CC = icc MKL_HOME = /home/opt/intel/mkl/ MKL_INCLUDE = $(MKL_HOME)/include MKL_PATH = $(MKL_HOME)/lib/em64t EXE = blas_c.exe blas_c: $(CC) -o $(EXE) blas_c.c -I$(MKL_INCLUDE) -L$(MKL_PATH) -Wl,--start-group -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thred -Wl,--end-group –liomp5 -lpthread

?dotc 2011/07/22Intel MKL Quickstart38  Computes a dot product of a conjugate vector with another vector.  Description : The routine is declared in  Fortran77 : mkl_blas.fi  Fortran95 : blas.f90  C : mkl_blas.h  Input Parameters ( zdotc(&res,&n,x,&inca,y,&incb) )  n : The length of two vectors.  incx : Specifies the increment for the elements of x  incy : Specifies the increment for the elements of y  output Parameters ( zdotc(&res,&n,x,&inca,y,&incb) )  res : The final result

BLAS Routines 2011/07/22Intel MKL Quickstart39  Routines Naming Conventions  BLASB routine names have the following structure: ()  The filed indicates the data type: s real, single precision c complex, single precision d real, double precision z complex, double precision  The filed indicates the data type: c conjugated vector u unconjugated vector g Givens rotation.

BLAS Routines 2011/07/22Intel MKL Quickstart40  Routines Naming Conventions  BLASB routine names have the following structure: ()  In BLAS level 2 and 3, filed indicates the matrix type: ge general matrix gb general band matrix sy symmetric matrix sb symmetric band matrix he Hermitian matrix hb Hermitian band matrix tr triangular matrix tb triangular band matrix

BLAS Level 1 Routines 2011/07/22Intel MKL Quickstart41 RoutineData TypeDescription ?asums, d, sc, dz Sum of vector magnitudes ?axpys, d, c, z Scalar-vector product ?copys, d, c, z Copy vector ?dots, d Doc product ?dotcc, z Doc conjugated ?nrm2s, d, sc, dz Vector 2-norm (Euclidean norm) ?rotgs, d, cs, zd Givens rotation of points ?rots, d, cs, zd Plane rotation of points ?scals, d, c, z, cs, zd Vector-scalar product ?swaps, d, c, z Vector-vector swap i?maxs, d, c, z Index of the maximum absolute value element of a vector

Example 2011/07/22Intel MKL Quickstart42  Brief examples to  BLAS Level 1 Routines (vector-vector operations)  BLAS Level 2 Routines (matrix-vector operations)  BLAS Level 3 Routines (matrix-matrix operations)  Compute the LU factorization of a matrix (LAPACK)  Solve linear system (LAPACK)  Solve eigen system (LAPACK)  Fast Fourier Transforms

Ex2-1. Matrix-vector product 2011/07/22Intel MKL Quickstart43 #include "mkl_blas.h” int main() { int m, n, incx, incy, lda, idxi, idxj; double alpha, beta, *x, *y, *A ; char trans; m = 3; n = 3; incx = 1; incy = 1; lda = m; alpha = 1.0; beta = 1.0; trans = 'n’; x = (double*)malloc(sizeof(double)*n); y = (double*)malloc(sizeof(double)*n); A = (double*)malloc(sizeof(double)*m*n);

Ex2-2. Matrix-vector product 2011/07/22Intel MKL Quickstart44 for( idxi = 0; idxi < n; idxi++ ){ *(x+idxi) = 1.0; *(y+idxi) = 1.0; } for( idxi = 0; idxi < m; idxi++ ) for( idxj = 0; idxj < n; idxj++) *(A+idxi*m+idxj) = (double)(idxi+1) + idxj; dgemv(&trans, &m, &n, &alpha, A, &lda, x, &incx, &beta, y, &incy); return 0; }

?gemv 2011/07/22Intel MKL Quickstart45  Computes a matrix-vector product using a general matrix.  Description : The routine is declared in  Fortran77 : mkl_blas.fi  Fortran95 : blas.f90  C : mkl_blas.h  Input Parameters dgemv(&trans,&m,&n,&alpha,A,&lda,x,&incx,&beta,y,&incy)  trans : if trans = ‘N’, ‘n’, then if trans = ‘T’, ‘t’, then if trans = ‘C’, ‘c’, then  m : The number of rows of the matrix A.

?gemv 2011/07/22Intel MKL Quickstart46  Input Parameters  n : The number of columns of the matrix  lda : The first dimension of matrix, lda = max(1,m)  incx : Specifies the increment for the elements of x  incy : Specifies the increment for the elements of y  output Parameters  y : Updated vector y.

Ex2. Result Vectors and PlanesIntroduction to MATLAB47

BLAS Level 2 Routines 2011/07/22Intel MKL Quickstart48 RoutineData TypeDescription ?gemvs, d, c, z Matrix-vector product using a general matrix ?gbmvs, d, c, z Matrix-vector product using a general band matrix ?symvs, d Matrix-vector product using a symmetric matrix ?sbmvs, d Matrix-vector product using a symmetric band matrix ?hemvc, z Matrix-vector product using a Hermitian matrix ?hbmvc, z Matrix-vector product using a Hermitian band matrix ?trmvc, z Matrix-vector product using a triangular matrix ?tbmvs, d, sc, dz Matrix-vector product using a triangular band matrix

Example 2011/07/22Intel MKL Quickstart49  Brief examples to  BLAS Level 1 Routines (vector-vector operations)  BLAS Level 2 Routines (matrix-vector operations)  BLAS Level 3 Routines (matrix-matrix operations)  Compute the LU factorization of a matrix (LAPACK)  Solve linear system (LAPACK)  Solve eigen system (LAPACK)  Fast Fourier Transforms

Ex3-1. Matrix-Matrix product 2011/07/22Intel MKL Quickstart50 #include "mkl_blas.h” int main() { int m, n, k, lda, ldb, ldc, idxi, idxj; double alpha, beta, *A, *B, *C ; char transa, transb; m = 3; n = 3; k = 3; lda = m; ldb = k; ldc = m; alpha = 1.0; beta = 1.0; transa = 'n’; transb = 'n’;

Ex3-2. Matrix-vector product 2011/07/22Intel MKL Quickstart51 A = (double*)malloc(sizeof(double)*m*n); B = (double*)malloc(sizeof(double)*m*n); C = (double*)malloc(sizeof(double)*m*n); for( idxi = 0; idxi < m; idxi++ ) for( idxj = 0; idxj < n; idxj++) { *(A+idxi*m+idxj) = (double)(idxi+1) + idxj; *(B+idxi*m+idxj) = (double)(idxi+1) + idxj; *(C+idxi*m+idxj) = (double)(idxi+1) + idxj; } dgemm(&transa, &transb, &m, &n, &k, &alpha, A, &lda, B, &ldb, &beta, C, &ldc); return 0; }

?gemm 2011/07/22Intel MKL Quickstart52  Input Parameters  k : The number of columns of the matrix and the number of rows of the matrix.  lda : When transa=‘N’ or ‘n’, then lda = max(1,m), otherwise lda=max(1,k).  ldb : When transa=‘N’ or ‘n’, then ldb = max(1,k), otherwise lda=max(1,n).  ldc : The first dimension of matrix, ldc = max(1,m)  output Parameters  C : Overwritten by m- by -n matrix.

Ex3. Result Vectors and PlanesIntroduction to MATLAB53

BLAS Level 3 Routines 2011/07/22Intel MKL Quickstart54 RoutineData TypeDescription ?gemms, d, c, z Matrix-matrix product of general matrices ?hemvc, z Matrix-matrix product of Hermitian matrices ?symms, d, c, z Matrix-matrix product of symmetric matrices ?trmms, d, sc, dz Matrix-matrix product of triangular matrices

Example 2011/07/22Intel MKL Quickstart55  Brief examples to  BLAS Level 1 Routines (vector-vector operations)  BLAS Level 2 Routines (matrix-vector operations)  BLAS Level 3 Routines (matrix-matrix operations)  Compute the LU factorization of a matrix (LAPACK)  Solve linear system (LAPACK)  Solve eigen system (LAPACK)  Fast Fourier Transforms

Ex4. LU Factorization 2011/07/22Intel MKL Quickstart56 #include "mkl_lapack.h” int main() { int m, n, lda, info, idxi, idxj, *ipiv; double *A; m = 3; n = 3; lda = m; ipiv = (int*)malloc(sizeof(int)*m); A = (double*)malloc(sizeof(double)*m*n); *(A+0)=1; *(A+1)=2; *(A+2)=6; *(A+3)=-2; *(A+4)=3; *(A+5)=5; *(A+6)=4; *(A+7)=8; *(A+8)=1; dgetrf(&m, &n, A, &lda,ipiv, &info); return 0; }

?getrf 2011/07/22Intel MKL Quickstart57  Description : The routine is declared in  Fortran77 : mkl_lapack.fi  Fortran95 : lapack.f90  C : mkl_lapack.h  Input Parameters  m : The number of columns of the matrix.  n : The number of rows of the matrix.  lda : The first dimension of matrix.  A : Array, REAL for sgetrf DOUBLE PRECISION for dgetrf COMPLEX for cgetrf DOUBLE COMPLEX for zgetrf.

?getrf 2011/07/22Intel MKL Quickstart58  output Parameters  A : Overwritten by L and U. The unit diagonal elements of L are not stored.  ipiv : An integer array, dimension at least max(1,min(m,n)). The pivot indices ; row i is interchanged with row ipiv(i)  info : Integer. If info=0, the execution is successful. If info=-i, the i -th parameter had an illegal value. If info=i, The factorization has been completed, but U is singular.

Ex4-1. Result Vectors and PlanesIntroduction to MATLAB59

Ex4-2. Result Vectors and PlanesIntroduction to MATLAB60

LAPACK Computational Routines 2011/07/22Intel MKL Quickstart61 general matrix sysmmetric indefinite sysmmetric positive- definite triangular matrix Factorize matrix ?getrf?sytrf?potrf Solve linear system with a factored matrix ?getrs?sytrs?potrs?trtrs Condition number ?gecon?sycon?pocon?trcon Compute the inverse matrix using the factorization ?getri?sytri?potri?trtri

LAPACK Routines: Linear Equations 2011/07/22Intel MKL Quickstart62 To solve a particular problem, you can call two or more computational routines or call a corresponding driver routines that combines several tasks in one call. For example, to solve a system of linear equation with a general matrix, call ?getrf (LU factorization) and then ?getrs (computing the solution). Alternatively, use the driver routine ?gesv that performs all these tasks in one call.

Example 2011/07/22Intel MKL Quickstart63  Brief examples to  BLAS Level 1 Routines (vector-vector operations)  BLAS Level 2 Routines (matrix-vector operations)  BLAS Level 3 Routines (matrix-matrix operations)  Compute the LU factorization of a matrix (LAPACK)  Solve linear system (LAPACK)  Solve eigen system (LAPACK)  Fast Fourier Transforms

Ex5-1. Solve the Linear Eqation 2011/07/22Intel MKL Quickstart64 #include #include "mkl_lapack.h” int main() { int n, nrhs, lda, ldb, info, idxi, idxj, *ipiv; double *A, *b; n = 3; nrhs = 1; lda = n; ldb = n; ipiv = (int*)malloc(sizeof(int)*n); A = (double*)malloc(sizeof(double)*n*n); b = (double*)malloc(sizeof(double)*n); for( idxi = 0; idxi < n; idxi++ ) for( idxj = 0; idxj < n; idxj++) *(A+idxi*n+idxj) = (double)(idxi+1) + idxj;

Ex5. Solve the Linear Eqation 2011/07/22Intel MKL Quickstart65 *(b+0) = 6; *(b+1) = 9; *(b+2) = 12; dgesv(&n, &nrhs, A, &lda,ipiv, b, &ldb, &info); return 0; }

?gesv 2011/07/22Intel MKL Quickstart66  Input Parameters  nrhs : The number of columns of the matrix.  Output Parameters  A : Overwritten by the factor L and U from the factorization of.  b : Overwritten by the solution matrix.

Ex5. Result Vectors and PlanesIntroduction to MATLAB67

Example 2011/07/22Intel MKL Quickstart68  Brief examples to  BLAS Level 1 Routines (vector-vector operations)  BLAS Level 2 Routines (matrix-vector operations)  BLAS Level 3 Routines (matrix-matrix operations)  Compute the LU factorization of a matrix (LAPACK)  Solve linear system (LAPACK)  Solve eigen system (LAPACK)  Fast Fourier Transforms

Ex6-1. Solve the Eigen Eqation 2011/07/22Intel MKL Quickstart69 #include "mkl_lapack.h” int main() { int n, lda, lwork, ldvl, ldvr, info, idxi, idxj; double *wr, *wi, *A, *work, *vl, *vr; char jobvl, jobvr; n = 3; lda = n; ldvl = 1; ldvr = n; lwork = 4*n; // not 3*n jobvl = ‘N’; jobvr = ‘V’; A = (double*)malloc(sizeof(double)*n*n); wr = (double*)malloc(sizeof(double)*n); wi = (double*)malloc(sizeof(double)*n); vl = (double*)malloc(sizeof(double)*ldvl*n); vr = (double*)malloc(sizeof(double)*ldvr*n); work = (double*)malloc(sizeof(double)*lwork);

Ex6-2. Solve the Eigen Eqation 2011/07/22Intel MKL Quickstart70 *(A+0) = 2; *(A+1) = -1; *(A+2) = 0; *(A+3) = -1; *(A+4) = 2; *(A+5) = -1; *(A+6) = 0; *(A+7) = -1; *(A+8) = 2; dgeev(&jobvl, &jobvr, &n, A, &lda, &wr, &wi, vl, &ldvl, vr, &ldvr, work, &lwork, &info); return 0; }

?geev 2011/07/22Intel MKL Quickstart71  Input Parameters  jobvl : If jobvl=‘N’, the left eigenvalues of A are not computed. If jobvl=‘V’, the left eigenvalues of A are computed.  jobvr : If jobvr=‘N’, the right eigenvalues of A are not computed. If jobvr=‘V’, the right eigenvalues of A are computed.  work : A workspace array, its dimension max(1, lwork).  lwork : The dimension of the array work. lwork ≥ max(1,3n), lwork < max(1,4n) (for real).  ldvl, ldvr : The leading dimension of the output array vl and vr, respectively.

?geev 2011/07/22Intel MKL Quickstart72  Output Parameters  wr, wi : Contain the real and imaginary parts, respectively, of the computed eigenvalue.  vl, vr : If jobvl = ‘V’, the left eigenvectors u(j) are stored one after another in the columns of vl, in the same order as their eigenvalues. If jobvl = ‘N’, vl is not referenced. If the j -th eigenvalue is real, then u(j) = vl(:,j), the j -th column of vl.  info : info=0, the execution is successful. info=-i, the i -th parameter had an illegal value. info= i, then the QR algorithm failed to compute all the eigenvalues, and no eigenvector have been computed.

Ex6. Result Vectors and PlanesIntroduction to MATLAB73

LAPACK Computational Routines 2011/07/22Intel MKL Quickstart74  Orthogonal Factorizations ( QR, QZ )  Singular Value Decomposition  Symmetric Eigenvalue Problems  Generalized Symmetric-Definite Eigenvalue Problems  Nonsymmetric Eigenvalue Problems  Generalized Nonsymmetric Eigenvalue Problems  Generalized Singular Value Decomposition

LAPACK Driver Routines 2011/07/22Intel MKL Quickstart75  Linear Least Squares (LLS) Problems  Generalized LLS Problems  Symmetric Eigenproblems  Nonsymmetric Eigenproblems  Singular Value Decomposition  Generalized Symmetric Definite Eigenproblems  Generalized Nonsymmetric Eigenproblems

Example 2011/07/22Intel MKL Quickstart76  Brief examples to  BLAS Level 1 Routines (vector-vector operations)  BLAS Level 2 Routines (matrix-vector operations)  BLAS Level 3 Routines (matrix-matrix operations)  Compute the LU factorization of a matrix (LAPACK)  Solve linear system (LAPACK)  Solve eigen system (LAPACK)  Fast Fourier Transforms

Five Stage Usage Model for Computing FFT 2011/07/22Intel MKL Quickstart77  Allocate a fresh descriptor for the problem with a call to the DftiCreateDescriptor function. (precision, rank, sizes, scaling factor, …)  Optionally adjust the descriptor configuration with a call to the DftiSetValue function.  Commit the descriptor with a call to the DftiCommitDescriptor function.  Compute the transform with a call to the DftiComputeForward / DftiComputeBackward function.  Deallocate the descriptor with a call to the DftiFreeDescriptor function.

Ex7-1. Three-Dimensional Complex FFT 2011/07/22Intel MKL Quickstart78 #include "mkl_dfti.h” #define m 1000 #define n 1000 #define k 1000 typedef struct { double re; double im; } mkl_complex; int main() { int idxi, idxj, idxk; double backward_scale; MKL_LONG status, length[3]; mkl_complex *vec_src, *vec_tmp, *vec_dst; DFTI_DESCRIPTOR_HANDLE handle = 0;

Ex7-2. Three-Dimensional Complex FFT 2011/07/22Intel MKL Quickstart79 x_src = (mkl_complex*)malloc(sizeof(mkl_complex)*m*n*k); x_tmp = (mkl_complex*)malloc(sizeof(mkl_complex)*m*n*k); x_dst = (mkl_complex*)malloc(sizeof(mkl_complex)*m*n*k); length[0] = m; length[1] = n; length[2] = k; memset(x_src, 0, sizeof(sizeof(mkl_complex)*m*n*k)); memset(x_tmp, 0, sizeof(sizeof(mkl_complex)*m*n*k)); memset(x_dst, 0, sizeof(sizeof(mkl_complex)*m*n*k)); for(idxk=0; idxk<k; idxk++) for(idxj=0; idxj<n; idxj++) for(idxi=0; idxi<m; idxi++) { (x_src+idxk*k*n+idxj*n+idxi)->re=1.0; (x_src+idxk*k*n+idxj*n+idxi)->im=0.0; }

Ex7-3. Three-Dimensional Complex FFT 2011/07/22Intel MKL Quickstart80 status = DftiCreateDescriptor( &handle, DFTI_DOUBLE, DFTI_COMPLEX, 3, length ); if(status && !DftiErrorClass(status, DFTI_NO_ERROR)) { printf("Error : %s\n", DftiErrorMessage(status)); printf("TEST FAILED : DftiCreatDescriptor(&hand,...)\n"); } status = DftiSetValue( handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE ); status = DftiCommitDescriptor( handle ); status = DftiComputeForward( handle, vec_src, vec_tmp ); backward_scale = 1.0/((double)m*n*k); status = DftiSetValue( handle, DFTI_BACKWARD_SCALE, backward_scale ); status = DftiCommitDescriptor( handle ); status = DftiComputeBackward( handle, vec_tmp, vec_dst); status = DftiFreeDescriptor( &handle ); return 0; }

FFT Functions 2011/07/22Intel MKL Quickstart81 Function NameOperation DftiCreateDescriptor Allocates memory for the descriptor data structure and preliminarily initializes it. DftiCommitDescriptor Performs all initialization for the actual FFT computation. DftiCopyDescriptor Copies an existing descriptor. DftiFreeDescriptor Frees memory allocated for a descriptor. DftiComputeForward Computes the forward FFT. DftiComputeBackward Computes the backward FFT. DftiSetValue Sets one particular configuration parameter with the specified configuration value. DftiGetValue Gets the value of one particular configuration parameter.

82 Intel MKL Quickstart Reference  Web site form LLNL tutorials (  Intel® Math Kernel Library Reference Manual (mklman.pdf)  Intel® Math Kernel Library for the Linux OS User ’ s Guide (userguide.pdf) Reference

Vectors and PlanesIntroduction to MATLAB83