Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 2 Performance Libraries: Intel® Math Kernel Library (MKL) Agenda Introduction Purpose of Library Intel® Math Kernel Library (Intel® MKL) Contents Performance Features Resource Limited Optimization Threading Using the Library The Library Sections BLAS LAPACK* DFTs VML VSL
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 3 Performance Libraries: Intel® Math Kernel Library (MKL) Math Kernel Library Purpose Addresses: Solvers (BLAS, LAPACK) Eigenvector/eigenvalue solvers (BLAS, LAPACK) Some quantum chemistry needs (dgemm) PDEs, signal processing, seismic, solid-state physics (FFTs) General scientific, financial [vector transcendental functions (VML) and vector random number generators (VSL)]
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 4 Performance Libraries: Intel® Math Kernel Library (MKL) Math Kernel Library Purpose – Don’ts But don’t use Intel® Math Kernel (Intel® MKL) on … Don’t use Intel® MKL on “small” counts. Don’t call vector math functions on small n. X’ Y’ Z’ W’ XYZWXYZW = 4x4 Transformation matrix Geometric Transformation § But you could use Intel ® Performance Primitives.
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 5 Performance Libraries: Intel® Math Kernel Library (MKL) Math Kernel Library Contents BLAS (Basic Linear Algebra Subroutines) Level 1 BLAS – vector-vector operations 15 function types 48 functions Level 2 BLAS – matrix-vector operations 26 function types 66 functions Level 3 BLAS – matrix-matrix operations 9 function types 30 functions Extended BLAS – level 1 BLAS for sparse vectors 8 function types 24 functions
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 6 Performance Libraries: Intel® Math Kernel Library (MKL) Math Kernel Library Contents LAPACK (linear algebra package) Solvers and eigensolvers. Many hundreds of routines total! There are more than 1000 total user callable and support routines DFTs (Discrete Fourier transforms) Mixed radix, multi-dimensional transforms Multithreaded VML (Vector Math Library) Set of vectorized transcendental functions Most of libm functions, but faster VSL (Vector Statistical Library) Set of vectorized random number generators
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 7 Performance Libraries: Intel® Math Kernel Library (MKL) Math Kernel Library Contents BLAS and LAPACK* are both Fortran. Legacy of high performance computation VSL and VML have Fortran and C interfaces. DFTs have Fortran 95 and C interfaces. cblas interface. It is more convenient for a C/C++ programmer to call BLAS.
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 8 Performance Libraries: Intel® Math Kernel Library (MKL) Math Kernel Library (Intel ® MKL) Environment Support 32-bit and 64-bit Intel ® processors Large set of examples and tests Extensive documentation Windows*Linux* CompilersIntel, CVF, MicrosoftIntel, Gnu Libraries.dll,.lib.a,.so
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 9 Performance Libraries: Intel® Math Kernel Library (MKL) Resource Limited Optimization The goal of all optimization is maximum speed. Resource limited optimization – exhaust one or more resource of system: CPU: Register use, FP units. Cache: Keep data in cache as long as possible; deal with cache interleaving. TLBs: Maximally use data on each page. Memory bandwidth: Minimally access memory. Computer: Use all the processors available using threading. System: Use all the nodes available (cluster software).
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 10 Performance Libraries: Intel® Math Kernel Library (MKL) Threading Most of Intel® Math Kernel Library (Intel® MKL) could be threaded but: Limited resource is memory bandwidth. Threading level 1 and level 2 BLAS are mostly ineffective ( O(n) ) There are numerous opportunities for threading: Level 3 BLAS ( O(n3) ) LAPACK* ( O(n3) ) FFTs ( O(n log(n) ) VML, VSL ? depends on processor and function All threading is via OpenMP*. All Intel MKL is designed and compiled for thread safety.
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 11 Performance Libraries: Intel® Math Kernel Library (MKL) Linking with Intel® Math Kernel Library (Intel® MKL) Scenario 1: ifort, BLAS, IA-32 processor: ifort myprog.f mkl_c.lib Scenario 2: CVF, LAPACK, IA-32 processor: f77 myprog.f mkl_s.lib Scenario 3: Statically link a C program with DLL linked at runtime: link myprog.obj mkl_c_dll.lib Note: Optimal binary code will execute at run time based on processor.
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 12 Performance Libraries: Intel® Math Kernel Library (MKL) Matrix Multiplication Roll Your Own/Dot Product for( i = 0; i < n; i++ ){ for( j = 0; j < m; j++ ){ for( k = 0; k < kk; k++ ) c[i][j] += a[i][k] * b[k][j]; }} for( i = 0; i < n; i++ ){ for( j = 0; j < m; j++ ) c[i][j] = cblas_ddot( n, &a[i], incx,&b[0][j], incy); } Roll Your Own ddot
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 13 Performance Libraries: Intel® Math Kernel Library (MKL) Matrix Multiplication DGEMV/DGEMM for( i = 0; i < n; i++ ) cblas_dgemv( CBLAS_RowMajor, CBLAS_NoTrans, m, n, alpha, a, lda, &b[0][i], ldb, beta, &c[0][i], ldc ); dgemv Cblas_dgemm( CblasColMajor, CblasNoTrans, CblasNoTrans, m, n, kk, alpha, b, ldb, a, lda, beta, c, ldc ); dgemm
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 14 Performance Libraries: Intel® Math Kernel Library (MKL) Activity 1: DGEMM Compare the performance of matrix multiply as implemented by C source code, DDOT, DGEMG and DGEMM. Exercise control of the threading capabilities in MKL/BLAS.
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 15 Performance Libraries: Intel® Math Kernel Library (MKL) Intel® Math Kernel Library Optimizations in LAPACK* Most important LAPACK optimizations: Threading – effectively uses multiple CPUs Recursive factorization Reduces scalar time (Amdahl’s law: t = tscalar + tparallel/p) Extends blocking further into the code No runtime library support required
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 16 Performance Libraries: Intel® Math Kernel Library (MKL) Discrete Fourier Transforms One dimensional, two-dimensional, three-dimensional… Multithreaded Mixed radix User-specified scaling, transform sign Transforms on imbedded matrices Multiple one-dimensional transforms on single call Strides C and F90 interfaces
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 17 Performance Libraries: Intel® Math Kernel Library (MKL) Using the Intel® Math Kernel Library DFTs Basically a 3-step Process Create a descriptor. Status = DftiCreateDescriptor(MDH, …) Commit the descriptor (instantiates it). Status = DftiCommitDescriptor(MDH) Perform the transform. Status = DftiComputeForward(MDH, X) Optionally free the descriptor. MDH: MyDescriptorHandle
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 18 Performance Libraries: Intel® Math Kernel Library (MKL) Vector Math Library (VML) Features/Issues Vector Math Library: vectorized transcendental functions – like libm but better (faster) Interface: Have both Fortran and C interfaces Multiple accuracies High accuracy ( < 1 ulp ) Lower accuracy, faster ( < 4 ulps ) Special value handling √(-a), sin(0), and so on Error handling – can not duplicate libm here
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 19 Performance Libraries: Intel® Math Kernel Library (MKL) VML: Why Does It Matter? It is important for financial codes (Monte Carlo simulations). Exponentials, logarithms Other scientific codes depend on transcendental functions. Error functions can be big time sinks in some codes. And so on
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 20 Performance Libraries: Intel® Math Kernel Library (MKL) Vector Statistical Library (VSL) Set of random number generators (RNGs) Numerous non-uniform distributions VML used extensively for transformations Parallel computation support – some functions User can supply own BRNG or transformations Five basic RNGs (BRNGs) – bits, integer, FP MCG31, R250, MRG32, MCG59, WH
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 21 Performance Libraries: Intel® Math Kernel Library (MKL) Non-Uniform RNGs Gaussian (two methods) Exponential Laplace Weibull Cauchy Rayleigh Lognormal Gumbel
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 22 Performance Libraries: Intel® Math Kernel Library (MKL) Using VSL Basically a 3-step Process Create a stream pointer. VSLStreamStatePtr stream; Create a stream. vslNewStream(&stream, VSL_BRNG_MC_G31, seed ); Generate a set of RNGs. vsRngUniform( 0, &stream, size, out, start, end ); Delete a stream (optional). vslDeleteStream(&stream);
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 23 Performance Libraries: Intel® Math Kernel Library (MKL) Activity: Calculating Pi using a Monte Carlo method Compare the performance of C source code (RAND function) and VSL. Exercise control of the threading capabilities in MKL/VSL.
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 24 Performance Libraries: Intel® Math Kernel Library (MKL) Performance Libraries: Intel® MKL What’s Been Covered Intel® Math Kernel Library is a broad scientific/engineering math library. It is optimized for Intel® processors. It is threaded for effective use on SMP machines.
Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 25 Performance Libraries: Intel® Math Kernel Library (MKL)