1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.

2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss MKL and LAPACK Describe VML, its features and use

IntroductionThe Library Sections Performance Features Using the Library

 MKL Addresses:  Solvers (BLAS, LAPACK  Eigenvector/eigenvalue solvers (BLAS, LAPACK)  Some quantum chemistry needs (dgemm)  PDEs, signal processing, seismic, solid-state physics (FFTs)  Geneal scientific, financial [vector transcendental functions (VML) and vector random number generators (VSL)

Software Construction Geometric Transformation Don’t use Intel® Math Kernel (Intel® MKL) on … Don’t use Intel® MKL on “small” counts. Don’t call vector math functions on small n. § But you could use Intel ® Performance Primitives

6  BLAS (Basic Linear Algebra Subroutines  Level 1 BLAS – vector-vector operations  15 function types  48 functions  Level 2 BLAS – matrix-vector operations  26 function types  66 functions  Level 3 BLAS – matrix-matrix operations  9 function types  30 functions  Extended BLAS – level 1 BLAS for sparse vectors  8 function types  24 functions

7  LAPACK (linear algebra package  Solvers and eigensolvers. Many hundreds of routines total  There are more than 1000 total user callable and support routines  Discrete Fourier Transformations (DFT)  Mixed radix, multi-dimensional transforms  Multi threaded  VML (Vector Math Library)  Set of vectorized transcendental functions  Most of libm functions, but faster  VSL (Vector Statistics Library)  Set of vectorized ran

8  BLAS and LAPACK* are both Fortran  Legacy of high performance computation  VSL and VML have Fortran and C interfaces  DFTs have Fortran 95 and C interfaces  cblas intercate. It is more convenient for a C/C++ programmer to call BLAS

9  Support 32-bit and 64-bit Intel Processors  Large set of examples and tests  Extensive documentation

11/28/201510 The goal of all optimization is maximum speed. Resource limited optimization – exhaust one or more resource of system:  CPU: Register use, FP units  Cache: Keep data in cache as long as possible; deal with cache interleaving.  TLBs: Maximally use data on each page  Memory bandwidth: Minimally access memory  Computer: Use all the processors available using threading  System: Use all the nodes available (cluster software)

11  Most of Intel MKL could be threaded but:  Limited resource is memory bandwidth  Threading level 1 and level 2 BLAS are mostly ineffective (O(n) )  There are numerous opportunities for threading:  Level 3 BLAS (O(n3) )  LAPACK* (O(n3) )  FFTs (O(n log(n) )  VML, VSL? Depends on processor and function  All threading is via OpenMP*  All Intel MKL is designed and compiled for thread safety

12 Scenario 1: ifort, BLAS, IA-32 processor: ifort myprog.f mkl_c.lib Scenario 2: CVF, LAPACK, IA-32 processor: f77 myprog.f mkl_s.lib Scenario 3: Statically link a C program with DLL linked at runtime: link myprog.obj mkl_c_dll.lib Note: Optimal binary code will execute at run time based on processor.

15  Most important LAPACK optimizations:  Threading – effectively uses multiple CPUs  Recursive factorization  Reduces scalar time (Amdahl’s law: t=tscalar + tparallel/p  Extends blocking further into the code  No runtime library support required

16  One dimensional, two-dimensional, three-dimensional  Multithreaded  Mixed radix  User – specified scaling, transform sign  Transforms on imbedded matrices  Multiple one-dimensional transforms on single cell  Strides  C and F90 interfaces

17  Basically a three-step process  Create a descriptor  Status = DftiCreate Descriptor (MDH,…)  Commit the descriptor (instantiates it)  Status = DftiCommit Descriptor (MDH)  Perform the transform  Status = DftiComputeForard (MDH, X)  Optionally free the descriptor

18  Vector Math Library: Vectorized transcendental functions – like libm but better (faster)  Interface: Have both Fortran and C interfaces  Multiple accuracies  High accuracy (<1ulp)  Lower accuracy, faster (<4 ulps)  Special value handling √(-a), sin(0), and so on  Error handling – can not duplicate libm here

19  It is important for financial codes (Monte Carlo simulations)  Exponentials, logarithms  Other scientific codes depend on transcendental functions  Error functions can be big time sinks in come codes

20 Vector Statistical Library (VSL) Set of random number generators (RNGs) Numerous non-uniform distributions VML used extensively for transformations Parallel computation support – some functions User can supply own BRNG or transformations Five basic RNGs (BRNGs) – bits, integer, FP ◦MCG31, R250, MRG32, MCG59, WH

21 Non-Uniform RNGs Gaussian (two methods) Exponential Laplace Weibull Cauchy Rayleigh Lognormal Gumbel

22 Using VSL Basically a 3-step Process Create a stream pointer. VSLStreamStatePtr stream; Create a stream. vslNewStream(&stream,VSL_BRNG_MC_G31, seed ); Generate a set of RNGs. vsRngUniform( 0, &stream, size, out, start, end ); Delete a stream (optional). vslDeleteStream(&stream);

23 Activity: Calculating Pi using a Monte Carlo method Compare the performance of C source code (RAND function) and VSL. Exercise control of the threading capabilities in MKL/VSL.

24 Performance Libraries: What’s Been Covered Intel® Math Kernel Library is a broad scientific/engineering math library. It is optimized for Intel® processors. It is threaded for effective use on SMP machines.

1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.

Similar presentations

Presentation on theme: "1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.

Similar presentations

Presentation on theme: "1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss."— Presentation transcript:

Similar presentations

About project

Feedback