Using BLIS Building Blocks: SI2-SSI: Sustaining Innovation in the Linear Algebra Software Stack for Computational Chemistry and other Sciences Don Batory, Victor Eijkhout, Devin Matthews, Maggie Myers, John Stanton, Robert van de Geijn, Field Van Zee, Department of Computer Science, Department of Statistics and Data Sciences, Institute for Computational Engineering and Sciences.Department of Chemistry and Biochemistry, Texas Advanced Computing Center The University of Texas at Austin shpc.ices.utexas.edu The Science of High-Performance Computing Group Scientific applications such as quantum chemistry are enabled by efficient computational kernels, especially from linear algebra. The development of the BLAS-like Library Instantiation Software (BLIS) framework and libflame library with LAPACK functionality as part of this grant not only rivals the efficiency of proprietary vendor libraries, but has the potential to support scientific applications in novel ways. The project incorporatesi innovative pedagogical outreach, ranging from undergraduate research involvement to the development of Massive Open Online Courses offered through edX. The Dense Linear Algebra Software Stack Machine Learning Primitives Open Source Software BLAS-like Library Instantiation Software (BLIS) This portable framework for rapidly instantiating BLAS-like libraries is a refactoring of Kazushige Goto's approach to the implementation of the level-3 BLAS. Importantly, BLIS casts virtually all level-3 computation in terms of a single "micro-kernel," and also serves as a productivity lever when supporting higher-level DLA functionality. Level-2 operations were also restructured, exposing key so-called level-1v (vector) and level-1f (fused) kernel operations. Software developed by this project is being distributed by AMD, ARM, Texas Instruments, and Movidius. It is also at the core of research collaborations with Intel and HP. 60 cores Recent publications. Chenhan D. Yu*, William March and George Biros. An NlogN Parallel Fast Direct Solver for Kernel Matrices, IPDPS'17 Chenhan D. Yu*, William March, Bo Xiao and George Biros. Inv-Askit: A Parallel Fast Direct Solver for Kernel Matrices, IPDPS’16. Chenhan D. Yu*, Jianyu Huang, Woody Austin, Bo Xiao, George Biros. Performance Optimization for the K-Nearest Neighbors Kernel on x86 Architectures, SC'15. BLIS vs. MKL on the Intel Xeon Phi Strassen’s Algorithm and Fast MM 16 cores libflame: This general-purpose DLA library is built atop existing BLAS APIs and provides an object-based environment in which to build custom linear algebra operations and application solutions. It natively provides much of the functionality commonly used in LAPACK, and provides full backward compatibility with existing LAPACK libraries, even where native functionality is not yet present. BLIS vs. ESSL on the IBM PowerPC A2 Recent publications. F. G. Van Zee, Robert A. van de Geijn. BLIS: A Framework for Rapidly Instantiating BLAS Functionality. ACM TOMS, 2015. F. G. Van Zee, T. Smith, B. Marker, T. M. Low, R. A. van de Geijn, F. D. Igual, M. Smelyanskiy, X. Zhang, M. Kistler, V. Austel, J. Gunnels, L. Killough. The BLIS Framework: Experiments in Portability. ACM TOMS, 2016 T. M. Smith, R. van de Geijn, M. Smelyanskiy, J. R. Hammond, F. G. Van Zee. Anatomy of High-Performance Many-Threaded Matrix Multiplication. IPDPS, 2014. T. M. Low, F. D. Igual, T. M. Smith, and E. S. Quintana-Orti. Analytical Modeling is Enough for High Performance BLIS. ACM TOMS, 2016 Using BLIS Building Blocks: Recent publications. J. Huang, T. M. Smith, G. M. Henry, R. A. van de Geijn, Strassen’s Algorithm Reloaded, SC16. J. Huang, L. Rice, D. A. Matthews, R. A. van de Geijn, Generating Families of Practical Fast Matrix Multiplication Algorithms, IPDPS, 2017, accepted. Training the Next Generation of Computational Scientists TBLIS: Dense and Hierarchical Tensor Contractions for Quantum Chemistry Stand-alone materials based on the edX course. Introductory linear algebra for computer science and computational science students. Uses notation and APIs from the FLAME project. ~900 pages of notes ~270 videos web-based activities IPython notebooks (Spring 2014) MATLAB (Spring 2015, Summer 2015) “pick-your-price” (or get it free) Offered Spring 2014, Spring 2015, Summer 2015, Fall 2016 and Spring 2017 100,000+ registered participants “Square” MM and TC on Xeon Phi 7210 Starts April 2017 Recent publications. D. A. Matthews, High-Performance Tensor Contraction without Transposition, SIAM SISC, in review.