Download presentation
Presentation is loading. Please wait.
Published byBuddy Sydney Dickerson Modified over 9 years ago
1
High Performance Computing The GotoBLAS Library
2
HPC: numerical libraries Many numerically intensive applications make use of specialty libraries to perform common operations: Linear algebra operators (e.g., dot products, matrix-vector multiplies) Fast Fourier transforms Linear solvers To maximize application performance (and throughput), we want these libraries to be highly optimized for each computer architecture One commonly used numerical library is BLAS: Contains routines that provide standard building blocks for performing basic vector and matrix operations Commonly used in scientific and engineering software and graphics processing “High-profile” since it is used with the Linpack benchmark, used to rank the fastest supercomputers in the world (Top 500 list)
3
HPC: GotoBLAS GotoBLAS is an implementation of the BLAS library developed by TACC researcher Kazushige Goto. Kazushige has been called “the Michael Jordan of high- performance linear algebra kernels.” Software is designed for all common chipset architectures, including: Power 4, Power 5 Opteron Blue Gene/L Pentium 4/Xeon (32-bit and 64-bit) Itanium 2
4
HPC: GotoBLAS Most vendors provide their own BLAS implementation: Significant development overhead incurred for new architectures Large code base with many switching branches based on input sizing Kazushige’s approach uses a simplified model No major context switching Functions separated based on performance impact Non-performance bits written in C Crucial performance kernels written in assembly GotoBLAS tries to minimize assembler codes Actual assembler code is really small Easy to improve and debug Benefit: It takes only 3 to 7 days to develop a tuned BLAS for a new architecture
5
GotoBLAS DGEMM performance ArchitectureEfficiency Itanium298.9% PPC440 FP298.2% Alpha 2126496.5% POWER596.2% Pentium495.7% Opteron92.8% PPC970MP92.0% SPARC IV92.0% Efficiency indicates the ratio of observed performance to the maximum theoretical value. DGEMM is one of the most widely used BLAS functions; it performs matrix-matrix multiplies.
6
Example GotoBLAS comparisons DGEMM POWER5 1.9GHz 0 760 1520 2280 3040 3800 4560 5320 6080 6840 7600 0500100015002000 Size MFlops GOTOESSLATLAS
7
HPC: GotoBLAS In April 2006, TACC released the latest version of GotoBLAS: Free to use for academic and research purposes Supports a wide range of Fortran compiler interfaces Available to commercial users through UT’s Office of Technology Commercialization Source code for the library is now available. Redistribution rights are also available.
8
Thanks for your time! Karl W. Schulz, karl@tacc.utexas.edukarl@tacc.utexas.edu Kazushige Goto, kgoto@tacc.utexas.edukgoto@tacc.utexas.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.