Atlas Graphics Group MeetingDec, 1 The Colt Distribution - Open Source Libraries for High Performance Scientific and Technical Computing in Java Wolfgang Hoschek CERN IT/PDP
Atlas Graphics Group MeetingDec, 2 n Technology Tracking n Motivation & Goals n Colt distribution n Features n Status & Future plans n Conclusions Overview
Atlas Graphics Group MeetingDec, 3 n Scientific and technical computing n demanding problem sizes n need for high performance at reasonably small memory footprint n n Technology Tracking n Don’t pray Java, C++ or whatever n Gain enough experience to be able to take well founded strategic decisions when the time comes… n Increased adoption in the field n Performance gap steadily closing n ease of use n cross-platform nature (no compiler/architecture/linker issues) n built-in support for multi-threading, network friendly APIs,... n IBM Watson's Ninja project n BLAS matrix computations up to 90% as fast as optimized Fortran Technology Tracking
Atlas Graphics Group MeetingDec, 4 n Users need libraries to get their job done n Java lacks foundation toolkits broadly available and conveniently accessible in C/C++ and Fortran n Build an infrastructure for scalable scientific and technical computing in Java n a la CLHEP n Don’t reinvent the wheel - share ressources in common efforts n Open source n User convenience n Document, package and distribute loosely coupled set of libraries under one single uniform umbrella n Avoid compiler/linker/architecture headaches n Set a single env. variable to cross-platform shared library and run a program no matter where you are Motivation & Goals
Atlas Graphics Group MeetingDec, 5 n Efficient High Level Data structures & algorithms for n On-line & Off-line Data Analysis n Histogramming n NTuple like manipulations, multi-dim. arrays, matrices n Random Numbers, Monte Carlo Simulation n Concurrent & Parallel Programming n Approach n Summon some of the best designs and implementations thought up over time by the community n Port or improve them; Introduce new approaches where need arises n Results so far n In overlapping areas competitive or superior to toolkits such as STL, Root, HTL, CLHEP, TNT, GSL, C-RAND / WIN-RAND, (all C/C++) as well as IBM Array, JDK 1.2 Collections framework, JGL (all Java), n in terms of performance (!), functionality and (re)usability Colt
Atlas Graphics Group MeetingDec, 6 n Several free libraries n For user convenience documented, packaged and bundled under one single uniform umbrella n Colt library n Fundamental general-purpose data structures optimized for numerical data, e.g. n Dense and sparse matrices (multi-dimensional arrays), Linear Algebra, resizable arrays, associative containers, buffer management n Jet library n Mathematical and statistical tools for data analysis, n Histogramming functionality, n Random Number Generators and Distributions for simulations n more Features (1)
Atlas Graphics Group MeetingDec, 7 n JAL library n a partial port of the C++ Standard Template Library n developed by Silicon Graphics n contains a wide range of efficiently coded general-purpose algorithms on arrays n Random library n A complete port of CLHEP’s random number library n Concurrent library n VNI library n special math functions, complex numbers n Contributions from n Sun, SGI, Visual Numerics, Univ. New York n Your package or library ? Features (2)
Atlas Graphics Group MeetingDec, 8 n Documentation n Executive summary, installation details, FAQs, news, feedback n HTML API documentation n Extensive doc for each package, class, and method. Examples, Tutorials n Build by javadoc n High quality, starting from single top entry point, easy navigation, browsing, exploration of features n Source codes for all libraries, n and everything else needed to build the entire distribution from scratch n One single cross-platform shared lib Download Contents
Atlas Graphics Group MeetingDec, 9 n Matrix Computations n 2D Assignment: 320 MB/sec, Element-wise Mult: 10 Mflops/sec n Linear Equation Solving: ~ 15 Mflops/sec n 2D matrix-matrix mult: 25+ Mflops/sec Mflops/sec, type=dense, MHz, Solaris, SunJDK1.2.2, Classic VM | density | s 30 | i 33 | z 66 | e 100 | | n Random Numbers ~ 3*10^6 numbers/sec n Histogram filling ~ 10^6 numbers/sec n JDK1.2 on Solaris, Linux, NT, AIX, SGI, HP, … Benchmarks
Atlas Graphics Group MeetingDec, 10 n JAS (www-sldnt.slac.stanford.edu/jas) n Histogram package n Java Grande Forum (math.nist.gov/javanumerics) n Working group on numerical computing in Java n Jama Linear Algebra package + many more n IBM Watson ( n Similar design as Colt matrix classes n CLHEP (wwwinfo.cern.ch/asd/lhc++/clhep) n Random Number n TNT (math.nist.gov/tnt) n Linear Algebra n Colt (nicewww.cern.ch/~hoschek/colt/index.htm) n Beta 1.3 under ASIS, Beta 1.4 under ASIS starting next week Related Work
Atlas Graphics Group MeetingDec, 11 n Currently V1.0 Beta 4 n Open Source n V1.0 Final mid Feb. 99 n CVS access ? n Under construction n Histogram package n Transparent Parallel matrix computations for SMPs n Contributions welcome Status & Future Plans
Atlas Graphics Group MeetingDec, 12 n Technology Tracking n At LHC time-scale change is inevitable n Java may soon be a major player in performance sensitive scientific and technical computing n Ease of use, Portability, Productivity, Fun n Colt distribution n Users need libraries to get their job done n Java lacks foundation toolkits broadly available and conveniently accessible in C/C++ and Fortran n Build an infrastructure for scalable scientific and technical computing in Java n Don’t reinvent the wheel - share ressources in Open Source efforts n Document, package and distribute loosely coupled set of libraries under one single uniform umbrella n Performance is good and improving - Only a question of time when Java will be faster than C++ Conclusions