Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su
Motivation I’m interested in (re-)coding a general solver for sc/mcDESPOT relaxometry mapping – Open source – Extensibility to new/add’l sequences with better sensitivity to certain parameters, e.g. B0 and MWF – Better parallelization But: – Large-scale code development in Matlab is cumbersome – Matlab is slow – C is hard (to write, read, debug) Creates large barrier for others to contribute
Matlab Pros Ubiquitous, code is cross- platform Can be fast with vectorized code Data visualization Quick development time Great IDE for general research – Poor for large projects Many useful native libraries/toolboxes Built-in profiling tools Cons Requires license, not free (though there is Octave) Vectorized code is often non-intuitive to write and hard to read Slow for general computations Limited parallel computing and GPU support
C/C++ Pros Fast Great IDEs for large coding projects – Not as great for general science work Strong parallel computer support and CUDA Community libraries for scientific computing Profiling dependent on IDE Cons High learning curve and development time No data visualization Compiled code is platform specific Compiler is not generally installed with OSX and Windows
Python Pros Preinstalled with OSX and Linux-based systems Readability is a core tenet (“pythonic”) Quick development time Native parallel computing support and community GPU modules Extensive community support – Including neuroimaging- specific: NiPype, NiBabel Built-in profiling module and some IDE tools Cons Slow for general computation Mixed bag of IDEs, some are great for coding, others for research Out of the box it’s a poor alternative: no linear algebra or data visualization
Python & Friends Cons Slow for general computation Mixed bag of IDEs, some are great for coding, others for research Out of the box it’s a poor alternative: no linear algebra or data visualization Solutions Cython, JIT compilers like PyPy There are a few good options out there that I’ve found: – Eclipse + PyDev, NetBeanz – Spyder – closest to MATLAB – Sage Math Notebook, IPython – like Mathematica – It may come down to preference. NumPy + SciPy + Matplotlib = PyLab – Sage Math includes these as well as other capabilities like symbolic math and graph theory
Pythonic? A term of praise used by the community to refer to clean code that is readable, intuitive, explicit, and takes advantage of coding idioms Python people = [‘John Doe’, ’Jane Doe’, ’John Smith’] smith_family = [] for name in people: if ‘Smith’ in name: smith_family.append(name) smith_family = [name for name in people if ‘Smith’ in name] Matlab people = {‘John Doe’, ’Jane Doe’, ’John Smith’}; smith_family = {} for name = people if strfind(name{1},’Smith’) smith_family = [smith_family name]; end
Installation On any OS: – Sage Math ( easy unzip installation but many “extraneous” packages (500MB) Some issues on OSX with matplotlib On OSX: – Use MacPorts to install Python (2.7), SciPy, matplotlib, and Cython Requires gcc compiler available through Apple Developer
NumPy + SciPy vs Matlab Same core libraries: LAPACK Equivalent syntax but not trying to be similar NumPy_for_Matlab_Users Key differences: – Python uses 0 (zero) based indexing. The initial element of a sequence is found using [0]. – In NumPy arrays have pass-by-reference semantics. Slice operations are views into an array.
Syntax Matlab a\b max(a(:)) a(end-4:end) [0:9] NumPy linalg.lstsq(a,b) a.max() a[-5:] arange(10.) or r_[:10.]
Cython Requires a C compiler Cython is Python with C data types. – Dynamic typing of Python has overhead, slow for computation Allows seamless coding of Python and embedded C-speed routines Python values and C values can be freely intermixed, with conversions occurring automatically wherever possible – This means for debugging C-level code, we can use all the plotting tools available in Python Process is sort of like EPIC 1.Write a.pyx source file 2.Run the Cython compiler to generate a C file 3.Run a C compiler to generate a compiled library 4.Run the Python interpreter and ask it to import the module
Code Comparison – Matlab Let’s try a really basic speed comparison test s = 0 tic for i = 1:1e8 s = s + i; end toc tic x = 1:1e8; sum(x) toc
Code Comparison – C #include int main() { long long unsigned int sum = 0; long long unsigned int i = 0; long long unsigned int max = ; clock_t tic = clock(); for (i = 0; i <= max; i++) { sum = sum + i; } clock_t toc = clock(); printf("%15lld, Elapsed: %f seconds\n", sum, (double)(toc - tic) / CLOCKS_PER_SEC); return 0; }
Code Comparison – Python import time from numpy import * s = 0 t = time.time() for i in xrange( ): s += i print time.time() - t t = time.time() x = arange( ) sum(x) print time.time() - t
Code Comparison – Cython addCy.pyx import time cdef long long int n = cdef long long int s = 0 cdef long long int i = 0 t = time.time() for i in xrange(n+1): s += i print time.time() – t runCy.py import pyximport; pyximport.install() import addCy
Speed Comparison Language/ImplementationTime (sec) Matlab/For loop0.547 Matlab/Vector sum0.817 (0.036 for sum only!) Python/For loop Python/NumPy sum0.648 (0.135 for sum only) C/For loop0.222 Cython/For loop0.068 (!)
Summary Python – Full featured programming language with an emphasis on “pythonic” readability NumPy/SciPy – Core libraries for linear algebra and computation (fft, optimization) Cython – Allows as much optimization as you want, degrading gracefully from high-level Python to low-level C – Profile, don’t over optimize too early!