Image Stitching for Optical Microscopy Timothy Blattner, Bertrand Stivalet, Walid Keyrouz Shujia Zhou IAB 2013-6-13
Image Stitching for Optical Microscopy Objectives Stitching of optical microscopy images at interactive rates General purpose library, ImageJ/Fiji plug-in, etc. Success criterion Transformative impact Run sample problem in < 1 min > 10x speed improvement IAB 2013-6-13
Credits Joe Chalfoun, Mary Brady NIST IAB 2013-6-13
Image Stitching Problem Optical microscopes scan a plate and take overlapping partial images (tiles) Need to assemble image tiles into one large image Modern microscopy automated: Scientists are acquiring & processing large sets of images IAB 2013-6-13
Image Stitching Problem… Header 2012-09-28 Image Stitching Problem… Two phases: Compute the X & Y translations for all tiles Apply the translations & compose the stitched image Main focus is on phase I IAB 2013-6-13 Footer
Image Stitching Algorithm Loop over all images: Read an image tile Compute its FFT-2D Compute correlation coefficients with west and north neighbors Depends on FFT-2D for each tile Major compute portions: FFT-2D of tiles Compute and normalize phase correlation Inverse FFT-2D Reduce max normalize IAB 2013-6-13
Algorithm’s Parallel Characteristics Almost embarrassingly parallel Large number of independent computations For an n x m grid: FFT for all images nxm NCC for all image pairs 2nxm - n - m FFT-1 for the NCCs of all image pairs 2nxm - n - m … Caveats Data FFT dependencies Limited memory IAB 2013-6-13
Data Set Grid of 59x42 images (2478) 1392x1040 16-bit grayscale images (2.8 MB per image) ~ 7 GB Source: Kiran Bhadriraju (NIST) IAB 2013-6-13
Evaluation Platform Hardware Dual Intel® Xeon® E-5620 CPUs (quad-core, 2.4 GHz, hyper-threading) 24 GB RAM Dual NVIDIA® TeslaTM C2070 cards Reference Implementations Fiji™ Stitching plugin, >3.6 hours MATLAB® prototype, ~17.5 minutes on a similar machine Software Ubuntu Linux 12.04/x86_64, kernel 3.2.0 Libc6 2.1.5, libstd++6 4.6 BOOST 1.48, FFTW 3.3, libTIFF4 NVIDIA CUDA & CUFFT 5.0 IAB 2013-6-13
Implementations & Results FFTW Exhaustive, CUDA 5.0 Time Speedup CPU Threads GPUs C++ Sequential 10 min 37 sec 1 Simple Multi-Threaded 1 min 48 sec 5.8x 8 Pipelined Multi-Threaded 1 min 22 sec 7.7x 19 Simple GPU 9 min 47 sec 1.08x Pipelined-Hybrid 25 sec 25.5x 13 2 IAB 2013-6-13
Java Implementation Allows easy integration info Fiji Tool used by many biologists for image stitching Pure Java code is extremely slow FFT computations Cross correlation Use JNI with FFTW and C code Java native interface Allows calling functions off of the virtual machine Requires compilation (gcc) IAB 2013-6-13
Java Implementation Runtimes 42x59 Tiles Threads Sequential > 4 hours 1 Sequential with JNI ~30 minutes Pipelined with JNI 3 min 42 sec 16 IAB 2013-6-13
Closure—General 25x speedup compared to Sequential C++ code 518x speedup compared to Fiji stitching plugin Representative data set: 42x59 grid ~25 sec Can budget compute time to: Generate stitched image Carry out additional analysis Enables computationally steerable experiments IAB 2013-6-13
Closure—Java Implementation Single threaded-executes full grid in ~45 minutes using FFTW native interface Multi-threaded executes in ~4 minutes Optimized version uses native intrinsics for computing cross correlation Provides simple integration into Fiji application Need to provide JNI for GPU functions JCUDA IAB 2013-6-13
Thank You Questions? IAB 2013-6-13