Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of Stromal Development Olcay Sertel 1,2, Antonio Ruiz 3, Umit Catayurek 1,2, Manuel Ujaldon 3, Joel Saltz 1, Metin Gurcan 1 Dept. of Computer Architecture, 1 Dept. of Biomedical Informatics, 2 Dept. of Electrical & Computer Engineering, 3 Dept. of Pathology, The Ohio State University, 3 Dept. of Computer Architecture, The University of Malaga
2 Why do we need high-performance tools? The size of a single whole-slide image is extremely large! Typically an uncompressed whole-slide image digitized at 40x is more than 40GB. A spatial resolution of 120K x 120K 120K x 120K x 3 Bytes(RGB) per pixel ≈ 43.2 GB Complicated and time-consuming image analysis algorithms.
3 Parallel processing infrastructure ` Whole-slide image Label 1 Label 2 Background Label 3 Assign classification labels Classification map Image tiles (40X magnification) Processor 1Processor N ……… Parallel Classification
4 What is GPGPU? GPGPU stands for General Purpose Graphics Processing Units Initially designed for gaming applications Fast GPUs are used to implement complex shader and rendering operations for real-time effects. Doom 3, © id Software Call of Duty, © Infinity Ward
5 Applications Physically-based Simulation Particle Systems Molecular Dynamics Fluid models Signal and Image Processing Segmentation Volume Rendering Visualization Photon Mapping Ray Tracing Medical Image Analysis Databases & Data Mining Database queries Stream Mining
6 GPU resources CPUGPU Processor clock2.13 GHz575 MHz Raw computational power10 GFLOPS520 GFLOPS Memory bus width64 bits384 bits Memory clock2x333 MHz2x900 MHz Memory bandwidth10.8 GB/s86.4 GB/s Memory size and type2 Gb DDR2768 Mb GDDR3 GPUs: Speed increasing at cubed- Moore’s law! Ubiquitous and inexpensive Functional units for specific graphics-based operations (vertex & pixel shaders) Small memory but raw computational power Memory bandwidth & clock provides superior performance
7 GPU implementation The implementation is crucial Programming model is unusual Programming idioms tied to computer graphics Programming environment tightly constrained Can’t simply port CPU code: Poorly suited to sequential, “pointer-chasing” code Missing support for some basic functionality (e.g., integers, bitwise operations) Underlying architectures are: Inherently parallel Rapidly evolving (even in basic feature set!) Largely secret
8 Computational savings on GPUs Execution times (in msec.) for a 1Kx1K image tile. CPU (Matlab)CPU (C++)GPU LA*B* conversion Statistical features LBP Total Processing of a relatively small whole-slide image of 50Kx50K size is: 47 sec. on GPU 35 min. on CPU Task to performC++ vs. MatlabGPU vs. C++GPU vs. Matlab RGB to LA*B* conv. 5.9x - 5.2x69.2x x406.1x x Statistical features 122.2x x0.2x - 2.1x21.8x x LBP operator 8.3x - 3.9x4.2x x34.6x x TOTAL 13.3x - 7.6x2.6x x33.4x x Performance gain depends on image resolution, varying from 128x128 to 1024x1024
9 Verification of the out values MeanStandard deviation CPU(Matlab) / CPU(C++) 1.4 CPU(C++) / GPU 6.5 CPU(Matlab) / GPU 1.5 Verification of the output values across hardware platforms obtained from 500 training images. There is no variation in the classification accuracy when using the feature values computed on GPU
10 Future directions & Conclusions Processing of the whole-slide images is essential to overcome the sampling bias problem. We need HPC tools that are available due to the huge sizes of whole-slide images and sophisticated image analysis algorithms The processing time can be reduced drastically using different infrastructures We are investigating novel ways of whole-slide images over various computational infrastructures Cluster of GPUs One drawback of GPUs is the low-level programmability Requires good knowledge of architecture Rapid changes in the architecture However, higher level development tools (CUDA by NVidia)
11 Thanks for your attention Any questions?