K. Sándor, M. Kozlovszky, V. Kamarás, L. Ficsór, S. V. Varga, B. Molnár HPCS 2008 April 14, 2008, Ottawa, Canada
1970 Kandó Polytechnic of Electrical Engineering 1970 Department of Computing Budapest Tech (established 2000) Integration of 3 Polytechnics John von Neumann Faculty of Informatics (NIK) (2000) Total number of students in the faculty ~1.000 April 14, 2008Budapest Tech2
The goal is to speed up a linear image registration code by using the Cell architecture. Histological, cytological and fluorescent slides MB for each slide. 1 object consists of slides. “Registration” - process of transforming input images into one coordinate system. 2D 3D image reconstruction Input slides (tissue slices): situated in the picture at different positions, different angles, significantly strained at random parts and might be disordered during the digital acquisition Already implemented algorithm: Coarse Mutual Adjustment Windows platform, sequential task April 14, 2008Budapest Tech3
Input: bitmap images Calculation of center-of-mass Image pre-processing (mask creation) First search – approximate slew Second search – slew Output: center-of-mass coordinates slew April 14, 2008Budapest Tech4 Threshold Open Median filter Rotation Comparison
Code adaptation to the Cell SDK 2.1 (following the original source code as much as possible) Code parallelization to the dual-threaded PPE (identifying concurrent tasks) Offloading concurrent tasks to SPEs (utilizing parallelization April 14, 2008Budapest Tech5
Code adaptation to the Cell SDK % of total time to adapt the code to the Cell SDK analysis of original software code - analysis and search for appropriate substantial libraries ( IPL -> OpenCV ) - implementation of missing functions ( image I/O, 1bpp image operations) - re-design of class structures April 14, 2008Budapest Tech6
Code parallelization to the dual-threaded PPE 10% of total time to parallelize the code to the dual threaded PPE - strongly modular source code - standard C++ functions supported - almost no additional data transfer related implementation April 14, 2008Budapest Tech7
Offloading concurrent tasks to SPEs 50% of total time to offload concurrent tasks to the SPEs - offload strategy, design - SPE-specific instructions (‘intrinsics’ ) - further substantial function development - implementation of data transfer mechanism - debugging April 14, 2008Budapest Tech8
April 14, 2008Budapest Tech9
Overall runtime results per mask pair ORIG – original code using IPL (~3.35s) (sequential procedures, utilizing SIMD instructions) LIN – sequential code ported to the SDK2.1, Linux (~6.85s) DT – dual-threaded parallelized code (~4.1s) FT – fully threaded code on the Cell Broadband Engine (~1s) ~2x ~3x >2x April 14, 2008Budapest Tech10
CELL Blade QS20 hosts the projects’ website ( ◦ Off-line Demo illustrating the outputs of the ported application ◦ On-line Demo that gives results on the fly ◦ Animated Demo illustrating the infrastructure of the application being developed April 14, 2008Budapest Tech11
IBM US ‘ Development of a microscopy application for fast 3D image modeling and reconstruction’ Faculty Award Dr. Dezső Sima DSc John von Neumann Faculty of Informatics László Kiss Kollár IBM Global Engineering Solutions Balázs Molnár John von Neumann Faculty of Informatics, Biotech Group April 14, 2008Budapest Tech12
Thank you!