Download presentation
Presentation is loading. Please wait.
Published byCollin Pierce Modified over 8 years ago
1
K. Sándor, M. Kozlovszky, V. Kamarás, L. Ficsór, S. V. Varga, B. Molnár HPCS 2008 April 14, 2008, Ottawa, Canada
2
1970 Kandó Polytechnic of Electrical Engineering 1970 Department of Computing Budapest Tech (established 2000) Integration of 3 Polytechnics John von Neumann Faculty of Informatics (NIK) (2000) Total number of students in the faculty ~1.000 April 14, 2008Budapest Tech2
3
The goal is to speed up a linear image registration code by using the Cell architecture. Histological, cytological and fluorescent slides 100-150 MB for each slide. 1 object consists of 100-300 slides. “Registration” - process of transforming input images into one coordinate system. 2D 3D image reconstruction Input slides (tissue slices): situated in the picture at different positions, different angles, significantly strained at random parts and might be disordered during the digital acquisition Already implemented algorithm: Coarse Mutual Adjustment Windows platform, sequential task April 14, 2008Budapest Tech3
4
Input: bitmap images Calculation of center-of-mass Image pre-processing (mask creation) First search – approximate slew Second search – slew Output: center-of-mass coordinates slew April 14, 2008Budapest Tech4 Threshold Open Median filter Rotation Comparison
5
Code adaptation to the Cell SDK 2.1 (following the original source code as much as possible) Code parallelization to the dual-threaded PPE (identifying concurrent tasks) Offloading concurrent tasks to SPEs (utilizing parallelization April 14, 2008Budapest Tech5
6
Code adaptation to the Cell SDK 2.1 40% of total time to adapt the code to the Cell SDK 2.1 - analysis of original software code - analysis and search for appropriate substantial libraries ( IPL -> OpenCV ) - implementation of missing functions ( image I/O, 1bpp image operations) - re-design of class structures April 14, 2008Budapest Tech6
7
Code parallelization to the dual-threaded PPE 10% of total time to parallelize the code to the dual threaded PPE - strongly modular source code - standard C++ functions supported - almost no additional data transfer related implementation April 14, 2008Budapest Tech7
8
Offloading concurrent tasks to SPEs 50% of total time to offload concurrent tasks to the SPEs - offload strategy, design - SPE-specific instructions (‘intrinsics’ ) - further substantial function development - implementation of data transfer mechanism - debugging April 14, 2008Budapest Tech8
9
April 14, 2008Budapest Tech9
10
Overall runtime results per mask pair ORIG – original code using IPL (~3.35s) (sequential procedures, utilizing SIMD instructions) LIN – sequential code ported to the SDK2.1, Linux (~6.85s) DT – dual-threaded parallelized code (~4.1s) FT – fully threaded code on the Cell Broadband Engine (~1s) ~2x ~3x >2x April 14, 2008Budapest Tech10
11
CELL Blade QS20 hosts the projects’ website (http://cell.nik.bmf.hu/) ◦ Off-line Demo illustrating the outputs of the ported application ◦ On-line Demo that gives results on the fly ◦ Animated Demo illustrating the infrastructure of the application being developed April 14, 2008Budapest Tech11
12
IBM US ‘ Development of a microscopy application for fast 3D image modeling and reconstruction’ Faculty Award Dr. Dezső Sima DSc John von Neumann Faculty of Informatics László Kiss Kollár IBM Global Engineering Solutions Balázs Molnár John von Neumann Faculty of Informatics, Biotech Group April 14, 2008Budapest Tech12
13
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.