K. Sándor, M. Kozlovszky, V. Kamarás, L. Ficsór, S. V. Varga, B. Molnár HPCS 2008 April 14, 2008, Ottawa, Canada.

Slides:



Advertisements
Similar presentations
Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical.
Advertisements

April, 2003IKTA-88/2001 Institute of Graphology, Hexium, BMF NIK Development of an Autonomous, Reliable, High Quality Signature Verification Device Institute.
Systems and Technology Group © 2006 IBM Corporation Cell Programming Tutorial - JHD24 May 2006 Cell Programming Tutorial Jeff Derby, Senior Technical Staff.
MP3 Optimization Exploiting Processor Architecture and Using Better Algorithms Mancia Anguita Universidad de Granada J. Manuel Martinez – Lechado Vitelcom.
Offline Adaptation Using Automatically Generated Heuristics Frédéric de Mesmay, Yevgen Voronenko, and Markus Püschel Department of Electrical and Computer.
Introduction to Computer Programming Nai-Wei Lin Department of Computer Science and Information Engineering National Chung Cheng University.
PHOTOMOD. Future outlook Aleksey Elizarov Head of Software Development Department, Racurs October 2014, Hainan, China From Imagery to Map: Digital Photogrammetric.
P10054 Enhancements to Cigarette Smoking Machine Senior Design Fall/Winter 2009 Team Lead Frank Forkl (ME) Slide 1 of 8 P10054.
CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.
Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.
1 8/28/06CS150 Introduction to Computer Science 1 Professor: Chadd Williams
Submitters:Vitaly Panor Tal Joffe Instructors:Zvika Guz Koby Gottlieb Software Laboratory Electrical Engineering Faculty Technion, Israel.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy IBM Systems and Technology Group IBM Journal of Research and Development.
Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.
William Lorensen GE Research Niskayuna, NY February 12, 2001 Insight Segmentation and Registration Toolkit.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Slide 1 Copyright © 2003 Encapsule Systems, Inc. Hyperworx Platform Brief Modeling and deploying component software services with the Hyperworx™ platform.
Invitation to Computer Science 5 th Edition Chapter 9 Introduction to High-Level Language Programming.
9/13/20151 Threads ICS 240: Operating Systems –William Albritton Information and Computer Sciences Department at Leeward Community College –Original slides.
1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.
EXACT TM CT Scanner EXACT: The heart of an FAA-certified Explosives Detection Scanner 3-D Image.
© 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, Scott Geaghan, Myra Jean Prelle, Brian Bouzas,
Atlanta, Georgia TiNy Threads on BlueGene/P: Exploring Many-Core Parallelisms Beyond The Traditional OS Handong Ye, Robert Pavel, Aaron Landwehr, Guang.
Lecture 14 Reconfigurable Computing Basics Lecturer: Simon Winberg.
Chapter I: Introduction to Computer Science. Computer: is a machine that accepts input data, processes the data and creates output data. This is a specific-purpose.
Computer Systems Organization CS 1428 Foundations of Computer Science.
RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture:
Massively Parallel Mapping of Next Generation Sequence Reads Using GPUs Azita Nouri, Reha Oğuz Selvitopi, Özcan Öztürk, Onur Mutlu, Can Alkan Bilkent University,
FPGA FPGA2  A heterogeneous network of workstations (NOW)  FPGAs are expensive, available on some hosts but not others  NOW provide coarse- grained.
Implementing Codesign in Xilinx Virtex II Pro Betim Çiço, Hergys Rexha Department of Informatics Engineering Faculty of Information Technologies Polytechnic.
Neuroblastoma Stroma Classification on the Sony Playstation 3 Tim Hartley, Olcay Sertel, Mansoor Khan, Umit Catalyurek, Joel Saltz, Metin Gurcan Department.
Filtered Backprojection. Radon Transformation Radon transform in 2-D. Named after the Austrian mathematician Johann Radon RT is the integral transform.
Programming Examples that Expose Efficiency Issues for the Cell Broadband Engine Architecture William Lundgren Gedae), Rick Pancoast.
Group May Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.
HARDWARE INTERFACE FOR A 3-DOF SURGICAL ROBOT ARM Ahmet Atasoy 1, Mehmed Ozkan 2, Duygun Erol Barkana 3 1 Institute of Biomedical Engineering, Bogazici.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Group May Bryan McCoy Kinit Patel Tyson Williams.
Sep 08, 2009 SPEEDUP – Optimization and Porting of Path Integral MC Code to New Computing Architectures V. Slavnić, A. Balaž, D. Stojiljković, A. Belić,
LYU0703 Parallel Distributed Programming on PS3 1 Huang Hiu Fung Wong Chung Hoi Supervised by Prof. Michael R. Lyu Department of Computer.
Assoc. Prof. Dr. Ahmet Turan ÖZCERİT.  What is engineer,  What is Computer Engineering  The topics in Computer Engineering You will learn: 2.
Portable Heart Attack Detector (PHAD) Final Presentation
Aurora/PetaQCD/QPACE Metting Regensburg University, April 14-15, 2010.
Implementing Fast Image Processing Pipelines in a Codesign Environment Accelerate image processing tasks through efficient use of FPGAs. Combine already.
Presented by Jeremy S. Meredith Sadaf R. Alam Jeffrey S. Vetter Future Technologies Group Computer Science and Mathematics Division Research supported.
Aarul Jain CSE520, Advanced Computer Architecture Fall 2007.
Canny Edge Detection Using an NVIDIA GPU and CUDA Alex Wade CAP6938 Final Project.
Machine Vision Edge Detection Techniques ENT 273 Lecture 6 Hema C.R.
May Mike Drob Grant Furgiuele Ben Winters Advisor: Dr. Chris Chu Client: IBM IBM Contact – Karl Erickson.
FFTC: Fastest Fourier Transform on the IBM Cell Broadband Engine David A. Bader, Virat Agarwal.
ACCELERATING VIRUS SCANNING WITH GPU Project by: Sinthuja K. Thipakar S. Computer Engineering Department, University of Peradeniya.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture ?? – September October.
Performed by:Liran Sperling Gal Braun Instructor: Evgeny Fiksman המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
GPS Computer Program Performed by: Moti Peretz Neta Galil Supervised by: Mony Orbach Spring 2009 Characterization presentation High Speed Digital Systems.
AliRoot survey: Reconstruction P.Hristov 11/06/2013.
XRD data analysis software development. Outline  Background  Reasons for change  Conversion challenges  Status 2.
Chapter I: Introduction to Computer Science. Computer: is a machine that accepts input data, processes the data and creates output data. This is a specific-purpose.
Zachary Starr Dept. of Computer Science, University of Missouri, Columbia, MO 65211, USA Digital Image Processing Final Project Dec 11 th /16 th, 2014.
Software Defined Radio
Dynamo: A Runtime Codesign Environment
SOFTWARE DESIGN AND ARCHITECTURE
Basic CUDA Programming
Fast Preprocessing for Robust Face Sketch Synthesis
Open Source Robotics Vision and Mapping System
IMAGE MOSAICING MALNAD COLLEGE OF ENGINEERING
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
Real time signal processing
Martin Croome VP Business Development GreenWaves Technologies.
Presentation transcript:

K. Sándor, M. Kozlovszky, V. Kamarás, L. Ficsór, S. V. Varga, B. Molnár HPCS 2008 April 14, 2008, Ottawa, Canada

1970 Kandó Polytechnic of Electrical Engineering 1970 Department of Computing Budapest Tech (established 2000) Integration of 3 Polytechnics John von Neumann Faculty of Informatics (NIK) (2000) Total number of students in the faculty ~1.000 April 14, 2008Budapest Tech2

 The goal is to speed up a linear image registration code by using the Cell architecture.  Histological, cytological and fluorescent slides MB for each slide.  1 object consists of slides.  “Registration” - process of transforming input images into one coordinate system.  2D  3D image reconstruction  Input slides (tissue slices): situated in the picture at different positions, different angles, significantly strained at random parts and might be disordered during the digital acquisition  Already implemented algorithm: Coarse Mutual Adjustment  Windows platform, sequential task April 14, 2008Budapest Tech3

 Input: bitmap images  Calculation of center-of-mass  Image pre-processing (mask creation)  First search – approximate slew  Second search – slew  Output: center-of-mass coordinates slew April 14, 2008Budapest Tech4 Threshold Open Median filter Rotation Comparison

 Code adaptation to the Cell SDK 2.1 (following the original source code as much as possible)  Code parallelization to the dual-threaded PPE (identifying concurrent tasks)  Offloading concurrent tasks to SPEs (utilizing parallelization April 14, 2008Budapest Tech5

 Code adaptation to the Cell SDK % of total time to adapt the code to the Cell SDK analysis of original software code - analysis and search for appropriate substantial libraries ( IPL -> OpenCV ) - implementation of missing functions ( image I/O, 1bpp image operations) - re-design of class structures April 14, 2008Budapest Tech6 

 Code parallelization to the dual-threaded PPE 10% of total time to parallelize the code to the dual threaded PPE - strongly modular source code - standard C++ functions supported - almost no additional data transfer related implementation April 14, 2008Budapest Tech7 

 Offloading concurrent tasks to SPEs 50% of total time to offload concurrent tasks to the SPEs - offload strategy, design - SPE-specific instructions (‘intrinsics’ ) - further substantial function development - implementation of data transfer mechanism - debugging April 14, 2008Budapest Tech8 

April 14, 2008Budapest Tech9

 Overall runtime results per mask pair ORIG – original code using IPL (~3.35s) (sequential procedures, utilizing SIMD instructions) LIN – sequential code ported to the SDK2.1, Linux (~6.85s) DT – dual-threaded parallelized code (~4.1s) FT – fully threaded code on the Cell Broadband Engine (~1s) ~2x ~3x >2x April 14, 2008Budapest Tech10

 CELL Blade QS20 hosts the projects’ website ( ◦ Off-line Demo illustrating the outputs of the ported application ◦ On-line Demo that gives results on the fly ◦ Animated Demo illustrating the infrastructure of the application being developed April 14, 2008Budapest Tech11

IBM US ‘ Development of a microscopy application for fast 3D image modeling and reconstruction’ Faculty Award Dr. Dezső Sima DSc John von Neumann Faculty of Informatics László Kiss Kollár IBM Global Engineering Solutions Balázs Molnár John von Neumann Faculty of Informatics, Biotech Group April 14, 2008Budapest Tech12

Thank you!