Image Processing Using Cilk 1 Parallel Processing – Final Project Image Processing Using Cilk Tomer Y & Tuval A (pp25)

Slides:



Advertisements
Similar presentations
Multi-core Computing Lecture 3 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh.
Advertisements

© 2009 Charles E. Leiserson and Pablo Halpern1 Introduction to Cilk++ Programming PADTAD July 20, 2009 Cilk, Cilk++, Cilkview, and Cilkscreen, are trademarks.
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Practical techniques & Examples
Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.
1/1/ /e/e eindhoven university of technology Microprocessor Design Course 5Z008 Dr.ir. A.C. (Ad) Verschueren Eindhoven University of Technology Section.
CSCI-455/552 Introduction to High Performance Computing Lecture 11.
CS0004: Introduction to Programming Introduction to Programming.
Basics of MPEG Picture sizes: up to 4095 x 4095 Most algorithms are for the CCIR 601 format for video frames Y-Cb-Cr color space NTSC: 525 lines per frame.
U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science The Implementation of the Cilk-5 Multithreaded Language (Frigo, Leiserson, and.
CILK: An Efficient Multithreaded Runtime System. People n Project at MIT & now at UT Austin –Bobby Blumofe (now UT Austin, Akamai) –Chris Joerg –Brad.
Parallelizing Incremental Bayesian Segmentation (IBS) Joseph Hastings Sid Sen.
Performance Analysis of Multiprocessor Architectures
 Understanding the Sources of Inefficiency in General-Purpose Chips.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad.
Parallelizing C Programs Using Cilk Mahdi Javadi.
1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication.
A New Block Based Motion Estimation with True Region Motion Field Jozef Huska & Peter Kulla EUROCON 2007 The International Conference on “Computer as a.
Reference: Message Passing Fundamentals.
Department of Computer Engineering University of California at Santa Cruz Video Compression Hai Tao.
Lecture06 Video Compression. Spatial Vs. Temporal Redundancy Image compression techniques exploit spatial redundancy, the phenomenon that picture contents.
FAST MULTI-BLOCK SELECTION FOR H.264 VIDEO CODING Chang, A.; Wong, P.H.W.; Yeung, Y.M.; Au, O.C.; Circuits and Systems, ISCAS '04. Proceedings of.
Parallelizing Compilers Presented by Yiwei Zhang.
Image (and Video) Coding and Processing Lecture: Motion Compensation Wade Trappe Most of these slides are borrowed from Min Wu and KJR Liu of UMD.
Cilk CISC 879 Parallel Computation Erhan Atilla Avinal.
Video Compression Concepts Nimrod Peleg Update: Dec
1 TRAPEZOIDAL RULE IN MPI Copyright © 2010, Elsevier Inc. All rights Reserved.
EE392J Final Project, March 20, Multiple Camera Object Tracking Helmy Eltoukhy and Khaled Salama.
Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Introduction CSE 1310 – Introduction to Computers and Programming
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.
Video Mosaics AllisonW. Klein Tyler Grant Adam Finkelstein Michael F. Cohen.
Programming Lifecycle
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
Shape from Stereo  Disparity between two images  Photogrammetry  Finding Corresponding Points Correlation based methods Feature based methods.
An Algorithm For Constant- Quality Compressed Video Michael F. Ringenburg Richard E. Ladner Eve A. Riskin UW CSE Industrial Affiliates Meeting February.
Low-Power H.264 Video Compression Architecture for Mobile Communication Student: Tai-Jung Huang Advisor: Jar-Ferr Yang Teacher: Jenn-Jier Lien.
MOTION ESTIMATION IMPLEMENTATION IN RECONFIGURABLE PLATFORMS
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
Limits of Instruction-Level Parallelism Presentation by: Robert Duckles CSE 520 Paper being presented: Limits of Instruction-Level Parallelism David W.
An efficient Video Coding using Phase-matched Error from Phase Correlation Information Manoranjan Paul 1 and Golam Sorwar IEEE.
Effect of Saturation Arithmetic on Sum of Absolute Difference (SAD) Computation in H.264 Venkata Suman Sanikommu ECE 734 Project Presentation.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.
Futures, Scheduling, and Work Distribution Speaker: Eliran Shmila Based on chapter 16 from the book “The art of multiprocessor programming” by Maurice.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 8 Networks and Multiprocessors.
Page 11/28/2016 CSE 40373/60373: Multimedia Systems Quantization  F(u, v) represents a DCT coefficient, Q(u, v) is a “quantization matrix” entry, and.
Block-based coding Multimedia Systems and Standards S2 IF Telkom University.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
1 Cilk Chao Huang CS498LVK. 2 Introduction A multithreaded parallel programming language Effective for exploiting dynamic, asynchronous parallelism (Chess.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
Motion Estimation Multimedia Systems and Standards S2 IF Telkom University.
Concurrency and Performance Based on slides by Henri Casanova.
PRACTICAL TIME BUNDLE ADJUSTMENT FOR 3D RECONSTRUCTION ON THE GPU Siddharth Choudhary ( IIIT Hyderabad ), Shubham Gupta ( IIIT Hyderabad ), P J Narayanan.
Hierarchical Systolic Array Design for Full-Search Block Matching Motion Estimation Noam Gur Arie,August 2005.
1שידור ווידיאו ואודיו ברשת האינטרנט Dr. Ofer Hadar Communication Systems Engineering Department Ben-Gurion University of the Negev URL:
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
CILK: An Efficient Multithreaded Runtime System
CMPS 5433 Programming Models
Optimization Code Optimization ©SoftMoore Consulting.
Research Topic Error Concealment Techniques in H.264/AVC for Wireless Video Transmission Vineeth Shetty Kolkeri EE Graduate,UTA.
Sum of Absolute Differences Hardware Accelerator
Presentation transcript:

Image Processing Using Cilk 1 Parallel Processing – Final Project Image Processing Using Cilk Tomer Y & Tuval A (pp25)

Image Processing Using Cilk 2 Project Goals Global Motion Estimation Using Full Search Block Matching Algorithm for motion vector detection Multithreaded parallel programming with Cilk

Image Processing Using Cilk 3 Cilk Description Cilk is a language for multithreaded parallel programming based on ANSI C. Cilk is designed for general-purpose parallel programming, but it is especially effective for exploiting dynamic, highly asynchronous parallelism, which can be difficult to write in data- parallel or message-passing style. Unlike many other multithreaded programming systems, Cilk is algorithmic, in that the runtime system employs a scheduler that allows the performance of programs to be estimated accurately based on abstract complexity measures.

Image Processing Using Cilk 4 Introduction to Cilk The philosophy behind Cilk is that a programmer should concentrate on structuring the program to expose parallelism and exploit locality, leaving Cilk's runtime system with the responsibility of scheduling the computation to run efficiently on a given platform. Thus, the Cilk runtime system takes care of details like load balancing, paging, and communication protocols. Unlike other multithreaded languages, however, Cilk is algorithmic in that the runtime system guarantees efficient and predictable performance.

A serial C program to compute the nth Fibonacci number A parallel C program to compute the nth Fibonacci number

Compiling and running Cilk programs For producing the fib executable, type the command : > cilk -O2 fib.cilk -o fib To run the program, type: > fib --nproc 4 30 This starts fib on 4 processors to compute the 30th Fibonacci number. At the end of the execution, you should see a printout similar to the following: Result:

Compiling and running Cilk programs – collect performance information The Cilk runtime system collects this information when a program is compiled with the flags -cilk-profile and -cilk-critical-path. $ cilk -cilk-profile -cilk-critical-path -O2 fib.cilk -o fib Cilk program compiled with profiling support can be instructed to print performance information by using the --stats option.

Compiling and running Cilk programs – collect performance information (cont.) The command line > fib --nproc 4 --stats 1 30 yields an output similar to the following: Result: RUNTIME SYSTEM STATISTICS: Wall-clock running time on 4 processors: s Total work = s Total work (accumulated) = s Critical path = us Parallelism = FOR MORE INFO...

Image Processing Using Cilk 9 Motion Estimation Motion Estimation Importance : compression process.  Effective and quick video signal transmission/storage depends on video compression process. motion vectors are transmitted  In order to get high compression ratio while preserving high image quality motion vectors are transmitted instead of image itself.

Local Motion Estimation FSA-Full Search Block Matching Algorithm Previous frame In the two pictures (256*256 pixels) below we can see movement of the camera (in this case 30 pixels right and 18 pixels down). Current frame The motion vector (30,18)

Our algorithm The goals of the program is detecting the movement and returning the motion vector The steps of the programs are : 1. Read the two BMP files into two matrixes, containing values of every pixel in the frame. 2. Divide the 256*256 image into Macro Blocks, each Macro Block containing 16*16 pixels.

Dividing the image to 16*16 MBMB = 16*16 pixels Pixel position - (i*16, j*16)

The next steps of the programs are : 3. Sending each of the MB( i = 0,2,…,15, j = 0,2,…,15 ) for local motion vector estimation – generating 16*16 processes. 4. Local motion estimation – assuming movements in the x and y directions (-15 <= x_move,y_move <=15 ) and calculating Mean Absolute Error for each of this movements. where S is the previous frame and R is the current frame

5. Find the lowest MAE and chose it’s movement offsets (x_move and y_move) as the local motion vector. 6. Calculate the global motion vector by summing all the local motion vectors together and divide the result by the number of the MB.

Image Processing Using Cilk 15 The parallel algorithm is based on a Master-Slaves method. By dividing the frame into 16*16 MB we can assign one MB to each process. The processes are independent and no communication between the processes is needed. Considering those facts we should achieve a significant speedup. Parallel computing