SPMD: Single Program Multiple Data Streams

Slides:



Advertisements
Similar presentations
Threads, SMP, and Microkernels
Advertisements

Computer Abstractions and Technology
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Summary Background –Why do we need parallel processing? Applications Introduction in algorithms and applications –Methodology to develop efficient parallel.
Reference: Message Passing Fundamentals.
Introduction CS 524 – High-Performance Computing.
ECE669 L11: Static Routing Architectures March 4, 2004 ECE 669 Parallel Computer Architecture Lecture 11 Static Routing Architectures.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
OPERATING SYSTEM OVERVIEW
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.
PRASHANTHI NARAYAN NETTEM.
CSE 1301 J Lecture 2 Intro to Java Programming Richard Gesick.
Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical.
Parallel Architectures
Introduction to Parallel Processing 3.1 Basic concepts 3.2 Types and levels of parallelism 3.3 Classification of parallel architecture 3.4 Basic parallel.
Fine Grain MPI Earl J. Dodd Humaira Kamal, Alan University of British Columbia 1.
Computer Architecture Parallel Processing
PARUS: a parallel programming framework for heterogeneous multiprocessor systems Alexey N. Salnikov (salnikov cs.msu.su) Moscow State University Faculty.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
PVM and MPI What is more preferable? Comparative analysis of PVM and MPI for the development of physical applications on parallel clusters Ekaterina Elts.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.
Compilation Technology SCINET compiler workshop | February 17-18, 2009 © 2009 IBM Corporation Software Group Coarray: a parallel extension to Fortran Jim.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Alternative ProcessorsHPC User Forum Panel1 HPC User Forum Alternative Processor Panel Results 2008.
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Advanced Computer Networks Lecture 1 - Parallelization 1.
Parallelization of a Swarm Intelligence System
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.
Parallel Computing Presented by Justin Reschke
Distributed Mutual Exclusion Synchronization in Distributed Systems Synchronization in distributed systems are often more difficult compared to synchronization.
CISC. What is it?  CISC - Complex Instruction Set Computer  CISC is a design philosophy that:  1) uses microcode instruction sets  2) uses larger.
Background Computer System Architectures Computer System Software.
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Introduction Super-computing Tuesday
The University of Adelaide, School of Computer Science
Q: What Does the Future Hold for “Parallel” Languages?
MPJ: A Java-based Parallel Computing System
(Computer fundamental Lab)
Presentation transcript:

SPMD: Single Program Multiple Data Streams Hui Ren Electrical & Computer Engineering Tufts University

Background This also increased power consumption, and had problems of heat dissipation at high clock speeds. Previously, computing performance was increased through clock speed scaling. Parallel computing allow more instructions to complete in a given time through parallel execution. Now days, parallel computing has entered main stream use, following the introduction of multi-core processors. This diagram show you how optical mammography works Four wavelength lights, (690,750,788,856nm), are modulated electronically around 70 Mhz. The modulated lights are guided and projected to breast by optical fiber bundle. The source and detector located on opposite sides of breast, and will move mechanicly and screening the breast tissue. Due to the diffusive nature of light propagation in breast tissue, the amplitude and phase of received light will change and shift. Prof. Fantini’s group can generate a image of the breast if they know the amplitude and phase data. The photon mulitiplier tube convert the optical singal to electrical signal. The amplitude and phase data, which can be measured from electrical signal . The spectral dependences of blood vessel, cancer and mastopathy are different. That’s why they use four wavelengths lights here. Here are the four images for different wavelenghs lights. The four images are combined to generate a oxygenation-index image, The arrow indicated the position of a 2.5cm invasive ductal carcinoma

What is SPMD? SPMD mode is a method of parallel computing, its processors run the same program, but execute different data. SPMD could get better computing performance through increasing the number of processors.

SPMD operation mechanism The same program code is loaded to all the processors. Data is distributed to each processor The barrier is like a control signal generated by all processors. It could synchronize the execution of processors at some point. This diagram show you how optical mammography works Four wavelength lights, (690,750,788,856nm), are modulated electronically around 70 Mhz. The modulated lights are guided and projected to breast by optical fiber bundle. The source and detector located on opposite sides of breast, and will move mechanicly and screening the breast tissue. Due to the diffusive nature of light propagation in breast tissue, the amplitude and phase of received light will change and shift. Prof. Fantini’s group can generate a image of the breast if they know the amplitude and phase data. The photon mulitiplier tube convert the optical singal to electrical signal. The amplitude and phase data, which can be measured from electrical signal . The spectral dependences of blood vessel, cancer and mastopathy are different. That’s why they use four wavelengths lights here. Here are the four images for different wavelenghs lights. The four images are combined to generate a oxygenation-index image, The arrow indicated the position of a 2.5cm invasive ductal carcinoma

The first example of SPMD: Titanium Titanium is a Java-based language for writing high performance scientific applications on large scale multiprocessors. public static void main(String[] args) { System.out.println("Hello from thread " + Ti.thisProc() ) ; Ti.barrier() ; if (Ti.thisProc() == 0) System.out.println("Done.") ; }

The first example of SPMD: Titanium Data locality: No communication between processors.

The second example of SPMD: MPI MPI is a standard interface for message passing parallel programs written in C, C++, or Fortran. begin program x = 0 z = 2 b = 7 if (rank == 0) then x = x + 1 b=x* 3 send(x) else receive(y) z=b* y (10) endif f = reduce(SUM,z) end program

The second example of SPMD: MPI we can see that the variable y will be assigned the constant value 1 due to the send of x and the corresponding receive into y. SPMD has a local view of execution.

Advantages of SPMD Locality. Data locality is essential to achieving good performance on large-scale machines, where communication across the network is very expensive. Structured Parallelism. The set of threads is fixed throughout computation. It is easier for compilers to reason about SPMD code, resulting in more efficient program analyses than in other models. Simple runtime implementation. SPMD belongs to MIMD, it has a local view of execution and parallelism is exposed directly to the user, compilers and runtime systems require less effort to implement than many other MIMD models.

Disadvantages of SPMD SPMD is a flat model, which makes it dificult to write hierarchical code, such as divide-and-conquer algorithms, as well as programs optimized for hierarchical machines. The second disadvantage may be that it seems hard to get the desired speedup using SPMD.

Expectation The advantages of SPMD are very obvious, and SPMD is still a common use on many large-scale machines. Many scientists have done researches to improve SPMD, such as the recursive SPMD, which provides hierarchical teams. So, SPMD will still be a good method for parallel computing in the future.

References 1. Kamil, A.A., Single Program, Multiple Data Programming for Hierarchical Computations. Dissertation of Philosophy Doctor, University of California, Berkeley, 2012. 2. Wierman, A., L.L.H. Andrew, and A. Tang, Power-aware speed scaling in processor sharing systems: Optimality and robustness. Perform. Eval., 2012. 69(12): p. 601-622. 3. Pais, M.S., K. Yamanaka, and E.R. Pinto, Rigorous Experimental Performance Analysis of Parallel Evolutionary Algorithms on Multicore Platforms. Latin America Transactions, IEEE (Revista IEEE America Latina), 2014. 12(4): p. 805-811. 4. Cremonesi, P. and C. Gennaro, Integrated performance models for SPMD applications and MIMD architectures. Parallel and Distributed Systems, IEEE Transactions on, 2002. 13(12): p. 1320-1332. 5. Cao, J.-J., S.-S. Fan, and X. Yang. SPMD Performance Analysis with Parallel Computing of Matlab. in Intelligent Networks and Intelligent Systems (ICINIS), 2012 Fifth International Conference on. 2012. 6. Lusk, E. MPI in 2002: has it been ten years already? in Cluster Computing, 2002. Proceedings. 2002 IEEE International Conference on. 2002. 7. Strout, M.M., B. Kreaseck, and P.D. Hovland. Data-Flow Analysis for MPI Programs. in Parallel Processing, 2006. ICPP 2006. International Conference on. 2006. 8. Numrich, R.W. and J. Reid, Co-array Fortran for parallel programming. SIGPLAN Fortran Forum, 1998. 17(2): p. 1-31.

Thank you!