Parallel Computing Multiprocessor Systems on Chip: Adv. Computer Arch. for Embedded Systems By Jason Agron.

Slides:



Advertisements
Similar presentations
Threads, SMP, and Microkernels
Advertisements

Distributed Systems CS
SE-292 High Performance Computing
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Intertask Communication and Synchronization In this context, the terms “task” and “process” are used interchangeably.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Computer Systems/Operating Systems - Class 8
Day 10 Threads. Threads and Processes  Process is seen as two entities Unit of resource allocation (process or task) Unit of dispatch or scheduling (thread.
MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 2 nd Edition Chapter 4: Threads.
Reference: Message Passing Fundamentals.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
3.5 Interprocess Communication
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
© 2004, D. J. Foreman 2-1 Concurrency, Processes and Threads.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Computer System Architectures Computer System Software
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Threads, Thread management & Resource Management.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
 2004 Deitel & Associates, Inc. All rights reserved. 1 Chapter 4 – Thread Concepts Outline 4.1 Introduction 4.2Definition of Thread 4.3Motivation for.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Chapter 7 -1 CHAPTER 7 PROCESS SYNCHRONIZATION CGS Operating System Concepts UCF, Spring 2004.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
Operating System 4 THREADS, SMP AND MICROKERNELS.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Department of Computer Science and Software Engineering
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Multiprocessor So far, we have spoken at length microprocessors. We will now study the multiprocessor, how they work, what are the specific problems that.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
Parallel Computing Presented by Justin Reschke
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.
Processes Chapter 3. Processes in Distributed Systems Processes and threads –Introduction to threads –Distinction between threads and processes Threads.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Chapter 4 – Thread Concepts
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
These slides are based on the book:
Chapter 4: Multithreaded Programming
Chapter 4: Threads.
Introduction to Parallel Processing
The Mach System Sri Ramkrishna.
Chapter 4 – Thread Concepts
Computer Engg, IIT(BHU)
Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang
Chapter 4: Threads.
Operating System 4 THREADS, SMP AND MICROKERNELS
Multithreaded Programming
Concurrency, Processes and Threads
Chapter 4: Threads & Concurrency
Chapter 4: Threads.
Presentation transcript:

Parallel Computing Multiprocessor Systems on Chip: Adv. Computer Arch. for Embedded Systems By Jason Agron

Laboratory Times? Available lab times… Monday, Wednesday-Friday 8:00 AM to 1:00 PM. We will post the lab times on the WIKI.

What is parallel computing? Parallel Computing (PC) is… Computing with multiple, simultaneously- executing resources. Usually realized through a computing platform that contains multiple CPUs. Often times implemented as…  Centralized Parallel Computer:  Multiple CPUs with a local interconnect or bus.  Distributed Parallel Computer:  Multiple computers networked together.

Why Parallel Computing? You can save time (execution time)! Parallel tasks can run concurrently instead of sequentially. You can solve larger problems! More computational resources = solve bigger problems! It makes sense! Many problem domains are naturally parallelizable. Example - Control systems for automobiles. Many independent tasks that require little communication. Serialization of tasks would cause the system to break down.  What if the engine management system waited to execute while you tuned the radio????

Typical Systems Traditionally, parallel computing systems are composed of the following: Individual computers with multiple CPUs. Networks of computers. Combinations of both.

Parallel Computing Systems on Programmable Chips Traditionally multiprocessor systems were expensive. Every processor was an atomic unit that had to be purchased. Bus structure and interconnect was not flexible. Today… Soft-core processors/interconnect can be used. Multiprocessor systems can be “built” from a program. Buy a single FPGA - but X processors can be instantiated. Where X is any number of processors that can fit on the target FPGA.

Parallel Programming How does one program a parallel computing system? Traditionally, programs are defined serially. Step-by-step, one instruction per step.  No explicitly defined parallelism. Parallel programming involves separating independent sections of code into tasks. Tasks are capable of running concurrently.  Granularity of tasks is user-definable. GOAL - parallel portions of code can execute concurrently so overall execution time is reduced.

How to describe parallelism? Data-level (SIMD) Lightweight - programmer/compiler handle this, no OS support needed. EXAMPLE = forAll() Thread/Task-level (MIMD) Fairly lightweight - little OS support EXAMPLE = thread_create() Process-level (MIMD) Heavyweight - a lot of OS support EXAMPLE = fork()

Serial Programs Program is decomposed into a series of tasks. Tasks can be fine-grained or coarse-grained. Tasks are made up of instructions. Tasks must be executed sequentially! Total execution time = ∑(Execution Time(Task)) What if tasks are independent? Why don’t we execute them in parallel?

Parallel Programs Total execution time can be reduced if tasks run in parallel. Problem: User is responsible for defining tasks. Dividing a program into tasks. What each task must do. How each task…  Communicates.  Synchronizes.

Parallel Programming Models Serial programs can be hard to design and debug. Parallel programs are even harder Models are needed so programmers can create and understand parallel programs. A model is needed that allows: a)A single application to be defined. b)Application to take advantage of parallel computing resources. c)Programmer to reason about how the parallel program will execute, communicate, and synchronize. d)Application to be portable to different architectures and platforms.

Parallel Programming Paradigms What is a “Programming Paradigm”? AKA Programming Model. Defines the abstractions that a programmer can use when defining a solution to a problem. Parallel programming implies that there are concurrent operations. So what are typical concurrency abstractions… Tasks:  Threads  Processes. Communication:  Shared-Memory.  Message-Passing.

Shared-Memory Model Global address space for all tasks. A variable, X, is shared by multiple tasks. Synchronization is needed in order to keep data consistent. Example - Task A gives Task B some data through X. Task B shouldn’t read X until Task A has put valid data in X. NOTE: Task B and Task A operate on the exact same piece of data, so their operations must be in synch. Synchronization is done with:  Semaphores.  Mutexes.  Condition Variables.

Message-Passing Model Tasks have their own address space. Communication must be done through the passing of messages. Copies data from one task to another. Synchronization is handled automatically for the programmer. Example - Task A gives Task B some data. Task B listens for a message from Task A. Task B then operates on the data once it receives the message from Task A. NOTE - After receiving the message Task B and Task A have independent copies of the data.

Comparing the Models Shared-Memory (Global address space). Inter-task communication is IMPLICIT! Every task communicates with shared data. Copying of data is not required. User is responsible for correctly using synchronization operations. Message-Passing (Independent address spaces). Inter-task communication is EXPLICIT! Messages require that data is copied. Copying data is slow --> Overhead! User is not responsible for synchronization operations, just for sending data to and from tasks.

Shared-Memory Example Communicating through shared data. Protection of critical regions. Interference can occur if protection is done incorrectly, b/c tasks are looking at the same data. Task A Mutex_lock(mutex1) Do Task A’s Job - Modify data protected by mutex1 Mutex_unlock(mutex1) Task B Mutex_lock(mutex1) Do Task B’s Job - Modify data protected by mutex1 Mutex_unlock(mutex1)

Shared-Memory Diagram

Message-Passing Example Communication through messages. Interference cannot occur b/c each task has its own copy of the data. Task A Receive_message(TaskB, dataInput) Do Task A’s Job - dataOutput = f A (dataInput) Send_message(TaskB, dataOutput) Task B Receive_message(TaskA, dataInput) Do Task B’s Job - dataOutput = f B (dataInput) Send_message(TaskA, dataOutput)

Message-Passing Diagram

Comparing the Models (Again) Shared-Memory The idea of data “ownership” is not explicit. (+) Program development is simplified and can be done more quickly.  Interfaces do not have to be clearly defined. (-) Lack of specification (and lack of data locality) may lead to difficult code to manage and maintain. (-) May be hard to figure out what the code is actually doing. Shared-memory doesn’t require copying. (+) Very lightweight = Less Overhead and More Concurrency. (-) May be hard to scale - Contention for a single memory.

Comparing the Models (Again, 2) Message-Passing Passing of data is explicit. Interfaces must be clearly defined  (+) Allows a programmer to reason about which tasks communicate and when  (+) Provides a specification of communication needs.  (-) Specifications take time to develop. Message-passing requires copying of data. (+) Each task “owns” its own copy of the data. (+) Scales fairly well.  Separate memories = Less contention and More concurrency. (-) Message-passing may be too “heavyweight” for some apps.

Which Model Is Better? Neither model has a significant advantage over the other. However, implementations can be better than one another. Implementations of each of the models can use underlying hardware of a different model. Shared-memory interface on a machine with distributed memory. Message-passing interface on a machine that uses a shared-memory model.

Using a Programming Model Most implementations of programming models are in the form of libraries. Why? C is popular, but has no support. Application Programmer Interfaces (APIs) The interface to the functionality of the library. Enforces policy while holding mechanisms abstract. Allows applications to be portable.  Hide details of the system from the programmer.  Just as a HLL and a compiler hide the ISA of a CPU.  A parallel programming library should hide the…  Architecture, interconnect, memories, etc.

Popular Libraries Shared-Memory POSIX Threads (Pthreads) OpenMP Message-Passing MPI

Popular Operating Systems (OSes) Linux “Normal” Linux Embedded Linux ucLinux eCos Maps POSIX calls to native eCos-Threads. HybridThreads (Hthreads) - Soon to be popular? OS components are implemented in hardware for super low-overhead system services. Maps POSIX calls to OS components in HW (SWTI). Provides a POSIX-compliant wrapper for computations in hardware (HWTI).

Threads are Lightweight…

POSIX Thread API Classes Thread Management Work directly with threads. Creating, joining, attributes, etc. Mutexes Used for synchronization. Used to “MUTually EXclude” threads. Condition Variables Used for communication between threads that use a common mutex. Used for signaling several threads on a user-specified condition.

References/Sources Introduction to Parallel Computing (LLNL) POSIX Thread Programming (LLNL)