MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Day 10 Threads. Threads and Processes  Process is seen as two entities Unit of resource allocation (process or task) Unit of dispatch or scheduling (thread.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 2 nd Edition Chapter 4: Threads.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Parallel Computing Multiprocessor Systems on Chip: Adv. Computer Arch. for Embedded Systems By Jason Agron.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Processes.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
CS 470/570:Introduction to Parallel and Distributed Computing.
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
國立台灣大學 資訊工程學系 Chapter 4: Threads. 資工系網媒所 NEWS 實驗室 Objectives To introduce the notion of a thread — a fundamental unit of CPU utilization that forms the.
Multi-core architectures. Single-core computer Single-core CPU chip.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Computer Organization David Monismith CS345 Notes to help with the in class assignment.
University of Washington What is parallel processing? Spring 2014 Wrap-up When can we execute things in parallel? Parallelism: Use extra resources to solve.
Multi-Core Development Kyle Anderson. Overview History Pollack’s Law Moore’s Law CPU GPU OpenCL CUDA Parallelism.
The fetch-execute cycle. 2 VCN – ICT Department 2013 A2 Computing RegisterMeaningPurpose PCProgram Counter keeps track of where to find the next instruction.
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
Parallel Computing.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition, Chapter 4: Multithreaded Programming.
Silberschatz, Galvin and Gagne ©2013Operating System Concepts – 9 th Edition Chapter 4: Threads Modified.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Chapter 4: Threads.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Threads. Thread A basic unit of CPU utilization. An Abstract data type representing an independent flow of control within a process A traditional (or.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Parallel Computing Presented by Justin Reschke
Processor Level Parallelism 1
Chapter 4 – Thread Concepts
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
These slides are based on the book:
CS203 – Advanced Computer Architecture
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Chapter 4: Multithreaded Programming
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Chapter 4: Threads.
Chapter 4: Threads.
Threads.
Parallel Processing - introduction
Chapter 4 – Thread Concepts
The University of Adelaide, School of Computer Science
What happens inside a CPU?
Chapter 4: Threads Overview Multithreading Models Thread Libraries
Multi-Processing in High Performance Computer Architecture:
Chapter 4: Threads.
Chapter 4: Threads.
Chapter 4: Threads.
Chapter 4: Threads.
Chapter 4: Threads.
Multithreaded Programming
Chapter 4: Threads.
Multithreading Why & How.
Chapter 4: Threads & Concurrency
Chapter 4 Multiprocessors
Chapter 4: Threads.
Chapter 4: Threads.
Chapter 4: Threads.
Chapter 4: Threads.
Chapter 4: Threads.
Chapter 4 Threads!!!.
Chapter 4: Threads & Concurrency
Presentation transcript:

MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick

MULTICORE PROCESSING Why we care

What is it?  A processor with more than one core on a single chip  Core: An independent system capable of processing instructions and modifying registers and memory

Motivation  Advancements in component technology and optimization are limited in contribution to processor speed  Many CPU applications attempt to do multiple things at once:  Video editing  Multi-agent simulation  So, use multiple cores to get it done faster

Hurdles  Instruction assignment (who does what?)  Mostly delegated to the operating system  Can be done to a small degree through dependency analysis on the chip  Cores must still communicate at times – how?  Shared-memory  Message passing

Advantages  Multiple Programs:  Can be separated between cores  Other programs don’t suffer when one hogs CPU  Multi-threaded Applications:  Independent threads don’t have to wait as long for each other – results in faster overall execution  VS Multiple Processors  Less distance between chips - faster communication results in higher maximum clock rate  Less expensive due to smaller overall chip area, shared components (caches, etc.)

Disadvantages  OS and programs must be optimized for multiple cores, or no gain will be seen  In a singly-threaded application, little to no improvement  Overhead in assigning tasks to cores  Real bottleneck is typically memory and disk access time – independent of number of cores

Amdahl’s Law  Potential performance increase on a parallel computing platform is given by Amdahl’s law.  Large problems are made up of several parallelizable parts and non-parallelizable parts. S = 1/(1-P) S = speed-up of program P = fraction of program that is parallizable

Current State of the Art  Commercial processors:  Most have at least 2 cores  Quad-core are highly popular for desktop applications  6-core processors have recently appeared on the market (Intel’s i7 980X)  8-core exist but are less common  Academic and research:  MIT: RAW 16-core  Intel Polaris – 80-core  UC Davis: AsAP – 36 and 167-core, individually-clocked

PARALLELISM

What is Parallel Computing?  Form of computation in which many calculations are carried out simultaneously.  Operating on the principle that large problems can often be divided into smaller ones, which are solved concurrently.

Types of Parallelism  Bit level parallelism  Increase processor word size  Instruction level parallelism  Instructions combined into groups  Data parallelism  Distribute data over different computing environments  Task parallelism  Distribute threads across different computing environments

Flynn’s Taxonomy

Single Instruction, Single Data (SISD)  Provides no parallelism in hardware  1 data stream processed by the CPU in 1 clock cycle  Instructions executed in serial fashion

Multiple Instruction, Single Data (MISD)  Process single data stream using multiple instruction streams simultaneously  More theoretical model than practical model

Single Instruction, Multiple Data (SIMD)  Single instruction steam has ability to process multiple data streams in 1 clock cycle  Takes operation specified in one instruction and applies it to more than 1 set of data elements at 1 time  Suitable for graphics and image processing

Multiple Instruction, Multiple Data (MIMD)  Different processors can execute different instructions on different pieces of data  Each processor can run independent task

Automatic parallelization  The goal is to relieve programmers from the tedious and error-prone manual parallelization process.  Parallelizing compiler tries to split up a loops so that its iterations can be executed on separate processors concurrently  Identify dependences between references -- independent actions can operate in parallel

Parallel Programming languages  Concurrent programming languages, libraries, API’s, and parallel programming models have been created for programming parallel computers.  Parallel languages make it easier to write parallel algorithms  Resulting code will run more efficiently because the compiler will have more information to work with  Easier to identify data dependencies so that the runtime system can implicitly schedule independent work

MULTITHREADING TECHNIQUES

fork()  Make a (nearly) exact duplicate of the process  Good when there is no or almost no need to communicate between processes  Often used for servers

fork() Parent Globals Heap Stack Child Globals Heap Stack Child Globals Heap Stack Child Globals Heap Stack Child Globals Heap Stack

fork() pid_t pID = fork(); if (pID == 0) { //child } else { //parent }

POSIX Threads  C library for threading  Available in Linux, OS X  Shared Memory  Threads are created and destroyed manually  Has mechanisms for locking memory

POSIX Threads Process Globals Heap Thread Stack Thread Stack Thread Stack Thread Stack

POSIX Threads pthread_t thread; pthread_create( &thread, NULL, function_to_call, (void*) data); //Do stuff pthread_join(thread, NULL);

POSIX Threads int total = 0; void do_work() { //Do stuff to create “result” total = total + result; } Thread 1 reads total (0) Thread 2 reads total (0) Thread 1 does add and saves total (1) Thread 2 does add and saves total (2)

POSIX Threads int total = 0; pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; void do_work() { //Do stuff to create “result” pthread_mutex_lock( &mutex ); total = total + result; pthread_mutex_unlock( &mutex ); }

OpenMP  Library and compiler directives for multi-threading  Support in Visual C++, gcc  Code compiles even if compiler doesn't support OpenMP  Popular in high performance communities  Easy to add parallelism to existing code

OpenMP Initialize an Array const int array_size = ; int i, a[array_size]; #pragma omp parallel for for (i = 0; i < array_size; i++) { a[i] = 2 * i; }

OpenMP Reduction #pragma omp parallel for reduction(+:total) for(i = 0; i < array_size; i++) { total = total + a[i]; }

Grand Central Dispatch Apple Technology for Multi-Threading Programmer puts work into queues A system central process determines the number threads to give to each queue Add code to queues using a closure Right now Mac only, but open source Easy to add parallelism to existing code

Grand Central Dispatch Initialize an Array dispatch_apply(array_size, dispatch_get_global_queue(0, 0), ^(int i) { a[i] = 2*i; });

Grand Central Dispatch GUI Example void analyzeDocument(doc) { do_analysis(doc); //May take a very long time update_display(); }

Grand Central Dispatch GUI Example void analyzeDocument(doc) { dispatch_async(dispatch_get_global_queue(0, 0), ^{ do_analysis(doc); update_display(); }); }

Other Technologies  Threading in Java, Python, etc.  MPI – for clusters

QUESTIONS?

Supplemental Reading  Introduction to Parallel Computing  bstract bstract  Introduction to Multi-Core Architecture  h01.pdf h01.pdf  CPU History: A timeline of microprocessors  imeline+of+microprocessors imeline+of+microprocessors