Presentation is loading. Please wait.

Presentation is loading. Please wait.

MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.

Similar presentations


Presentation on theme: "MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick."— Presentation transcript:

1 MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick

2 MULTICORE PROCESSING Why we care

3 What is it?  A processor with more than one core on a single chip  Core: An independent system capable of processing instructions and modifying registers and memory

4 Motivation  Advancements in component technology and optimization are limited in contribution to processor speed  Many CPU applications attempt to do multiple things at once:  Video editing  Multi-agent simulation  So, use multiple cores to get it done faster

5 Hurdles  Instruction assignment (who does what?)  Mostly delegated to the operating system  Can be done to a small degree through dependency analysis on the chip  Cores must still communicate at times – how?  Shared-memory  Message passing

6 Advantages  Multiple Programs:  Can be separated between cores  Other programs don’t suffer when one hogs CPU  Multi-threaded Applications:  Independent threads don’t have to wait as long for each other – results in faster overall execution  VS Multiple Processors  Less distance between chips - faster communication results in higher maximum clock rate  Less expensive due to smaller overall chip area, shared components (caches, etc.)

7 Disadvantages  OS and programs must be optimized for multiple cores, or no gain will be seen  In a singly-threaded application, little to no improvement  Overhead in assigning tasks to cores  Real bottleneck is typically memory and disk access time – independent of number of cores

8 Amdahl’s Law  Potential performance increase on a parallel computing platform is given by Amdahl’s law.  Large problems are made up of several parallelizable parts and non-parallelizable parts. S = 1/(1-P) S = speed-up of program P = fraction of program that is parallizable

9 Current State of the Art  Commercial processors:  Most have at least 2 cores  Quad-core are highly popular for desktop applications  6-core processors have recently appeared on the market (Intel’s i7 980X)  8-core exist but are less common  Academic and research:  MIT: RAW 16-core  Intel Polaris – 80-core  UC Davis: AsAP – 36 and 167-core, individually-clocked

10 PARALLELISM

11 What is Parallel Computing?  Form of computation in which many calculations are carried out simultaneously.  Operating on the principle that large problems can often be divided into smaller ones, which are solved concurrently.

12 Types of Parallelism  Bit level parallelism  Increase processor word size  Instruction level parallelism  Instructions combined into groups  Data parallelism  Distribute data over different computing environments  Task parallelism  Distribute threads across different computing environments

13 Flynn’s Taxonomy

14 Single Instruction, Single Data (SISD)  Provides no parallelism in hardware  1 data stream processed by the CPU in 1 clock cycle  Instructions executed in serial fashion

15 Multiple Instruction, Single Data (MISD)  Process single data stream using multiple instruction streams simultaneously  More theoretical model than practical model

16 Single Instruction, Multiple Data (SIMD)  Single instruction steam has ability to process multiple data streams in 1 clock cycle  Takes operation specified in one instruction and applies it to more than 1 set of data elements at 1 time  Suitable for graphics and image processing

17 Multiple Instruction, Multiple Data (MIMD)  Different processors can execute different instructions on different pieces of data  Each processor can run independent task

18 Automatic parallelization  The goal is to relieve programmers from the tedious and error-prone manual parallelization process.  Parallelizing compiler tries to split up a loops so that its iterations can be executed on separate processors concurrently  Identify dependences between references -- independent actions can operate in parallel

19 Parallel Programming languages  Concurrent programming languages, libraries, API’s, and parallel programming models have been created for programming parallel computers.  Parallel languages make it easier to write parallel algorithms  Resulting code will run more efficiently because the compiler will have more information to work with  Easier to identify data dependencies so that the runtime system can implicitly schedule independent work

20 MULTITHREADING TECHNIQUES

21 fork()  Make a (nearly) exact duplicate of the process  Good when there is no or almost no need to communicate between processes  Often used for servers

22 fork() Parent Globals Heap Stack Child Globals Heap Stack Child Globals Heap Stack Child Globals Heap Stack Child Globals Heap Stack

23 fork() pid_t pID = fork(); if (pID == 0) { //child } else { //parent }

24 POSIX Threads  C library for threading  Available in Linux, OS X  Shared Memory  Threads are created and destroyed manually  Has mechanisms for locking memory

25 POSIX Threads Process Globals Heap Thread Stack Thread Stack Thread Stack Thread Stack

26 POSIX Threads pthread_t thread; pthread_create( &thread, NULL, function_to_call, (void*) data); //Do stuff pthread_join(thread, NULL);

27 POSIX Threads int total = 0; void do_work() { //Do stuff to create “result” total = total + result; } Thread 1 reads total (0) Thread 2 reads total (0) Thread 1 does add and saves total (1) Thread 2 does add and saves total (2)

28 POSIX Threads int total = 0; pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; void do_work() { //Do stuff to create “result” pthread_mutex_lock( &mutex ); total = total + result; pthread_mutex_unlock( &mutex ); }

29 OpenMP  Library and compiler directives for multi-threading  Support in Visual C++, gcc  Code compiles even if compiler doesn't support OpenMP  Popular in high performance communities  Easy to add parallelism to existing code

30 OpenMP Initialize an Array const int array_size = 100000; int i, a[array_size]; #pragma omp parallel for for (i = 0; i < array_size; i++) { a[i] = 2 * i; }

31 OpenMP Reduction #pragma omp parallel for reduction(+:total) for(i = 0; i < array_size; i++) { total = total + a[i]; }

32 Grand Central Dispatch Apple Technology for Multi-Threading Programmer puts work into queues A system central process determines the number threads to give to each queue Add code to queues using a closure Right now Mac only, but open source Easy to add parallelism to existing code

33 Grand Central Dispatch Initialize an Array dispatch_apply(array_size, dispatch_get_global_queue(0, 0), ^(int i) { a[i] = 2*i; });

34 Grand Central Dispatch GUI Example void analyzeDocument(doc) { do_analysis(doc); //May take a very long time update_display(); }

35 Grand Central Dispatch GUI Example void analyzeDocument(doc) { dispatch_async(dispatch_get_global_queue(0, 0), ^{ do_analysis(doc); update_display(); }); }

36 Other Technologies  Threading in Java, Python, etc.  MPI – for clusters

37 QUESTIONS?

38 Supplemental Reading  Introduction to Parallel Computing  https://computing.llnl.gov/tutorials/parallel_comp/#A bstract https://computing.llnl.gov/tutorials/parallel_comp/#A bstract  Introduction to Multi-Core Architecture  http://www.intel.com/intelpress/samples/mcp_samplec h01.pdf http://www.intel.com/intelpress/samples/mcp_samplec h01.pdf  CPU History: A timeline of microprocessors  http://everything2.com/title/CPU+history%253A+A+t imeline+of+microprocessors http://everything2.com/title/CPU+history%253A+A+t imeline+of+microprocessors


Download ppt "MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick."

Similar presentations


Ads by Google