1 Introduction to Parallel Processing with Multi-core Part I Jie Liu, Ph.D. Professor Department of Computer Science Western Oregon University USA

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

Lecture 6: Multicore Systems
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
ICS 556 Parallel Algorithms Ebrahim Malalla Office: Bldg 22, Room
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
History of Distributed Systems Joseph Cordina
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Supercomputers Daniel Shin CS 147, Section 1 April 29, 2010.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
Parallel Computing Overview CS 524 – High-Performance Computing.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
Lecture 1: Introduction to High Performance Computing.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
Lecture 39: Review Session #1 Reminders –Final exam, Thursday 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through.
Prince Sultan College For Woman
Parallel Programming in.NET Kevin Luty.  History of Parallelism  Benefits of Parallel Programming and Designs  What to Consider  Defining Types of.
Department of Computer and Information Science, School of Science, IUPUI Dale Roberts, Lecturer Computer Science, IUPUI CSCI.
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
Lappeenranta University of Technology / JP CT30A7001 Concurrent and Parallel Computing Introduction to concurrent and parallel computing.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Operating Systems Lecture 02: Computer System Overview Anda Iamnitchi
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
Chapter 1 Performance & Technology Trends Read Sections 1.5, 1.6, and 1.8.
Computer Organization & Assembly Language © by DR. M. Amer.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
Processor Architecture
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Outline Why this subject? What is High Performance Computing?
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
1 A simple parallel algorithm Adding n numbers in parallel.
Introduction. News you can use Hardware –Multicore chips (2009: mostly 2 cores and 4 cores, but doubling) (cores=processors) –Servers (often.
MAHARANA PRATAP COLLEGE OF TECHNOLOGY SEMINAR ON- COMPUTER PROCESSOR SUBJECT CODE: CS-307 Branch-CSE Sem- 3 rd SUBMITTED TO SUBMITTED BY.
Why Parallel/Distributed Computing Sushil K. Prasad
What’s going on here? Can you think of a generic way to describe both of these?
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.
Introduction to Computers - Hardware
These slides are based on the book:
PRAM and Parallel Computing
Processing Device and Storage Devices
Introduction Super-computing Tuesday
Parallel Processing - introduction
The University of Adelaide, School of Computer Science
Hardware September 19, 2017.
What is Parallel and Distributed computing?
Introduction.
Introduction.
Parallel Processing Sharing the load.
3.1 Introduction to CPU Central processing unit etched on silicon chip called microprocessor Contain tens of millions of tiny transistors Key components:
Chapter 1 Introduction.
Dr. Tansel Dökeroğlu University of Turkish Aeronautical Association Computer Engineering Department Ceng 442 Introduction to Parallel.
By Brandon, Ben, and Lee Parallel Computing.
Chapter 4 Multiprocessors
Vrije Universiteit Amsterdam
Multicore and GPU Programming
Presentation transcript:

1 Introduction to Parallel Processing with Multi-core Part I Jie Liu, Ph.D. Professor Department of Computer Science Western Oregon University USA

2 Now the question – Why parallel?  Three things are for sure: Tax, death, and parallelism  How long does it take a single person to build I-5? Answer   What we do is that we want to solve a very computational intensive problem, such as modeling protein interacting with the water surrounding it. The problem could take a long long time. The protein simulation problem take a Cray X/MP 31,688 years to simulate 1 second of interaction (in 1990). Let’s say today super computer is 100 time faster than Cray X/MP, we still need more than 300 years! The only solution  parallel processing

3 Why parallel (2)  Moore’s Law The logic density of silicon-based IC (Integrated Circuits) closely followed the curve, that is, it doubles every year (until 1970, then every 18 months)  Why is the density related to processor’s speed? Because, during the process of “Computing,” the electrons need to carry signal from one end of a circuit to the other end.  For a 2GHz computer, its signals travel about.5 meters per clock cycle (.5 nanosecond)  That is, the speed of light places a physical limitation on how fast a sign processor computer can run

4 Why parallel (3)  There are problems require much faster computation power than today’s fastest single CPU computers can provide.  The speed of light limits how fast a single CPU computer can run  If we want to solve some computational intensive problems in a reasonable amount of time, we have to result to parallel computers!

5 Some Definitions  Parallel processing Information processing that emphasizes on concurrent manipulation of data belonging to many processes solving a single problem Example: having 100 processors sorting an array of 1,400,000,000 element – is Parallel processing Example: printing homework while reading s – is concurrent, but not Parallel processing because the processes are not solving the same problem.  A parallel computer is a multi-processor computer capable of parallel processing Computers with just co-processors for math and image processing are not considered as parallel computers (some people disagree with this notion)

6 Two forms of parallelisms  Control Parallelism Concurrency is achieve by applying different operations to different data elements of a single problem Pipeline is a special form of control parallelism  Assembly line is an example of pipeline  Data Parallelism Concurrency is achieve by applying the same operation to different data elements of a single problem  Taking a class is an example of data parallelism (if we assuming you all are learning at the same speed)  Marching of army brigade can be considered as data parallelism Note the granularity of the above examples

7 Control VS. Data Parallelism  Looking the following statement 1.if a[i] > b[i] 2. a[i] = a[i]*b[i] 3.else 4. b[i] = a[i]-b[i]  In a control parallelism fashion, some processors execute statement a[i] = a[i]*b[i], other may execute b[i] = a[i]-b[i] during the same clock cycle  In a data parallelism fashion, especially on a SIMD machine, this if statement is executed in two clock cycles: During the first clock cycle, all the processors satisfy the condition of a[i] > b[i] execute statement a[i] = a[i]*b[i]. During the second machine cycle, processors not satisfy the condition of a[i] > b[i] execute statement b[i] = a[i]-b[i]

8 Speedup – Take I  Speedup is a measurement of how well or how effective a parallel algorithm is  Is defined as the ratio between the time needed for the most efficient sequential algorithm to perform a computation and the time needed to perform the same computation on a parallel computer with a parallel algorithm. That is,  Example, we developed a parallel bubble sort that sort n elements in O(log n) time using n processors. The speedup is because there are efficient sorting algorithms that has a complexity of O(nlogn)

9 Brain Exercise  Six equally skilled students need to make 210 special cookies, each consists of the following tasks 1.Break dough into small pieces of equal size (1) 2.Hand roll the small size dough pieces into balls (1) 3.Press the balls flat for rolling (1) 4.Roll the flat dough into wrappers (1) 5.Place suitable amount of fillings onto the wrappers (1) 6.Fold the wrappers to enclose the fillings completely to finish making a cookie (1) How to do this in a pipeline fashion? How to do this in a control parallelism fashion, other than pipeline? How to do this in data parallel fashion?

10 Approach #1 D1 ~ D6 D7 ~ D12 T1T2T3T4T5T6T7T8T9T10T11T12 S S S S S S

11 Approach #2 T1T2T3T4T5T6T7T8T9T10T11T12 S S S S S S D1D2D3D4D5D7D6

12 Analysis  Sequential cost ( )*210 = 1260 time units  Maximum Speedup for Approach #1 ?  Maximum Speedup for Approach #2 ?  Other questions to consider If I have 1260 students, can I get the task done in 1 time unit? What if step 3 takes 3 time units and step 6 takes 2 time units? What if I add more “skilled” students to different approaches, what would be the effect?

13 Grand challenges  A list of problems that are very computational intensive, but can benefit human being greatly, heavily funded by the US government  The following is just the category of problems

14 Parallel Computers & Companies

15 One of the Fastest Computer  Per ttp://abcnews.go.com/Technology/WireStory?id= &page=2 ttp://abcnews.go.com/Technology/WireStory?id= &page=2  By: IBM and Los Alamos National Laboratory  Name: Roadrunner (Named after New Mexico’s state bird )  Twice as fast as IBM's Blue Gene, which is three time faster than the next fastest computer in the world  Cost $100,000,000 – very cheap  Speed 1,000,000,000,000,000 FLOP per second (petaflop)  Usage: primarily on nuclear weapons work, including simulating nuclear explosions  Related to gaming: In some ways, it's "a very souped-up Sony PlayStation 3."  Some facts:  The interconnecting system occupies 6,000 square feet with 57 miles of fiber optics and weighs 500,000 pounds. Although made from commercial parts, the computer consists of 6,948 dual-core computer chips and 12,960 cell engines, and it has 80 terabytes of memory housed in 288 connected refrigerator-sized racks.  Two years ago, the fastest computer in the world can perform 100,000,000,000,000 FLOP per second 100 taraflop

16 Parallel Computers and Programming – the trend  Hardware Super computers – multiprocessor/multicomputer – the fastest computers at the time Beowulf – cluster of off-the-shelf computers linked by a switch Othe distributed system such as NOW Multi-core – Many core (a CPU itself) within a CPU, soon will go over 60+ cores per CPU  Programming MPI for message passing architecture Vendor specific add-on to well known programming languages New language such as Microsoft’s F# Multi-core programming (add-on to well known programming languages)  Intel's Threading Building Blocks (TBB)Threading Building Blocks  Microsoft’s Task Parallel Library -- support Parallel For, PLINQ and etc, need to keep an eye on this one  Third party such as Jibu – may merge with MS

17 Multi-Core Programming  Sequential   Parallel 

18 Why Study Parallel Processing/Programming  Making your code run more efficiently  Utilize existing resources (other cores)  … …  Good coding class for CS students To learn something new To improve your skill sets To improve your problem solving skills To exercise your brain To review many Computer Science subject areas To relax a constraint our professors embedded in our thinking process in our early years of studying (What is the PC in a CPU?)

19 PRAM (Parallel Random Access Machine)  A theoretical parallel computer  Consists of a control unit, global memory, and an unbounded set of processors, each with its own memory.  In addition, Each processor has its unique id At each step, a active processor can Read/Write memory (global or private), perform the instruction as all other active processors, idle, or activate another processor  How many steps does it take to activate n processors

20 PRAM

21 Important Terms  computational intensive problem  Moore’s Law  Parallel processing  parallel computer  Control Parallelism  Data Parallelism  Speedup  Grand challenges  Massive Parallel Computer  Roadrunner  petaflop  Super computers  Beowulf  NOW  MPI  Multi-core  PRAM