Presentation is loading. Please wait.

Presentation is loading. Please wait.

Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler.

Similar presentations


Presentation on theme: "Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler."— Presentation transcript:

1 Saman Amarasinghe

2 Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler do all the work Maintain the current strong machine abstraction SUIF Parallelizing Compiler Monica Lam and the Stanford SUIF team 1993 – 1997 Automatically extract parallelism from sequential programs Heroic Analysis Interprocedural analysis Array and scalar data-flow analysis Reduction and recurrence recognition C to FORTRAN Achieved Best SPEC results of the day Vector processor Cray C90540 Uniprocessor Digital 21164508 SUIF on 8 processors Digital 84001,016 But… Techniques were not robust for general use s p i c e 2 g 6 d o d u c f p p p p o r a m d l j d p 2 w a v e 5 m d l j s p 2 a l v i n n n a s a 7 e a r h y d r o 2 d s u 2 c o r t o m c a t v s w m 2 5 6 N u m b e r o f P r o c e s s o r s 1 2 3 4 5 6 7 8 0 200 400 600 800 1000 1200 M F L O P S

3 Composition is key to building large systems Implemented naturally via time-multiplexing The framework for parallelizing sequential programs Sequential parts at outermost Global barriers

4 Speedup = 1/(1– p + p/N) Utilization = 1/(p + N*(1 – p)) Utilization Number of cores Expected Year

5 Speedup = 1/(1– p + p/N) Utilization = 1/(p + N*(1 – p)) Utilization Number of cores Expected Year % parallel

6 Speedup = 1/(1– p + p/N) Utilization = 1/(p + N*(1 – p)) Utilization Number of cores Expected Year % parallel

7 Speedup = 1/(1– p + p/N) Utilization = 1/(p + N*(1 – p)) Utilization Number of cores Expected Year % parallel

8 Currently… Theory, algorithms, languages, tools all centered around the sequential paradigm A well enforced machine abstraction Move to muticore is a fundamental shift Akin to analog design to digital shift

9 Need a new abstraction where parallelism is the primary form of expression Parallelism is simple Parallelism is natural Communication is intuitive Parallel composition of sequential segments With possible space-multiplexed execution

10 Parallel programming still in the dark ages Elite community of practitioners Active open research, little stable consensus Assumption: we don’t know how to teach parallel programming! Aim for a “Mead and Conway” type revolution Develop simple, cookbook approaches If we can’t teach them, they’re too complex! Make them accessible Carefully thought-out courseware, tools, texts, courses Focus on the educational community Exporting, proselytizing, workshops, conferences, journals, …

11 1. Move to a truly parallel world (long term) Natural world is extremely parallel  learn to emulate it Can we make sequential programs a special case of parallel programming? 2. Rejoice when parallelism is natural (medium term) Switch to parallel languages if using them is easier than sequential languages 3. Help migrate legacy application (short term) Existing large body of code – cannot ignore! Written in sequential languages – need to work with them

12 Some domains are inherently parallel Coding them using a sequential language is… Harder than using the right parallel abstraction All information on inherent parallelism is lost There are win-win situations Increasing the programmer productivity while extracting parallel performance Streaming domain and the StreamIt experience

13 Picture Reorder joiner IDCT IQuantization splitter VLD macroblocks, motion vectors frequency encoded macroblocks differentially coded motion vectors spatially encoded macroblocks recovered picture ZigZag Saturation Channel Upsample Motion Vector Decode Y Cb Cr quantization coefficients picture type reference picture Motion Compensation reference picture Motion Compensation reference picture Motion Compensation Repeat Color Space Conversion MPEG bit stream Structured block level diagram describes computation and flow of data Conceptually easy to understand Clean abstraction of functionality Mapping to C (sequentialization) destroys this simple view MPEG-2 Decoder

14 add VLD(QC, PT1, PT2); add splitjoin { split roundrobin(N  B, V); add pipeline { add ZigZag(B); add IQuantization(B) to QC; add IDCT(B); add Saturation(B); } add pipeline { add MotionVectorDecode(); add Repeat(V, N); } join roundrobin(B, V); } add splitjoin { split roundrobin(4  (B+V), B+V, B+V); add MotionCompensation(4  (B+V)) to PT1; for (int i = 0; i < 2; i++) { add pipeline { add MotionCompensation(B+V) to PT1; add ChannelUpsample(B); } join roundrobin(1, 1, 1); } add PictureReorder(3  W  H) to PT2; add ColorSpaceConversion(3  W  H); Picture Reorder joiner IDCT IQuantization splitter VLD macroblocks, motion vectors frequency encoded macroblocks differentially coded motion vectors spatially encoded macroblocks recovered picture ZigZag Saturation Channel Upsample Motion Vector Decode Y Cb Cr quantization coefficients picture type reference picture Motion Compensation reference picture Motion Compensation reference picture Motion Compensation Repeat Color Space Conversion MPEG bit stream MPEG-2 Decoder

15 Task Parallelism Thread (fork/join) parallelism Parallelism explicit in algorithm Between filters without producer/consumer relationship Data Parallelism Data parallel loop (forall) Between iterations of a stateless filter Can’t parallelize filters with state Pipeline Parallelism Usually exploited in hardware Between producers and consumers Stateful filters can be parallelized MPEG-2 Decoder Picture Reorder joiner IDCT IQuantization splitter VLD macroblocks, motion vectors frequency encoded macroblocks differentially coded motion vectors spatially encoded macroblocks recovered picture ZigZag Saturation Channel Upsample Motion Vector Decode Y Cb Cr quantization coefficients picture type reference picture Motion Compensation reference picture Motion Compensation reference picture Motion Compensation Repeat Color Space Conversion MPEG bit stream

16 On a 16 core MIT Raw Processor (http://cag.csail.mit.edu/raw)

17 Don’t modify a code segment if… The performance impact is insignificant and is isolated from the rest Automatic parallelizer works perfectly Modify and annotate a segment if… Automatic parallelizer needs a little help Otherwise rewrite the segment Program Reincarnation A new body with the same old soul Still in Existing Sequential Languages Use a Parallel Language

18 .exe Original Compiler Original Compiler Original Binary Automatic Parallelization Automatic Parallelization Static Analysis Static Analysis Dynamic analysis Managed program execution Program invariant inference Application knowledge database Assisted parallelization GUI tool Correctness in reincarnated Test Generation Divergence Analysis Static analysis Automatic parallelization info for program understanding Learn about the domain Flag domain specific issues Generate domain-specific hints Bring programs to modern age Block diagram Refactoring identification Instrumenter and Binary interpreter Instrumenter and Binary interpreter Managed Program Execution Managed Program Execution Program Invariant Inference Engine Program Invariant Inference Engine.log Application Knowledge (program representation & invariants) Application Knowledge (program representation & invariants) Known Idiom Identification & Domain Hint Generation Known Idiom Identification & Domain Hint Generation Domain Knowledge Database Domain Knowledge Database Domain Knowledge Extraction Domain Knowledge Extraction Compiler & Instrumenter Compiler & Instrumenter Reincarna ted.c.exe Assisted Application Reincarnation Tool Managed Program Execution Managed Program Execution.log Test Generation Divergence Analysis Divergence Analysis Refactoring Identification Refactoring Identification Block Diagram Representation Block Diagram Representation Legacy Program Source File.c

19 Multicore menace will impact all of us in a big way Parallelism need to keep up with Moore’s curve Will definitely need new parallel languages where parallelism is the primary form of composition Low hanging fruit when parallelism is the natural form of expression However, cannot ignore the past investments

20 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler."

Similar presentations


Ads by Google