Multiprocessing & The.Net Parallel Extensions Guy Ben Haim Senior Application Engineer Intel Asaf Shelly.

Slides:



Advertisements
Similar presentations
Advanced Troubleshooting with Debug Diagnostics on IIS 6
Advertisements

Parallel Performance Tools in Visual Studio 2010.
Designing InfoPath Forms: The Dos and Donts Deploying InfoPath Forms: Making the right choice Adding custom business logicin case the built-in stuff isnt.
The Brave New World of Software Adam Kemp Staff Software Engineer National Instruments.
Parallelism Lecture notes from MKP and S. Yalamanchili.
Parallel Extensions to the.NET Framework Daniel Moth Microsoft
Computer Abstractions and Technology
Chapter 6: Process Synchronization
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Advanced Troubleshooting with Debug Diagnostics on IIS 6 Draft 2.5 5/13/06 NameTitleGroup Microsoft Corporation.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Windows Server 2008 Core Eyal Malach Senior Instructor - Hi-Tech College Infrastructure Consultant - Calanit Carmon
Visual Studio 2005 C# IDE Enhancements Luca Bolognese C# Program Manager Microsoft Corporation.
Parallel Programming in.NET Kevin Luty.  History of Parallelism  Benefits of Parallel Programming and Designs  What to Consider  Defining Types of.
1 Advanced Computer Programming Concurrency Multithreaded Programs Copyright © Texas Education Agency, 2013.
Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven / Aachen, Germany Stand: Version 2.3.
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
© 2011 Autodesk Single Job 1 Processor 1 Single Job 2 Single Job 3 Processor 2 Processor 3 Big Job 1 Big Job 2 Single Job 4 Processor 1 Single Job 5 Single.
 Lynne Hill General Manager Parallel Computing Platform Visual Studio.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Multi-Threading and Load Balancing Compiled by Paul TaylorCSE3AGR Stolen mainly from Orion Granatir
The Team About Me Microsoft MVP Intel Blogger TechEd Israel, TechEd Europe HPC NT, CE, DDK, C#, Asp.Net, DirectShow, 8051, …
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
The University of Adelaide, School of Computer Science
About Me Microsoft MVP Intel Blogger TechEd Israel, TechEd Europe Expert C++ Book
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
CS162 Week 5 Kyle Dewey. Overview Announcements Reactive Imperative Programming Parallelism Software transactional memory.
Thinking in Parallel – Implementing In Code New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
DEV394 Windows Forms Performance Tips And Tricks Mike Henderlight Development Manager.NET Client Team Microsoft Corporation
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Tuning Threaded Code with Intel® Parallel Amplifier.
MAHARANA PRATAP COLLEGE OF TECHNOLOGY SEMINAR ON- COMPUTER PROCESSOR SUBJECT CODE: CS-307 Branch-CSE Sem- 3 rd SUBMITTED TO SUBMITTED BY.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Using the VTune Analyzer on Multithreaded Applications
5/15/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
CS427 Multicore Architecture and Parallel Computing
Parallel Software Development with Intel Threading Analysis Tools
Tech Ed North America /20/2018 7:07 AM Required Slide
Atomic Operations in Hardware
Atomic Operations in Hardware
Async or Parallel? No they aren’t the same thing!
Lighting Up Windows Server 2008 R2 Using the ConcRT on UMS
The University of Adelaide, School of Computer Science
Task Parallel Library: Design Principles and Best Practices
Using Dynamic Languages to Build Scriptable Apps
Morgan Kaufmann Publishers
Many-core Software Development Platforms
EE 193: Parallel Computing
Staying Afloat in the .NET Async Ocean
TechEd /14/2018 6:26 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
C++ Forever: Interactive Applications in the Age of Manycore
Developer Patterns to Integrate Silverlight 4.0 with SharePoint 2010
12/5/ :14 PM © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Building responsive apps and sites with HTML5 web workers
F# for Parallel and Asynchronous Programming
Tech·Ed North America /8/ :16 PM
Team Foundation Server 2010 for Everyone
TechEd /9/2018 4:17 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Pedro Miguel Teixeira Senior Software Developer Microsoft Corporation
Multithreading Why & How.
Foundations and Definitions
TechEd /12/ :12 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

Multiprocessing & The.Net Parallel Extensions Guy Ben Haim Senior Application Engineer Intel Asaf Shelly Senior Consultant Pacific Software

Session Objectives and Agenda Multicore Parallel Software.Net Parallel Extensions Q&A Summary

What is Multicore Pentium Pentium Processor Dual Core Quad Core

Moore’s Law – GHz to Multicore Performance 2006 Intel MC Assistance Threading Multi-tasking Training Tools Performance Through Multi-Core frequency -+

Intel Processor Advancement Multiple execution cores ramping across Intel platforms

Why Multi Core? Power Performance 2 GHz 100%

CPU that is 20% Faster Power Performance 2.4 GHz2 GHz 174% 100% 113% 100%

CPU that is 20% Slower Power Performance 1.6 GHz 100% 2 GHz 50% 87% 2.4 GHz 174% 100% 113%

Multi Core: Energy Efficient Performance Power Performance 1.6 GHz 100% 2 GHz 100% 174% 2.4 GHz 174% 100% 113% 174%

What does it mean Multi Cores? Performance

Start Thinking Parallel Software Today Instructions – Assembly – Making it workInstructions – Assembly – Making it work Thinking like a CPUThinking like a CPU Functions – C, Pascal, Basic – Faster CodeFunctions – C, Pascal, Basic – Faster Code Procedural ThinkingProcedural Thinking Objects – C++, Java, C#, Delphi, VB – Manage CodeObjects – C++, Java, C#, Delphi, VB – Manage Code OOD, OOP – Thinking in objectsOOD, OOP – Thinking in objects Tasks - ? – Optimize RuntimeTasks - ? – Optimize Runtime Thinking Parallel

Situation Today Experts, Freelance Specialists, Skilled Groups API is not intuitive Hard to understand execution flow Problematic Design Patterns Little awareness of tools Hidden Problems Hard to test and debug

Understanding Parallel Computing Resources Ownership Global data / Shared data Collisions and Race Conditions Task Design Conjunction Points

Task Oriented Design Modify Write Open Modify Scan

Simple For for ( int y = 0; y < bmp.Height; y++ ) { for ( int x = 0; x < bmp.Width; x++ ) { Pixels[ x, y ] = bmp.GetPixel( x, y ); }

Parallel For Parallel.For( 0, bmp.Height, y => { for ( int x = 0; x < bmp.Width; x++ ) { Pixels[ x, y ] = bmp.GetPixel( x, y ); } });

.Net Parallel Extensions - Performance

Parallel Class Parallel.For Parallel.Do Parallel.ForEach Inplace code / Function Object Type

Parallel Do Parallel Quick Sort: void QuicksortParallel(,, ) { int pivot = Partition(arr, left, right); Parallel.Do( () => QuicksortParallel(arr, left, pivot - 1), () => QuicksortParallel(arr, pivot + 1, right)); }

PLINQ

.Net Parallel Extensions – PLINQ

Task Parallel Library Parallel For, Do, ForEach PLINQ Tasks over Threads Tasks over Cores TaskManager Conjunction Points

.Net Parallel Extensions – Tasks Parallel Library

.Net Parallel Extensions – RayTracer

Tips Shared are Globals Parallel Loops are not loops Define data as Loop internal Race Conditions are still here Don’t use Locks!! Don’t use MUTEXs

Threading Tools  Intel® Thread Checker Used to create correct multi- threaded code  Intel® Thread Profiler Used to analyze performance Intel Software Solutions Group:

Data Race example Serial program What is value of A_SUM: A_Sum = 4 R S1: x = 1.0; y = 2.0 ; A1 = 0; S2: A1 = x * y; S3: A_SUM = 2 * A1; x y A1

Data Race example (Cont.) Initiate x = 1.0; y = 2.0 ; A1 = 0; Thread1 A1 = x * y Thread2 A_SUM = 2 * A1 What is value of x if: Thread1 runs before Thread2? Thread2 runs before Thread1? Execution order is not guaranteed x y A_Sum = 4 A_Sum = 0 A1

Intel® Thread Checker Diagnostics

Source Code Viewer

Performance Profile Threads Speedup Possible causes for this scalability profile: 1.Insufficient parallel work 2.Load imbalance 3.Synchronization overhead 4.Memory bandwidth limitations

Finding Serial and Parallel Time

Load Imbalance Multi Threading should be managed Multi Threading should be managed  Programming should consider load imbalance

Load Imbalance Unequal work loads lead to idle threads and wasted time Busy Idle Time Thread 0 Thread 1 Thread 2 Thread 3 Start threads Join threads

Synchronization  Programming should consider Synchronizations issues

Synchronization By definition, synchronization serializes execution Lock contention means more idle time for threads Busy Idle In Critical Thread 0 Thread 1 Thread 2 Thread 3 Time

Real example : Before fix Serial Parallel Switching Overhead

Real example: After fix Serial Parallel 2 X Speed Up

Summary Parallelize or Perish !

Do we really want Parallel Code? Do users even care?

Change In Mindset Everything is stopped. Waiting for the photographer Everyone is working independently

Developers are writing functions Developers are managing tasks

Doing things the way we always have Things are going to be different

Keep yourself in the loop Public event by Pacific Software Register to the User Group Asynchronous Operations Web Site has all the online resources that you need... and more Register to my five day course titled Multiprocessing Traps and Pitfalls Use our poster to let people know that you know

Resources Download the Microsoft.Net Parallel Extensions bc7f180ba&displaylang=en bc7f180ba&displaylang=en Asynchronous Operations Web Site Intel’s Multicore Pacificsoft Training and Consulting Microsoft Forum for Parallel Computing

Make a difference Let us know what you think Feedback for the.Net Parallel Extensions Dev team Video blog about parallel computing Fill the feedback form …

כדאי למלא משוב ! איך ממלאים? בעקבות מייל שישלח בסיום כל יום, ב-Business Center במתחם HP, בעמדות האינטרנט במלונות הילטון ודן מילאת משוב - מגיעה לך חולצת Live It! מילאת משוב בשלושת ימי הכנס? יש לך הזדמנות לזכות בכרטיס טיסה לתאילנד מתנת סוכנות BTC, מכשיר בלאק ג'ק מתנת סמסונג, מכשיר HTC מתנת ניופאן, מדיה סנטר מתנת DataSafe ועוד...

© 2007 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.