Recognizing Potential Parallelism Introduction to Parallel Programming Part 1.

Slides:



Advertisements
Similar presentations
Guðmundur Helgi Axelsson Program Manager End of Day and Statement Posting.
Advertisements

Intel® Education Fluid Math™
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
INTEL CONFIDENTIAL Why Parallel? Why Now? Introduction to Parallel Programming – Part 1.
HEVC Commentary and a call for local temporal distortion metrics Mark Buxton - Intel Corporation.
Guðmundur Helgi Axelsson Program Manager Inventory and Replenishment.
Jeff Blucher Program Manager Store setup and POS.
Intel ® Server Platform Transitions Nov / Dec ‘07.
Online Channel Management
Intel® Education Read With Me Intel Solutions Summit 2015, Dallas, TX.
Yabin Liu Senior Program Manager Business Intelligence and Reporting.
Yabin Liu Senior Program Manager Credit Card Payment Processing.
Intel® Education Learning in Context: Science Journal Intel Solutions Summit 2015, Dallas, TX.
Scott Tucker Program Manager Customer and Loyalty.
INTEL CONFIDENTIAL Parallel Decomposition Methods Introduction to Parallel Programming – Part 2.
Recognizing Potential Parallelism Intel Software College Introduction to Parallel Programming – Part 1.
Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads.
Evaluation of a DAG with Intel® CnC Mark Hampton Software and Services Group CnC MIT July 27, 2010.
IBIS-AMI and Direction Indication February 17, 2015 Updated Feb. 20, 2015 Michael Mirmak.
Multi-core Programming: Basic Concepts. Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered.
K-12 Blueprint Overview March An Overview The K-12 Blueprint offers resources for education leaders involved.
Intel® Education Learning in Context: Concept Mapping Intel Solutions Summit 2015, Dallas, TX.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Legal Notices and Important Information Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Enterprise Platforms & Services Division (EPSD) JBOD Update October, 2012 Intel Confidential Copyright © 2012, Intel Corporation. All rights reserved.
Intel Confidential – For Use with Customers under NDA Only Revision - 01 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL®
IBIS-AMI and Direction Decisions
IBIS-AMI and Direction Indication February 17, 2015 Michael Mirmak.
Copyright © 2006 Intel Corporation. WiMAX Wireless Broadband Access: The World Goes Wireless Michael Chen Director of Product & Platform Marketing Group.
The Drive to Improved Performance/watt and Increasing Compute Density Steve Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.
Copyright © 2011 Intel Corporation. All rights reserved. Openlab Confidential CERN openlab ICT Challenges workshop Claudio Bellini Business Development.
Visit our Focus Rooms Evaluation of Implementation Proposals by Dynamics AX R&D Solution Architecture & Industry Experts Gain further insights on Dynamics.
Boxed Processor Stocking Plans Server & Mobile Q1’08 Product Available through February’08.
Processor Architecture
Josef Schauer Program Manager Previous version support.
Josef Schauer Program Manager Commerce Data Exchange.
INTEL CONFIDENTIAL Intel® Smart Connect Technology Remote Wake with WakeMyPC November 2013 – Revision 1.2 CDI/IBP #:
Meera Mahabala Program Manager Online storefront.
Feb 6-7, 2104 Hyatt Residency Bellevue. Yabin Liu Program Manager.
Josef Schauer Program Manager Retail headquarters setup.
Thinking in Parallel - Introduction New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.
Game Developers Conference 2009 Multithreaded AI For The Win! Orion Granatir Senior Software Engineer.
1 Game Developers Conference 2008 Comparative Analysis of Game Parallelization Dmitry Eremin Senior Software Engineer, Intel Software and Solutions Group.
Only Use FD.io VPP to Achieve high performance service function chaining Yi Intel.
Using Parallelspace TEAM Models to Design and Create Custom Profiles
BLIS optimized for EPYCTM Processors
Microsoft Dynamics Retail Conference 2014
Parallelspace PowerPoint Template for ArchiMate® 2.1 version 1.1
Parallelspace PowerPoint Template for ArchiMate® 2.1 version 2.0
Many-core Software Development Platforms
Digital Video Solutions For Any Content Anywhere March 2010
A Proposed New Standard: Common Privacy Vulnerability Scoring System (CPVSS) Jonathan Fox, Privacy Office/PDIT Harold A. Toomey, PSG/ISecG Jason M. Fung,
Ideas for adding FPGA Accelerators to DPDK
By Vipin Varghese Application Engineer (NCSD)
Expanded CPU resource pool with
Presentation transcript:

Recognizing Potential Parallelism Introduction to Parallel Programming Part 1

This course module is intended for single and academic use only Single users may utilize these course modules for personal use and individual training. Individuals or institutions may use these modules in whole or part in an academic environment providing that they are members of the Intel Academic Community and abide by its terms and conditionshttp://software.intel.com/en-us/academic INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel may make changes to specifications and product descriptions at any time, without notice. All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Intel, Intel Inside, and the Intel logo are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2008 Intel Corporation. DISCLAIMER AND LEGAL INFORMATION

What Is Parallel Computing? Attempt to speed solution of a particular task by 1. Dividing task into sub-tasks 2. Executing sub-tasks simultaneously on multiple processors Successful attempts require both 1. Understanding of where parallelism can be effective 2. Knowledge of how to design and implement good solutions

Clock Speeds Have Flattened Out Problems caused by higher speeds Excessive power consumption Heat dissipation Current leakage Power consumption critical for mobile devices Mobile computing platforms increasingly important Retail laptop sales now exceed desktop sales Laptops may be 35% of PC market in 2007

Multi-core Architectures Potential performance = CPU speed  # of CPUs Strategy: Limit CPU speed and sophistication Put multiple CPUs (“cores”) on a single chip Potential performance the same

Concurrency vs. Parallelism Concurrency: two or more threads are in progress at the same time: Parallelism: two or more threads are executing at the same time Multiple cores needed Thread 1 Thread 2 Thread 1 Thread 2

Improving Performance Use parallelism in order to improve turnaround or throughput Examples Automobile assembly line Each worker does an assigned function Searching for pieces of Skylab Divide up area to be searched US Postal Service Post office branches, mail sorters, delivery

Turnaround Complete single task in the smallest amount of time Example: Setting a dinner table One to put down plates One to fold and place napkins One to place utensils One to place glasses

Throughput Complete more tasks in a fixed amount of time Example: Setting up banquet tables Multiple waiters each do separate tables Specialized waiters for plates, glasses, utensils, etc.

Methodology Study problem, sequential program, or code segment Look for opportunities for parallelism Try to keep all processors busy doing useful work

Ways of Exploiting Parallelism Domain decomposition Task decomposition

Domain Decomposition First, decide how data elements should be divided among processors Second, decide which tasks each processor should be doing Example: Vector addition

Domain Decomposition Large data sets whose elements can be computed independently Divide data and associated computation among threads Example: Grading test papers Multiple graders with same key What if different keys are needed?

Domain Decomposition Find the largest element of an array

Domain Decomposition Find the largest element of an array Core 0Core 1Core 2Core 3

Domain Decomposition Find the largest element of an array Core 0Core 1Core 2Core 3

Domain Decomposition Find the largest element of an array Core 0Core 1Core 2Core 3

Domain Decomposition Find the largest element of an array Core 0Core 1Core 2Core 3

Domain Decomposition Find the largest element of an array Core 0Core 1Core 2Core 3

Domain Decomposition Find the largest element of an array Core 0Core 1Core 2Core 3

Domain Decomposition Find the largest element of an array Core 0Core 1Core 2Core 3

Domain Decomposition Find the largest element of an array Core 0Core 1Core 2Core 3

Domain Decomposition Find the largest element of an array Core 0Core 1Core 2Core 3

Domain Decomposition Find the largest element of an array Core 0Core 1Core 2Core 3

Domain Decomposition Find the largest element of an array Core 0Core 1Core 2Core 3

Task (Functional) Decomposition First, divide tasks among processors Second, decide which data elements are going to be accessed (read and/or written) by which processors Example: Event-handler for GUI

Task Decomposition Divide computation based on natural set of independent tasks Assign data for each task as needed Example: Paint-by-Numbers Painting a single color is a single task Number of tasks = number of colors Two artists: one does even, other odd

Task Decomposition f() s() r() q() h() g()

Task Decomposition f() s() r() q() h() g() Core 0 Core 2 Core 1

Task Decomposition f() s() r() q() h() g() Core 0 Core 2 Core 1

Task Decomposition f() s() r() q() h() g() Core 0 Core 2 Core 1

Task Decomposition f() s() r() q() h() g() Core 0 Core 2 Core 1

Task Decomposition f() s() r() q() h() g() Core 0 Core 2 Core 1

Recognizing Sequential Processes Time is inherently sequential Dynamics and real-time, event driven applications are often difficult to parallelize effectively Many games fall into this category Iterative processes The results of an iteration depend on the preceding iteration Audio encoders fall into this category Pregnancy is inherently sequential Adding more people will not shorten gestation

Summary Clock speeds will not increase dramatically Parallelism takes full advantage of multi-core processors Improve application turnaround or throughput Two methods for implementing parallelism Domain Decomposition Task Decomposition