Parallel Software Development with Intel Threading Analysis Tools

Slides:



Advertisements
Similar presentations
Parallelism Lecture notes from MKP and S. Yalamanchili.
Advertisements

Threads Relation to processes Threads exist as subsets of processes Threads share memory and state information within a process Switching between threads.
Starting Parallel Algorithm Design David Monismith Based on notes from Introduction to Parallel Programming 2 nd Edition by Grama, Gupta, Karypis, and.
INSTITUTE OF COMPUTING TECHNOLOGY An Adaptive Task Creation Strategy for Work-Stealing Scheduling Lei Wang, Huimin Cui, Yuelu Duan, Fang Lu, Xiaobing Feng,
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
1 An Evolution of Pattern Matching within Network Intrusion Detection Systems Erik Anderson 9 November 2006.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
Submitters:Vitaly Panor Tal Joffe Instructors:Zvika Guz Koby Gottlieb Software Laboratory Electrical Engineering Faculty Technion, Israel.
Multi-core processors. History In the early 1970’s the first Microprocessor was developed by Intel. It was a 4 bit machine that was named the 4004 The.
Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven / Aachen, Germany Stand: Version 2.3.
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
Multi-core Programming for Academia Intel Software College.
Multi-core Programming Threading Concepts. 2 Basics of VTune™ Performance Analyzer Topics A Generic Development Cycle Case Study: Prime Number Generation.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Threads by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Multi-core Programming Threading Methodology. 2 Topics A Generic Development Cycle.
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
Data Parallelism Task Parallel Library (TPL) The use of lambdas Map-Reduce Pattern FEN 20141UCN Teknologi/act2learn.
Single Node Optimization Computational Astrophysics.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
1 How to do Multithreading First step: Sampling and Hotspot hunting Myongji University Sugwon Hong 1.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Parallel Computing Presented by Justin Reschke
Tuning Threaded Code with Intel® Parallel Amplifier.
Lecture 5. Example for periority The average waiting time : = 41/5= 8.2.
Chapter 4 – Thread Concepts
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Introduction to threads
Using the VTune Analyzer on Multithreaded Applications
These slides are based on the book:
Chapter 4: Multithreaded Programming
PARALLEL COMPUTING Submitted By : P. Nagalakshmi
Welcome: Intel Multicore Research Conference
Multi-core processors
Chapter 4 – Thread Concepts
Pattern Parallel Programming
A task-based implementation for GeantV
Task Scheduling for Multicore CPUs and NUMA Systems
Multi-core processors
Parallel Algorithm Design
Core i7 micro-processor
Morgan Kaufmann Publishers
Challenges in Concurrent Computing
EE 193: Parallel Computing
Multi-Processing in High Performance Computer Architecture:
Chapter 3: Principles of Scalable Performance
Intel® Parallel Studio and Advisor
Chapter 4: Threads.
Chapter 4: Threads.
Parallel Processing Sharing the load.
Compiler Back End Panel
Summary Background Introduction in algorithms and applications
Compiler Back End Panel
Hybrid Programming with OpenMP and MPI
CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM
Multithreaded Programming
Parallelism and Amdahl's Law
Multithreading Why & How.
Chapter 4: Threads & Concurrency
Chapter 4: Threads.
Lecture 2 The Art of Concurrency
Introduction, background, jargon
Mattan Erez The University of Texas at Austin
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Parallel Implementation of Adaptive Spacetime Simulations A
Presentation transcript:

Parallel Software Development with Intel Threading Analysis Tools Author: Lihui Wang, Xianchao Xu Publisher: Intel Technology Journal Presenter: 歐子揚, 劉楷陽 Date: 2013/01/17

Outline Introduction Principles of Parallel Application Design Challenges of Parallel Programming Multiple Pattern Matching Algorithm Parallelization with the Win32 Threading Library Parallel Computing with Intel TBB Results

Introduction As multi-core processors become mainstream in the market place, software needs to be parallel to take advantage of multiple cores. In order to make life easier for developers, Intel provides a set of threading tools targeting various phases of the development cycle. In a generic development cycle, program development can be divided into four phases : Analysis phase Design/implementation phase Debug phase Testing/tuning phase

Intel’s threading tools provide aids for developers from performance analysis to implementation and debugging: Intel VTune Performance Analyzer Intel Thread Profiler Intel Thread Checker Intel Threading Building Blocks (Intel TBB)

Principles of Parallel Application Design Decomposition Techniques Functional decomposition Data decomposition Parallel Models Data parallel model Task parallel model Hybrid models

Amdahl’s Law

Challenges of Parallel Programming Parallel Overhead Synchronization Load Balance Granularity

Multiple Pattern Matching Algorithm Aho-Corasick Boyer-Moore Algorithm

Parallelization with the Win32 Threading Library

383.353ms -> 339.553ms

339.553ms -> 255.500ms

255.500ms -> 252.643ms 252.643ms -> 248.702ms The scalability compared to serial processing is 383.353/248.702 = 1.54 (the ideal is 1.867)

Parallel Computing with Intel TBB While the Win32 threading library gives programmers great flexibility by giving them detailed control over threads, the library brings challenges for them too. To make it easier for programmers to realize parallelism, Intel TBB provides a high-level generic implementation of parallel patterns and concurrent data structures.

Results

The average Scalability tbb=1. 655, and the average Scalabilitywin32=1 Considering the ideal scalability is 1.867, Intel TBB demonstrates good scaling performance on a dual core system.