09/24/2010CS4961 CS4961 Parallel Programming Lecture 10: Thread Building Blocks Mary Hall September 24, 2009 1.

Slides:

Advertisements

Similar presentations

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.

Advertisements

Efficient Sparse Matrix-Matrix Multiplication on Heterogeneous High Performance Systems AACEC 2010 – Heraklion, Crete, Greece Jakob Siegel 1, Oreste Villa.

L15: Tree-Structured Algorithms on GPUs CS6963L15: Tree Algorithms.

Lecture 2 The Art of Concurrency 张奇复旦大学 COMP Data Intensive Computing 1.

March 15, 2004CS WPI1 CS 509 Design of Software Systems Lecture #8 Monday, March 15, 2004.

George Blank University Lecturer. CS 602 Java and the Web Object Oriented Software Development Using Java Chapter 4.

L15: Review for Midterm. Administrative Project proposals due today at 5PM (hard deadline) – handin cs6963 prop March 31, MIDTERM in class L15: Review.

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

Chapter 15 C-implementation PA = LU Speaker: Lung-Sheng Chien.

October, 1998DARPA / Melamed / Singh1 Parallelization of Search Algorithms for Modeling QTES Processes Joshua Kramer and Santokh Singh Rutgers University.

(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures  Dataflow  Systolic arrays  Neural networks.

L12: Sparse Linear Algebra on GPUs CS6963. Administrative Issues Next assignment, triangular solve – Due 5PM, Monday, March 8 – handin cs6963 lab 3 ”

1 Chapter 4 Threads Threads: Resource ownership and execution.

A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.

A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.

ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.

Section 8.1 – Systems of Linear Equations

L20: Putting it together: Tree Search (Ch. 6) November 29, 2011.

SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.

10/04/2011CS4961 CS4961 Parallel Programming Lecture 12: Advanced Synchronization (Pthreads) Mary Hall October 4, 2011.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.

Upcrc.illinois.edu OpenMP Lab Introduction. Compiling for OpenMP Open project Properties dialog box Select OpenMP Support from C/C++ -> Language.

An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.

L11: Sparse Linear Algebra on GPUs CS Sparse Linear Algebra 1 L11: Sparse Linear Algebra CS6235

L21: “Irregular” Graph Algorithms November 11, 2010.

1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.

Math 15 Lecture 10 University of California, Merced Scilab Programming – No. 1.

OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)

Exascale Programming Models Lecture Series 06/12/2014 What is OCR? TG Team (presenter: Romain Cledat) June 12,

L17: Introduction to “Irregular” Algorithms and MPI, cont. November 8, 2011.

MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.

Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j

09/21/2010CS4961 CS4961 Parallel Programming Lecture 9: Red/Blue and Introduction to Data Locality Mary Hall September 21,

Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Pipes & Filters Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.

10/02/2012CS4230 CS4230 Parallel Programming Lecture 11: Breaking Dependences and Task Parallel Algorithms Mary Hall October 2,

Principles of Linear Pipelining

09/07/2012CS4230 CS4230 Parallel Programming Lecture 7: Loop Scheduling cont., and Data Dependences Mary Hall September 7,

Copyright © 2011 Pearson Education, Inc. Solving Linear Systems Using Matrices Section 6.1 Matrices and Determinants.

9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,

CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/

© David Kirk/NVIDIA and Wen-mei W. Hwu, University of Illinois, CS/EE 217 GPU Architecture and Parallel Programming Lecture 11 Parallel Computation.

Threaded Programming Lecture 2: Introduction to OpenMP.

CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.

Lab 3. Why Compressed Row Storage –A sparse matrix has a lot of elements of value zero. –Using a two dimensional array to store a sparse matrix wastes.

10/05/2010CS4961 CS4961 Parallel Programming Lecture 13: Task Parallelism in OpenMP Mary Hall October 5,

Static DLX processor Understanding its architecture and available toolset.

Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.

L20: Sparse Matrix Algorithms, SIMD review November 15, 2012.

08/23/2012CS4230 CS4230 Parallel Programming Lecture 2: Introduction to Parallel Algorithms Mary Hall August 23,

Embedded Systems MPSoC Architectures OpenMP: Exercises Alberto Bosio

Data Parallel Computations and Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson, slides6c.ppt Nov 4, c.1.

Matrices Rules & Operations.

CS4961 Parallel Programming Lecture 11: Data Locality, cont

Lecture 25 More Synchronized Data and Producer/Consumer Relationship

Threads, Concurrency, and Parallelism

CS4230 Parallel Programming Lecture 12: More Task Parallelism Mary Hall October 4, /04/2012 CS4230.

Synchronization Lecture 23 – Fall 2017.

CS4961 Parallel Programming Lecture 12: Data Locality, cont

Lab. 3 (May 11th) You may use either cygwin or visual studio for using OpenMP Compiling in cygwin “> gcc –fopenmp ex1.c” will generate a.exe Execute :

Programming Languages

Parallel build blocks.

Lecture 2 The Art of Concurrency

Data Parallel Pattern 6c.1

Data Parallel Computations and Pattern

Data Parallel Computations and Pattern

Presentation transcript:

09/24/2010CS4961 CS4961 Parallel Programming Lecture 10: Thread Building Blocks Mary Hall September 24,

Administrative Programming assignment 2 is posted (after class) Due, Thursday, October 8 before class -Use the “handin” program on the CADE machines -Use the following command: “handin cs4961 prog2 ” Mailing list set up: Remote access to window machines: -term.coe.utah.edu 09/24/2010CS49612

Today’s Lecture Project 2 Thread Building Blocks Sources for Lecture: - -Intel Academic Partners program (see other slides) 09/24/2010CS49613

Project 2 Part I Open MP Problem 1 (Data Parallelism): The code from the last assignment models a sparse matrix vector multiply (updated in sparse_matvec.c). The matrix is sparse in that many of its elements are zero. Rather than representing all of these zeros which wastes storage, the code uses a representation called Compressed Row Storage (CRS), which only represents the nonzeros with auxiliary data structures to keep track of their location in the full array. Given youwrite.c, develop an OpenMP implementation of this code for 4 threads. You will also need to modify the initialization code as described below, and add timer functions. You will need to evaluate the three different scheduling mechanisms, static, dynamic and guided, and for two different chunk sizes of your choosing. I have provided three input matrices, sm1.txt, sm2.txt2 and sm3.txt3, which were generated from the MatrixMarket (see The format for these is a sorted coordinate representation (row, col, value) and will need to be converted to CRS. Measure the execution time for the sequential code and all three parallel versions, all three data set sizes and both chunk sizes. You will turn in the code, and a brief README file with the 21 different timings and an explanation of which strategies performed best and why. 09/24/2010CS49614

Project 2, cont. Part I Open MP, cont. Problem 2 (Task Parallelism): Producer-consumer codes represent a common form of a task parallelism where one task is “producing” values that another thread “consumes”. It is often used with a stream of data to implement pipeline parallelism. The program prodcons.c implements a producer/consumer sequential application where the producer is generating array elements, and the consumer is summing up their values. You should use OpenMP parallel sections to implement this producer- consumer model. You will also need a shared queue between the producer and consumer tasks to hold partial data, and synchronization to control access to the queue. Create two parallel versions: producing/consuming one value at a time, and producing/consuming 128 values at a time. Measure performance of the sequential code and the two parallel implementations and include these measurements in your README file. 09/24/2010CS49615

Project 2, cont. Part II Thread Building Blocks As an Academic Alliance member, we have access to Intel assignments for ThreadBuildingBlocks. We will use the assignments from Intel, with provided code that needs to be modified to use TBB constructs. You will turn in just your solution code. Problem 3 (Problem 1 in TBB.doc, Using parallel_for) Summary: Parallelize “mxm_serial.cpp” Problem 4 (Problem 3 in TBB.doc, Using recursive tasks) Summary: Modify implementation in rec_main.cpp All relevant files prepended with rec_ to avoid conflict. Problem 5 (Problem 4 in TBB.doc, Using the concurrent_hash_map container) Summary: Modify implementation in chm_main.cpp All relevant files prepended with chm_ to avoid conflict. 09/24/2010CS49616

Project 2, cont. Using OpenMP You can do your development on any machines, and use compilers available to you. However, the final measurements should be obtained on the quadcore systems in lab5. Here is how to invoke OpenMP for gcc and icc. -gcc: gcc –fopenmp prodcons.c -icc: icc –openmp prodcons.c 09/24/2010CS49617