Presentation is loading. Please wait.

Presentation is loading. Please wait.

 The PFunc Implementation of NAS Parallel Benchmarks. Presenter: Shashi Kumar Nanjaiah Advisor: Dr. Chung E Wang Department of Computer Science California.

Similar presentations


Presentation on theme: " The PFunc Implementation of NAS Parallel Benchmarks. Presenter: Shashi Kumar Nanjaiah Advisor: Dr. Chung E Wang Department of Computer Science California."— Presentation transcript:

1  The PFunc Implementation of NAS Parallel Benchmarks. Presenter: Shashi Kumar Nanjaiah Advisor: Dr. Chung E Wang Department of Computer Science California State University, Sacramento

2 Overview. The goal of this project is to prove the efficacy of task parallelism in PFunc to parallelize industry-standard benchmark computation kernels and applications on shared-memory.  Introduce PFunc, a new tool for task parallelism.  New features and extensions.  Fibonacci example.  Introduce NAS parallel benchmarks.  Briefly explain the 7 benchmarks.

3 Background PFunc - A new tool for task parallelism.  Extends existing task parallel feature-set.  Cilk, Threading Building Blocks, Fortran M, etc.  Portable.  Linux, OS X, AIX and Windows.  Customizable.  Generic and generic programming techniques.  No runtime penalty.  C and C++ APIs.  Released under Eclipse Public License v1.0.  http://coin-or.org/projects/PFunc.xml

4 Example: Parallelizing Fibonacci numbers. typedef struct {int n; int fib_n;} fib_t; void fibonacci (void arg) { fib_t* fib_arg = (fib_t*) arg; if (0 == fib_arg-> n || 1 == fib_arg-> n{ fib_arg-> fib_n = fib_arg-> n; }else{ pfunc_cilk_task_t fib_task; fib_t fib_n_1 = {(fib_arg-> n) - 1, 0}; fib_t fib_n_2 = {(fib_arg-> n) – 2, 0}; pfunc_cilk_task_init (&fib_task); pfunc_cilk_spawn_c (fib_task, /* Handle to the task* / NULL, /* Attribute -- use default */ NULL, /* Group -- use default */ fibonacci, /* Function to execute */ &fib_n_1); /* Argument */ fibonacci (&fib_n_2); pfunc_cilk_wait (fib_task); pfunc_cilk_task_clear (&fib_task); fib_arg-> fib_n = fib_n_1.fib_n + fib_n_2.fib_n; }}

5 Fibonacci: task creation overhead. Fibonacci number 37 (2 36 ≈ 69 billion tasks).  2x faster than TBB!  Only 2x slower than Cilk.  But provides more flexibility!  Fibonacci is the worst case behavior.  Library-based rather than a custom compiler. ThreadsCilk Time (secs)PFunc/CilkTBB/CilkPfunc/TBB 12.172.21784.43100.5004 21.152.11354.19240.5041 40.552.21314.41830.5009 80.282.21144.98390.4437 160.152.49445.93700.4201

6 NAS Parallel Benchmarks.  Stands for NASA Advanced Supercomputing.  Help to evaluate performance of parallel tools and machines.  Consist of 5 kernels and 3 pseudo applications.  Taken mostly from Computational Fluid Dynamics (CFD).  Originally written in Fortran, but C versions are available.  http://www.nas.nasa.gov/Resources/Software/npb.html  NPB OpenMP-C v2.3.  Base code taken from Omni group’s implementation.

7 NAS Parallel Benchmarks. BenchmarkExplanation Embarrassingly Parallel (EP)Gaussian random varieties. Marsaglia polar method. Multigrid (MG)3-dimensional discrete Poisson equation. Conjugate Gradient (CG)Iterative solver for linear systems. Symmetric positive-definite matrices. Integer sort (IS)Bucket sort. LU Solver (LU)Lower-upped symmetric Gauss-Seidel. System of nonlinear equations. Pentadiagonal solver (SP)System of nonlinear equations. Block tridiagonal solver (BT)System of nonlinear equations.

8 Conclusion.  Modify data-parallel NPB OpenMP-C version to task parallel version.  Compare against original NPB OpenMP-C version.  For problem sizes in classes A, B and C.


Download ppt " The PFunc Implementation of NAS Parallel Benchmarks. Presenter: Shashi Kumar Nanjaiah Advisor: Dr. Chung E Wang Department of Computer Science California."

Similar presentations


Ads by Google