Frequent Itemset Mining on Graphics Processors, Fang et al., DaMoN 2009. Turbo-charging Vertical Mining of Large Databases, Shenoy et al., MOD 2000. NVIDIA.

Slides:



Advertisements
Similar presentations
wwwcsif.cs.ucdavis.edu/~jacksoni
Advertisements

MPI Message Passing Interface
Fine-grain Task Aggregation and Coordination on GPUs
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki.
By Francisco Morales Carbonell Jaime Rodriguez Maya Jan Sola Ramos Find My Business.
Real-Time Systems Scheduling Tool Developed by Daniel Ghiringhelli Advisor: Professor Jiacun Wang December 19, 2005.
CILK: An Efficient Multithreaded Runtime System. People n Project at MIT & now at UT Austin –Bobby Blumofe (now UT Austin, Akamai) –Chris Joerg –Brad.
PFunc: Modern Task Parallelism For Modern High Performance Computing Prabhanjan Kambadur, Open Systems Lab, Indiana University.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
CS 225 Section 1 Spring Topics Software design Correctness and Efficiency Inheritance Data structures –Lists, Stacks, Queues –Trees –Sets, Maps.
Microsoft ASP.NET AJAX - AJAX as it has to be Presented by : Rana Vijayasimha Nalla CSCE Grad Student.
Synchronization (other solutions …). Announcements Assignment 2 is graded Project 1 is due today.
Cilk CISC 879 Parallel Computation Erhan Atilla Avinal.
Chapter 3.6 Game Architecture. 2 Overall Architecture The code for modern games is highly complex With code bases exceeding a million lines of code, a.
Contemporary Languages in Parallel Computing Raymond Hummel.
Parallel Programming in.NET Kevin Luty.  History of Parallelism  Benefits of Parallel Programming and Designs  What to Consider  Defining Types of.
Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
© 2010 IBM Corporation © 2011 IBM Corporation September 6, 2012 NCDHHS FAMS Overview for Behavioral Health Managed Care Organizations.
Prabhanjan Kambadur, Amol Ghoting, Anshul Gupta and Andrew Lumsdaine. International Conference on Parallel Computing (ParCO),2009 Extending Task Parallelism.
German National Research Center for Information Technology Research Institute for Computer Architecture and Software Technology German National Research.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
 The PFunc Implementation of NAS Parallel Benchmarks. Presenter: Shashi Kumar Nanjaiah Advisor: Dr. Chung E Wang Department of Computer Science California.
DCE (distributed computing environment) DCE (distributed computing environment)
Ch 1. A Python Q&A Session Spring Why do people use Python? Software quality Developer productivity Program portability Support libraries Component.
Advanced Design and System Patterns The Microkernel Pattern.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
VTK-m Project Goals A single place for the visualization community to collaborate, contribute, and leverage massively threaded algorithms. Reduce the challenges.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
Parallelism without Concurrency Charles E. Leiserson MIT.
January 26, 2016Department of Computer Sciences, UT Austin Characterization of Existing Systems Puneet Chopra Ramakrishna Kotla.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Page :Algorithms in the Real World Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Prabhanjan Kambadur, Amol Ghoting, Anshul Gupta and Andrew Lumsdaine. International Conference on Parallel Computing (ParCO),2009 Extending Task Parallelism.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
1 Cilk Chao Huang CS498LVK. 2 Introduction A multithreaded parallel programming language Effective for exploiting dynamic, asynchronous parallelism (Chess.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
10/2/20161 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam King,
BSA 385 Week 3 Individual Assignment Frequent Shopper Program Part 2 Check this A+ tutorial guideline at
CILK: An Efficient Multithreaded Runtime System
Chapter 4: Multithreaded Programming
Computer Architecture: Parallel Task Assignment
Chapter 4: Threads.
Thread & Processor Scheduling
Multi-processor Scheduling
Game Architecture Rabin is a good overview of everything to do with Games A lot of these slides come from the 1st edition CS 4455.
The Mach System Sri Ramkrishna.
CMPS 5433 Programming Models
Prabhanjan Kambadur, Open Systems Lab, Indiana University
Task Scheduling for Multicore CPUs and NUMA Systems
Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.
CHAPTER 2 CREATING AN ARCHITECTURAL DESIGN.
Chapter 4: Threads.
Chapter 4: Threads.
Boston (Burlington), Mass. November 14-15, 2018
Allen D. Malony Computer & Information Science Department
CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM
Transactions with Nested Parallelism
Chapter 4: Threads & Concurrency
Cilk and Writing Code for Hardware
Introduction to CILK Some slides are from:
Presentation transcript:

Frequent Itemset Mining on Graphics Processors, Fang et al., DaMoN Turbo-charging Vertical Mining of Large Databases, Shenoy et al., MOD NVIDIA Tesla: A Unified Graphics and Computing Architecture, Lindholm et al., IEEE Micro An Efficient Compression Scheme For Bitmap Indices, Wu et al., TR LBNL-49626, References: Task Parallelism struct fibonacci { fibonacci (const int& n):(n),answer(0){} void operator () () { if (0 == n || 1 == n) answer = n; else { task fib_task; fibonacci fib_n_1 (n−1), fib_n_2 (n−2); pfunc::spawn (fib_task, fib_n_1); fib_n_2(); pfunc::wait (fib_task); answer=fib_n_1.answer+fib_n_2.answer; } int answer; const int n; }; PFunc-specific Features for Task Parallelism Spawn tasks on specific queues. Select from work-stealing to work-sharing at runtime. Tasks can have multiple parents; execute DAGs seamlessly! 2× faster than TBB! Fibonacci (n=37) number References PFunc: Modern Task Parallelism For Modern High Performance Computing, Kambadur et al., SC Demand-driven Execution Of Static Directed Acyclic Graphs Using Task Parallelism, Kambadur et al., HiPC, Extending Task Parallelism For Frequent Pattern Mining, Kambadur et al, ParCO, number Scheduling point Is thread’s own queue empty? Victim queue = own Predicate = own Victim queue = own Predicate = own Victim queue = other Predicate = steal Victim queue = other Predicate = steal Get candidate tasks Select the best task using supplied predicate Found a task? End NO YES Scheduling Policy Regular Predicate Pair Waiting Predicate Pair Group Predicate Pair Task Queue Set struct fibonacci; typedef pfunc::generator <cilkS, /*Scheduling policy*/ pfunc::use_default, /*Compare*/ fibonacci> /*Functor*/ my_pfunc; typedef my_pfunc::taskmgr taskmgr; typedef my_pfunc::attribute attribute; typedef my_pfunc::task task; FeatureBuilt-inDefault Scheduling policy cilkS, prioS, fifoS, lifoS cilkS CompareN/A std::less FunctorN/A virtual_functor Customizing at Compile-time Loop Parallelism Now Available: PFunc 1.0. Salient Features New loop parallelism constructs. New examples including matmult, scale, and accumulate. Updated easy-to-use interface. Updated atomics: compare-and-swap, fetch-and-add, etc. Operating SystemProcessor Windows XPx86_32 Linuxppc32, ppc64, x86_32, x86_64 AIXppc32, ppc64 OS Xx86_32, x86_64 Software Architecture SPMD-style Parallelism Loop parallelism is simple, yet powerful. Completely realized using task parallelism. pfunc::parallel_reduce pfunc::parallel_for pfunc::parallel_while User Applications and Libraries Hardware OS PAPI Threads PFunc Thread Manager Perf Profiler Task Scheduler PFunc : A Tool For Teaching And Implementing Task Parallelism Prabhanjan Kambadur 1, Anshul Gupta 1, Andrew Lumsdaine 2 1 IBM TJ Watson Research Center. 2 Indiana University, Bloomington Pedagogical and Research Aids Portable, easy to install, and use. Thorough documentation and tutorials. Industry-strength exception handling. PAPI integration for profiling performance. Growing list of sample applications. Online user-groups and support. Mix task parallelism with SPMD-style programming. Create group of tasks; a task can be in only one group. Tasks can communicate/sync using their group rank. Barrier primitive on a group allows collective syncs.