MCSTL: The Multi-Core Standard Template Library Xiaofan Liu 29.01.2008.

Slides:



Advertisements
Similar presentations
CS 400/600 – Data Structures External Sorting.
Advertisements

On Dynamic Load Balancing on Graphics Processors Daniel Cederman and Philippas Tsigas Chalmers University of Technology.
INTEL CONFIDENTIAL Threading for Performance with Intel® Threading Building Blocks Session:
OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Fall 2008Parallel Query Optimization1. Fall 2008Parallel Query Optimization2 Bucket Sizes and I/O Costs Bucket B does not fit in the memory in its entirety,
CS 171: Introduction to Computer Science II Mergesort.
Quicksort CS 3358 Data Structures. Sorting II/ Slide 2 Introduction Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case:
High Performance Comparison-Based Sorting Algorithm on Many-Core GPUs Xiaochun Ye, Dongrui Fan, Wei Lin, Nan Yuan, and Paolo Ienne Key Laboratory of Computer.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
An Evaluation of a Framework for the Dynamic Load Balancing of Highly Adaptive and Irregular Parallel Applications Kevin J. Barker, Nikos P. Chrisochoides.
Cache effective mergesort and quicksort Nir Zepkowitz Based on: “Improving Memory Performance of Sorting Algorithms” by Li Xiao, Xiaodong Zhang, Stefan.
S: Application of quicksort on an array of ints: partitioning.
CSC 2300 Data Structures & Algorithms March 20, 2007 Chapter 7. Sorting.
Three Vignettes in mixed style
Design of Embedded Systems Task partitioning between hardware and software Hardware design and integration Software development System integration.
A Load Balancing Framework for Adaptive and Asynchronous Applications Kevin Barker, Andrey Chernikov, Nikos Chrisochoides,Keshav Pingali ; IEEE TRANSACTIONS.
ADA: 3. Insertion Sort1 Objective o asymptotic analysis of insertion sort Algorithm Design and Analysis (ADA) , Semester
“Evaluating MapReduce for Multi-core and Multiprocessor Systems” Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis Computer.
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications Published in: Cluster.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th.
A performance analysis of multicore computer architectures Michel Schelske.
Written By: Kris Tiri and Ingrid Verbauwhede Presented By: William Whitehouse.
Adaptive Parallel Sorting Algorithms in STAPL Olga Tkachyshyn, Gabriel Tanase, Nancy M. Amato
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Design and Analysis of Algorithms Dynamic Set Model Haidong Xue Summer 2012, at GSU.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science,
CS212: Object Oriented Analysis and Design Lecture 24: Introduction to STL.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Sorting.
Our project main purpose is to develop a tool for a combinatorial game researcher. Given a version of combinatorial puzzle game and few more parameters,
An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001.
A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)
Sorting 1. Insertion Sort
The Sort Benchmark AlgorithmsSolid State Disks External Memory Multiway Mergesort  Phase 1: Run Formation  Phase 2: Merge Runs  Careful parameter selection.
QuickSort Choosing a Good Pivot Design and Analysis of Algorithms I.
Introduction The STL is a complex piece of software engineering that uses some of C++'s most sophisticated features STL provides an incredible amount.
CS 420 Design of Algorithms Parallel Algorithm Design.
Sorting Quick, Merge & Radix Divide-and-conquer Technique subproblem 2 of size n/2 subproblem 1 of size n/2 a solution to subproblem 1 a solution to.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo Vignesh T. Ravi Gagan Agrawal Department of Computer Science and Engineering,
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
W AVEFRONT S KIPPING USING BRAM S FOR C ONDITIONAL A LGORITHMS ON V ECTOR P ROCESSORS Aaron Severance Joe Edwards Guy G.F. Lemieux.
Introduction A sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order. Efficient sorting.
CMPT 238 Data Structures More on Sorting: Merge Sort and Quicksort.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
The Sort Benchmark AlgorithmsSolid State Disks External Memory Multiway Mergesort  Phase 1: Run Formation  Phase 2: Merge Runs  Careful parameter selection.
A Parallel Communication Infrastructure for STAPL
Prototyping SoC-based Gate Drive Logic for Power Convertors by Generating code from Simulink models. Researchers Rounak Siddaiah, Graduate Student-University.
Algorithm Design and Analysis (ADA)
Stefan Kaestle, Reto Achermann, Timothy Roscoe, Tim Harris ATC’15
Sorting by Tammy Bailey
Case-Based Reasoning CBR Cycle CBR Problem Issues
How can this be simplified?
Containers, Iterators, Algorithms, Thrust
Sorting.
EE 312 Software Design and Implementation I
ITEC 2620M Introduction to Data Structures
Project Title: (Your project title here)
ICOM 4015 Advanced Programming
Algorithm Efficiency and Sorting
Generic Set Algorithms
CMPT 225 Lecture 10 – Merge Sort.
Presentation transcript:

MCSTL: The Multi-Core Standard Template Library Xiaofan Liu

Introduction STL : Standard Template Library MCSTL:  Shared memory systems  Small inputs  Dynamic load balancing  The level of parallelism

Algorithms

Embarrassingly parallel Find Partition Sort

Algorithms Embarrassingly Parallel:  “Work stealing”  A user-tunable granularity

Algorithms Find:  m: the first matching element’s position  Size of blocks  m 0 m M

Algorithms Partition:  a blocked strategy  O(n/p+p) ‏ Sort:  Multiway Mergesort:  Load-Balanced Quicksort :

Software Engineering OpenMP 2.5 Using the MCSTL in a program Website:

Experimental Result

Conclusion Already done Future work:  Implementation of worthwhile functions  Add dynamic load balancing to more functions  Configure it with right tuning parameters  Integrate it with the external memory library STXXL

Thank you~~