Parallel Algorithms in STAPL Implementation and Evaluation Jeremy Vu Faculty Mentor: Dr. Nancy Amato Supervisor: Dr. Mauro Bianco.

Slides:

Advertisements

Similar presentations

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Advertisements

Distributed Systems CS

Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.

ICS 556 Parallel Algorithms Ebrahim Malalla Office: Bldg 22, Room

1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*

Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.

Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?

11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.

Parallel Algorithms in STAPL Implementation and Evaluation Jeremy Vu, Mauro Bianco, Nancy Amato Parasol Lab, Department of Computer.

Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.

Introduction CS 524 – High-Performance Computing.

12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.

System Level Design: Orthogonalization of Concerns and Platform- Based Design K. Keutzer, S. Malik, R. Newton, J. Rabaey, and A. Sangiovanni-Vincentelli.

Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.

What is Concurrent Programming? Maram Bani Younes.

Pregel: A System for Large-Scale Graph Processing

The Group Runtime Optimization for High-Performance Computing An Install-Time System for Automatic Generation of Optimized Parallel Sorting Algorithms.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.

1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.

Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Performance Evaluation of Parallel Processing. Why Performance?

GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Adaptive Parallel Sorting Algorithms in STAPL Olga Tkachyshyn, Gabriel Tanase, Nancy M. Amato

CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.

Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:

Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.

The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.

1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University.

Parallel XML Parsing Using Meta-DFAs Yinfei Pan 1, Ying Zhang 1, Kenneth Chiu 1, Wei Lu 2 1 State University of New York (SUNY) Binghamton 2 Indiana University.

Parallel Computing With High Performance Computing Clusters (HPCs) By Jeremy Cathey.

PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015.

Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.

Outline Why this subject? What is High Performance Computing?

Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.

1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.

Kriging for Estimation of Mineral Resources GISELA/EPIKH School Exequiel Sepúlveda Department of Mining Engineering, University of Chile, Chile ALGES Laboratory,

Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.

Multi-Grid Esteban Pauli 4/25/06. Overview Problem Description Problem Description Implementation Implementation –Shared Memory –Distributed Memory –Other.

A Parallel Communication Infrastructure for STAPL

Advanced Algorithms Analysis and Design

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

University of Technology

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University

Parallel Programming By J. H. Wang May 2, 2017.

Parallel Computers.

Objective of This Course

CSE8380 Parallel and Distributed Processing Presentation

What is Concurrent Programming?

Algorithm Analysis and Design

Hybrid Programming with OpenMP and MPI

COMP60611 Fundamentals of Parallel and Distributed Systems

What is Concurrent Programming?

Assoc. Prof. Marc FRÎNCU, PhD. Habil.

Presentation transcript:

Parallel Algorithms in STAPL Implementation and Evaluation Jeremy Vu Faculty Mentor: Dr. Nancy Amato Supervisor: Dr. Mauro Bianco Parasol Lab, Texas A&M University

Presentation Outline Parallelism What is parallelism? What are some motivations/problems? What are some solutions? STAPL: What is it? My work for the summer. Example: p_unique Performance testing

What is parallelism? In terms of computing, parallelism is simultaneous occurrence/coincidence. We perform multiple computations simultaneously. The general idea is that we can divide a problem into smaller ones that we can solve at the same time. We can solve problems faster this way (hopefully). Example: If the Girl Scouts only sent one scout to sell cookies, they almost certainly wouldn’t make money fast enough to finance all of their activities. Reality: They have many scouts out selling cookies at the same time. They earn money much faster this way.

Okay? Parallel computing has been practiced for a long time in High- Performance Computing (HPC). Ex: weather forecasting models Ex: simulation of nuclear weapon detonations Ex: hydrodynamics (study of liquids in motion) It's becoming more popular. Large scale parallel machines are getting larger. Small scale parallel machines are mainstream. re: many-core processors As more people get into parallel computing, the issues associated with it become more prevalent, and new ones arise.

Challenges and Solutions Productivity: Scalability and efficiency Portable efficiency Programmability Hardware: Memory setup (shared, distributed, distributed-shared) Communication implementation (interprocessor, processor-memory) Generalized solutions include: programming languages (Chapel by Cray) libraries (TBB by Intel, various in Microsoft.NET) APIs (OpenMP, MPI) STAPL, the project I’m a part of, is one such solution.

6 STAPL: Standard Template Adaptive Parallel Library STAPL: A library of parallel, generic constructs based on the C++ Standard Template Library (STL) Provides parallel data structures and generic parallel algorithms – Helps programmers address the difficulties of parallel programming by:  Allowing programmers to work at a high level of abstraction  Hiding many of the details of parallel programming  Providing a high degree of productivity, performance, and portability.

Example: p_unique STL unique  p_unique Input: A sequence of elements and a binary relation. Output: A sequence of elements consisting of only the first element of each group of consecutive duplicate elements. The binary relation is used to determine the duplicate elements. ex: ‘=‘ {1, 1, 2, 2, 3, 3, 4, 4} ---> {1, 2, 3, 4} Sequentially, there’s only one way to do this; in parallel, there are multiple cases. Case 1, symmetric + transitive: ex: ‘=‘ Compare, in parallel, each element to the next element in the sequence using the relation and keep or remove it based on the result. If we have enough processors, all of the comparisons can happen simultaneously. This is ideal.

Example: p_unique Case 2, transitive: Requires parallel prefix algorithm. Parallel prefix allows us to “prefix” each chunk of data with the appropriate initial element. Then we complete in the same manner as for case 1. ex: ‘<‘ operator {100, 80, 70, 20, 40, 30, 15, 10} ---> {100, 80, 70, 20, 15, 10} Case 3, neither: Sequential, unfortunately; only comparison must be done in order, one at a time. ex: ‘is in love with’, ‘is the father/mother of’

Example: p_unique void stapl_main(int, char **) {... p_unique(p_vector);... p_unique_copy(p_vector, p_array, ‘<‘);... p_unique_copy(p_list, p_vector, my_relation); }

What I Do in STAPL (cont.) Some of the things I test Parallel speedup (scalability) – How does the addition of more processors affect the performance? Ideally, linear speedup. Performance vs. theoretical model and vs. sequential counterpart(s). Performance in different situations. Ex: data is distributed poorly in a distributed memory system (all data on a single location) Different algorithms for the same problem, such as how I choose to schedule activities.

More information… Questions? Comments? Concerns? {cvu, amato,