Key to Scalable Parallelism - Regularity and Locality

Slides:

Advertisements

Similar presentations

1 Presenter: Chien-Chih Chen. 2 Dynamic Scheduler for Multi-core Systems Analysis of The Linux 2.6 Kernel Scheduler Optimal Task Scheduler for Multi-core.

Advertisements

ECE 598HK Computational Thinking for Many-core Computing Lecture 2: Many-core GPU Performance Considerations © Wen-mei W. Hwu and David Kirk/NVIDIA,

Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.

Parallel Controllable Texture Synthesis Sylvain Lefebvre, Hugues Hoppe SIGGRAPH (3),

3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.

Assets and Dynamics Computation for Virtual Worlds.

1 7 Questions for Parallelism Applications: 1. What are the apps? 2. What are kernels of apps? Hardware: 3. What are the HW building blocks? 4. How to.

© John A. Stratton, 2014 CS 395 CUDA Lecture 6 Thread Coarsening and Register Tiling 1.

7/6/2015 Orthogonal Functions Chapter /6/2015 Orthogonal Functions Chapter 7 2.

Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Presented by: Ahmad Lashgar ECE Department, University of Tehran.

Early Adopter Introduction to Parallel Computing: Research Intensive University: 4 th Year Elective Bo Hong Electrical and Computer Engineering Georgia.

© 2010 The MITRE Corporation. All rights reserved. Session 2: Many Core Sharon Sacco / The MITRE Corporation HPEC 2010 Approved for Public Release:

1 Titanium Review: Ti Parallel Benchmarks Kaushik Datta Titanium NAS Parallel Benchmarks Kathy Yelick U.C. Berkeley September.

Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.

Trip report: GPU UERJ Felice Pantaleo SFT Group Meeting 03/11/2014 Felice Pantaleo SFT Group Meeting 03/11/2014.

Power Characteristics of Irregular GPGPU Programs Jared Coplin and Martin Burtscher Department of Computer Science 1.

GRAPHICS PROCESSING UNIT ACCELERATED MEDICAL IMAGING Sam Van der Jeught University of Antwerp Belgium New Challenges in the European Area: Young Scientist's.

Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.

©Wen-mei W. Hwu and David Kirk/NVIDIA Urbana, Illinois, August 2-5, 2010 VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture.

CS 395 Last Lecture Summary, Anti-summary, and Final Thoughts.

Lecture 7 – Data Reorganization Pattern Data Reorganization Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.

10/17/2015 Stakeholders and How to Engage Them All – How to Ensure Success of This Initiative? Jie Wu Dept. of Computer and Information Sciences Temple.

©Wen-mei W. Hwu and David Kirk/NVIDIA Urbana, Illinois, August 2-5, 2010 VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign ECE408 / CS483 Applied Parallel Programming.

Procedures for managing workflow components Workflow components: A workflow can usually be described using formal or informal flow diagramming techniques,

Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Fall 2013.

StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:

Efficient Local Statistical Analysis via Integral Histograms with Discrete Wavelet Transform Teng-Yok Lee & Han-Wei Shen IEEE SciVis ’13Uncertainty & Multivariate.

Classroom TA414 Introduction. TA414 classroom There are twelve computers in this classroom. Teacher can use this classroom to teach some courses.

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous.

What is a Course Rep? Course Reps are your voice within the University. They listen to you and make decisions on your behalf. They lobby and negotiate.

Parallel Graph Partioning Using Simulated Annealing Parallel and Distributed Computing I Sadik Gokhan Caglar.

Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,

©Wen-mei W. Hwu and David Kirk/NVIDIA Urbana, Illinois, August 2-5, 2010 VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture.

NSF/TCPP Curriculum Planning Workshop Joseph JaJa Institute for Advanced Computer Studies Department of Electrical and Computer Engineering University.

Fine-grained Adoption of Jocobian Matrix Filling in INCOMP3D July 20, 2015 Fine-grained Jacobian Filling in INCOMP3D 1 Lixiang (Eric) Luo, Jack Edwards,

Lecture 3: Designing Parallel Programs. Methodological Design Designing and Building Parallel Programs by Ian Foster www-unix.mcs.anl.gov/dbpp.

©Wen-mei W. Hwu and David Kirk/NVIDIA Urbana, Illinois, August 2-5, 2010 VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture.

Seminar (mit Bachelorarbeit) Seminar aus Computergraphik WS 2009 Organizers: Muddassir Malik, M. Eduard Gröller Teaching staff: Andrej.

IBM Cell Processor Ryan Carlson, Yannick Lanner-Cusin, & Cyrus Stoller CS87: Parallel and Distributed Computing.

Sieve of Eratosthenes Quiz questions ITCS4145/5145, Parallel Programming Oct 24, 2013.

Computer Vision COURSE OBJECTIVES: To introduce the student to computer vision algorithms, methods and concepts. EXPECTED OUTCOME: Get introduced to computer.

IT 210 Week 6 CheckPoint Algorithm Verification To purchase this material link CheckPoint-Algorithm-Verification.

IT 242 Week 7 DQ 2 To purchase this material link 242-Week-7-DQ-2 For more courses visit our website

Stencil-based Discrete Gradient Transform Using

AP Java Unit 3 Strings & Arrays.

Quiz for Week #5.

Unit 1. Sorting and Divide and Conquer

CS 6068 Parallel Computing Fall 2015 Week 4 – Sept 21

- Stream Cipher and Block Cipher - Confusion & Diffusion

ECE408 Fall 2015 Applied Parallel Programming Lecture 21: Application Case Study – Molecular Dynamics.

Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang

Implementation of DWT using SSE Instruction Set

A Cloud System for Machine Learning Exploiting a Parallel Array DBMS

Parallel Computing has been moving into University training for several years… IAC Membership indicates commitment to parallel computing in undergraduate.

Introduction to Scientific Computing II

High Performance Computing (CS 540)

VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture 5: Data Layout for Grid Applications ©Wen-mei W. Hwu and David Kirk/NVIDIA.

ECE408 / CS483 Applied Parallel Programming Lecture 23: Application Case Study – Electrostatic Potential Calculation.

Latte: Locality Aware Transformation for High Level Synthesis

STUDY AND IMPLEMENTATION

Simple Algorithms to Teach the Ideas Behind Coding

Tze Meng Low, Qi Guo, Franz Franchetti

University of Wisconsin-Madison

1-D DISCRETE COSINE TRANSFORM DCT

Simple Algorithms to Teach the Ideas Behind Coding

Introduction to Scientific Computing II

What is this course about?

Presentation transcript:

Key to Scalable Parallelism - Regularity and Locality GPU Computing Forum

Eight Algorithm Optimizations Techniques (so far) Scatter to Gather transformation Privatization Work granularity coarsening Data tiling/reuse Data layout and traversal ordering Input data binning Input compaction Input extraction and regularization http://courses.engr.illinois.edu/ece598/hk/ Currently a graduate-level practical algorithm course GPU Computing Forum

“Orthogonal” to Traditional Parallel Algorithms for Teaching Tiling Privatization Regularization Compaction Binning Data Layout Granularity Coarsening Scatter to Gather MRI- Gridding ✓ CutCP Histo Stencil LBM BFS DMM MRI-Q SpMV SAD Tpacf FFT GPU Computing Forum