My view of challenges faced by Open64 Xiaoming Li University of Delaware.

Slides:

Advertisements

Similar presentations

Dr J Henning Deputy Executive Director Library Services 23 July 2013 OPEN LEARINING IN HIGHER EDUCATION UNIVERSITY OF SOUTH AFRICA.

Advertisements

Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.

What Are My Coin Riddles. I am 1 coin and I equal 1 cent. What am I? Think about it and click to see the answer.

Overview Motivation Scala on LLVM Challenges Interesting Subsets.

Optimization on Kepler Zehuan Wang

Optimizing single thread performance Dependence Loop transformations.

Parallel computer architecture classification

XEON PHI. TOPICS What are multicore processors? Intel MIC architecture Xeon Phi Programming for Xeon Phi Performance Applications.

Efficient Sparse Matrix-Matrix Multiplication on Heterogeneous High Performance Systems AACEC 2010 – Heraklion, Crete, Greece Jakob Siegel 1, Oreste Villa.

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.

Continuous School Improvement Final Presentation School Name Date.

GPUs on Clouds Andrew J. Younge Indiana University (USC / Information Sciences Institute) UNCLASSIFIED: 08/03/2012.

Discussion. What is predictability? Is it another word for: “Degree of variance”, “Degree of analyzability”, “Repeatability”, “Determinism” What is the.

Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:

L15: Review for Midterm. Administrative Project proposals due today at 5PM (hard deadline) – handin cs6963 prop March 31, MIDTERM in class L15: Review.

L13: Review for Midterm. Administrative Project proposals due Friday at 5PM (hard deadline) No makeup class Friday! March 23, Guest Lecture Austin Robison,

Instruction Level Parallelism (ILP) Colin Stevens.

Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

1 Panel Session Open64: Challenges and Opportunities for the Many-Core Era March 22, 2009 Seattle, WA The Open64 Workshop at CGO 2009.

1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.

Chapter Hardwired vs Microprogrammed Control Multithreading

State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.

University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.

Communication-Minimizing 2D Convolution in GPU Registers Forrest N. Iandola David Sheffield Michael Anderson P. Mangpo Phothilimthana Kurt Keutzer University.

Panda: MapReduce Framework on GPU’s and CPU’s

Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.

Last time: Runtime infrastructure for hybrid (GPU-based) platforms  Task scheduling Extracting performance models at runtime  Memory management Asymmetric.

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA

Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.

(1) ECE 8823: GPU Architectures Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute of Technology NVIDIA Keplar.

GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.

Kernel, processes and threads Windows and Linux. Windows Architecture Operating system design Modified microkernel Layered Components HAL Interacts with.

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

GPU Programming with CUDA – Optimisation Mike Griffiths

© David Kirk/NVIDIA and Wen-mei W. Hwu, 1 Programming Massively Parallel Processors Lecture Slides for Chapter 1: Introduction.

CUDA Performance Considerations (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.

CUDA Optimizations Sathish Vadhiyar Parallel Programming.

GPU Architecture and Programming

©2015 EarthLink. All rights reserved. Network Diagnostics Professional Services.

Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008.

3.2/3.3. Li: To know how to structure unit assessment answers To plan/write a draft answer for the unit assessment workbook.

Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.

LLMGuard: Compiler and Runtime Support for Memory Management on Limited Local Memory (LLM) Multi-Core Architectures Ke Bai and Aviral Shrivastava Compiler.

GPU Power Model Nandhini Sudarsanan Nathan Vanderby Neeraj Mishra Usha Vinodh

Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Goal: I will have a greater focus on responsive feedback that supports further learning – especially in writing.

Multi-Threaded Video Rendering COMP400 Project – 2006 Yohan Launay.

Objectivist Ethics by Derek Ciocca.

Weekly Report- Reduction Ph.D. Student: Leo Lee date: Oct. 30, 2009.

Chapter 1 © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chapter 1: The Database Environment and Development Process Modern Database Management.

Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.

Cross-Architecture Performance Prediction (XAPP): Using CPU to predict GPU Performance Newsha Ardalani Clint Lestourgeon Karthikeyan Sankaralingam Xiaojin.

Institute of Software,Chinese Academy of Sciences An Insightful and Quantitative Performance Optimization Chain for GPUs Jia Haipeng.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

Computer Engg, IIT(BHU)

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

Sathish Vadhiyar Parallel Programming

CS427 Multicore Architecture and Parallel Computing

Parallel Computing Lecture

Lecture 5: GPU Compute Architecture

High Performance Computing (CS 540)

Lecture 5: GPU Compute Architecture for the last time

Chapter 4: Threads.

All-Pairs Shortest Paths

Fine-grained vs Coarse-grained multithreading

Introduction to CUDA.

(team representative name here)

Informatics 122 Alex Baker

Presentation transcript:

My view of challenges faced by Open64 Xiaoming Li University of Delaware

Some new challenges Collective optimization – Multi-core and SIMD-like architectures ask for optimizations for a group of threads – New optimization goals Optimization for bandwidth – Description of resource confliction Explicitly managed hardware resources – Understanding and internal presentation of programmer's intention

My two cents Occupancy-oriented optimization on GPU – Achieve higher occupancy on GPU by “duplicating” code. Description of program execution context