Organizers Adwait Jog EJ Kim

Slides:

Advertisements

Similar presentations

© 2011 IBM Corporation Karthick Rajamani and John Carter Welcome To The Fourth Workshop on Energy Efficient Design Portland, Oregon.

Advertisements

© 2009 IBM Corporation John Carter and Karthick Rajamani Welcome To The Second Workshop on Energy Efficient Design Saint Malo, France.

Revisiting Co-Processing for Hash Joins on the Coupled CPU- GPU Architecture School of Computer Engineering Nanyang Technological University 27 th Aug.

Orchestrated Scheduling and Prefetching for GPGPUs Adwait Jog, Onur Kayiran, Asit Mishra, Mahmut Kandemir, Onur Mutlu, Ravi Iyer, Chita Das.

Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture Seongbeom Kim, Dhruba Chandra, and Yan Solihin Dept. of Electrical and Computer.

Computer Science, University of Oklahoma Reconfigurable Versus Fixed Versus Hybrid Architectures John K. Antonio Oklahoma Supercomputing Symposium 2008.

System Simulation Of 1000-cores Heterogeneous SoCs Shivani Raghav Embedded System Laboratory (ESL) Ecole Polytechnique Federale de Lausanne (EPFL)

OWL: Cooperative Thread Array (CTA) Aware Scheduling Techniques for Improving GPGPU Performance Adwait Jog, Onur Kayiran, Nachiappan CN, Asit Mishra, Mahmut.

†The Pennsylvania State University

Memory Network: Enabling Technology for Scalable Near-Data Computing Gwangsun Kim, John Kim Korea Advanced Institute of Science and Technology Jung Ho.

Managing GPU Concurrency in Heterogeneous Architect ures Shared Resources Network LLC Memory.

IEEE TCCA Business Meeting HPCA-21 San Francisco, CA

Define Embedded Systems Small (?) Application Specific Computer Systems.

University of Kansas Research Interests David Andrews Rm. 324 Nichols

COMP25212 SYSTEM ARCHITECTURE Antoniu Pop Jan/Feb 2015COMP25212 Lecture 1.

Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.

Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis.

ERD and Memory Architectures Paul Franzon Department of Electrical and Computer Engineering

Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.

1 Hardware Support for Collective Memory Transfers in Stencil Computations George Michelogiannakis, John Shalf Computer Architecture Laboratory Lawrence.

1 Evaluation and Optimization of Multicore Performance Bottlenecks in Supercomputing Applications Jeff Diamond 1, Martin Burtscher 2, John D. McCalpin.

Memory  Main memory consists of a number of storage locations, each of which is identified by a unique address  The ability of the CPU to identify each.

(1) ECE 3056: Architecture, Concurrency and Energy in Computation Lecture Notes by MKP and Sudhakar Yalamanchili Sudhakar Yalamanchili (Some small modifications.

COMP25212: System Architecture Lecturers Alasdair Rawsthorne Daniel Goodman

March 9, 2015 San Jose Compute Engineering Workshop.

1 COMS 161 Introduction to Computing Title: Computing Basics Date: September 15, 2004 Lecture Number: 10.

Trading Cache Hit Rate for Memory Performance Wei Ding, Mahmut Kandemir, Diana Guttman, Adwait Jog, Chita R. Das, Praveen Yedlapalli The Pennsylvania State.

CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.

ROM AND RAM By Georgia Harris. WHAT DOES IT MEAN?  RAM: random access memory  ROM: read only memory.

Understanding Parallel Computers Parallel Processing EE 613.

33 rd IEEE International Conference on Computer Design ICCD rd IEEE International Conference on Computer Design ICCD 2015 Improving Memristor Memory.

Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.

Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.

Compiler Research How I spent my last 22 summer vacations Philip Sweany.

Computer Hardware. 11 th Class Computer Office Application-1 Computer Hardware Md. Jamirul Islam Assistant Professor Computer Monwara Azmat Ali College,

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

Parallel Computers Today LANL / IBM Roadrunner > 1 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating point.

New Rules: Scaling Performance for Extreme Scale Computing

Persistent Memory (PM)

0. Course Introduction Rocky K. C. Chang, 25 August 2017.

Computer Organization and Architecture Lecture 1 : Introduction

What is it and why do you need it?

A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Session 2A: Today, 10:20 AM Nandita Vijaykumar.

Assistant Professor, EECS Department

Managing GPU Concurrency in Heterogeneous Architectures

Describe the central processing unit including its role

Describe the central processing unit including its role

Computer Architecture and Organization

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

The Basic Organization of Computers T.Jeya M.Sc., M.Phil Assistant Professor, Department of CS, SAC Women’s College. Cumbum. Tamilnadu.

Gwangsun Kim Niladrish Chatterjee Arm, Inc. NVIDIA Mike O’Connor

Parallel Computers Today

Untrodden Paths for Near Data Processing

IRDS Emerging Research Devices and Architectures NanoCrossbar Workshop

Discussion Lead: Pen-Chung (Pen) Yew

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

Organizers Adwait Jog Assistant Professor (William & Mary)

Virtualization Techniques

Advanced Operating Systems – Fall 2009

Accelerating Approximate Pattern Matching with Processing-In-Memory (PIM) and Single-Instruction Multiple-Data (SIMD) Programming Damla Senol Cali1 Zülal.

Interconnect with Cache Coherency Manager

Discovering Computers 2014: Chapter6

The Memory-Processor Gap

Architectural Support for Efficient Large-Scale Automata Processing

ECE 463/563 Fall `18 Memory Hierarchies, Cache Memories H&P: Appendix B and Chapter 2 Prof. Eric Rotenberg Fall 2018 ECE 463/563, Microprocessor Architecture,

A Case for Interconnect-Aware Architectures

Graphics Processing Unit

A451 GCSE Computing | Hardware| Required knowledge

Stream-Lined Data Management

Haonan Wang, Adwait Jog College of William & Mary

Presentation transcript:

Min-Move 2017 1st Workshop on Hardware/Software Techniques for Minimizing Data Movement

Organizers Adwait Jog EJ Kim Assistant Professor, College of William and Mary Computer Architecture, GPUs, Memory Systems www.cs.wm.edu/~adwait EJ Kim Associate Professor, Texas A&M University Computer Architecture, Interconnects, Performance Evaluation http://faculty.cse.tamu.edu/ejkim/

Scope Any idea/technique that can help in reducing the data movement is appropriate for this workshop. Some topics (but not limited to) are: Near Data Processing (e.g., near caches or memory or storage devices) In-Memory Computing (e.g., in caches or memory or storage devices) Approximate Computing (e.g., load value approximations) Cache/DRAM Locality Optimizations Data Compression Techniques Emerging Memory Technologies (e.g., STT-RAM, Memristor) Non Von-Neumann Architectures (e.g., Quantum Architectures, Automata Processor) Interconnection Architectures (e.g., on-chip, off-chip, Ethernet, interposer system, flexible interconnects for FPGA) Programming and Language Support for Minimizing Data Movement

Program Committee Murali Annavaram (USC) Rajeev Balasubramonian (Utah) Reetuparna Das (UMich) Lizy Kurian John (UT Austin) David Kaeli (Northeastern) Onur Kayiran (AMD) Hyesoon Kim (Gatech) Asit Mishra (Intel) Lawrence Rauchwerger (Texas A&M) Sudhakar Yalamanchalli (Gatech)

Highlights of the Workshop Keynotes from Lizy Kurian John (UT Austin) Yan Solihin (NSF/NCSU) Jun Yang (UPitt) Peer-Reviewed Workshop Papers (most of the papers received three reviews) Exploring Non-Uniform Processing In-Memory Architectures (UT Austin) Evaluating a Trade-Off between DRAM and Persistent Memory for Persistent-Data Placement on Hybrid Main Memory (Fujitsu Laboratories) Opportunities for Processing Near Non-Volatile Memory in Heterogeneous Memory Systems (UIUC and AMD)

Workshop Schedule http://insight-archlab.github.io/minmove.html