Organizers Adwait Jog Assistant Professor (William & Mary)

Slides:



Advertisements
Similar presentations
Revisiting Co-Processing for Hash Joins on the Coupled CPU- GPU Architecture School of Computer Engineering Nanyang Technological University 27 th Aug.
Advertisements

An OpenCL Framework for Heterogeneous Multicores with Local Memory PACT 2010 Jaejin Lee, Jungwon Kim, Sangmin Seo, Seungkyun Kim, Jungho Park, Honggyu.
Orchestrated Scheduling and Prefetching for GPGPUs Adwait Jog, Onur Kayiran, Asit Mishra, Mahmut Kandemir, Onur Mutlu, Ravi Iyer, Chita Das.
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture Seongbeom Kim, Dhruba Chandra, and Yan Solihin Dept. of Electrical and Computer.
†The Pennsylvania State University
Managing GPU Concurrency in Heterogeneous Architect ures Shared Resources Network LLC Memory.
Some Thoughts on Technology and Strategies for Petaflops.
Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian,
COMP25212 SYSTEM ARCHITECTURE Antoniu Pop Jan/Feb 2015COMP25212 Lecture 1.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access Niladrish Chatterjee Manjunath Shevgoor Rajeev Balasubramonian Al Davis.
2 nd Year - 1 st Semester Asst. Lect. Mohammed Salim
Computer Architecture and Organization
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
Internal Components. Binary system The binary system is the language that the computer uses to process information. The binary system has only two numbers.
Computer Architecture and Organization Introduction.
Computer Architectures Jung H Kim Dept of Systems Engineering.
Chapter 1 Introduction. Architecture & Organization 1 Architecture is those attributes visible to the programmer —Instruction set, number of bits used.
COMP25212: System Architecture Lecturers Alasdair Rawsthorne Daniel Goodman
March 9, 2015 San Jose Compute Engineering Workshop.
Introduction Computer System “An electronic device, operating under the control of instructions stored in its own memory unit, that can accept data (input),
1 Wisconsin Computer Architecture Guri SohiMark HillMikko LipastiDavid WoodKaru Sankaralingam Nam Sung Kim.
Lesson 3 0x Hardware Components
Trading Cache Hit Rate for Memory Performance Wei Ding, Mahmut Kandemir, Diana Guttman, Adwait Jog, Chita R. Das, Praveen Yedlapalli The Pennsylvania State.
Understanding Parallel Computers Parallel Processing EE 613.
Chap 4: Processors Mainly manufactured by Intel and AMD Important features of Processors: Processor Speed (900MHz, 3.2 GHz) Multiprocessing Capabilities.
William Stallings Computer Organization and Architecture Chapter 1 Introduction.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Compiler Research How I spent my last 22 summer vacations Philip Sweany.
Parallel Computers Today LANL / IBM Roadrunner > 1 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating point.
New Rules: Scaling Performance for Extreme Scale Computing
A Case for Toggle-Aware Compression for GPU Systems
Computer Organization and Architecture Lecture 1 : Introduction
Brad Baker, Wayne Haney, Dr. Charles Choi
William Stallings Computer Organization and Architecture 8th Edition
William Stallings Computer Organization and Architecture 7th Edition
A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Session 2A: Today, 10:20 AM Nandita Vijaykumar.
CIT 668: System Architecture
Assistant Professor, EECS Department
A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Nandita Vijaykumar Gennady Pekhimenko, Adwait.
Managing GPU Concurrency in Heterogeneous Architectures
Architecture & Organization 1
Organizers Adwait Jog EJ Kim
Computer Architecture and Organization
Gwangsun Kim Niladrish Chatterjee Arm, Inc. NVIDIA Mike O’Connor
Parallel Computers Today
IRDS Emerging Research Devices and Architectures NanoCrossbar Workshop
Discussion Lead: Pen-Chung (Pen) Yew
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
Architecture & Organization 1
ECEG-3202 Computer Architecture and Organization
Virtualization Techniques
Text Book Computer Organization and Architecture: Designing for Performance, 7th Ed., 2006, William Stallings, Prentice-Hall International, Inc.
William Stallings Computer Organization and Architecture 8th Edition
William Stallings Computer Organization and Architecture 7th Edition
Advanced Operating Systems – Fall 2009
Interconnect with Cache Coherency Manager
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.
Embedded Computer Architecture 5SIA0 Overview
ECEG-3202 Computer Architecture and Organization
Chapter 1 Introduction.
ECE 463/563 Fall `18 Memory Hierarchies, Cache Memories H&P: Appendix B and Chapter 2 Prof. Eric Rotenberg Fall 2018 ECE 463/563, Microprocessor Architecture,
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.
Graphics Processing Unit
6- General Purpose GPU Programming
Stream-Lined Data Management
Sculptor: Flexible Approximation with
Presentation transcript:

Min-Move 2018 2nd Workshop on Hardware/Software Techniques for Minimizing Data Movement

Organizers Adwait Jog Assistant Professor (William & Mary) Computer Architecture, GPUs www.cs.wm.edu/~adwait Also local arrangements co-chair for ASPLOS EJ Kim Associate Professor, Texas A&M University Computer Architecture, Interconnects, Performance Evaluation http://faculty.cse.tamu.edu/ejkim/

Scope Any idea/technique that can help in reducing the data movement is appropriate for this workshop. Some topics (but not limited to) are: Near Data Processing (e.g., near caches or memory or storage devices) In-Memory Computing (e.g., in caches or memory or storage devices) Approximate Computing (e.g., load value approximations) Cache/DRAM Locality Optimizations Data Compression Techniques Emerging Memory Technologies (e.g., STT-RAM, Memristor) Non Von-Neumann Architectures (e.g., Quantum Architectures, Automata Processor) Interconnection Architectures (e.g., on-chip, off-chip, Ethernet, interposer system, flexible interconnects for FPGA) Programming and Language Support for Minimizing Data Movement

Program Committee Niladrish Chatterjee (Nvidia) Reetuparna Das (UMich) Myoungsoo Jung (Yonsei) David Kaeli (Northeastern) Onur Kayiran (AMD Research) Aasheesh Kolli (VMWare and Penn State) Asit Mishra (Intel) Yoonho Park (IBM) Gennady Pekhimenko (UToronto) Lawrence Rauchwerger (Texas A&M)

Highlights of the Workshop Keynotes from Rajeev Balasubramonian (Utah) Reetuparna Das (Michigan) Alessandro Morari (IBM) Peer-Reviewed Workshop Paper (received three reviews) An Extensible Scheduler for the OpenLambda FaaS Platform   Authors: Gustavo Totoy, Edwin Boza, and Cristina Abad (ESPOL, Eucador)