Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller

Slides:



Advertisements
Similar presentations
Appendix B. Memory Hierarchy CSCI/ EENG – W01 Computer Architecture 1 Dr. Babak Beheshti Slides based on the PowerPoint Presentations created by.
Advertisements

Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
Managing GPU Concurrency in Heterogeneous Architect ures Shared Resources Network LLC Memory.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
1 Lecture 15: Virtual Memory and Large Caches Today: TLB design and large cache design basics (Sections )
Memory Management 2010.
1 Friday, June 30, 2006 "Man's mind, once stretched by a new idea, never regains its original dimensions." - Oliver Wendell Holmes, Jr.
Lecture 17: Virtual Memory, Large Caches
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
03/22/2004CSCI 315 Operating Systems Design1 Virtual Memory Notice: The slides for this lecture have been largely based on those accompanying the textbook.
Mem. Hier. CSE 471 Aut 011 Evolution in Memory Management Techniques In early days, single program run on the whole machine –Used all the memory available.
03/17/2008CSCI 315 Operating Systems Design1 Virtual Memory Notice: The slides for this lecture have been largely based on those accompanying the textbook.
1 Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu.
Review of Memory Management, Virtual Memory CS448.
1. Memory Manager 2 Memory Management In an environment that supports dynamic memory allocation, the memory manager must keep a record of the usage of.
Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Vivek Seshadri Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu,
Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath.
Cosc 2150: Computer Organization Chapter 6, Part 2 Virtual Memory.
Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​ Kevin Chang, Greg Nazario, Reetuparna.
CS212: OPERATING SYSTEM Lecture 6: Virtual-Memory Management 1 Computer Science Department.
1 Lecture 13: Cache, TLB, VM Today: large caches, virtual memory, TLB (Sections 2.4, B.4, B.5)
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 9: Virtual Memory.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Implementation.
Redundant Memory Mappings for Fast Access to Large Memories
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
1 Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5)
SizeCap: Efficiently Handling Power Surges for Fuel Cell Powered Data Centers Yang Li, Di Wang, Saugata Ghose, Jie Liu, Sriram Govindan, Sean James, Eric.
Agile Paging: Exceeding the Best of Nested and Shadow Paging
Non Contiguous Memory Allocation
Transparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh Eiman Ebrahimi, Gwangsun Kim,
Virtual Memory CSSE 332 Operating Systems
Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh
A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Session 2A: Today, 10:20 AM Nandita Vijaykumar.
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Lecture: Large Caches, Virtual Memory
Chapter 9: Virtual Memory
Understanding Latency Variation in Modern DRAM Chips Experimental Characterization, Analysis, and Optimization Kevin Chang Abhijith Kashyap, Hasan Hassan,
Mosaic: A GPU Memory Manager
Some Real Problem What if a program needs more memory than the machine has? even if individual programs fit in memory, how can we run multiple programs?
Managing GPU Concurrency in Heterogeneous Architectures
(Find all PTEs that map to a given PPN)
Energy-Efficient Address Translation
Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata Ausavarungnirun, Eric.
Rachata Ausavarungnirun GPU 2 (Virginia EF) Tuesday 2PM-3PM
Rachata Ausavarungnirun
CSCI206 - Computer Organization & Programming
MASK: Redesigning the GPU Memory Hierarchy
Evolution in Memory Management Techniques
Jeremie S. Kim Minesh Patel Hasan Hassan Onur Mutlu
Lecture 27: Virtual Memory
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
CGS 3763 Operating Systems Concepts Spring 2013
GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping
Accelerating Approximate Pattern Matching with Processing-In-Memory (PIM) and Single-Instruction Multiple-Data (SIMD) Programming Damla Senol Cali1 Zülal.
Amirhossein Mirhosseini
Virtual Memory Hardware
Chapter 9: Virtual Memory
Reducing DRAM Latency via
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
TLB Performance Seung Ki Lee.
Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose,
CoLT: Coalesced Large-Reach TLBs
` A Framework for Memory Oversubscription Management in Graphics Processing Units Chen Li, Rachata Ausavarungnirun, Christopher J. Rossbach, Youtao.
CSE 542: Operating Systems
Lecture 9: Caching and Demand-Paged Virtual Memory
CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators
MicroScope: Enabling Microarchitectural Replay Attacks
Presentation transcript:

Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller Saugata Ghose, Jayneel Gandhi, Christopher J. Rossbach, Onur Mutlu Session 2-A 2PM-4PM

Bottlenecks of GPU Virtual Memory GPU Core GPU Core GPU Core GPU Core Private TLB Private TLB Private TLB Private TLB Private Shared Shared TLB Page Table Walkers Add demand paging GPU-side memory CPU-side memory Page Table (Main memory) Data (Main Memory) CPU Memory

Bottlenecks of GPU Virtual Memory GPU Core GPU Core GPU Core GPU Core Private TLB Private TLB Private TLB Private TLB Private Shared Limited TLB reach Shared TLB Page Table Walkers High latency page walks Add demand paging High latency I/O GPU-side memory CPU-side memory Page Table (Main memory) Data (Main Memory) CPU Memory

Key Page Size Tradeoffs Larger pages: Better TLB reach High demand paging latency Need a punchline on this slide

Key Page Size Tradeoffs Larger pages: Better TLB reach High demand paging latency Smaller pages: Lower demand paging latency Limited TLB reach Need a punchline on this slide

Key Page Size Tradeoffs Larger pages: Better TLB reach High demand paging latency Smaller pages: Lower demand paging latency Limited TLB reach Mosaic enables application-transparent use of both page sizes Need a punchline on this slide

Key Challenge with Multiple Page Sizes State-of-the-art Large Page Frame 1 Large Page Frame 2 Cannot coalesce pages Use the time to help. With cudaMalloc in the timeline, and the physical memory showing the large page frame on the other side. This should make it faster to go through as well as it is clear on what we are doing. Unallocated App 1 App 2

Key Idea of Mosaic Cannot coalesce pages State-of-the-art With Mosaic Large Page Frame 1 Large Page Frame 1 Large Page Frame 2 Large Page Frame 2 Cannot coalesce pages In-Place Coalescing Use the time to help. With cudaMalloc in the timeline, and the physical memory showing the large page frame on the other side. This should make it faster to go through as well as it is clear on what we are doing. Large Page Frame 1 Unallocated App 1 App 2 Large Page Frame 2

Mosaic Hardware GPU Runtime

Contiguity-Conserving Mosaic Hardware GPU Runtime Contiguity-Conserving Allocation

Contiguity-Conserving Mosaic Hardware GPU Runtime Contiguity-Conserving Allocation In-Place Coalescer

Contiguity-Conserving Mosaic Hardware GPU Runtime Contiguity-Conserving Allocation In-Place Coalescer Contiguity-Aware Compaction

Benefits High TLB reach Low demand paging latency Application-transparent 55% higher average performance At this point, 9 minutes. Need to condense the earlier contents.

Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller Saugata Ghose, Jayneel Gandhi, Christopher J. Rossbach, Onur Mutlu Session 2-A 2PM-4PM