Reducing Code Management Overhead in Software-Managed Multicores

Slides:

Advertisements

Similar presentations

Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.

Advertisements

OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.

An OpenCL Framework for Heterogeneous Multicores with Local Memory PACT 2010 Jaejin Lee, Jungwon Kim, Sangmin Seo, Seungkyun Kim, Jungho Park, Honggyu.

IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.

CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

Compiler techniques for exposing ILP

High Performing Cache Hierarchies for Server Workloads

CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.

Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.

5th International Conference, HiPEAC 2010 MEMORY-AWARE APPLICATION MAPPING ON COARSE-GRAINED RECONFIGURABLE ARRAYS Yongjoo Kim, Jongeun Lee *, Aviral Shrivastava.

Presented By Srinivas Sundaravaradan. MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC L3 Similar.

Using one level of Cache:

Computer ArchitectureFall 2007 © November 14th, 2007 Majd F. Sakr CS-447– Computer Architecture.

Memory Problems Prof. Sin-Min Lee Department of Mathematics and Computer Sciences.

LCTES 2010, Stockholm Sweden OPERATION AND DATA MAPPING FOR CGRA’S WITH MULTI-BANK MEMORY Yongjoo Kim, Jongeun Lee *, Aviral Shrivastava ** and Yunheung.

© ACES Labs, CECS, ICS, UCI. Energy Efficient Code Generation Using rISA * Aviral Shrivastava, Nikil Dutt

Data Partitioning Techniques for Partially Protected Caches to Reduce Soft Error Induced Failures (DIPES 08) Kyoungwoo Lee.

Computer Organization Cs 147 Prof. Lee Azita Keshmiri.

Spring 2008 CSE 591 Compilers for Embedded Systems Aviral Shrivastava Department of Computer Science and Engineering Arizona State University.

Efficient Software-Based Fault Isolation—sandboxing Presented by Carl Yao.

CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

PMaC Performance Modeling and Characterization Efficient HPC Data Motion via Scratchpad Memory Kayla Seager, Ananta Tiwari, Michael Laurenzano, Joshua.

1 Reducing DRAM Latencies with an Integrated Memory Hierarchy Design Authors Wei-fen Lin and Steven K. Reinhardt, University of Michigan Doug Burger, University.

Copyright © 2008 UCI ACES Laboratory Kyoungwoo Lee 1, Aviral Shrivastava 2, Nikil Dutt 1, and Nalini Venkatasubramanian 1.

A Dynamic Code Mapping Technique for Scratchpad Memories in Embedded Systems Amit Pabalkar Compiler and Micro-architecture Lab School of Computing and.

Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-8 Memory Management (2) Department of Computer Science and Software.

Computer Architecture Lecture 26 Fasih ur Rehman.

1 Virtual Memory and Address Translation. 2 Review Program addresses are virtual addresses.  Relative offset of program regions can not change during.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

Zheng Wu. Background Motivation Analysis Framework Intra-Core Cache Analysis Cache Conflict Analysis Optimization Techniques WCRT Analysis Experiment.

Computer Organization and Architecture Tutorial 1 Kenneth Lee.

Improving I/O with Compiler-Supported Parallelism Why Should We Care About I/O? Disk access speeds are much slower than processor and memory access speeds.

Spring 2008 CSE 591 Compilers for Embedded Systems Aviral Shrivastava Department of Computer Science and Engineering Arizona State University.

LLMGuard: Compiler and Runtime Support for Memory Management on Limited Local Memory (LLM) Multi-Core Architectures Ke Bai and Aviral Shrivastava Compiler.

CML SSDM: Smart Stack Data Management for Software Managed Multicores Jing Lu Ke Bai, and Aviral Shrivastava Compiler Microarchitecture Lab Arizona State.

Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑教授組員 : R 張馨怡 R 林秀萍.

Computer Organization CS224 Fall 2012 Lessons 41 & 42.

WCET-Aware Dynamic Code Management on Scratchpads for Software-Managed Multicores Yooseong Kim 1,2, David Broman 2,3, Jian Cai 1, Aviral Shrivastava 1,2.

Varun Mathur Mingwei Liu Sanghyun Park, Aviral Shrivastava and Yunheung Paek.

CML CML A Software Solution for Dynamic Stack Management on Scratch Pad Memory Arun Kannan, Aviral Shrivastava, Amit Pabalkar, Jong-eun Lee Compiler Microarchitecture.

CML Branch Penalty Reduction by Software Branch Hinting Jing Lu Yooseong Kim, Aviral Shrivastava, and Chuan Huang Compiler Microarchitecture Lab Arizona.

1 Compiler Managed Dynamic Instruction Placement In A Low-Power Code Cache Rajiv Ravindran, Pracheeti Nagarkar, Ganesh Dasika, Robert Senger, Eric Marsman,

TI Information – Selective Disclosure

Software Coherence Management on Non-Coherent-Cache Multicores

High Performance Computing (HIPC)

Computer Architecture Principles Dr. Mike Frank

GPU Memory Details Martin Kruliš by Martin Kruliš (v1.1)

CSC 4250 Computer Architectures

Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy

Cache Memory Presentation I

UnSync: A Soft Error Resilient Redundant Multicore Architecture

Ke Bai and Aviral Shrivastava Presented by Bryce Holton

Splitting Functions in Code Management on Scratchpad Memories

Hwisoo So. , Moslem Didehban#, Yohan Ko

Department of Computer Science University of California, Santa Barbara

Jian Cai, Aviral Shrivastava Presenter: Yohan Ko

URECA: A Compiler Solution to Manage Unified Register File for CGRAs

Dynamic Code Mapping Techniques for Limited Local Memory Systems

Jianbo Dong, Lei Zhang, Yinhe Han, Ying Wang, and Xiaowei Li

Compiler Code Optimizations

Partially Protected Caches to Reduce Failures Due to Soft Errors in Multimedia Applications Kyoungwoo Lee, Aviral Shrivastava, Ilya Issenin, Nikil Dutt,

Reiley Jeyapaul and Aviral Shrivastava Compiler-Microarchitecture Lab

Optimizing Heap Data Management on Software Managed Many-core Architectures By: Jinn-Pean Lin.

Spring 2008 CSE 591 Compilers for Embedded Systems

ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg

Mapping DSP algorithms to a general purpose out-of-order processor

Code Transformation for TLB Power Reduction

Reiley Jeyapaul and Aviral Shrivastava Compiler-Microarchitecture Lab

Department of Computer Science University of California, Santa Barbara

Presentation transcript:

Reducing Code Management Overhead in Software-Managed Multicores Jian Cai, Yooseong Kim, Youngbin Kim, Aviral Shrivastava, Kyoungwoo Lee Compiler Microarchitecture Lab Arizona State University

Motivation Software managed multicore (SMM) architectures Have scratchpad memories (SPMs) instead of caches as on-chip memory Consumes significant less area and power but require explicit data movement Management overhead and must be minimized Code management divides SPM space into memory regions, and map functions into the regions Previous works focus on finding better mapping to reduce data transfers caused by conflicts among functions in the same region, but blindly insert management functions around every function call This paper reduces computation by avoiding unnecessary management calls by an always-hit analysis to identify functions that are always present in SPM a first-miss analysis to identify functions that are always present in SPM within loops _get(main) is not necessary since main is the only function in region 0 _get(F1) is hoisted since F1 is the only function in region 1 called within the loop We have developped and are developping management techniques for other data segments, but this paper focuses on code management. The always-hit and first-miss analyses are both iterative data-flow analysis

Result 14% improvement on average 9% improvement on average