CSE-591 Compilers for Embedded Systems Code transformations and compile time data management techniques for application mapping onto SIMD-style Coarse-grained.

Slides:



Advertisements
Similar presentations
5th International Conference, HiPEAC 2010 MEMORY-AWARE APPLICATION MAPPING ON COARSE-GRAINED RECONFIGURABLE ARRAYS Yongjoo Kim, Jongeun Lee *, Aviral Shrivastava.
Advertisements

University of Michigan Electrical Engineering and Computer Science 1 Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtualized Execution.
University of Michigan Electrical Engineering and Computer Science 1 Reducing Control Power in CGRAs with Token Flow Hyunchul Park, Yongjun Park, and Scott.
A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.
Chapter 13 Embedded Systems
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
OpenCL Introduction A TECHNICAL REVIEW LU OCT
Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula School of Computing, Informatics, and Decision Systems Engineering Arizona State University June 2013.
Yongjoo Kim*, Jongeun Lee**, Jinyong Lee*, Toan Mai**, Ingoo Heo* and Yunheung Paek* *Seoul National University **UNIST (Ulsan National Institute of Science.
1 © FASTER Consortium Catalin Ciobanu Chalmers University of Technology Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration.
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
Orchestration by Approximation Mapping Stream Programs onto Multicore Architectures S. M. Farhad (University of Sydney) Joint work with Yousun Ko Bernd.
Operating Systems for Reconfigurable Systems John Huisman ID:
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Automated Design of Custom Architecture Tulika Mitra
Efficient Mapping onto Coarse-Grained Reconfigurable Architectures using Graph Drawing based Algorithm Jonghee Yoon, Aviral Shrivastava *, Minwook Ahn,
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
I2CRF: Incremental Interconnect Customization for Embedded Reconfigurable Fabrics Jonghee W. Yoon, Jongeun Lee*, Jaewan Jung, Sanghyun Park, Yongjoo Kim,
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
CML REGISTER FILE ORGANIZATION FOR COARSE GRAINED RECONFIGURABLE ARCHITECTURES (CGRAs) Dipal Saluja Compiler Microarchitecture Lab, Arizona State University,
Compilers for Embedded Systems Ram, Vasanth, and VJ Instructor : Dr. Edwin Sha Synthesis and Optimization of High-Performance Systems.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
Jason Li Jeremy Fowers 1. Speedups and Energy Reductions From Mapping DSP Applications on an Embedded Reconfigurable System Michalis D. Galanis, Gregory.
Computer Architecture SIMD Ola Flygt Växjö University
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Programming Massively Parallel.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
University of Michigan Electrical Engineering and Computer Science 1 Increasing Hardware Efficiency with Multifunction Loop Accelerators Kevin Fan, Manjunath.
Scalable Register File Architectures for CGRA Accelerators
Employing compression solutions under openacc
CS427 Multicore Architecture and Parallel Computing
Ph.D. in Computer Science
Graphics Processing Unit
CGRA Express: Accelerating Execution using Dynamic Operation Fusion
James D. Z. Ma Department of Electrical and Computer Engineering
Methodology of a Compiler that Compresses Code using Echo Instructions
Anne Pratoomtong ECE734, Spring2002
Introduction to cosynthesis Rabi Mahapatra CSCE617
A systolic array for a 2D-FIR filter for image processing
EPIMap: Using Epimorphism to Map Applications on CGRAs
Dynamically Reconfigurable Architectures: An Overview
Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke
Compiler Back End Panel
ESE 566: Hardware/Software Co-Design of Embedded Systems Fall 2005  Instructor: Dr. Alex Doboli. Paper discussed in class: H. Singh, M.-H. Lee, G. Lu,
Summary of my Thesis Power-efficient performance is the need of the future CGRAs can be used as programmable accelerators Power-efficient performance can.
Compiler Back End Panel
URECA: A Compiler Solution to Manage Unified Register File for CGRAs
Embedded Computer Architecture 5SIA0 Overview
Reiley Jeyapaul and Aviral Shrivastava Compiler-Microarchitecture Lab
Introduction to Operating Systems
1. Arizona State University, Tempe, USA
Introduction to Operating Systems
Operating System Chapter 7. Memory Management
병렬처리시스템 2005년도 2학기 채 수 환
Department of Electrical Engineering Joint work with Jiong Luo
Mapping DSP algorithms to a general purpose out-of-order processor
Code Transformation for TLB Power Reduction
CS703 - Advanced Operating Systems
RAMP: Resource-Aware Mapping for CGRAs
6- General Purpose GPU Programming
Suman Das, Sridhar Rajagopal, Chaitali Sengupta and Joseph R.Cavallaro
Presentation transcript:

CSE-591 Compilers for Embedded Systems Code transformations and compile time data management techniques for application mapping onto SIMD-style Coarse-grained Reconfigurable Architectures By, Sandeep Marathe

Introduction CGRAs – Performance of hardware and flexibility of software What is Reconfigurable ? Reconfigurable functionality and data routing SIMD architecture example Morphosys

Problem Outline Objective – To efficiently use PEs to achieve maximum parallelism possible with SIMD constraints Reduce data/context transfers Mapping process - Mapping of data onto Frame buffer – Data partitioning, arrangement Mapping of operations onto PEs – map similar operations onto single Column/Row (SIMD), decision on interconnections

Related Work Girish Venkataramani, Fadi Kurdahi, Wim Bohm, “A Compiler framework for Mapping Applications to a Coarse-grained Reconfigurable Architecture”, CASES 2001 Loop Synthesis (generating a execution schedule) Use SA-C framework (HDFG) Element generator Window generator Uses SA-C programs, image processing kernels

Partition application into set of independent kernels Rafael Maestre et al, “A Framework for Reconfigurable Computing: Task Scheduling and Context Management”, IEEE transactions on VLSI Systems, Dec 2001 Kernel schedule – Partition application into set of independent kernels Scheduling within a partition Context schedule – Context selection and allocation

Possible Approaches To perform loop transformations so that the data access is within the array space which can fit into the Frame buffer Are temporally close Re-arrange data in each array space based on DFG so that PEs operate on them efficiently