Instruction Driven Cross-Layer CNN Accelerator with Winograd Transformation on FPGA Jincheng Yu, Yiming Hu, Xuefei Ning, Jiantao Qiu, Kaiyuan Guo, Yu.

Slides:



Advertisements
Similar presentations
Operating Systems Components of OS
Advertisements

An Overview Of Virtual Machine Architectures Ross Rosemark.
Mafijul Islam, PhD Software Systems, Electrical and Embedded Systems Advanced Technology & Research Research Issues in Computing Systems: An Automotive.
Multi-dimensional Packet Classification on FPGA: 100Gbps and Beyond
Hardware Support for Trustworthy Systems Ted Huffmire ACACES 2012 Fiuggi, Italy.
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Kernel memory allocation
Challenges and Opportunities for System Software in the Multi-Core Era or The Sky is Falling, The Sky is Falling!
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Multiprocessing Memory Management
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
Xinming Chen, Zhen Chen, Beipeng Mu, Lingyun Ruan, Jinli Meng Towards High-performance IPsec on Cavium OCTEON Platform Research Institute of Information.
Low-Power Wireless Sensor Networks
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
UNIT - 1Topic - 3. Computer software is a program that tells a computer what to do. Computer software, or just software, is any set of machine-readable.
What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
Energy Efficient Phone-to-Phone Communication Based on WiFi Hotspots in PSN En Wang 1,2, Yongjian Yang 1, and Jie Wu 2 1 Dept. of Computer Science and.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
E X C E E D I N G E X P E C T A T I O N S OP SYS Linux System Administration Dr. Hoganson Kennesaw State University Operating Systems Functions of an operating.
System Architecture of Sensor Network Processors Alan Pilecki.
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack.
Important Concepts  Parts of the CPU  Arithmetic/Logic Unit  Control Unit  Registers  Program Counter  Instruction Register  Fetch/Decode/Execute.
Chapter 2 Introduction to OS Chien-Chung Shen CIS, UD
2013/12/09 Yun-Chung Yang Partitioning and Allocation of Scratch-Pad Memory for Priority-Based Preemptive Multi-Task Systems Takase, H. ; Tomiyama, H.
Silberschatz, Galvin and Gagne  Operating System Concepts UNIT II Operating System Services.
O PERATING S YSTEM. What is an Operating System? An operating system is an event driven program which acts as an interface between a user of a computer,
1 Chapter 1 Programming Languages Evolution of Programming Languages To run a Java program: Java instructions need to be translated into an intermediate.
Operating-System Structures
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
1 load [2], [9] Transfer contents of memory location 9 to memory location 2. Illegal instruction.
Relational Query Processing on OpenCL-based FPGAs Zeke Wang, Johns Paul, Hui Yan Cheah (NTU, Singapore), Bingsheng He (NUS, Singapore), Wei Zhang (HKUST,
Mihaela Malița Gheorghe M. Ștefan
A move towards Greener Planet
Organizations Are Embracing New Opportunities
5/3/2018 3:51 AM Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter Yuanwei Lu1,2, Guo Chen2, Zhenyuan Ruan1,2, Wencong Xiao2,3,
Microarchitecture.
Enabling machine learning in embedded systems
Distributed Real-Time Embedded Video Processing
Dynamic Data Driven Application Systems
Modeling of solids segregation in circulating fluidized bed boilers
Overcoming Resource Underutilization in Spatial CNN Accelerators
Introduction to Operating Systems
Xuechao Wei, Peng Zhang, Cody Hao Yu, and Jim Wu
Anne Pratoomtong ECE734, Spring2002
Bluetooth Based Smart Sensor Network
11/13/ :11 PM Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter Yuanwei Lu1,2, Guo Chen2, Zhenyuan Ruan1,2, Wencong Xiao2,3,
Power-Efficient Machine Learning using FPGAs on POWER Systems
Computer Architecture Lecture 4 17th May, 2006
Dynamic Data Driven Application Systems
for Network Processors
EVA2: Exploiting Temporal Redundancy In Live Computer Vision
UN Economic Commission for Europe United Nations Headquarters
AI Stick Easy to learn and use, accelerate the industrialization of artificial intelligence, and let the public become an expert in AI.
Final Project presentation
Computer Services Business challenge
Chapter 2: Operating-System Structures
Map Information Visualization
NetPerL Seminar Hardware/Software Co-Design
Chapter 2: Operating-System Structures
Function of Operating Systems
Optimal Co-design of FPGA Implementations for MPC
Presentation transcript:

Instruction Driven Cross-Layer CNN Accelerator with Winograd Transformation on FPGA Jincheng Yu, Yiming Hu, Xuefei Ning, Jiantao Qiu, Kaiyuan Guo, Yu Wang, Huazhong Yang Dept. E.E., Tsinghua University, Beijing, China Key problems of CNN accelerator on FPGA Memory Access Flexibility Peak Performance Solutions for each problem Cross-Layer Scheduling Instruction Set Winograd Transformation FPGA is adopted to accelerate CNN due to its high performance, high energy efficiency, and flexibility. Memory Access dominates the energy consumption of CNN accelerators rather than computation units. Cross layer scheduling policy can minimize the data intermediate data transfer by using on-chip memory instead of off-chip memory to cache the intermediate data between different layers Flexibility is important for the hardware accelerator because the great variety of the topologies of state-of-the-art CNNs brings challenge to hardware. An Instruction set is can drive different CNN on the same hardware. Winograd transformation can leverage FPGA and improve the peak performance, since several times of multiplication can be done with the same hardware resources In our work, a CNN is divided into several layer blobs to minimize data transfer. The compiler of our instruction set transfer each layer blob into instructions with cross-layer and Winograd. We also design a hardware to run the instructions. Workflow Network dividing to minimize data transfer Translate CNN into instructions Run instructions on FPGA