Richard Dorrance Literature Review: 1/11/13

Slides:

Advertisements

Similar presentations

NEU Neural Computing MSc Natural Computation Department of Computer Science University of York.

Advertisements

Introduction to Neural Networks 2. Overview  The McCulloch-Pitts neuron  Pattern space  Limitations  Learning.

Implementation of 2-D FFT on the Cell Broadband Engine Architecture William Lundgren Gedae), Kerry Barnes (Gedae), James Steed (Gedae)

Sparse Coding in Sparse Winner networks Janusz A. Starzyk 1, Yinyin Liu 1, David Vogel 2 1 School of Electrical Engineering & Computer Science Ohio University,

Sparse LU Factorization for Parallel Circuit Simulation on GPU Ling Ren, Xiaoming Chen, Yu Wang, Chenxi Zhang, Huazhong Yang Department of Electronic Engineering,

5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.

Sparse Matrix Storage Lecture #3 EEE 574 Dr. Dan Tylavsky.

Maths for Computer Graphics

Automatic Performance Tuning of Sparse Matrix Kernels Observations and Experience Performance tuning is tedious and time- consuming work. Richard Vuduc.

CEG 221 Lesson 5: Algorithm Development II Mr. David Lippa.

SME Review - September 20, 2006 Neural Network Modeling Jean Carlson, Ted Brookings.

Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.

The McCulloch-Pitts Neuron. Characteristics The activation of a McCulloch Pitts neuron is binary. Neurons are connected by directed weighted paths. A.

Neural Networks Lab 5. What Is Neural Networks? Neural networks are composed of simple elements( Neurons) operating in parallel. Neural networks are composed.

HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B By: Zaid Abassi Supervisor: Rolf.

L11: Sparse Linear Algebra on GPUs CS Sparse Linear Algebra 1 L11: Sparse Linear Algebra CS6235

Little Linear Algebra Contents: Linear vector spaces Matrices Special Matrices Matrix & vector Norms.

Makoto Kudoh*1, Hisayasu Kuroda*1,

Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, Katherine Yelick Lawrence Berkeley National Laboratory ACM International Conference.

Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.

IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.

Neural Networks Architecture Baktash Babadi IPM, SCS Fall 2004.

Modular SRAM-based Binary Content-Addressable Memories Ameer M.S. Abdelhadi and Guy G.F. Lemieux Department of Electrical and Computer Engineering University.

PDCS 2007 November 20, 2007 Accelerating the Complex Hessenberg QR Algorithm with the CSX600 Floating-Point Coprocessor Yusaku Yamamoto 1 Takafumi Miyata.

Automatic Performance Tuning of SpMV on GPGPU Xianyi Zhang Lab of Parallel Computing Institute of Software Chinese Academy of Sciences

JAVA AND MATRIX COMPUTATION

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.

Sparse Vectors & Matrices Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.

Joseph L. GreathousE, Mayank Daga AMD Research 11/20/2014

Direct Methods for Sparse Linear Systems Lecture 4 Alessandra Nardi Thanks to Prof. Jacob White, Suvranu De, Deepak Ramaswamy, Michal Rewienski, and Karen.

Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

Sparse Matrix-Vector Multiply on the Keystone II Digital Signal Processor Yang Gao, Fan Zhang and Dr. Jason D. Bakos 2014 IEEE High Performance Extreme.

STRUCTURAL AGNOSTIC SPMV: ADAPTING CSR-ADAPTIVE FOR IRREGULAR MATRICES MAYANK DAGA AND JOSEPH L. GREATHOUSE AMD RESEARCH ADVANCED MICRO DEVICES, INC.

PMLAB, IECS, FCU Designing Efficient Matrix Transposition on Various Interconnection Networks Using Tensor Product Formulation Presented by Chin-Yi Tsai.

Spiking Neural Networks Banafsheh Rekabdar. Biological Neuron: The Elementary Processing Unit of the Brain.

Irregular Applications –Sparse Matrix Vector Multiplication

Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.

Notes Over 4.2 Finding the Product of Two Matrices Find the product. If it is not defined, state the reason. To multiply matrices, the number of columns.

Vector computers.

Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory.

1 Lecture 5a: CPU architecture 101 boris.

Optimizing the Performance of Sparse Matrix-Vector Multiplication

University of California, Berkeley

Scalpel: Customizing DNN Pruning to the

Yang Gao and Dr. Jason D. Bakos

Analysis of Sparse Convolutional Neural Networks

Compressive Coded Aperture Video Reconstruction

12-1 Organizing Data Using Matrices

Scientific requirements and dimensioning for the MICADO-SCAO RTC

Computer Organisation

Sparse Matrix-Vector Multiplication (Sparsity, Bebop)

Prof. Zhang Gang School of Computer Sci. & Tech.

Prof. Zhang Gang School of Computer Sci. & Tech.

C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs Shuo Wang1, Zhe Li2, Caiwen Ding2, Bo Yuan3, Qinru Qiu2, Yanzhi Wang2,

for more information ... Performance Tuning

Introduction to Computer Systems

Linchuan Chen, Peng Jiang and Gagan Agrawal

Mihir Awatramani Lakshmi kiran Tondehal Xinying Wang Y. Ravi Chandra

"Developing an Efficient Sparse Matrix Framework Targeting SSI Applications" Diego Rivera and David Kaeli The Center for Subsurface Sensing and Imaging.

Covariation Learning and Auto-Associative Memory

Introduction (2/2) in Adaptive Cooperative Systems by Martine Beckerman, ’ 7.10 B.-W. Ku.

Artificial Neural Networks

1CECA, Peking University, China

Parallel build blocks.

Artificial Neural Networks

Memory System Performance Chapter 3

Matrices An appeaser is one who feeds a crocodile—hoping it will eat him last. Winston Churchhill.

Matrix A matrix is a rectangular arrangement of numbers in rows and columns Each number in a matrix is called an Element. The dimensions of a matrix are.

August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab

Introduction to Neural Network

Presentation transcript:

Richard Dorrance Literature Review: 1/11/13 Chapter 11 in Systems and Circuit Design for Biologically-Inspired Intelligent Learning FPGA Coprocessor for Simulation of Neural Networks Using Compressed Matrix Storage Richard Dorrance Literature Review: 1/11/13

Overview Binary neuron model for unsupervised learning arranged into minicloumns and macrocloumns Sparse matrix vector multiplication (SpMxV) compressed row format software optimizations FPGA coprocessor exploiting matrix characteristics for a “novel” architecture benchmarks and scalability (or lack there of)

Neural Network Example

Binary Neuron Model Modeled as McCulloch-Pitts neurons (i.e. binary): only 2 states: firing (1) and not firing (0) fixed time step t refractory period of one time step

Macrocolumn Dynamics

Cortical Macrocolumn Connection Matrix

Feature Extraction: Bars Test Introduce learning with “Hebbian plasticity” (i.e. updating neuronal links)

Compress Row Format Sparse matrix representation (w/ 3 vectors): value column index row pointer

SpMxV is the Bottleneck Theoretically SpMxV should be memory bound Reality: lots of stalling for data due to irregular memory access patterns Coding Strategies: cache prefetching matrix reordering register blocking (i.e. N smaller, dense matrices) CPU: 100 GFLOPS (theoretical), 300 MFLOPS (reality) GPU: 1 TFLOPS (theoretical), 10 GFLOPS (reality)

Simplifications and Optimizations Matrix elements are binary: value vector is dropped Strong block-like structure to matrices: compress column index vector

FPGA Coprocessor Architecture

Resource Usage

Scalability

Conclusions SpMxV is the major bottleneck to simulating neural networks Architectures of CPUs and GPUs limit performance FPGAs can increase performance efficiency Specialized formulation of SpMxV  limited scalability