Mahdi Nazemi, Shahin Nazarian, and Massoud Pedram July 10, 2017

Slides:

Advertisements

Similar presentations

Bayesian Belief Propagation

Advertisements

Deep Learning Bing-Chen Tsai 1/21.

EE645: Independent Component Analysis

Separating Hyperplanes

The loss function, the normal equation,

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University

REAL-TIME INDEPENDENT COMPONENT ANALYSIS IMPLEMENTATION AND APPLICATIONS By MARCOS DE AZAMBUJA TURQUETI FERMILAB May RTC 2010.

Independent Component Analysis (ICA) and Factor Analysis (FA)

Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.

12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.

ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.

MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Saturation, Flat-spotting Shift up Derivative Weight Decay No derivative on output nodes.

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Survey on ICA Technical Report, Aapo Hyvärinen, 1999.

Collaborative Filtering Matrix Factorization Approach

Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.

Introduction to Adaptive Digital Filters Algorithms

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 On-line Learning of Sequence Data Based on Self-Organizing.

Optimal Nonlinear Neural Network Controllers for Aircraft Joint University Program Meeting October 10, 2001 Nilesh V. Kulkarni Advisors Prof. Minh Q. Phan.

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.

Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction.

Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.

Model representation Linear regression with one variable

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.

J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.

Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.

Data statistics and transformation revision Michael J. Watts

Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.

CSC321: Neural Networks Lecture 9: Speeding up the Learning

Stanford University.

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift CS838.

Deep Feedforward Networks

Deep Learning Amin Sobhani.

Randomness in Neural Networks

Artificial Neural Networks

LECTURE 11: Advanced Discriminant Analysis

第 3 章神经网络.

Multimodal Learning with Deep Boltzmann Machines

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Classification with Perceptrons Reading:

Basic machine learning background with Python scikit-learn

Machine Learning Basics

CS 188: Artificial Intelligence

Adaptation Behavior of Pipelined Adaptive Filters

Neural Networks and Backpropagation

Blind Source Separation with a Time-Varying Mixing Matrix

Collaborative Filtering Matrix Factorization Approach

Logistic Regression & Parallel SGD

Bayesian belief networks 2. PCA and ICA

Artificial Intelligence Chapter 3 Neural Networks

Fluctuation-Dissipation Relations for Stochastic Gradient Descent

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Final Project presentation

Neural Networks Geoff Hulten.

Artificial Intelligence Chapter 3 Neural Networks

A Fast Fixed-Point Algorithm for Independent Component Analysis

Artificial Intelligence Chapter 3 Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Multivariate Methods Berlin Chen, 2005 References:

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Artificial Intelligence Chapter 3 Neural Networks

CHAPTER 11 REINFORCEMENT LEARNING VIA TEMPORAL DIFFERENCES

Presentation transcript:

Mahdi Nazemi, Shahin Nazarian, and Massoud Pedram July 10, 2017 High-Performance FPGA Implementation of Equivariant Adaptive Separation via Independence Algorithm for Independent Component Analysis Mahdi Nazemi, Shahin Nazarian, and Massoud Pedram July 10, 2017

Outline Independent Component Analysis (ICA) Motivations for Using ICA Equivariant Adaptive Separation via Independence (EASI) Algorithm Proposed Algorithm and Hardware Implementation Results

Independent Component Analysis (ICA) ICA can be defined as estimation of a generative model 𝒙 𝑚×1 = 𝑨 𝑚×𝑛 𝒔 𝑛×1 𝑚 ≥𝑛 𝒙: observed random variables 𝑨: mixing matrix 𝒔: independent components (ICs) Objective: estimate both mixing coefficients and ICs Another variation: 𝒚 𝑛×1 = 𝑩 𝑛×𝑚 𝒙 𝑚×1 𝑩: separation matrix 𝒚: estimates of ICs ICA allows feature extraction, i.e. to keep features that explain the essential structure of the data. Cocktail party problem

Outline Independent Component Analysis (ICA) Motivations for Using ICA Equivariant Adaptive Separation via Independence (EASI) Algorithm Proposed Algorithm and Hardware Implementation Results

Bayesian Neural Networks (BNNs) Inputs, weights, or outputs follow a probability distribution function Sampling dependent features is complicated and computationally expensive ICA finds independent components ⇒ simplifies sampling RNGs for sampling: Wallace, Ziggurat

ICA for Dimensionality Reduction Preprocessing step for transforming the original problem into a smaller problem suitable for hardware implementation MNIST dataset Input features: 784 → 32 (~25x ↓) Accuracy: 0.23% ↓ not only reduces redundancy, but also better for hardware

Outline Independent Component Analysis (ICA) Motivations for Using ICA Equivariant Adaptive Separation via Independence (EASI) Algorithm Proposed Algorithm and Hardware Implementation Results

EASI Algorithm Equivariant Adaptive Combines whitening and separation Stochastic Gradient Descent (SGD) optimization Combines whitening and separation Only requires addition and multiplication 𝒙 𝑘 : observations (m) 𝑩 𝑘 : separation matrix 𝒚 𝑘 : output features (n) 𝒚 𝑘 = 𝑩 𝑘 𝒙 𝑘 Nonlinearity 𝐠 (𝒚 𝑘 )= 𝑦 𝑘 3 𝑩 𝑘+1 = 𝑩 𝑘 −𝜇𝐻 𝑩 𝑘 𝑯=𝑰− 𝒚 𝑘 𝒚 𝑘 𝑇 + 𝒈(𝒚 𝑘 ) 𝒚 𝑘 𝑇 − 𝒚 𝑘 𝒈( 𝒚 𝑘 ) 𝑇 𝒙 𝑘 Repeat until convergence loop-carried dependency S1 S2 S3 S4

Hardware Implementation

Shortcomings of Existing Implementations Clock frequency/throughput is low each training sample has to wait for the immediately preceding sample to update model parameters Clock frequency or throughput decreases by increasing 𝑚 and 𝑛 [Meyer-Baese, SPIE ’15]

Outline Independent Component Analysis (ICA) Motivations for Using ICA Equivariant Adaptive Separation via Independence (EASI) Algorithm Proposed Algorithm and Hardware Implementation Results

Proposed Algorithm S1 S2 S3 S4 S5 𝒙 𝑘 𝑝 𝒙 𝑘 : observations (m) 𝒚 𝑘 𝑝 = 𝑩 𝑘 𝑝 𝒙 𝑘 𝑝 Nonlinearity 𝐠( 𝒚 𝑘 𝑝 )= ( 𝒚 𝑘 𝑝 ) 3 𝑩 𝑘 𝑝+1 = 𝑩 𝑘 𝑝 − 𝑯 𝑘 𝑝 𝑩 𝑘 𝑝 𝑯 𝑘 𝑝 =𝑰− 𝒚 𝑘 𝑝 ( 𝒚 𝑘 𝑝 ) 𝑇 +𝑔( 𝒚 𝑘 𝑝 ) ( 𝒚 𝑘 𝑝 ) 𝑇 − 𝒚 𝑘 𝑝 𝒈( 𝒚 𝑘 𝑝 ) 𝑇 𝒙 𝑘 𝑝 𝑯 𝑘 𝑝 = 𝛾 𝑯 𝑘−1 𝑃 +𝜇 𝑯 𝑘 𝑝 , 𝑝=0 &𝛽 𝑯 𝑘 𝑝−1 +𝜇 𝑯 𝑘 𝑝 , 0<𝑝<𝑃 increment p S1 S2 S3 S4 S5 𝒙 𝑘 : observations (m) 𝑩 𝑘 : separating matrix 𝒚 𝑘 : output features (n) 𝑘: index of mini-batch 𝑝: index of training sample within a mini-batch 𝑃: mini-batch size Initialize 𝐵 0 0 randomly At the beginning of each mini-batch: initialize 𝑝 to 0 initialize 𝑯 𝑘 𝑝 to a zero matrix

Hardware Implementation equations for critical path delay and throughput

Outline Independent Component Analysis (ICA) Motivations for Using ICA Equivariant Adaptive Separation via Independence (EASI) Algorithm Proposed Algorithm and Hardware Implementation Results

FPGA Implementation 32-bit floating point variables and operations 𝑚=4 and n=2 ~11x increase in clock frequency ~149x increase in throughput ~23x increase in number of registers Clock frequency is independent of 𝑚 and 𝑛 EASI with SGD EASI with SMBGD Clock frequency (MHz) 4.81 55.17 Throughput (MIPS) 717.21 Adaptive Logic Modules (ALMs) 12731 10350 DSPs (Multipliers) 42 Registers (bits) 160 3648

During Poster Session Later Today Q&A During Poster Session Later Today