Mahdi Nazemi, Shahin Nazarian, and Massoud Pedram July 10, 2017

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Deep Learning Bing-Chen Tsai 1/21.
EE645: Independent Component Analysis
Separating Hyperplanes
The loss function, the normal equation,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
REAL-TIME INDEPENDENT COMPONENT ANALYSIS IMPLEMENTATION AND APPLICATIONS By MARCOS DE AZAMBUJA TURQUETI FERMILAB May RTC 2010.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Saturation, Flat-spotting Shift up Derivative Weight Decay No derivative on output nodes.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Survey on ICA Technical Report, Aapo Hyvärinen, 1999.
Collaborative Filtering Matrix Factorization Approach
Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.
Introduction to Adaptive Digital Filters Algorithms
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 On-line Learning of Sequence Data Based on Self-Organizing.
Optimal Nonlinear Neural Network Controllers for Aircraft Joint University Program Meeting October 10, 2001 Nilesh V. Kulkarni Advisors Prof. Minh Q. Phan.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Model representation Linear regression with one variable
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Data statistics and transformation revision Michael J. Watts
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
CSC321: Neural Networks Lecture 9: Speeding up the Learning
Stanford University.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift CS838.
Deep Feedforward Networks
Deep Learning Amin Sobhani.
Randomness in Neural Networks
Artificial Neural Networks
LECTURE 11: Advanced Discriminant Analysis
第 3 章 神经网络.
Multimodal Learning with Deep Boltzmann Machines
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Classification with Perceptrons Reading:
Basic machine learning background with Python scikit-learn
Machine Learning Basics
CS 188: Artificial Intelligence
Adaptation Behavior of Pipelined Adaptive Filters
Neural Networks and Backpropagation
Blind Source Separation with a Time-Varying Mixing Matrix
Collaborative Filtering Matrix Factorization Approach
Logistic Regression & Parallel SGD
Bayesian belief networks 2. PCA and ICA
Artificial Intelligence Chapter 3 Neural Networks
Fluctuation-Dissipation Relations for Stochastic Gradient Descent
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Final Project presentation
Neural Networks Geoff Hulten.
Artificial Intelligence Chapter 3 Neural Networks
A Fast Fixed-Point Algorithm for Independent Component Analysis
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Multivariate Methods Berlin Chen, 2005 References:
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Artificial Intelligence Chapter 3 Neural Networks
CHAPTER 11 REINFORCEMENT LEARNING VIA TEMPORAL DIFFERENCES
Presentation transcript:

Mahdi Nazemi, Shahin Nazarian, and Massoud Pedram July 10, 2017 High-Performance FPGA Implementation of Equivariant Adaptive Separation via Independence Algorithm for Independent Component Analysis Mahdi Nazemi, Shahin Nazarian, and Massoud Pedram July 10, 2017

Outline Independent Component Analysis (ICA) Motivations for Using ICA Equivariant Adaptive Separation via Independence (EASI) Algorithm Proposed Algorithm and Hardware Implementation Results

Independent Component Analysis (ICA) ICA can be defined as estimation of a generative model 𝒙 𝑚×1 = 𝑨 𝑚×𝑛 𝒔 𝑛×1 𝑚 ≥𝑛 𝒙: observed random variables 𝑨: mixing matrix 𝒔: independent components (ICs) Objective: estimate both mixing coefficients and ICs Another variation: 𝒚 𝑛×1 = 𝑩 𝑛×𝑚 𝒙 𝑚×1 𝑩: separation matrix 𝒚: estimates of ICs ICA allows feature extraction, i.e. to keep features that explain the essential structure of the data. Cocktail party problem

Outline Independent Component Analysis (ICA) Motivations for Using ICA Equivariant Adaptive Separation via Independence (EASI) Algorithm Proposed Algorithm and Hardware Implementation Results

Bayesian Neural Networks (BNNs) Inputs, weights, or outputs follow a probability distribution function Sampling dependent features is complicated and computationally expensive ICA finds independent components ⇒ simplifies sampling RNGs for sampling: Wallace, Ziggurat

ICA for Dimensionality Reduction Preprocessing step for transforming the original problem into a smaller problem suitable for hardware implementation MNIST dataset Input features: 784 → 32 (~25x ↓) Accuracy: 0.23% ↓ not only reduces redundancy, but also better for hardware

Outline Independent Component Analysis (ICA) Motivations for Using ICA Equivariant Adaptive Separation via Independence (EASI) Algorithm Proposed Algorithm and Hardware Implementation Results

EASI Algorithm Equivariant Adaptive Combines whitening and separation Stochastic Gradient Descent (SGD) optimization Combines whitening and separation Only requires addition and multiplication 𝒙 𝑘 : observations (m) 𝑩 𝑘 : separation matrix 𝒚 𝑘 : output features (n) 𝒚 𝑘 = 𝑩 𝑘 𝒙 𝑘 Nonlinearity 𝐠 (𝒚 𝑘 )= 𝑦 𝑘 3 𝑩 𝑘+1 = 𝑩 𝑘 −𝜇𝐻 𝑩 𝑘 𝑯=𝑰− 𝒚 𝑘 𝒚 𝑘 𝑇 + 𝒈(𝒚 𝑘 ) 𝒚 𝑘 𝑇 − 𝒚 𝑘 𝒈( 𝒚 𝑘 ) 𝑇 𝒙 𝑘 Repeat until convergence loop-carried dependency S1 S2 S3 S4

Hardware Implementation

Shortcomings of Existing Implementations Clock frequency/throughput is low each training sample has to wait for the immediately preceding sample to update model parameters Clock frequency or throughput decreases by increasing 𝑚 and 𝑛 [Meyer-Baese, SPIE ’15]

Outline Independent Component Analysis (ICA) Motivations for Using ICA Equivariant Adaptive Separation via Independence (EASI) Algorithm Proposed Algorithm and Hardware Implementation Results

Proposed Algorithm S1 S2 S3 S4 S5 𝒙 𝑘 𝑝 𝒙 𝑘 : observations (m) 𝒚 𝑘 𝑝 = 𝑩 𝑘 𝑝 𝒙 𝑘 𝑝 Nonlinearity 𝐠( 𝒚 𝑘 𝑝 )= ( 𝒚 𝑘 𝑝 ) 3 𝑩 𝑘 𝑝+1 = 𝑩 𝑘 𝑝 − 𝑯 𝑘 𝑝 𝑩 𝑘 𝑝 𝑯 𝑘 𝑝 =𝑰− 𝒚 𝑘 𝑝 ( 𝒚 𝑘 𝑝 ) 𝑇 +𝑔( 𝒚 𝑘 𝑝 ) ( 𝒚 𝑘 𝑝 ) 𝑇 − 𝒚 𝑘 𝑝 𝒈( 𝒚 𝑘 𝑝 ) 𝑇 𝒙 𝑘 𝑝 𝑯 𝑘 𝑝 = 𝛾 𝑯 𝑘−1 𝑃 +𝜇 𝑯 𝑘 𝑝 , 𝑝=0 &𝛽 𝑯 𝑘 𝑝−1 +𝜇 𝑯 𝑘 𝑝 , 0<𝑝<𝑃 increment p S1 S2 S3 S4 S5 𝒙 𝑘 : observations (m) 𝑩 𝑘 : separating matrix 𝒚 𝑘 : output features (n) 𝑘: index of mini-batch 𝑝: index of training sample within a mini-batch 𝑃: mini-batch size Initialize 𝐵 0 0 randomly At the beginning of each mini-batch: initialize 𝑝 to 0 initialize 𝑯 𝑘 𝑝 to a zero matrix

Hardware Implementation equations for critical path delay and throughput

Outline Independent Component Analysis (ICA) Motivations for Using ICA Equivariant Adaptive Separation via Independence (EASI) Algorithm Proposed Algorithm and Hardware Implementation Results

FPGA Implementation 32-bit floating point variables and operations 𝑚=4 and n=2 ~11x increase in clock frequency ~149x increase in throughput ~23x increase in number of registers Clock frequency is independent of 𝑚 and 𝑛 EASI with SGD EASI with SMBGD Clock frequency (MHz) 4.81 55.17 Throughput (MIPS) 717.21 Adaptive Logic Modules (ALMs) 12731 10350 DSPs (Multipliers) 42 Registers (bits) 160 3648

During Poster Session Later Today Q&A During Poster Session Later Today