Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Slides from: Doug Gray, David Poole
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Machine Learning Neural Networks
Artificial Neural Networks ML Paul Scheible.
Neural Networks Part 4 Dan Simon Cleveland State University 1.
Neural Networks Multi-stage regression/classification model activation function PPR also known as ridge functions in PPR output function bias unit synaptic.
Basic Data Mining Techniques Chapter Decision Trees.
Basic Data Mining Techniques
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Artificial Neural Networks
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
Last lecture summary.
Chapter 7 Artificial Neural Networks
Evaluating Performance for Data Mining Techniques
Artificial Neural Networks
Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.
Biointelligence Laboratory, Seoul National University
Cascade Correlation Architecture and Learning Algorithm for Neural Networks.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 23 Nov 2, 2005 Nanjing University of Science & Technology.
Chapter 9 – Classification and Regression Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples Dan Ciresan.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 21 Oct 28, 2005 Nanjing University of Science & Technology.
Benk Erika Kelemen Zsolt
A comparison of the ability of artificial neural network and polynomial fitting was carried out in order to model the horizontal deformation field. It.
Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.
Multi-Layer Perceptron
Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.
EE459 Neural Networks Backpropagation
Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Implementing Local Relative Sensitivity Pruning Paul Victorey.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Network Compression and Speedup
Chapter 7. Classification and Prediction
Debesh Jha and Kwon Goo-Rak
Extreme Learning Machine
MATLAB Implementation of the Optimal Brain Surgeon (OBS) Algorithm
One-layer neural networks Approximation problems
ECE 539 Final Project Mark Slosarek
Java Implementation of Optimal Brain Surgeon
Going Backwards In The Procedure and Recapitulation of System Identification By Ali Pekcan 65570B.
Generalization ..
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
ECE 471/571 - Lecture 17 Back Propagation.
Biological and Artificial Neuron
Biological and Artificial Neuron
Biological and Artificial Neuron
Capabilities of Threshold Neurons
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR
Neural Network Training
Seminar on Machine Learning Rada Mihalcea
Pattern Recognition: Statistical and Neural
Derivatives and Gradients
Presentation transcript:

Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003

The Idea Networks with excessive weights “over-train” on data. As a result, they have poor generalization. Create a technique that can effectively reduce the size of the network without reducing validation. Hopefully, by reducing the complexity, network pruning can increase the generalization capabilities of the net.

History Removing weights means to set them to 0 and freeze them First attempt at network pruning removed weights of least magnitude Minimize cost function composed of both the training error and the measure of network complexity

Lecun’s Take Derive a more theoretically sound technique for weight removal order using the derivative of the error function:

Computing the 2 nd Derivatives Network expressed as: Diagonals of Hessian: Second Derivatives:

The Recipe Train the network until local minimum is obtained Compute the second derivatives for each parameter Compute the saliencies Delete the low-saliency parameters Iterate

Results Results of OBD Compared to Magnitude-Based Damage

Results Continued Comparison of MSE with Retraining versus w/o Retraining

Lecon’s Conclusions Optimal Brain Damage results in a decrease in the number of parameters by up to four; general recognition accuracy increased. OBD can be used either as an automatic pruning tool or an interactive one.

Babak Hassibi: Return of Lecun Several problems arise from Lecun’s simplifying assumptions For smaller sized networks, OBD chooses the incorrect parameter to delete It is possible to recursively calculate the Hessian, yielding a more accurate approximation.

**Insert Math Here** (I have no idea what I’m talking about)

The MONK’s Problems Set of problems involving classifying artificial robots based on six discrete valued attributes Binary Decision Problems: (head_shape = body_shape) Study performed in 1991; Back-propagation with weight decay found to be most accurate solution at the time.

Results: Hassibi Wins Training # weights MONK1 BPWD OBS MONK2 BPWD OBS MONK3 BPWD OBS

References Le Cun, Yann. “Optimal Brain Damage”. AT&T Bell Laboratories, Hassibi, Babak, Stork, David. “Optimal Brain Surgeon and General Network Pruning”. Ricoh California Research Center Thrun, S.B. “The MONK’s Problems”. CMU

Questions? (Brain Background Courtesy Brainburst.com)