Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.

Slides:

Advertisements

Similar presentations

Artificial Neural Networks

Advertisements

Slides from: Doug Gray, David Poole

Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Chapter 7 – Classification and Regression Trees

Chapter 7 – Classification and Regression Trees

Machine Learning Neural Networks

Artificial Neural Networks ML Paul Scheible.

Neural Networks Part 4 Dan Simon Cleveland State University 1.

Neural Networks Multi-stage regression/classification model activation function PPR also known as ridge functions in PPR output function bias unit synaptic.

Basic Data Mining Techniques Chapter Decision Trees.

Basic Data Mining Techniques

1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Artificial Neural Networks

General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.

Last lecture summary.

Chapter 7 Artificial Neural Networks

Evaluating Performance for Data Mining Techniques

Artificial Neural Networks

Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.

Biointelligence Laboratory, Seoul National University

Cascade Correlation Architecture and Learning Algorithm for Neural Networks.

1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.

1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 23 Nov 2, 2005 Nanjing University of Science & Technology.

Chapter 9 – Classification and Regression Trees

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.

Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples Dan Ciresan.

1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 21 Oct 28, 2005 Nanjing University of Science & Technology.

Benk Erika Kelemen Zsolt

A comparison of the ability of artificial neural network and polynomial fitting was carried out in order to model the horizontal deformation field. It.

Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Multi-Layer Perceptron

Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.

EE459 Neural Networks Backpropagation

Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.

Implementing Local Relative Sensitivity Pruning Paul Victorey.

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.

Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.

Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.

A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Network Compression and Speedup

Chapter 7. Classification and Prediction

Debesh Jha and Kwon Goo-Rak

Extreme Learning Machine

MATLAB Implementation of the Optimal Brain Surgeon (OBS) Algorithm

One-layer neural networks Approximation problems

ECE 539 Final Project Mark Slosarek

Java Implementation of Optimal Brain Surgeon

Going Backwards In The Procedure and Recapitulation of System Identification By Ali Pekcan 65570B.

Generalization ..

Machine Learning Today: Reading: Maria Florina Balcan

CSC 578 Neural Networks and Deep Learning

ECE 471/571 - Lecture 17 Back Propagation.

Biological and Artificial Neuron

Biological and Artificial Neuron

Biological and Artificial Neuron

Capabilities of Threshold Neurons

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR

Neural Network Training

Seminar on Machine Learning Rada Mihalcea

Pattern Recognition: Statistical and Neural

Derivatives and Gradients

Presentation transcript:

Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003

The Idea Networks with excessive weights “over-train” on data. As a result, they have poor generalization. Create a technique that can effectively reduce the size of the network without reducing validation. Hopefully, by reducing the complexity, network pruning can increase the generalization capabilities of the net.

History Removing weights means to set them to 0 and freeze them First attempt at network pruning removed weights of least magnitude Minimize cost function composed of both the training error and the measure of network complexity

Lecun’s Take Derive a more theoretically sound technique for weight removal order using the derivative of the error function:

Computing the 2 nd Derivatives Network expressed as: Diagonals of Hessian: Second Derivatives:

The Recipe Train the network until local minimum is obtained Compute the second derivatives for each parameter Compute the saliencies Delete the low-saliency parameters Iterate

Results Results of OBD Compared to Magnitude-Based Damage

Results Continued Comparison of MSE with Retraining versus w/o Retraining

Lecon’s Conclusions Optimal Brain Damage results in a decrease in the number of parameters by up to four; general recognition accuracy increased. OBD can be used either as an automatic pruning tool or an interactive one.

Babak Hassibi: Return of Lecun Several problems arise from Lecun’s simplifying assumptions For smaller sized networks, OBD chooses the incorrect parameter to delete It is possible to recursively calculate the Hessian, yielding a more accurate approximation.

**Insert Math Here** (I have no idea what I’m talking about)

The MONK’s Problems Set of problems involving classifying artificial robots based on six discrete valued attributes Binary Decision Problems: (head_shape = body_shape) Study performed in 1991; Back-propagation with weight decay found to be most accurate solution at the time.

Results: Hassibi Wins Training # weights MONK1 BPWD OBS MONK2 BPWD OBS MONK3 BPWD OBS

References Le Cun, Yann. “Optimal Brain Damage”. AT&T Bell Laboratories, Hassibi, Babak, Stork, David. “Optimal Brain Surgeon and General Network Pruning”. Ricoh California Research Center Thrun, S.B. “The MONK’s Problems”. CMU

Questions? (Brain Background Courtesy Brainburst.com)