Analysis of Classification-based Error Functions Mike Rimer Dr. Tony Martinez BYU Computer Science Dept. 18 March 2006.

Slides:



Advertisements
Similar presentations
NEURAL NETWORKS Backpropagation Algorithm
Advertisements

EE 690 Design of Embodied Intelligence
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Neural Network I Week 7 1. Team Homework Assignment #9 Read pp. 327 – 334 and the Week 7 slide. Design a neural network for XOR (Exclusive OR) Explore.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Lecture 14 – Neural Networks
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Artificial Neural Networks ML Paul Scheible.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Deep Belief Networks for Spam Filtering
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Artificial Neural Networks
Experimental Evaluation
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Artificial Neural Networks
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Machine Learning Chapter 4. Artificial Neural Networks
Appendix B: An Example of Back-propagation algorithm
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Multi-Layer Perceptron
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Over-Trained Network Node Removal and Neurotransmitter-Inspired Artificial Neural Networks By: Kyle Wray.
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Chapter 6 Neural Network.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Evolutionary Computation Evolving Neural Network Topologies.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Artificial Neural Networks
COMP24111: Machine Learning and Optimisation
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
CSSE463: Image Recognition Day 17
Prof. Carolina Ruiz Department of Computer Science
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Artificial Intelligence Methods
network of simple neuron-like computing elements
Artificial Neural Networks
CSSE463: Image Recognition Day 17
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
CSSE463: Image Recognition Day 17
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
CSSE463: Image Recognition Day 17
COSC 4335: Part2: Other Classification Techniques
CSSE463: Image Recognition Day 17
Prof. Carolina Ruiz Department of Computer Science
Presentation transcript:

Analysis of Classification-based Error Functions Mike Rimer Dr. Tony Martinez BYU Computer Science Dept. 18 March 2006

Overview Machine learning Teaching artificial neural networks with an error function Problems with conventional error functions CB algorithms Experimental results Conclusion and future work

Machine Learning Goal: Automating learning of problem domains Given a training sample from a problem domain, induce a correct solution-hypothesis over the entire problem population The learning model is often used as a black box inputoutput f (x)

Teaching ANNs with an Error Function Used to train a multi-layer perceptron (MLP) to guide the gradient descent learning procedure to an optimal state Conventional error metrics are sum-squared error (SSE) and cross entropy (CE) SSE suited to function approximation CE aimed at classification problems CB error functions [Rimer & Martinez 06] work better for classification

SSE, CE Attempts to approximate 0-1 targets in order to represent making a decision 0 1 O2O2 O1O1 ERROR 2ERROR 1 Pattern labeled as class 2

Issues with approximating hard targets Requires weights to be large to achieve optimality Leads to premature weight saturation Weight decay, etc., can improve the situation Learns areas of the problem space unevenly and at different times during training Makes global learning problematic

Classification-based Error Functions Designed to more closely match the goal of learning a classification task (i.e. correct classifications, not low error on 0-1 targets), avoiding premature weight saturation and discouraging overfit CB1 [Rimer & Martinez 02, 06] CB2 [Rimer & Martinez 04] CB3 (submitted to ICML ‘06)

CB1 Only backpropagates error on misclassified training patterns 0 1 Correct T~T 0 1 Misclassified T~T ERROR

CB2 Adds a confidence margin, μ, that is increased globally as training progresses 0 1 Misclassified T~T ERROR μ 0 1 ~TT ERROR μ Correct, but doesn’t satisfy margin 0 1 Correct, and satisfies margin T~T μ

CB3 Learns a confidence C i for each training pattern i as training progresses Patterns often misclassified have low confidence Patterns consistently classified correctly gain confidence 0 1 Misclassified T~T ERROR 0 1 ~TT ERROR CiCi Correct with learned low confidence 0 1 ~TT ERROR CiCi Correct with learned high confidence

Neural Network Training Influenced by: Initial parameter (weight) settings Pattern order presentation (stochastic training) Learning rate # of hidden nodes Goal of training: High generalization Low bias and variance

Experiments Empirical comparison of six error functions SSE, CE, CE w/ WD, CB1-3 Used eleven benchmark problems from the UC Irvine Machine Learning Repository ann, balance, bcw, derm, ecoli, iono, iris, musk2, pima, sonar, wine Testing performed using stratified 10-fold cross- validation Model selection by hold-out set Results were averaged over ten tests LR = 0.1, M = 0.7

Classifier output difference (COD) Evaluation of behavioral difference of two hypotheses (e.g. classifiers) T is the test set I is the identity or characteristic function

Robustness to initial network weights Averaged 30 random runs over all datasets algorithm % Test accSt DevEpoch CB CB CB CE CE w/ WD SSE

Robustness to initial network weights Averaged over all tests AlgorithmTest errorCOD CB CB CB CE CE w/ WD SSE

Robustness to pattern presentation order Averaged 30 random runs over all datasets algorithm % Test accSt DevEpoch CB CB CB CE CE w/ WD SSE

Robustness to pattern presentation order Averaged over all tests AlgorithmTest errorCOD CB CB CB CE CE w/ WD SSE

Robustness to learning rate Average of varying the learning rate from 0.01 – 0.3 AlgorithmTest accSt DevEpoch CB CB SSE CB CE CE w/ WD

Robustness to learning rate

Robustness to number of hidden nodes Average of varying the number of nodes in the hidden layer from AlgorithmTest accSt devEpoch CB CB CB SSE CE CE w/ WD

Robustness to number of hidden nodes

Conclusion CB1-3 are generally more robust than SSE, CE, and CE w/ WD, with respect to: Initial weight settings Pattern presentation order Pattern variance Learning rate # hidden nodes CB3 is superior, most robust, with most consistent results

Questions?