Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine

Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
Support vector machine
Separating Hyperplanes
Support Vector Machines
Lecture 14 – Neural Networks
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
Support Vector Machines Kernel Machines
Support Vector Machine (SVM) Classification
Support Vector Machines and Kernel Methods
Support Vector Machines
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
Bayesian belief networks 2. PCA and ICA
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
An Introduction to Support Vector Machines (M. Law)
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
START OF DAY 5 Reading: Chap. 8. Support Vector Machine.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
An Introduction to Support Vector Machine (SVM)
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Inequality Constraints Lecture 7. Inequality Contraints (I) n A Review of Lagrange Multipliers –As we discussed last time, the first order necessary conditions.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.
Support vector machines
CSSE463: Image Recognition Day 14
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
Support Vector Machine
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Support Vector Machines
Lecture 19. SVM (III): Kernel Formulation
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Statistical Learning Dong Liu Dept. EEIS, USTC.
Support Vector Machines Most of the slides were taken from:
CSSE463: Image Recognition Day 14
Support vector machines
Machine Learning Week 3.
Support vector machines
Support vector machines
Machine Learning Support Vector Machine Supervised Learning
Presentation transcript:

Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras

Bayesian neural networks I. The Bayes rule:Let’s consider a model of a system and an observation of the system, an event. The a posteriori probability of correctness of the model, after the observation of the event, is proportional to the product of the a priori correctness of the model and the probability of the event conditioned by the correctness of the model. Mathematically: where  is the parameter of the model H  and D is the observed event

Bayesian neural networks II. Best model: model with highest a posteriori probability of correctness Model selection by optimizing the formula:

Bayesian neural networks III. Application to neural networks: g  is the function represented by the neural network, is the observed event where  is the vector of all parameters of the network we suppose normal distribution for the data conditioned by the validity of a model, i.e., the observed values y i are normally distributed around g  (x i ), if  is the correct parameter vector

Bayesian neural networks IV. By making the calculations we get: and the new formula for optimization is:

Bayesian neural networks V. The equivalence of the regularization and Bayesian model selection Regularization formula: Bayesian optimization formula: Equivalence: Both represents a priori information about the correct solution

Bayesian neural networks VI. Bayesian pruning by regularization Gauss pruning: Laplace pruning: Cauchy pruning: N is the number of components of the  vectors

Support vector machines - SVM I. Linear separable classes: - many separators - there is an optimal separator

Support vector machines - SVM II. How to find the optimal separator ? - support vectors - overspecification Property: one less support vectornew optimal separator

Support vector machines - SVM III. We look for minimal and robust separators. These are minimal and robust models of the data. The full data set is equivalent with the set of the support vectors with respect to the specification of the minimal robust model.

Support vector machines - SVM IV. Mathematical problem formulation I. we represent the separator as a pair (w,b), where w is vector and b is a scalar we look w and b such that they satisfy: The support vectors are those x i -s for which this inequality is in fact equality.

Support vector machines - SVM V. Mathematical problem formulation II. The distances form the origo of the hyper-planes of the support vectors are: The distance between the two planes is:

Support vector machines - SVM VI. Mathematical problem formulation III. Optimal separator: the distance between the two hyper-planes is maximal Optimization: with the restrictions that or in other form

Support vector machines - SVM VII. Mathematical problem formulation IV. Complete optimization formula, using Lagrange multipliers

Support vector machines - SVM VIII. Mathematical problem formulation V. Writing the optimality conditions for w and b we get: The dual problem is: The support vectors are those x i -s for which  i is strictly positive

Support vector machines - SVM IX. Graphical interpretation We search for the tangent point of a hyper-ellipsoid with the positive space quadrant

Support vector machines - SVM X. How to solve the support vector problem ? Optimization with respect to the  -s - gradient method - Newton and quasi-Newton methods We get as result: - the support vectors - the optimal linear separator

Support vector machines - SVM XI. Implications for artificial neural networks: - robust perceptron (low sensitivity to noise) - minimal linear classificatory neural network

Support vector machines - SVM XII. What can we do if the boundary is nonlinear ? Idea:transform the data vectors to a space where the separator is linear

Support vector machines - SVM XIII. The transformation many times is made to an infinite dimensional space, usually a function space. Example: x  cos(u T x)

Support vector machines - SVM XIV. The new optimization formulas are:

Support vector machines - SVM XIV. How to handle the products of the transformed vectors ? Idea: use a transformation that fits the Mercer theorem Mercer theorem: Letthen K has a decomposition whereand H is a function space if and only if for each

Support vector machines - SVM XV. Optimization formula with transformation that fits the Mercer theorem: The form of the solution: the b is determined from an equation valid for a support vector

Support vector machines - SVM XVI. Examples of transformations and kernels a. b. c.

Support vector machines - SVM XVII. Other typical kernels

Support vector machines - SVM XVIII. Summary of main ideas look for minimal complexity classification transform the data to another space where the class boundaries are linear use Mercer kernels

Support vector machines - SVM XIX. Practical issues the global optimization doesn’t work with large amount of data  sequential optimization with chunks of the data the resulted models are minimal complexity models, they are insensitive to noise and keep the generalization ability of the more complex models applications: character recognition, economic forecasting

Regularization neural networks General optimization vs. optimization over the grid The regularization operator specifies the grid: - we look for functions that satisfy ||Tg|| 2 =0 - in the relaxed case the regularization operator is incorporated as a constraint in the error function: