Machine Learning Neural Networks.

Machine Learning Neural Networks

Learning Theory Theorems that characterize classes of learning problems or specific algorithms in terms of computational complexity or sample complexity, i.e. the number of training examples necessary or sufficient to learn hypotheses of a given accuracy. Complexity of a learning problem depends on: Size or expressiveness of the hypothesis space. Accuracy to which target concept must be approximated. Probability with which the learner must produce a successful hypothesis. Manner in which training examples are presented, e.g. randomly or by query to an oracle.

Types of Results Learning in the limit: Is the learner guaranteed to converge to the correct hypothesis in the limit as the number of training examples increases indefinitely? Sample Complexity: How many training examples are needed for a learner to construct (with high probability) a highly accurate concept? Computational Complexity: How much computational resources (time and space) are needed for a learner to construct (with high probability) a highly accurate concept? High sample complexity implies high computational complexity, since learner at least needs to read the input data. Mistake Bound: Learning incrementally, how many training examples will the learner misclassify before constructing a highly accurate concept.

Cannot Learn Exact Concepts from Limited Data, Only Approximations
Positive Learner Classifier Negative Positive Negative Wrong! Right!

The machine learning framework
Apply a prediction function to a feature representation of the image to get the desired output: f( ) = “apple” f( ) = “tomato” f( ) = “cow”

The machine learning framework
y = f(x) Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set Testing: apply f to a never before seen test example x and output the predicted value y = f(x) output prediction function Image feature

Classification Steps Training Testing Training Labels Training Images
Image Features Training Learned model Testing Image Features Learned model Prediction Test Image

Classifiers: Nearest neighbor
Training examples from class 2 Training examples from class 1 Test example f(x) = label of the training example nearest to x All we need is a distance function for our inputs No training required!

Classifiers: Linear Find a linear function to separate the classes:
f(x) = sgn(w  x + b)

Many classifiers to choose from
SVM Neural networks Naïve Bayes Bayesian network Logistic regression Randomized Forests Boosted Decision Trees K-nearest neighbor RBMs Etc. Which is the best one?

Recognition task and supervision
Images in the training set must be annotated with the “correct answer” that the model is expected to produce Contains a motorbike

Test set (labels unknown)
Generalization Training set (labels known) Test set (labels unknown) How well does a learned model generalize from the data it was trained on to a new test set?

Classification Assign input vector to one of two or more classes
Any decision rule divides input space into decision regions separated by decision boundaries Slide credit: L. Lazebnik

Classifiers: Linear SVM
x o x2 x1 Find a linear function to separate the classes: f(x) = sgn(w  x + b)

Neural Networks Artificial Neural Network is based on the biological nervous system as Brain It is composed of interconnected computing units called neurons ANN like human, learn by examples

Why Artificial Neural Networks?
There are two basic reasons why we are interested in building artificial neural networks (ANNs): Technical viewpoint: Some problems such as character recognition or the prediction of future states of a system require massively parallel and adaptive processing. Biological viewpoint: ANNs can be used to replicate and simulate components of the human (or animal) brain, thereby giving us insight into natural information processing.

Science: Model how biological neural systems, like human brain, work?
How do we see? How is information stored in/retrieved from memory? How do you learn to not to touch fire? How do your eyes adapt to the amount of light in the environment? Related fields: Neuroscience, Computational Neuroscience, Psychology, Psychophysiology, Cognitive Science, Medicine, Math, Physics.

Brief History Old Ages: Association (William James; 1890)
McCulloch-Pitts Neuron (1943,1947) Perceptrons (Rosenblatt; 1958,1962) Adaline/LMS (Widrow and Hoff; 1960) Perceptrons book (Minsky and Papert; 1969) Dark Ages: Self-organization in visual cortex (von der Malsburg; 1973) Backpropagation (Werbos, 1974) Foundations of Adaptive Resonance Theory (Grossberg; 1976) Neural Theory of Association (Amari; 1977)

History Modern Ages: Adaptive Resonance Theory (Grossberg; 1980)
Hopfield model (Hopfield; 1982, 1984) Self-organizing maps (Kohonen; 1982) Reinforcement learning (Sutton and Barto; 1983) Simulated Annealing (Kirkpatrick et al.; 1983) Boltzmann machines (Ackley, Hinton, Terrence; 1985) Backpropagation (Rumelhart, Hinton, Williams; 1986) ART-networks (Carpenter, Grossberg; 1992) Support Vector Machines

Hebb’s Learning Law In 1949, Donald Hebb formulated William James’ principle of association into a mathematical form. If the activation of the neurons, y1 and y2 , are both on (+1) then the weight between the two neurons grow. (Off: 0) Else the weight between remains the same. However, when bipolar activation {-1,+1} scheme is used, then the weights can also decrease when the activation of two neurons does not match.

Biological Neurons Human brain = tens of thousands of neurons
Each neuron is connected to thousands other neurons A neuron is made of: The soma: body of the neuron Dendrites: filaments that provide input to the neuron The axon: sends an output signal Synapses: connection with other neurons – releases certain quantities of chemicals called neurotransmitters to other neurons

Modeling of Brain Functions

The biological neuron The pulses generated by the neuron travels along the axon as an electrical wave. Once these pulses reach the synapses at the end of the axon open up chemical vesicles exciting the other neuron.

How do NNs and ANNs work? Information is transmitted as a series of electric impulses, so-called spikes. The frequency and phase of these spikes encodes the information. In biological systems, one neuron can be connected to as many as 10,000 other neurons. Usually, a neuron receives its information from other neurons in a confined area

Computers vs. Neural Networks
“Standard” Computers Neural Networks one CPU highly parallel processing fast processing units slow processing units reliable units unreliable units static infrastructure dynamic infrastructure

Neural Network

Neural Network Application
Pattern recognition can be implemented using NN The figure can be T or H character, the network should identify each class of T or H.

Simple Neuron X1 Inputs X2 Output Xn b

An Artificial Neuron x1 synapses neuron i x2 Wi,1 Wi,2 … xi … Wi,n xn
net input signal output

Neural Network Input Layer Hidden 1 Hidden 2 Output Layer

Network Layers The common type of ANN consists of three layers of neurons: a layer of input neurons connected to the layer of hidden neuron which is connected to a layer of output neurons.

Architecture of ANN Feed-Forward networks
Allow the signals to travel one way from input to output Feed-Back Networks The signals travel as loops in the network, the output is connected to the input of the network

How do NNs and ANNs Learn?
NNs are able to learn by adapting their connectivity patterns so that the organism improves its behavior in terms of reaching certain (evolutionary) goals. The NN achieves learning by appropriately adapting the states of its synapses.

Neural Network Learning
Learning approach based on modeling adaptation in biological neural systems. Perceptron: Initial algorithm for learning simple neural networks (single layer) developed in the 1950’s. Backpropagation: More complex algorithm for learning multi-layer neural networks developed in the 1980’s.

Learning Rule The learning rule modifies the weights of the connections. The learning process is divided into Supervised and Unsupervised learning

Supervised Network Which means there exists an external teacher. The target is to minimization of the error between the desired and computed output

Unsupervised Network Uses no external teacher and is based upon only local information.

Perceptron Perceptron is a type of artificial neural network (ANN)

Perceptron It is a network of one neuron and hard limit transfer function Inputs  f X1 X2 Xn Output W1 W2 Wn

Perceptron The perceptron is given first a randomly weights vectors
Perceptron is given chosen data pairs (input and desired output) Preceptron learning rule changes the weights according to the error in output

Perceptron - Operation
It takes a vector of real-valued inputs, calculates a linear combination of these inputs, then output 1 if the result is greater than some threshold and -1 otherwise

Perceptron Learning Rule
W new = W old + (t-a) X Where W new is the new weight W old is the old value of weight X is the input value t is the desired value of output a is the actual value of output

Example Let W = [2 2] and b = -3 X1 = [0 0] and t =0

AND Network This example means we construct a network for AND operation. The network draw a line to separate the classes which is called Classification

Perceptron Geometric View
The equation below describes a (hyper-)plane in the input space consisting of real valued m-dimensional vectors. The plane splits the input space into two regions, each of them describing one class. decision region for C1 x2 w1x1 + w2x2 + w0 >= 0 decision boundary C1 x1 C2 w1x1 + w2x2 + w0 = 0

Perceptron – Decision Surface
In 2-dimensional space w0 w1 w2 x1 x2 Decision Surface (Line) o=-1 o=+1

Perceptron – Representation Power
Separate the objects from the rest x1 x2 1 2 3 4 6 5 7 9 10 12 11 13 14 8 15 16 Elliptical blobs (objects)

Problems Four one-dimensional data belonging to two classes are
X = [ ] T = [ ] W = [ ]

Boolean Functions Take in two inputs (-1 or +1)
Produce one output (-1 or +1) In other contexts, use 0 and 1 Example: AND function Produces +1 only if both inputs are +1 Example: OR function Produces +1 if either inputs are +1 Related to the logical connectives from F.O.L.

The First Neural Neural Networks
AND Function 1 X1 X2 Y Threshold(Y) = 2

Simple Networks t = 0.0 y x W = 1.5 W = 1 -1

Exercises Design a neural network to recognize the problem of
X1=[2 2] , t1=0 X=[1 -2], t2=1 X3=[-2 2], t3=0 X4=[-1 1], t4=1 Start with initial weights w=[0 0] and bias =0

Perceptron: Limitations
The perceptron can only model linearly separable classes, like (those described by) the following Boolean functions: AND OR COMPLEMENT It cannot model the XOR. You can experiment with these functions in the Matlab practical lessons.

Types of decision regions
x1 1 x2 w2 w1 w0 Network with a single node 1 x1 x2 Convex region L1 L2 L3 L4 One-hidden layer network that realizes the convex region -3.5

Gaussian Neurons Another type of neurons overcomes this problem by using a Gaussian activation function: 1 fi(neti(t)) neti(t) -1

Gaussian Neurons Gaussian neurons are able to realize non-linear functions. Therefore, networks of Gaussian units are in principle unrestricted with regard to the functions that they can realize. The drawback of Gaussian neurons is that we have to make sure that their net input does not exceed 1. This adds some difficulty to the learning in Gaussian networks.

Sigmoidal Neurons Sigmoidal neurons accept any vectors of real numbers as input, and they output a real number between 0 and 1. Sigmoidal neurons are the most common type of artificial neuron, especially in learning networks. A network of sigmoidal units with m input neurons and n output neurons realizes a network function f: Rm  (0,1)n

Sigmoidal Neurons fi(neti(t))  = 1 neti(t)
fi(neti(t)) neti(t) -1  = 1 The parameter  controls the slope of the sigmoid function, while the parameter  controls the horizontal offset of the function in a way similar to the threshold neurons.

Sigmoidal Neurons This leads to a simplified form of the sigmoid function: We do not need a modifiable threshold , because we will use “dummy” inputs as we did for perceptron. The choice  = 1 works well in most situations and results in a very simple derivative of S(net).

Sigmoidal Neurons This result will be very useful when we develop the backpropagation algorithm.

Machine Learning Neural Networks.

Similar presentations

Presentation on theme: "Machine Learning Neural Networks."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning Neural Networks.

Similar presentations

Presentation on theme: "Machine Learning Neural Networks."— Presentation transcript:

Similar presentations

About project

Feedback