Regression.

Regression, Artificial Neural Networks 16/03/2016

Regression

Regression Supervised learning: Based on training examples, learn a modell which works fine on previously unseen examples. Regression: forecasting real values

Regression Training dataset: {xi, ri} riϵR Evaluation metric:
„Least squared error”

Linear regression

Linear regression g(x) = w1x + w0 Its gradient is 0 if

or various linear models on the leaves
Regression variants +MLE → Bayes k nearest neighbur’s mean or distance weighted average Decision tree or various linear models on the leaves

Regression SVM

Artificial Neural Networks

Artificial neural networks
Motivation: the simulation of the neuo system (human brain)’s information processing mechanisms Structure: huge amount of densely connected, mutally operating processing units (neurons) It learns from experiences (training instances)

Some neurobiology… Neurons have many inputs and a single output
The output is either excited or not The inputs from other neurons determins whether the neuron fires Each input synapse has a weight Inputs: dentrites Processing: soma Outputs: axons Synapses: electrochemical contact between neurons Basically, a biological neuron receives inputs from other sources, combines them in some way, performs a generally nonlinear operation on the result, and then output the final result.

A neuron in maths Weighted average of inputs. If the average is above a threshold T it fires (outputs 1) else its output is 0 or -1. The basic unit of neural networks, the artificial neurons, simulates the four basic functions of natural neurons. Artificial neurons are much simpler than the biological neuron; the figure below shows the basics of an artificial neuron.

Statistics about the human brain
#nerons: ~ 1011 Avg. #connections per neuron: 104 Signal sending time: 10-3 sec Face recognition: 10-1 sec

Motivation (machine learning point of view)
Goal: non-linear classification Linear machines are not satisfactory at several real world situations Which non-linear function family to choose? Neural networks: latent non-linear patterns will be machine learnt

Perceptron

Multilayer perceptron = Neural Network
Different representation at various layers Biologically, neural networks are constructed in a three dimensional way from microscopic components. These neurons seem capable of nearly unrestricted interconnections. This is not true in any man-made network. Artificial neural networks are the simple clustering of the primitive artificial neurons. This clustering occurs by creating layers, which are then connected to one another. How these layers connect may also vary. Basically, all artificial neural networks have a similar structure of topology. Some of the neurons interface the real world to receive its inputs and other neurons provide the real world with the network’s outputs. All the rest of the neurons are hidden form view. As the figure above shows, the neurons are grouped into layers The input layer consist of neurons that receive input form the external environment. The output layer consists of neurons that communicate the output of the system to the user or external environment. There are usually a number of hidden layers between these two layers; the figure above shows a simple structure with only one hidden layer. When the input layer receives the input its neurons produce output, which becomes input to the other layers of the system. The process continues until a certain condition is satisfied or until the output layer is invoked and fires their output to the external environment. Inter-layer connections There are different types of connections used between layers, these connections between layers are called inter-layer connections. Fully connected Each neuron on the first layer is connected to every neuron on the second layer. Partially connected A neuron of the first layer does not have to be connected to all neurons on the second layer. Feed forward The neurons on the first layer send their output to the neurons on the second layer, but they do not receive any input back form the neurons on the second layer. Bi-directional There is another set of connections carrying the output of the neurons of the second layer into the neurons of the first layer. Feed forward and bi-directional connections could be fully- or partially connected. Hierarchical If a neural network has a hierarchical structure, the neurons of a lower layer may only communicate with neurons on the next level of layer. Resonance The layers have bi-directional connections, and they can continue sending messages across the connections a number of times until a certain condition is achieved.

Multilayer perceptron

Feedforward neural networks
Connection only to the next layer The weights of the connections (between two layers) can be changed Activation functions are used to calculate whether the neuron fires Three-layer network: Input layer Hidden layer Output layer

Network function The network function of neuron j:
where i is the index of input neurons, and wji is the weight between the neurons i and j. wj0 is the bias

Activation function activation function is a non-linear function of the network value: yj = f(netj) (if it’d be linear, the whole network will be linear) The sign activation function: oi 1 Tj netj

Differentiable activation functions
Enables gradient descent-based learning The sigmoid function: 1 Tj netj

Output layer where k is the index on the output layer and nH is the number of hidden neurons Binary classification: sign function Multi-class classification: a neuron for each of the classes, the argmax is predicted (discriminant function) Regression: linear transformation

y1 hidden unit calculates:
x1 + x x1 OR x2 < 0  y1 = -1 - y2 represents:  0  y2 = +1 x1 + x x1 AND x2 < 0  y2 = -1 The output neuron: z1 = 0.7y1-0.4y2 - 1, sgn(z1) is 1 iff y1 =1, y2 = -1 (x1 OR x2 ) AND NOT(x1 AND x2)

General (three-layer) feedforward network (c output unit)
The hidden units with their activation functions can express non-linear functions The activation functions can be different at neurons (but the same one is used in practice)

Universal approximation theorem
Universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons can approximate any continuous functions But the theorem does not give any hint on who to design activation functions for problems/datasets

Training of neural networks (backpropagation)

Training of neural networks
The network topology is given The same activation function is used at each hidden neuron and it is given Training = calibration of weights on-line learning (epochs) The brain basically learns from experience. Neural networks are sometimes called machine learning algorithms, because changing of its connection weights (training) causes the network to learn the solution to a problem. The strength of connection between the neurons is stored as a weight-value for the specific connection. The system learns new knowledge by adjusting these connection weights. The learning ability of a neural network is determined by its architecture and by the algorithmic method chosen for training.

Forward propagation An input vector propagates through the network 2. Weight update (backpropagation) the weights of the network will be changed in order to decrease the difference between the predicted and gold standard values

we can calculate (propagate back) the error signal for each hidden neuron

tk is the target (gold standard) value of output neuron k, zk is the prediction at output neuron k (k = 1, …, c) and w are the weights Error: backpropagation is a gradient descent algorithms initial weights are random, then

Backpropagation The error of the weights between the hidden and output layers: the error signal for output neuron k:

because netk = wkty: and: The change of weights between the hidden and output layers: wkj = kyj = (tk – zk) f’ (netk)yj

The gradient of the hidden units:

The error signal of the hidden units: The weight change between the input and hidden layers:

update the weights to k:
Backpropagation Calculate the error signal for the output neurons and update the weights between the output and hidden layers output update the weights to k: hidden input

Backpropagation Calculate the error signal for hidden neurons output
rejtett input

Backpropagation Update the weights between the input and hidden neurons output rejtett updating the ones to j input

w initialised randomly Begin init: nH; w, stopping critera , , m  0 do m  m + 1 xm  a sampled training instance wji  wji + jxi; wkj  wkj + kyj until ||J(w)|| <  return w End

Stopping criteria if the change in J(w) is smaller than the threshold  Problem: estimating the change from a single training instance. Use bigger batches for change estimation:

Stopping based on the performance on a validation dataset
The usage of unseen training instances for estimating the performance of supervised learning (to avoid overfitting) Stopping at the minimum error on the validation set

Notes on backpropagation
it can be stack at local minima In practice, the local minima is close to the global one Multiple training starting from various randomly initalized weights might help we can take the trained network with the minimal error (on a validation set) there are voting schema for voting the networks

Questions of network design
How many hidden neurons? few neurons cannot learn complex patterns too many neurons can easily overfit validation set? Learning rate!?

Outlook

History of neural networks
Perceptron: one of the first machine learners ~1950 Backpropagation: multilayer perceptrons, 1975- Deep learning: popular again 2006-

Deep learning (auto-encoder pretraining)

Recurrent neural networks
short term memory

Regression.

Similar presentations

Presentation on theme: "Regression."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Regression.

Similar presentations

Presentation on theme: "Regression."— Presentation transcript:

Similar presentations

About project

Feedback