Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Science and Engineering

Similar presentations


Presentation on theme: "Computer Science and Engineering"— Presentation transcript:

1 Computer Science and Engineering
Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur

2

3 NN 1 The Neuron 10-00 The neuron is the basic information processing unit of a NN. It consists of: A set of synapses or connecting links, each link characterized by a weight: W1, W2, …, Wm An adder function (linear combiner) which computes the weighted sum of the inputs: Activation function (squashing function) for limiting the amplitude of the output of the neuron. Neural Networks NN 1 Elene Marchiori

4 Computation at Units Compute a 0-1 or a graded function of the weighted sum of the inputs is the activation function

5 The Neuron Bias Activation Local Field Output Input signal Summing
NN 1 The Neuron 10-00 Input signal Synaptic weights Summing function Bias b Activation Local Field v Output y x1 x2 xm w2 wm w1 Neural Networks NN 1 Elene Marchiori

6 Common Activation Functions
Step function: g(x)=1, if x >= t ( t is a threshold) g(x) = 0, if x < t Sign function: g(x) = -1, if x < t Sigmoid function: g(x)= 1/(1+exp(-x))

7 NN 1 Bias of a Neuron 10-00 Bias b has the effect of applying an affine transformation to u v = u + b v is the induced field of the neuron v u Neural Networks NN 1 Elene Marchiori

8 Bias as extra input Input signal Synaptic weights Summing function
NN 1 10-00 Bias is an external parameter of the neuron. Can be modeled by adding an extra input. Input signal Synaptic weights Summing function Activation Local Field v Output y x1 x2 xm w2 wm w1 w0 x0 = +1 Neural Networks NN 1 Elene Marchiori

9 Face Recognition NN 1 10-00 90% accurate learning head pose, and recognizing 1-of-20 faces Neural Networks NN 1 Elene Marchiori

10 Handwritten digit recognition
NN 1 10-00 Neural Networks NN 1 Elene Marchiori

11 Computing with spaces x1 x2 y error: +1 = cat, -1 = dog y x2 x1
perceptual features +1 = cat, -1 = dog x1 x2 y dog cat

12 Can Implement Boolean Functions
A unit can implement And, Or, and Not Need mapping True and False to numbers: e.g. True = 1.0, False= 0.0 (Exercise) Use a step function and show how to implement various simple Boolean functions Combining the units, we can get any Boolean function of n variables Can obtain logical circuits as special case

13 Network Structures Feedforward (no cycles), less power, easier understood Input units Hidden layers Output units Perceptron: No hidden layer, so basically correspond to one unit, also basically linear threshold functions (ltf) Ltf: defined by weights and threshold , value is 1 iff otherwise, 0

14 Single Layer Feed-forward
NN 1 10-00 Single Layer Feed-forward Input layer of source nodes Output layer of neurons Neural Networks NN 1 Elene Marchiori

15 Multi layer feed-forward
NN 1 Multi layer feed-forward 10-00 3-4-2 Network Output layer Input layer Hidden Layer Neural Networks NN 1 Elene Marchiori

16 Network Structures Recurrent (cycles exist), more powerful as they can implement state, but harder to analyze. Examples: Hopfield network, symmetric connections, interesting properties, useful for implementing associative memory Boltzmann machines: more general, with applications in constraint satisfaction and combinatorial optimization

17 Simple recurrent networks
input layer output layer x1 x2 copy hidden layer z1 z2 input context units x1 x2 (Elman, 1990)

18 Perceptron Capabilities
Quite expressive: many, but not all Boolean functions can be expressed. Examples: conjuncts and disjunctions, example more generally, can represent functions that are true if and only if at least k of the inputs are true: Can’t represent XOR

19 Representable Functions
Perceptrons have a monotinicity property: If a link has positive weight, activation can only increase as the corresponding input value increases (irrespective of other input values) Can’t represent functions where input interactions can cancel one another’s effect (e.g. XOR)

20 Representable Functions
Can represent only linearly separable functions Geometrically: only if there is a line (plane) separating the positives from the negatives The good news: such functions are PAC learnable and learning algorithms exist

21 Linearly Separable - + + + _ + + + + + + + + +

22 NOT linearly Separable
+ + + _ + + OR + + +

23 Problems with simple networks
y x2 x1 x1 x2 Some kinds of data are not linearly separable x1 x2 AND x1 x2 OR x1 x2 XOR

24 A solution: multiple layers
input layer output layer y y z1 z2 x1 x2 hidden layer z1 z2 x1 x2

25 The Perceptron Learning Algorithm
Example of current-best-hypothesis (CBH) search (so incremental, etc.): Begin with a hypothesis (a perceptron) Repeat over all examples several times Adjust weights as examples are seen Until all examples correctly classified or a stopping criterion reached

26 Method for Adjusting Weights
One weight update possibility: If classification correct, don’t change Otherwise: If false negative, add input: If false positive, subtract input: Intuition: For instance, if example is positive, strengthen/increase the weights corresponding to the positive attributes of the example

27 Properties of the Algorithm
In general, also apply a learning rate The adjustment is in the direction of minimizing error on the example If learning rate is appropriate and the examples are linear separable, after a finite number of iterations, the algorithm converges to a linear separator

28 Another Algorithm (least-sum-squares algorithm)
Define and minimize an error function S is the set of examples, is the ideal function, is the linear function corresponding to the current perceptron Error of the perceptron (over all examples): Note:

29 The Delta Rule y x1 x2 for any function g with derivative g
+1 = cat, -1 = dog y for any function g with derivative g x1 x2 perceptual features output error influence of input

30 Derivative of Error Gradient (derivative) of E:
Take the steepest descent direction: is the gradient along , is the learning rate

31 Gradient Descent The algorithm: pick initial random perceptron and repeatedly compute error and modify the perceptron (take a step along the reverse of gradient) E Gradient direction: Descent direction:

32 General-purpose learning mechanisms
E (error) ( is learning rate) wij

33 Gradient Calculation

34 Derivation (cont.)

35 Properties of the algorithm
Error function has no local minima (is quadratic) The algorithm is a gradient descent method to the global minimum, and will asymptotically converge Even if not linearly separable, can find a good (minimum error) linear classifier Incremental?

36 Multilayer Feed-Forward Networks
Multiple perceptrons, layered Example: a two-layer network with 3 inputs one output, one hidden layer (two hidden units) output layer inputs layer hidden layer

37 Power/Expressiveness
Can represent interactions among inputs (unlike perceptrons) Two layer networks can represent any Boolean function, and continuous functions (within a tolerance) as long as the number of hidden units is sufficient and appropriate activation functions used Learning algorithms exist, but weaker guarantees than perceptron learning algorithms

38 Back-Propagation Similar to the perceptron learning algorithm and gradient descent for perceptrons Problem to overcome: How to adjust internal links (how to distribute the “blame” or the error) Assumption: internal units use differentiable functions and nonlinear sigmoid functions are convenient

39 NN 1 10-00 Recurrent network Recurrent Network with hidden neuron(s): unit delay operator z-1 implies dynamic system z-1 input hidden output Neural Networks NN 1 Elene Marchiori

40 Back-Propagation (cont.)
Start with a network with random weights Repeat until a stopping criterion is met For each example, compute the network output and for each unit i it’s error term Update each weight (weight of link going from node i to node j): Output of unit i

41 The Error Term

42 Derivation Write the error for a single training example; as before use sum of squared error (as it’s convenient for differentiation, etc): Differentiate (with respect to each weight…) For example, we get for weight connecting node j to output i

43 Properties Converges to a minimum, but could be a local minimum
Could be slow to converge (Note: Training a three node net is NP-Complete!) Must watch for over-fitting just as in decision trees (use validation sets, etc.) Network structure? Often two layers suffices, start with relatively few hidden units

44 Properties (cont.) Many variations to the basic back-propagation: e.g. use momentum Reduce with time (applies to perceptrons as well) Nth update amount a constant

45 Networks, features, and spaces
Artificial neural networks can represent any continuous function… Simple algorithms for learning from data fuzzy boundaries effects of typicality

46 NN properties Can handle domains with
continuous and discrete attributes Many attributes noisy data Could be slow at training but fast at evaluation time Human understanding of what the network does could be limited

47 Networks, features, and spaces
Artificial neural networks can represent any continuous function… Simple algorithms for learning from data fuzzy boundaries effects of typicality A way to explain how people could learn things that look like rules and symbols…

48 Networks, features, and spaces
Artificial neural networks can represent any continuous function… Simple algorithms for learning from data fuzzy boundaries effects of typicality A way to explain how people could learn things that look like rules and symbols… Big question: how much of cognition can be explained by the input data?

49 Challenges for neural networks
Being able to learn anything can make it harder to learn specific things this is the “bias-variance tradeoff”


Download ppt "Computer Science and Engineering"

Similar presentations


Ads by Google