Computer Science and Engineering

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Beyond Linear Separability
Slides from: Doug Gray, David Poole
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Classification Neural Networks 1
Machine Learning Neural Networks
Artificial Neural Networks
Neural NetworksNN 11 Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Prof. Carla P. Gomes Module: Neural Networks: Concepts (Reading:
Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Artificial Neural Networks
Before we start ADALINE
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Data Mining with Neural Networks (HK: Chapter 7.5)
Artificial Neural Networks
CS 4700: Foundations of Artificial Intelligence
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Network
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
Classification Part 3: Artificial Neural Networks
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Artificial Neural Networks
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Linear Discrimination Reading: Chapter 2 of textbook.
Linear Classification with Perceptrons
EE459 Neural Networks Backpropagation
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Artificial Neural Network
EEE502 Pattern Recognition
Chapter 6 Neural Network.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
129 Feed-Forward Artificial Neural Networks AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos Discovery Systems Laboratory.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Fall 2004 Backpropagation CS478 - Machine Learning.
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
Classification Neural Networks 1
Artificial Neural Networks
Lecture Notes for Chapter 4 Artificial Neural Networks
Seminar on Machine Learning Rada Mihalcea
Presentation transcript:

Computer Science and Engineering Neural Networks Pabitra Mitra Computer Science and Engineering IIT Kharagpur pabitra@gmail.com

NN 1 The Neuron 10-00 The neuron is the basic information processing unit of a NN. It consists of: A set of synapses or connecting links, each link characterized by a weight: W1, W2, …, Wm An adder function (linear combiner) which computes the weighted sum of the inputs: Activation function (squashing function) for limiting the amplitude of the output of the neuron. Neural Networks NN 1 Elene Marchiori

Computation at Units Compute a 0-1 or a graded function of the weighted sum of the inputs is the activation function

The Neuron Bias Activation Local Field Output Input signal Summing NN 1 The Neuron 10-00 Input signal Synaptic weights Summing function Bias b Activation Local Field v Output y x1 x2 xm w2 wm w1 Neural Networks NN 1 Elene Marchiori

Common Activation Functions Step function: g(x)=1, if x >= t ( t is a threshold) g(x) = 0, if x < t Sign function: g(x) = -1, if x < t Sigmoid function: g(x)= 1/(1+exp(-x))

NN 1 Bias of a Neuron 10-00 Bias b has the effect of applying an affine transformation to u v = u + b v is the induced field of the neuron v u Neural Networks NN 1 Elene Marchiori

Bias as extra input Input signal Synaptic weights Summing function NN 1 10-00 Bias is an external parameter of the neuron. Can be modeled by adding an extra input. Input signal Synaptic weights Summing function Activation Local Field v Output y x1 x2 xm w2 wm w1 w0 x0 = +1 Neural Networks NN 1 Elene Marchiori

Face Recognition NN 1 10-00 90% accurate learning head pose, and recognizing 1-of-20 faces Neural Networks NN 1 Elene Marchiori

Handwritten digit recognition NN 1 10-00 Neural Networks NN 1 Elene Marchiori

Computing with spaces x1 x2 y error: +1 = cat, -1 = dog y x2 x1 perceptual features +1 = cat, -1 = dog x1 x2 y dog cat

Can Implement Boolean Functions A unit can implement And, Or, and Not Need mapping True and False to numbers: e.g. True = 1.0, False= 0.0 (Exercise) Use a step function and show how to implement various simple Boolean functions Combining the units, we can get any Boolean function of n variables Can obtain logical circuits as special case

Network Structures Feedforward (no cycles), less power, easier understood Input units Hidden layers Output units Perceptron: No hidden layer, so basically correspond to one unit, also basically linear threshold functions (ltf) Ltf: defined by weights and threshold , value is 1 iff otherwise, 0

Single Layer Feed-forward NN 1 10-00 Single Layer Feed-forward Input layer of source nodes Output layer of neurons Neural Networks NN 1 Elene Marchiori

Multi layer feed-forward NN 1 Multi layer feed-forward 10-00 3-4-2 Network Output layer Input layer Hidden Layer Neural Networks NN 1 Elene Marchiori

Network Structures Recurrent (cycles exist), more powerful as they can implement state, but harder to analyze. Examples: Hopfield network, symmetric connections, interesting properties, useful for implementing associative memory Boltzmann machines: more general, with applications in constraint satisfaction and combinatorial optimization

Simple recurrent networks input layer output layer x1 x2 copy hidden layer z1 z2 input context units x1 x2 (Elman, 1990)

Perceptron Capabilities Quite expressive: many, but not all Boolean functions can be expressed. Examples: conjuncts and disjunctions, example more generally, can represent functions that are true if and only if at least k of the inputs are true: Can’t represent XOR

Representable Functions Perceptrons have a monotinicity property: If a link has positive weight, activation can only increase as the corresponding input value increases (irrespective of other input values) Can’t represent functions where input interactions can cancel one another’s effect (e.g. XOR)

Representable Functions Can represent only linearly separable functions Geometrically: only if there is a line (plane) separating the positives from the negatives The good news: such functions are PAC learnable and learning algorithms exist

Linearly Separable - + + + _ + + + + + + + + +

NOT linearly Separable + + + _ + + OR + + +

Problems with simple networks y x2 x1 x1 x2 Some kinds of data are not linearly separable x1 x2 AND x1 x2 OR x1 x2 XOR

A solution: multiple layers input layer output layer y y z1 z2 x1 x2 hidden layer z1 z2 x1 x2

The Perceptron Learning Algorithm Example of current-best-hypothesis (CBH) search (so incremental, etc.): Begin with a hypothesis (a perceptron) Repeat over all examples several times Adjust weights as examples are seen Until all examples correctly classified or a stopping criterion reached

Method for Adjusting Weights One weight update possibility: If classification correct, don’t change Otherwise: If false negative, add input: If false positive, subtract input: Intuition: For instance, if example is positive, strengthen/increase the weights corresponding to the positive attributes of the example

Properties of the Algorithm In general, also apply a learning rate The adjustment is in the direction of minimizing error on the example If learning rate is appropriate and the examples are linear separable, after a finite number of iterations, the algorithm converges to a linear separator

Another Algorithm (least-sum-squares algorithm) Define and minimize an error function S is the set of examples, is the ideal function, is the linear function corresponding to the current perceptron Error of the perceptron (over all examples): Note:

The Delta Rule y x1 x2 for any function g with derivative g +1 = cat, -1 = dog y for any function g with derivative g x1 x2 perceptual features output error influence of input

Derivative of Error Gradient (derivative) of E: Take the steepest descent direction: is the gradient along , is the learning rate

Gradient Descent The algorithm: pick initial random perceptron and repeatedly compute error and modify the perceptron (take a step along the reverse of gradient) E Gradient direction: Descent direction:

General-purpose learning mechanisms E (error) ( is learning rate) wij

Gradient Calculation

Derivation (cont.)

Properties of the algorithm Error function has no local minima (is quadratic) The algorithm is a gradient descent method to the global minimum, and will asymptotically converge Even if not linearly separable, can find a good (minimum error) linear classifier Incremental?

Multilayer Feed-Forward Networks Multiple perceptrons, layered Example: a two-layer network with 3 inputs one output, one hidden layer (two hidden units) output layer inputs layer hidden layer

Power/Expressiveness Can represent interactions among inputs (unlike perceptrons) Two layer networks can represent any Boolean function, and continuous functions (within a tolerance) as long as the number of hidden units is sufficient and appropriate activation functions used Learning algorithms exist, but weaker guarantees than perceptron learning algorithms

Back-Propagation Similar to the perceptron learning algorithm and gradient descent for perceptrons Problem to overcome: How to adjust internal links (how to distribute the “blame” or the error) Assumption: internal units use differentiable functions and nonlinear sigmoid functions are convenient

NN 1 10-00 Recurrent network Recurrent Network with hidden neuron(s): unit delay operator z-1 implies dynamic system z-1 input hidden output Neural Networks NN 1 Elene Marchiori

Back-Propagation (cont.) Start with a network with random weights Repeat until a stopping criterion is met For each example, compute the network output and for each unit i it’s error term Update each weight (weight of link going from node i to node j): Output of unit i

The Error Term

Derivation Write the error for a single training example; as before use sum of squared error (as it’s convenient for differentiation, etc): Differentiate (with respect to each weight…) For example, we get for weight connecting node j to output i

Properties Converges to a minimum, but could be a local minimum Could be slow to converge (Note: Training a three node net is NP-Complete!) Must watch for over-fitting just as in decision trees (use validation sets, etc.) Network structure? Often two layers suffices, start with relatively few hidden units

Properties (cont.) Many variations to the basic back-propagation: e.g. use momentum Reduce with time (applies to perceptrons as well) Nth update amount a constant

Networks, features, and spaces Artificial neural networks can represent any continuous function… Simple algorithms for learning from data fuzzy boundaries effects of typicality

NN properties Can handle domains with continuous and discrete attributes Many attributes noisy data Could be slow at training but fast at evaluation time Human understanding of what the network does could be limited

Networks, features, and spaces Artificial neural networks can represent any continuous function… Simple algorithms for learning from data fuzzy boundaries effects of typicality A way to explain how people could learn things that look like rules and symbols…

Networks, features, and spaces Artificial neural networks can represent any continuous function… Simple algorithms for learning from data fuzzy boundaries effects of typicality A way to explain how people could learn things that look like rules and symbols… Big question: how much of cognition can be explained by the input data?

Challenges for neural networks Being able to learn anything can make it harder to learn specific things this is the “bias-variance tradeoff”