Learning via Neural Networks L. Manevitz All rights reserved.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Beyond Linear Separability
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
NEURAL NETWORKS Perceptron
Neural Network I Week 7 1. Team Homework Assignment #9 Read pp. 327 – 334 and the Week 7 slide. Design a neural network for XOR (Exclusive OR) Explore.
Artificial Neural Networks
Introduction to Training and Learning in Neural Networks n CS/PY 399 Lab Presentation # 4 n February 1, 2001 n Mount Union College.
Lecture 11 Neural Networks L. Manevitz Dept of Computer Science Tel: 2420.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Overview over different methods – Supervised Learning
Simple Neural Nets For Pattern Classification
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
NNs Adaline 1 Neural Networks - Adaline L. Manevitz.
Neural Networks Reading: Chapter 3 in Nilsson text Lab: Work on assignment 2 (due 19th in class)
Back-Propagation Algorithm
Neural Network Computing Lecture no.1. All rights reserved L. Manevitz Lecture 12 McCullogh-Pitts Neuron The activation of a McCullogh-Pitts Neuron is.
1 Pendahuluan Pertemuan 1 Matakuliah: T0293/Neuro Computing Tahun: 2005.
Artificial Neural Networks
Data Mining with Neural Networks (HK: Chapter 7.5)
Artificial Neural Networks
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Network
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
Artificial Neural Networks
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Machine Learning Chapter 4. Artificial Neural Networks
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Computational Modeling of Cognitive Activity Prof. Larry M. Manevitz Course Slides: 2012.
L. Manevitz U. Haifa 1 Neural Networks: Capabilities and Examples L. Manevitz Computer Science Department HIACS Research Center University of Haifa.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Chapter 2 Single Layer Feedforward Networks
Perceptrons Michael J. Watts
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Chapter 6 Neural Network.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Computational Intelligence Semester 2 Neural Networks Lecture 2 out of 4.
Introduction to the Basic Principles of Machine Learning
Fall 2004 Backpropagation CS478 - Machine Learning.
Learning with Perceptrons and Neural Networks
Chapter 2 Single Layer Feedforward Networks
第 3 章 神经网络.
Wed June 12 Goals of today’s lecture. Learning Mechanisms
Artificial Neural Networks
CSSE463: Image Recognition Day 17
CS621: Artificial Intelligence
Data Mining with Neural Networks (HK: Chapter 7.5)
Chapter 3. Artificial Neural Networks - Introduction -
Computational Modeling of Cognitive Activity
Artificial Intelligence Chapter 3 Neural Networks
Learning via Neural Networks
Neural Networks Chapter 5
Neural Network - 2 Mayank Vatsa
Artificial Intelligence Chapter 3 Neural Networks
CSSE463: Image Recognition Day 17
Machine Learning: Lecture 4
Machine Learning: UNIT-2 CHAPTER-1
Artificial Intelligence Chapter 3 Neural Networks
Chapter - 3 Single Layer Percetron
Artificial Intelligence Chapter 3 Neural Networks
Seminar on Machine Learning Rada Mihalcea
David Kauchak CS158 – Spring 2019
Artificial Intelligence Chapter 3 Neural Networks
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Learning via Neural Networks L. Manevitz All rights reserved

Natural versus Artificial Neuron Natural NeuronMcCullough Pitts Neuron

Definitions and History McCullough – Pitts Neuron Perceptron Adaline Linear Separability Multi-Level Neurons Neurons with Loops

Sample Feed forward Network (No loops) Weights Input Output Wji Vik F(  wji xj

Pattern Identification (Note: Neuron is trained) weights Perceptron

weights Feed Forward Network weights

Reason for Explosion of Interest Two co-incident affects (around 1985 – 87) –(Re-)discovery of mathematical tools and algorithms for handling large networks –Availability (hurray for Intel and company!) of sufficient computing power to make experiments practical.

Some Properties of NNs Universal: Can represent and accomplish any task. Uniform: “ Programming ” is changing weights Automatic: Algorithms for Automatic Programming; Learning

Networks are Universal All logical functions represented by three level (non-loop) network (McCullough-Pitts) All continuous (and more) functions represented by three level feed-forward networks (Cybenko et al.) Networks can self organize (without teacher). Networks serve as associative memories

Universality McCullough-Pitts: Adaptive Logic Gates; can represent any logic function Cybenko: Any continuous function representable by three-level NN.

Networks can “ LEARN ” and Generalize (Algorithms) One Neuron (Perceptron and Adaline) Very popular in 1960s – early 70s –Limited by representability (only linearly separable Feed forward networks (Back Propagation) –Currently most popular network (1987 – now) Kohonen self-Organizing Network (1980s – now)(loops) Attractor Networks (loops)

Learnability (Automatic Programming) One neuron: Perceptron and Adaline algorithms (Rosenblatt and Widrow-Hoff) (1960s – now) Feed forward Networks: Backpropagation (1987 – now) Associative Memories and Looped Networks ( “ Attractors ” ) (1990s – now)

Generalizability Typically train a network on a sample set of examples Use it on general class Training can be slow; but execution is fast.

weights Feed Forward Network weights

Classical Applications (1986 – 1997) “ Net Talk ” : text to speech ZIPcodes: handwriting analysis Glovetalk: Sign Language to speech Data and Picture Compression: “ Bottleneck ” Steering of Automobile (up to 55 m.p.h) Market Predictions Associative Memories Cognitive Modeling: (especially reading, … ) Phonetic Typewriter (Finnish)

Neural Network Once the architecture is fixed; the only free parameters are the weights Uniform ProgrammingThus Uniform Programming Potentially Automatic ProgrammingPotentially Automatic Programming Search for Learning AlgorithmsSearch for Learning Algorithms

Programming: Just find the weights! AUTOMATIC PROGRAMMING One Neuron: Perceptron or Adaline Multi-Level: Gradient Descent on Continuous Neuron (Sigmoid instead of step function).

Prediction Input/Output NN delay Compare

Abstracting Note: Size may not be crucial (apylsia or crab does many things) Look at simple structures first

Real and Artificial Neurons

One Neuron McCullough-Pitts This is very complicated. But abstracting the details,we have  w1 w2 wn x1 x2 xn  hreshold  ntegrate Integrate-and-fire Neuron

Representability What functions can be represented by a network of Mccullough-Pitts neurons? Theorem: Every logic function of an arbitrary number of variables can be represented by a three level network of neurons.

Proof Show simple functions: and, or, not, implies Recall representability of logic functions by DNF form.

AND, OR, NOT  w1 w2 wn x1 x2 xn  hreshold  ntegrate

AND, OR, NOT  w1 w2 wn x1 x2 xn  hreshold  ntegrate.9 1.0

AND, OR, NOT  w1 w2 wn x1 x2 xn  hreshold  ntegrate.5

DNF and All Functions

Learnability and Generalizability The previous theorem tells us that neural networks are potentially powerful, but doesn ’ t tell us how to use them. We desire simple networks with uniform training rules.

One Neuron (Perceptron) What can be represented by one neuron? Is there an automatic way to learn a function by examples?

Perceptron Training Rule Loop: Take an example. Apply to network. If correct answer, return to loop. If incorrect, go to FIX. FIX: Adjust network weights by input example. Go to Loop.

Example of Perceptron Learning X1 = 1 (+) x2 = -.5 (-) X3 = 3 (+) x4 = -2 (-) Expanded Vector –Y1 = (1,1) (+) y2= (-.5,1)(-) –Y3 = (3,1) (+) y4 = (-2,1) (-) Random initial weight (-2.5, 1.75)

Trace of Perceptron –W1 y1 = (-2.5,1.75) (1,1)<0 wrong W2 = w1 + y1 = (-1.5, 2.75) –W2 y2 = (-1.5, 2.75)(-.5, 1)>0 wrong W3 = w2 – y2 = (-1, 1.75) –W3 y3 = (-1,1.75)(3,1) <0 wrong W4 = w4 + y3 = (2, 2.75)

Perceptron Convergence Theorem If the concept is representable in a perceptron then the perceptron learning rule will converge in a finite amount of time. (MAKE PRECISE and Prove)

Percentage of Boolean Functions Representible by a Perceptron InputFunctions Perceptron ,882 65, ,572 10**9 615,028,134 10**19 7 8,378,070,864 10** ,561,539,552,946 10**77

What is a Neural Network? What is an abstract Neuron? What is a Neural Network? How are they computed? What are the advantages? Where can they be used? –Agenda –What to expect

Perceptron Algorithm Start: Choose arbitrary value for weights, W Test: Choose arbitrary example X If X pos and WX >0 or X neg and WX <= 0 go to Test Fix: –If X pos W := W +X; –If X negative W:= W – X; –Go to Test;

Perceptron Conv. Thm. Let F be a set of unit length vectors. If there is a vector V* and a value e>0 such that V*X > e for all X in F then the perceptron program goes to FIX only a finite number of times.

Applying Algorithm to “ And ” W0 = (0,0,1) or random X1 = (0,0,1) result 0 X2 = (0,1,1) result 0 X3 = (1,0, 1) result 0 X4 = (1,1,1) result 1

“ And ” continued Wo X1 > 0 wrong; W1 = W0 – X1 = (0,0,0) W1 X2 = 0 OK (Bdry) W1 X3 = 0 OK W1 X4 = 0 wrong; W2 = W1 +X4 = (1,1,1) W3 X1 = 1 wrong W4 = W3 – X1 = (1,1,0) W4X2 = 1 wrong W5 = W4 – X2 = (1,0, -1) W5 X3 = 0 OK W5 X4 = 0 wrong W6 = W5 + X4 = (2, 1, 0) W6 X1 = 0 OK W6 X2 = 1 wrong W7 = W7 – X2 = (2,0, -1)

“ And ” page 3 W8 X3 = 1 wrong W9 = W8 – X3 = (1,0, 0) W9X4 = 1 OK W9 X1 = 0 OK W9 X2 = 0 OK W9 X3 = 1 wrong W10 = W9 – X3 = (0,0,-1) W10X4 = -1 wrong W11 = W10 + X4 = (1,1,0) W11X1 =0 OK W11X2 = 1 wrong W12 = W12 – X2 = (1,0, -1)

Proof of Conv Theorem Note: 1. By hypothesis, there is a  such that V*X >  for all x  F 1. Can eliminate threshold (add additional dimension to input) W(x,y,z) > threshold if and only if W* (x,y,z,1) > 0 2. Can assume all examples are positive ones (Replace negative examples by their negated vectors) W(x,y,z) <0 if and only if W(-x,-y,-z) > 0.

Proof (cont). Consider quotient V*W/|W|. (note: this is multidimensional cosine between V* and W.) Recall V* is unit vector. Quotient <= 1.

Proof(cont) Now each time FIX is visited W changes via ADD. V* W(n+1) = V*(W(n) + X) = V* W(n) + V*X >= V* W(n) +  Hence V* W(n) >= n 

Proof (cont) Now consider denominator: |W(n+1)| = W(n+1)W(n+1) = ( W(n) + X)(W(n) + X) = |W(n)|**2 + 2W(n)X + 1 (recall |X| = 1) < |W(n)|**2 + 1 So after n times |W(n+1)|**2 < n (**)

Proof (cont) Putting (*) and (**) together: Quotient = V*W/|W| > n  sqrt(n) Since Quotient <=1 this means n < 1/  This means we enter FIX a bounded number of times. Q.E.D.

Geometric Proof See hand slides.

Additional Facts Note: If X ’ s presented in systematic way, then solution W always found. Note: Not necessarily same as V* Note: If F not finite, may not obtain solution in finite time Can modify algorithm in minor ways and stays valid (e.g. not unit but bounded examples); changes in W(n).

What wont work? Example: Connectedness with bounded diameter perceptron. Compare with Convex with (use sensors of order three).

What wont work? Try XOR.

Limitations of Perceptron Representability –Only concepts that are linearly separable. –Compare: Convex versus connected –Examples: XOR vs OR