ECE – Pattern Recognition Lecture 15: NN - Perceptron

Slides:



Advertisements
Similar presentations
Perceptron Lecture 4.
Advertisements

Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
Neural Network I Week 7 1. Team Homework Assignment #9 Read pp. 327 – 334 and the Week 7 slide. Design a neural network for XOR (Exclusive OR) Explore.
Biological and Artificial Neurons Michael J. Watts
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Artificial Intelligence (CS 461D)
Simple Neural Nets For Pattern Classification
Neurons, Neural Networks, and Learning 1. Human brain contains a massively interconnected net of (10 billion) neurons (cortical cells) Biological.
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Introduction to Artificial Intelligence (G51IAI) Dr Rong Qu Neural Networks.
Non-Bayes classifiers. Linear discriminants, neural networks.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
ECE 471/571 – Lecture 2 Bayesian Decision Theory 08/25/15.
ECE 471/571 – Lecture 6 Dimensionality Reduction – Fisher’s Linear Discriminant 09/08/15.
Artificial Neural Networks Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
ECE 471/571 – Lecture 13 Use Neural Networks for Pattern Recognition 10/13/15.
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.
ECE 471/571 - Lecture 19 Review 11/12/15. A Roadmap 2 Pattern Classification Statistical ApproachNon-Statistical Approach SupervisedUnsupervised Basic.
ECE 471/571 – Lecture 21 Syntactic Pattern Recognition 11/19/15.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Use Neural Networks for Pattern Recognition 03/29/17
Neural networks and support vector machines
Syntactic Pattern Recognition 04/28/17
ECE 471/571 - Lecture 19 Review 02/24/17.
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
Learning with Perceptrons and Neural Networks
Performance Evaluation 02/15/17
Artificial Intelligence (CS 370D)
ECE 471/571 – Lecture 18 Classifier Fusion 04/12/17.
Unsupervised Learning - Clustering 04/03/17
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Unsupervised Learning - Clustering
ECE 599/692 – Deep Learning Lecture 2 - Background
ECE 471/571 - Lecture 17 Back Propagation.
ECE 471/571 – Lecture 18 Use Neural Networks for Pattern Recognition – Some More Background and Recurrent Neural Network.
ECE 471/571 – Lecture 12 Perceptron.
ECE 471/571 – Review 1.
Perceptron as one Type of Linear Discriminants
Parametric Estimation
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Artificial Intelligence Lecture No. 28
Hairong Qi, Gonzalez Family Professor
The Naïve Bayes (NB) Classifier
Artificial Intelligence 12. Two Layer ANNs
ECE – Pattern Recognition Lecture 16: NN – Back Propagation
ECE – Pattern Recognition Lecture 13 – Decision Tree
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Hairong Qi, Gonzalez Family Professor
Hairong Qi, Gonzalez Family Professor
Introduction to Neural Network
Introduction to Neural Networks
ECE – Lecture 1 Introduction.
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Bayesian Decision Theory
Hairong Qi, Gonzalez Family Professor
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
ECE – Pattern Recognition Lecture 4 – Parametric Estimation
Hairong Qi, Gonzalez Family Professor
ECE – Pattern Recognition Lecture 14 – Gradient Descent
Machine Learning.
ECE – Pattern Recognition Midterm Review
Presentation transcript:

ECE471-571 – Pattern Recognition Lecture 15: NN - Perceptron Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi Email: hqi@utk.edu

Pattern Classification Statistical Approach Non-Statistical Approach Supervised Unsupervised Decision-tree Basic concepts: Baysian decision rule (MPP, LR, Discri.) Basic concepts: Distance Agglomerative method Syntactic approach Parameter estimate (ML, BL) k-means Non-Parametric learning (kNN) Winner-takes-all LDF (Perceptron) Kohonen maps NN (BP) Mean-shift Support Vector Machine Deep Learning (DL) Dimensionality Reduction FLD, PCA Performance Evaluation ROC curve (TP, TN, FN, FP) cross validation Optimization Methods local opt (GD) global opt (SA, GA) Classifier Fusion majority voting NB, BKS

Review - Bayes Decision Rule Maximum Posterior Probability For a given x, if P(w1|x) > P(w2|x), then x belongs to class 1, otherwise 2 If , then x belongs to class 1, otherwise, 2. Likelihood Ratio Discriminant Function Case 1: Minimum Euclidean Distance (Linear Machine), Si=s2I Case 2: Minimum Mahalanobis Distance (Linear Machine), Si = S Case 3: Quadratic classifier , Si = arbitrary Non-parametric kNN For a given x, if k1/k > k2/k, then x belongs to class 1, otherwise 2 Dimensionality reduction Estimate Gaussian (Maximum Likelihood Estimation, MLE), Two-modal Gaussian Performance evaluation ROC curve

A program that can sense, reason, act, and adapt Algorithms whose performance improve as they are exposed to more data over time Naive Bayes kNN Decision Tree SVM K-Means Dimensionality Reduction, Multi-layer neural networks learn from vast amounts of data 4/19/20194/19/20194/19/20194/19/20194/19/2019

A Bit of History 1956-1976 1956, The Dartmouth Summer Research Project on Artificial Intelligence, organized by John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon The rise of symbolic methods, systems focused on limited domains, deductive vs. inductive systems 1973, the Lighthill report by James Lighthill, “Artificial Intelligence: A General Survey” - automata, robotics, neural network 1976, the AI Winter 1976-2006 1986, BP algorithm ~1995, The Fifth Generation Computer 2006-??? 2006, Hinton (U. of Toronto), Bingio (U. of Montreal), LeCun (NYU) 2012, ImageNet by Fei-Fei Li (2010-2017) and AlexNet We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College ... The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer. Hamming, Dijkstra, Knuth (1974, the art of computer programming), Tim Berners-Lee (2016, www), Diffie Hellman PKC (2015) Essentially a brainstorming section https://en.wikipedia.org/wiki/Dartmouth_workshop https://en.wikipedia.org/wiki/Lighthill_report

A Wrong Direction One argument: Instead of understanding the human brain, we understand the computer. Therefore, NN dies out in 70s. 1980s, Japan started “the fifth generation computer research project”, namely, “knowledge information processing computer system”. The project aims to improve logical reasoning to reach the speed of numerical calculation. This project proved an abortion, but it brought another climax to AI research and NN research.

Biological Neuron Dendrites: tiny fibers which carry signals to the neuron cell body Cell body: serves to integrate the inputs from the dendrites Axon: one cell has a single output which is axon. Axons may be very long (over a foot) Synaptic junction: an axon impinges on a dendrite which causes input/output signal transitions

http://faculty.washington.edu/chudler/chnt1.html Synapse Communication of information between neurons is accomplished by movement of chemicals across the synapse. The chemicals are called neurotransmitters (generated from cell body) The neurotransmitters are released from one neuron (the presynaptic nerve terminal), then cross the synapse and are accepted by the next neuron at a specialized site (the postsynaptic receptor).

The Discovery of Neurotransmitters Otto Loewi's Experiment (1920) Heart 1 is connected to vagus nerve, and is put in a chamber filled with saline Electrical stimulation of vagus nerve causes heart 1 to slow down. Then after a delay, heart 2 slows down too. Acetylcholine Back in 1921, an Austrian scientist named Otto Loewi discovered the first neurotransmitter.In his experiment (which came to him in a dream), he used two frog hearts. One heart (heart #1) was still connected to the vagus nerve. Heart #1 was placed in a chamber that was filled with saline. This chamber was connected to a second chamber that contained heart #2. So, fluid from chamber #1 was allowed to flow into chamber #2. Electrical stimulation of the vagus nerve (which was attached to heart #1) caused heart #1 to slow down. Loewi also observed that after a delay, heart #2 also slowed down. From this experiment, Loewi hypothesized that electrical stimulation of the vagus nerve released a chemical into the fluid of chamber #1 that flowed into chamber #2. He called this chemical "Vagusstoff". We now know this chemical as the neurotransmitter called acetylcholine.

Action Potential When a neurotransmitter binds to a receptor on the postsynaptic side of the synapse, it results in a change of the postsynaptic cell's excitability: it makes the postsynaptic cell either more or less likely to fire an action potential. If the number of excitatory postsynaptic events are large enough, they will add to cause an action potential in the postsynaptic cell and a continuation of the "message." Many psychoactive drugs and neurotoxins can change the properties of neurotransmitter release, neurotransmitter reuptake and the availability of receptor binding sites.

Perceptron x1 w1 w2 x2 S …… z wd xd -w0 1 Bias or threshold

Perceptron A program that learns “concepts” based on examples and correct answers It can only respond with “true” or “false” Single layer neural network By training, the weight and bias of the network will be changed to be able to classify the training set with 100% accuracy

Perceptron Criterion Function Y: the set of samples misclassified by a Gradient descent learning a = [w1 w2 … wd -w0] – augmentation y = [x1 x2 … xd 1]

Perceptron Learning Rule x1 w1 w2 x2 S …… z wd xd -w0 w(k+1) = w(k) + (T – z) x -w0(k+1) = -w0(k) + (T – z) 1 T is the expected output z is the real output

Training Step1: Samples are presented to the network Step2: If the output is correct, no change is made; Otherwise, the weight and biases will be updated based on perceptron learning rule Step3: An entire pass through all the training set is called an “epoch”. If no change has been made for the epoch, stop. Otherwise, go back Step1

Exercise (AND Logic) x1 w1 w2 z x2 -w0 1 x1 x2 T 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 1 w1 w2 -w0

% demonstration on perceptron % AND gate % % Hairong Qi input = 2; tr = 4; w = rand(1,input+1); % generate the initial weight vector x = [0 0; 1 0; 0 1; 1 1]; % the training set (or the inputs) T = [0; 0; 0; 1]; % the expected output % learning process (training process) finish = 0; while ~finish disp(w) for i=1:tr z(i) = w(1:input) * x(i,:)' > w(input+1); w(1:input) = w(1:input) + (T(i)-z(i))* x(i,:); w(input+1) = w(input+1) - (T(i)-z(i)); end if sum(abs(z-T')) == 0 finish = 1; disp(z) pause

Visualizing the Decision Rule

Limitations The output only has two values (1 or 0) Can only classify samples which are linearly separable (straight line or straight plane) Can’t train network functions like XOR

Examples

Examples