CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
For Wednesday Read chapter 19, sections 1-3 No homework.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
Artificial Neural Networks Artificial Neural Networks are (among other things) another technique for supervised learning k-Nearest Neighbor Decision Tree.
Artificial Neural Networks
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
NEURAL NETWORKS FOR DATA MINING
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
For Friday No reading Take home exam due Exam 2. For Monday Read chapter 22, sections 1-3 FOIL exercise due.
Today’s Topics Read: Chapters 7, 8, and 9 on Logical Representation and Reasoning HW3 due at 11:55pm THURS (ditto for your Nannon Tourney Entry) Recipe.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.
Today’s Topics Artificial Neural Networks (ANNs) Perceptrons (1950s) Hidden Units and Backpropagation (1980s) Deep Neural Networks (2010s) ??? (2040s [note.
Computational Intelligence Semester 2 Neural Networks Lecture 2 out of 4.
Today’s Topics 11/10/15CS Fall 2015 (Shavlik©), Lecture 21, Week 101 More on DEEP ANNs –Convolution –Max Pooling –Drop Out Final ANN Wrapup FYI:
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Today’s Lecture Neural networks Training
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
CS 388: Natural Language Processing: Neural Networks
Neural Networks.
ECE 5424: Introduction to Machine Learning
Learning in Neural Networks
ECE 5424: Introduction to Machine Learning
Chapter 2 Single Layer Feedforward Networks
Artificial neural networks:
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks CS 446 Machine Learning.
Classification with Perceptrons Reading:
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CS621: Artificial Intelligence
cs638/838 - Spring 2017 (Shavlik©), Week 7
Machine Learning Today: Reading: Maria Florina Balcan
Goodfellow: Chap 6 Deep Feedforward Networks
Chapter 3. Artificial Neural Networks - Introduction -
cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11
CSE (c) S. Tanimoto, 2004 Neural Networks
cs540 - Fall 2016 (Shavlik©), Lecture 18, Week 10
CS Fall 2016 (Shavlik©), Lecture 2
Artificial Intelligence Chapter 3 Neural Networks
Perceptron as one Type of Linear Discriminants
Neural Network - 2 Mayank Vatsa
Concept Learning Algorithms
[Figure taken from googleblog
Neural Networks Geoff Hulten.
Artificial Intelligence Lecture No. 28
CSE (c) S. Tanimoto, 2001 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
cs638/838 - Spring 2017 (Shavlik©), Week 7
CSE (c) S. Tanimoto, 2002 Neural Networks
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSE (c) S. Tanimoto, 2007 Neural Nets
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
David Kauchak CS158 – Spring 2019
Artificial Intelligence Chapter 3 Neural Networks
Principles of Back-Propagation
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10 11/16/2018 Today’s Topics HW4 Out (due in two weeks, some Java) Artificial Neural Networks (ANNs) Perceptrons (1950s) Hidden Units and Backpropagation (1980s) Deep Neural Networks (2010s) ??? (2040s [note the pattern]) This Lecture: The Big Picture & Forward Propagation Next Lecture: Learning Network Weights 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

Should you? (Slide I used in CS 760 for 20+ years) ‘Fenwίck here is biding his time waiting for neural networks’ 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

Recall: Supervised ML Systems Differ in How They Represent Concepts Backpropagation … Training Examples ID3, CART … FOIL, ILP   X  Y   Z 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

Advantages of Artificial Neural Networks Provide best predictive accuracy for many problems Can represent a rich class of concepts (‘universal approximators’) Positive Negative Positive time-series data 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

A Brief Overview of ANNs Output units  error  weight Recurrent link Hidden units Input units 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

Recurrent ANN’s (Advanced topic: LSTM models, Schmidhuber group) State Units (ie, memory) 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

Representing Features in ANNs (and SVMs) - we need NUMERIC values Input Units Ex 1 Nominal f={a,b,c} ‘1 of N’ rep f=a f=b f=c Hierarchical Linear/Ordered f=a f=b f=c f=d f=e f=g 1 (for f=e) Typical Approaches - others possible f=[a,b] Approach I (use 1 input unit): f = value – a b - a Approach II: Thermometer Rep (next slide) 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

More on Encoding Datasets Thermometer Representation f is an element of { a, b, c }, ie f is ordered f = a  100 f = b  110 f = c  111 (could also discretize continuous functions this way) For N categories use a 1-of-N representation Output Representation Category 1  100 Category 2  010 Category 3  001 For Boolean functions use either 1 or 2 output units Normalize real-valued functions to [0,1] Could also use an error-correcting code (but we won’t cover that) 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

Connectionism History PERCEPTRONS (Rosenblatt 1957) no hidden units earliest work in machine learning, died out in 1960’s (due to Minsky & Papert book) wij J wik K I L wil Outputi = F(Wij  outputj + Wik  outputk + Wil  outputl ) 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10 Connectionism (cont.) Backpropagation Algorithm Overcame Perceptron’s Weakness Major reason for renewed excitement in 1980’s ‘Hidden Units’ Important Fundamental extension to perceptrons Can generate new features (‘constructive induction’, ‘predicate invention’, ‘learning representations’, ‘derived features’) 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10 Deep Neural Networks Old: backprop algo does not work well for more than one layer of hidden units (‘gradient gets too diffuse’) New: with a lot of training data, deep (several layers of hidden units) neural networks exceed prior state-of-the-art results Unassigned, but FYI: http://www.idsia.ch/~juergen/deep-learning-overview.html 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

Sample Deep Neural Network 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10 A Deeper Network Old Design: fully connect each input node to each HU (only one HU layer), then fully connect each HU to each output node We’ll cover CONVOLUTION and POOLING later 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

Digit Recognition: Influential ANN Testbed From http://people.idsia.ch/~juergen/handwriting.html Digit Recognition: Influential ANN Testbed Deep Networks (Schmidhuber, 2012) One YEAR of training on single CPU One WEEK of training on a single GPU that performed 109 wgt updates/sec 0.2% Error Rate (old record was 0.4%) More info on datasets and results at http://yann.lecun.com/exdb/mnist/ Perceptron: 12% error (7.6% with feature engineering) k-NN: 2.8% (0.63%) Ensemble of d-trees: 1.5% SVMs: 1.4% (0.56%) One layer of HUs: 1.6% (0.4%; feature engr + ensemble of 25 ANNs) 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

Activation Units: Map Weighted Sum to Scalar Individual Units’ Computation output I = F(Sweight i,j x output j) Typically F(input i) = j 1 1+e -(input i – bias i) bias output input Called the ‘sigmoid’ and ‘logistic’ (hyperbolic tangent also used) Piecewise Linear (and Gaussian) nodes can also be used 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10 Rectified Linear Units (ReLUs) (Nair & Hinton, 2010) – used for HUs; use ‘pure’ linear for output units, ie F(wgt’edSum) = wgt’edSum F(wgt’edSum) = max(0, wgt’edSum) Argued to be more biologically plausible Used in ‘deep networks’ bias output input 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10 Sample ANN Calculation (‘Forward Propagation’, ie, reasoning with weights learned by backprop) 3 4 OUTPUT Assume bias=0 for all nodes for simplicity and using RLUs 3 4 -2 3 4 -1 1 -8 -7 9 5 3 2 INPUT 3 2 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

Perceptron Convergence Theorem (Rosenblatt, 1957) Perceptron  no hidden units If a set of examples is learnable, the DELTA rule will eventually find the necessary weights However a perceptron can only learn/represent linearly separable dataset 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10 X2 + + + + + + - + - - + + + + - + + - - - + + - + - - - - - Linear Separability Consider a perceptron, its output is 1 If W1 X1 + W2 X2 + … + Wn Xn > Q 0 otherwise X1 In terms of feature space (2 features only) W1X1 + W2X2 = Q X2 = = Q -W1X1 W2 -W1 Q W2 W2 X1+ y = mx + b Hence, can only classify examples if a ‘line’ (hyperplane) can separate them 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

The (Infamous) XOR Problem Not linearly separable Exclusive OR (XOR) X1 Input 0 0 0 1 1 0 1 1 Output 1 a) b) c) d) 1 b d a c 1 X2 A Solution with (Sigmoidal) Hidden Units 10 X1 10 -10 -10 X2 10 Let Q = 5 for all nodes 10 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

The Need for Hidden Units If there is one layer of enough hidden units (possibly 2N for Boolean functions), the input can be recoded (N = number of input units) This recoding allows any mapping to be represented (known by Minsky & Papert) Question: How to provide an error signal to the interior units? (backprop is the answer from the 1980’s) 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10 Hidden Units One View Allow a system to create its own internal representation – for which problem solving is easy A perceptron 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10 11/16/2018 Reformulating XOR X1 X1 X3 = X1  X2 X2 Alternatively X1 X2 X3 So, if a hidden unit can learn to represent X1  X2 , solution is easy X2 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

The Need for Non-Linear Activation Functions Claim: For every ANN using only linear activation functions with depth k, there is an equivalent perceptron Ie, a neural network with no hidden units So if using only linear activation units, ‘deep’ ANN can only learn a separating ‘line’ Note that RLU’s are non-linear (‘piecewise’ linear) Can show using linear algebra (but won’t in cs540) 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

A Famous Early Application (http://cnl.salk.edu/Media/nettalk.mp3) NETtalk (Sejnowski & Rosenburg, 1987) Mapping character strings into phonemes ‘Sliding Window’ approach Train: 1,000 most common English words 88.5% correct Test: 20,000 word dictionary 72% / 63% correct … … … Ă Ō … … Like the phonemes in a dictionary A T C _ 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

An Empirical Comparison of Symbolic and Neural Learning [Shavlik, Mooney, & Towell, IJCAI 1989 & ML journal 1991] Perceptron works quite well! 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10

ANN Wrapup on Non-Learning Aspects Geoff Hinton, 1947- (great-great-grandson of George Boole!) ANN Wrapup on Non-Learning Aspects Perceptrons can do well, but can only create linear separators in feature space Backprop Algo (next lecture) can successfully train hidden units Historically only one HU layer used Deep Neural Networks (several HU layers) highly successful given large amounts of training data, especially for images & text 11/8/16 CS540 - Fall 2016 (Shavlik©), Lecture 17, Week 10