Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Classification Neural Networks 1
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Simple Neural Nets For Pattern Classification
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.
Neural Networks Marco Loog.
Artificial Neural Networks Artificial Neural Networks are (among other things) another technique for supervised learning k-Nearest Neighbor Decision Tree.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Chapter 6: Multilayer Neural Networks
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Artificial Neural Networks
LOGO Classification III Lecturer: Dr. Bo Yuan
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Networks
Computer Science and Engineering
Machine Learning Chapter 4. Artificial Neural Networks
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Non-Bayes classifiers. Linear discriminants, neural networks.
EE459 Neural Networks Backpropagation
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Lecture 5 Instructor: Max Welling Squared Error Matrix Factorization.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Artificial Neural Network
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Classification with Perceptrons Reading:
Artificial Neural Networks
CSE P573 Applications of Artificial Intelligence Neural Networks
Classification / Regression Neural Networks 2
Machine Learning Today: Reading: Maria Florina Balcan
Classification Neural Networks 1
Perceptron as one Type of Linear Discriminants
Artificial Neural Networks
Neural Networks Geoff Hulten.
Lecture Notes for Chapter 4 Artificial Neural Networks
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
Machine Learning: Lecture 4
Neural networks (1) Traditional multi-layer perceptrons
Seminar on Machine Learning Rada Mihalcea
David Kauchak CS158 – Spring 2019
Presentation transcript:

Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4

Neurons Neurons communicate by receiving signals on their dendrites. Adding these signals and firing off a new signal along the axon if the total input exceeds a threshold. The axon connects to new dendrites through synapses which can learn how much signal is transmitted. McCulloch and Pitt (’43) built a first abstract model of a neuron. input output weights activation function bias 1 b

Neurons We have about neurons, each one connected to other neurons on average. Each neuron needs at least seconds to transmit the signal. So we have many, slow neurons. Yet we recognize our grandmother in sec. Computers have much faster switching times: sec. Conclusion: brains compute in parallel ! In fact, neurons are unreliable/noisy as well. But since things are encoded redundantly by many of them, their population can do computation reliably and fast.

Classification / Regression Neural nets are parameterized function Y=f(X;W) from inputs (X) to outputs (Y). If Y is continuous: regression, if Y is discrete: classification. We adapt the weights so as to minimize the error between the data and the model predictions. Or, in other words: maximize the conditional probability of the data (output, given attributes) E.g. 2-class classification (y=0/1) looks familiar ?

Logistic Regression If we use the following model for f(x), we obtain logistic regression! This is called a “perceptron” in neural networks jargon. Perceptrons can only separate classes linearly. Homework: show equivalence by rewriting the error function

Regression Probability of output given input attributes is Normally distributed. Error is negative log-probability = squared loss function:

Optimization We use stochastic gradient descent: pick a single data-item, compute the contribution of that data-point to the overall gradient and update the weights.

Stochastic Gradient Descent stochastic updates full updates (averaged over all data-items) Stochastic gradient descent does not converge to the minimum, but “dances” around it. To get to the minimum, one needs to decrease the step-size as one get closer to the minimum. Alternatively, one can obtain a few samples and average predictions over them (similar to bagging).

Multi-Layer Nets Single layers can only do linear things. If we want to learn non-linear decision surfaces, or non-linear regression curves, we need more than one layer. In fact, NN with 1 hidden layer can approximate any boolean and cont. functions y h2 h1 x W3,b3 W2,b2 W1,b1

Back-propagation How do we learn the weights of a multi-layer network? Answer: Stochastic gradient descent. But now the gradients are harder! example: i W3,b3 x h1 h2 y W2,b2 W1,b1

Back Propagation i W3,b3 x h1 h2 y W2,b2 W1,b1 i W3,b3 x h1 h2 y W2,b2 W1,b1 Upward pass downward pass

Back Propagation i W3,b3 x h1 h2 y W2,b2 W1,b1 What is the update rule for W1,b1 and W3,b3 ? You don’t need to derive all the rules. You may simply generalize the rules shown here. Please do not look up the answer in some book.

ALVINN This hidden unit detects a mildly left sloping road and advices to steer left. How would another hidden unit look like? Learning to drive a car

Weight Decay NN can also overfit (of course). We can try to avoid this by initializing all weights/biases terms to very small random values and grow them during learning. One can now check performance on a validation set and stop early. Or one can change the update rule to discourage large weights: Now we need to set using X-validation. This is called “weight-decay” in NN jargon.

Momentum In the beginning of learning it is likely that the weights are changed in a consistent manner. Like a ball rolling down a hill, we should gain speed if we make consistent changes. It’s like an adaptive stepsize. This idea is easily implemented by changing the gradient as follows: (and similar to biases)

Conclusion NN are a flexible way to model input/output functions They can be given a probabilistic interpretation They are robust against noisy data Hard to interpret the results (unlike DTs) Learning is fast on large datasets when using stochastic gradient descent plus momentum. Overfitting can be avoided using weight decay or early stopping There are also NN which feed information back (recurrent NN) Many more interesting NNs: Boltzman machines, self-organizing maps,...