ICS 273A UC Irvine Instructor: Max Welling Neural Networks.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
Neural networks Introduction Fitting neural networks
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Neural Network I Week 7 1. Team Homework Assignment #9 Read pp. 327 – 334 and the Week 7 slide. Design a neural network for XOR (Exclusive OR) Explore.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Reading for Next Week Textbook, Section 9, pp A User’s Guide to Support Vector Machines (linked from course website)
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Classification Neural Networks 1
Machine Learning Neural Networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.
Neural Networks Marco Loog.
Artificial Neural Networks Artificial Neural Networks are (among other things) another technique for supervised learning k-Nearest Neighbor Decision Tree.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Chapter 6: Multilayer Neural Networks
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Artificial Neural Networks
LOGO Classification III Lecturer: Dr. Bo Yuan
CS 484 – Artificial Intelligence
Artificial Neural Networks
Computer Science and Engineering
Machine Learning Chapter 4. Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
START OF DAY 4 Reading: Chap. 3 & 4. Project Topics & Teams Select topics/domains Select teams Deliverables – Description of the problem – Selection.
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Non-Bayes classifiers. Linear discriminants, neural networks.
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Artificial Neural Networks Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Artificial Neural Network
Introduction to Neural Networks Freek Stulp. 2 Overview Biological Background Artificial Neuron Classes of Neural Networks 1. Perceptrons 2. Multi-Layered.
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Machine Learning Supervised Learning Classification and Regression
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
CSE P573 Applications of Artificial Intelligence Neural Networks
Classification / Regression Neural Networks 2
Machine Learning Today: Reading: Maria Florina Balcan
Classification Neural Networks 1
Perceptron as one Type of Linear Discriminants
CSE 573 Introduction to Artificial Intelligence Neural Networks
Artificial Neural Networks
Neural Networks Geoff Hulten.
Lecture Notes for Chapter 4 Artificial Neural Networks
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
COSC 4335: Part2: Other Classification Techniques
Seminar on Machine Learning Rada Mihalcea
David Kauchak CS158 – Spring 2019
Presentation transcript:

ICS 273A UC Irvine Instructor: Max Welling Neural Networks

Neurons Neurons communicate by receiving signals on their dendrites. Adding these signals and firing off a new signal along the axon if the total input exceeds a threshold. The axon connects to new dendrites through synapses which can learn how much signal is transmitted. McCulloch and Pitt (’43) built a first abstract model of a neuron. input output weights activation function bias 1 b

Neurons We have about neurons, each one connected to other neurons on average. Each neuron needs at least seconds to transmit the signal. So we have many, slow neurons. Yet we recognize our grandmother in sec. Computers have much faster switching times: sec. Conclusion: brains compute in parallel ! In fact, neurons are unreliable/noisy as well. But since things are encoded redundantly by many of them, their population can do computation reliably and fast.

Classification & Regression Neural nets are a parameterized function Y=f(X;W) from inputs (X) to outputs (Y). If Y is continuous: regression, if Y is discrete: classification. We adapt the weights so as to minimize the error between the data and the model predictions. This is just a perceptron with a quadratic cost function.

Optimization We use stochastic gradient descent: pick a single data-item, compute the contribution of that data-point to the overall gradient and update the weights.

Stochastic Gradient Descent stochastic updates full updates (averaged over all data-items) Stochastic gradient descent does not converge to the minimum, but “dances” around it. To get to the minimum, one needs to decrease the step-size as one get closer to the minimum. Alternatively, one can obtain a few samples and average predictions over them (similar to bagging).

Multi-Layer Nets Single layers can only do linear things. If we want to learn non-linear decision surfaces, or non-linear regression curves, we need more than one layer. In fact, NN with 1 hidden layer can approximate any boolean and cont. functions y h2 h1 x W3,b3 W2,b2 W1,b1

Back-propagation How do we learn the weights of a multi-layer network? Answer: Stochastic gradient descent. But now the gradients are harder! i W3,b3 x h1 h2 y W2,b2 W1,b1

Back Propagation i W3,b3 x h1 h2 y W2,b2 W1,b1 i W3,b3 x h1 h2 y W2,b2 W1,b1 Upward pass downward pass

Back Propagation i W3,b3 x h1 h2 y W2,b2 W1,b1

ALVINN This hidden unit detects a mildly left sloping road and advices to steer left. How would another hidden unit look like? Learning to drive a car

Weight Decay NN can also overfit (of course). We can try to avoid this by initializing all weights/biases terms to very small random values and grow them during learning. One can now check performance on a validation set and stop early. Or one can change the update rule to discourage large weights: Now we need to set using X-validation. This is called “weight-decay” in NN jargon.

Momentum In the beginning of learning it is likely that the weights are changed in a consistent manner. Like a ball rolling down a hill, we should gain speed if we make consistent changes. It’s like an adaptive stepsize. This idea is easily implemented by changing the gradient as follows: (and similar to biases)

Conclusion NN are a flexible way to model input/output functions They are robust against noisy data Hard to interpret the results (unlike DTs) Learning is fast on large datasets when using stochastic gradient descent plus momentum. Local minima in optimization is a problem Overfitting can be avoided using weight decay or early stopping There are also NN which feed information back (recurrent NN) Many more interesting NNs: Boltzman machines, self-organizing maps,...