Neural Networks ICS 273A UC Irvine Instructor: Max Welling

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
Neural networks Introduction Fitting neural networks
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Neural Network I Week 7 1. Team Homework Assignment #9 Read pp. 327 – 334 and the Week 7 slide. Design a neural network for XOR (Exclusive OR) Explore.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Artificial Neural Networks
Classification Neural Networks 1
Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.
Neural Networks Marco Loog.
Artificial Neural Networks Artificial Neural Networks are (among other things) another technique for supervised learning k-Nearest Neighbor Decision Tree.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Artificial Neural Networks
LOGO Classification III Lecturer: Dr. Bo Yuan
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Networks
Computer Science and Engineering
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Machine Learning Chapter 4. Artificial Neural Networks
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Artificial Neural Network
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
Advanced information retreival
Artificial Intelligence (CS 370D)
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Artificial Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
CSE P573 Applications of Artificial Intelligence Neural Networks
Classification / Regression Neural Networks 2
Machine Learning Today: Reading: Maria Florina Balcan
ECE 471/571 - Lecture 17 Back Propagation.
Classification Neural Networks 1
Perceptron as one Type of Linear Discriminants
CSE 573 Introduction to Artificial Intelligence Neural Networks
Artificial Neural Networks
Neural Networks Geoff Hulten.
Lecture Notes for Chapter 4 Artificial Neural Networks
Machine Learning: Lecture 4
Machine Learning: UNIT-2 CHAPTER-1
Neural networks (1) Traditional multi-layer perceptrons
Chapter - 3 Single Layer Percetron
COSC 4335: Part2: Other Classification Techniques
Seminar on Machine Learning Rada Mihalcea
David Kauchak CS158 – Spring 2019
Introduction to Neural Networks
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Neural Networks ICS 273A UC Irvine Instructor: Max Welling Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling

Neurons Neurons communicate by receiving signals on their dendrites. Adding these signals and firing off a new signal along the axon if the total input exceeds a threshold. The axon connects to new dendrites through synapses which can learn how much signal is transmitted. McCulloch and Pitt (’43) built a first abstract model of a neuron. 1 b input output bias activation function weights

Neurons We have about neurons, each one connected to other neurons on average. Each neuron needs at least seconds to transmit the signal. So we have many, slow neurons. Yet we recognize our grandmother in sec. Computers have much faster switching times: sec. Conclusion: brains compute in parallel ! In fact, neurons are unreliable/noisy as well. But since things are encoded redundantly by many of them, their population can do computation reliably and fast.

Classification / Regression Neural nets are a parameterized function Y=f(X;W) from inputs (X) to outputs (Y). If Y is continuous: regression, if Y is discrete: classification. We adapt the weights so as to minimize the error between the data and the model predictions. Or, in other words: maximize the conditional probability of the data (output, given attributes) E.g. 2-class classification (y=0/1) looks familiar ?

Logistic Regression If we use the following model for f(x), we obtain logistic regression! This is called a “perceptron” in neural networks jargon. Perceptrons can only separate classes linearly.

Regression Probability of output given input attributes is Normally distributed. Error is negative log-probability = squared loss function:

Optimization We use stochastic gradient descent: pick a single data-item, compute the contribution of that data-point to the overall gradient and update the weights.

Stochastic Gradient Descent stochastic updates full updates (averaged over all data-items) Stochastic gradient descent does not converge to the minimum, but “dances” around it. To get to the minimum, one needs to decrease the step-size as one get closer to the minimum. Alternatively, one can obtain a few samples and average predictions over them (similar to bagging).

Multi-Layer Nets y W3,b3 h2 W2,b2 h1 W1,b1 x Single layers can only do linear things. If we want to learn non-linear decision surfaces, or non-linear regression curves, we need more than one layer. In fact, NN with 1 hidden layer can approximate any boolean and cont. functions

Back-propagation How do we learn the weights of a multi-layer network? Answer: Stochastic gradient descent. But now the gradients are harder! example: i W3,b3 x h1 h2 y W2,b2 W1,b1

Back Propagation y i y i W3,b3 W3,b3 h2 h2 W2,b2 W2,b2 h1 h1 W1,b1 x h1 h2 y W2,b2 W1,b1 i W3,b3 x h1 h2 y W2,b2 W1,b1 downward pass Upward pass

Back Propagation i W3,b3 x h1 h2 y W2,b2 W1,b1

ALVINN Learning to drive a car This hidden unit detects a mildly left sloping road and advices to steer left. How would another hidden unit look like?

Weight Decay NN can also overfit (of course). We can try to avoid this by initializing all weights/biases terms to very small random values and grow them during learning. One can now check performance on a validation set and stop early. Or one can change the update rule to discourage large weights: Now we need to set using X-validation. This is called “weight-decay” in NN jargon.

Momentum In the beginning of learning it is likely that the weights are changed in a consistent manner. Like a ball rolling down a hill, we should gain speed if we make consistent changes. It’s like an adaptive stepsize. This idea is easily implemented by changing the gradient as follows: (and similar to biases)

Conclusion NN are a flexible way to model input/output functions They can be given a probabilistic interpretation They are robust against noisy data Hard to interpret the results (unlike DTs) Learning is fast on large datasets when using stochastic gradient descent plus momentum. Overfitting can be avoided using weight decay or early stopping There are also NN which feed information back (recurrent NN) Many more interesting NNs: Boltzman machines, self-organizing maps,...