ICS 273A UC Irvine Instructor: Max Welling Neural Networks.

Slides:

Advertisements

Similar presentations

Artificial Neural Networks

Advertisements

Slides from: Doug Gray, David Poole

NEURAL NETWORKS Backpropagation Algorithm

Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.

Neural networks Introduction Fitting neural networks

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)

Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.

Neural Network I Week 7 1. Team Homework Assignment #9 Read pp. 327 – 334 and the Week 7 slide. Design a neural network for XOR (Exclusive OR) Explore.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Tuomas Sandholm Carnegie Mellon University Computer Science Department

Reading for Next Week Textbook, Section 9, pp A User’s Guide to Support Vector Machines (linked from course website)

Artificial Neural Networks

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

Classification Neural Networks 1

Machine Learning Neural Networks

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.

Neural Networks Marco Loog.

Artificial Neural Networks Artificial Neural Networks are (among other things) another technique for supervised learning k-Nearest Neighbor Decision Tree.

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Chapter 6: Multilayer Neural Networks

Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.

MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Artificial Neural Networks

LOGO Classification III Lecturer: Dr. Bo Yuan

CS 484 – Artificial Intelligence

Artificial Neural Networks

Computer Science and Engineering

Machine Learning Chapter 4. Artificial Neural Networks

11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering

START OF DAY 4 Reading: Chap. 3 & 4. Project Topics & Teams Select topics/domains Select teams Deliverables – Description of the problem – Selection.

Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.

Classification / Regression Neural Networks 2

LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.

Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.

Non-Bayes classifiers. Linear discriminants, neural networks.

Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.

Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:

Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.

Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22

Artificial Neural Networks Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.

Artificial Neural Network

Introduction to Neural Networks Freek Stulp. 2 Overview Biological Background Artificial Neuron Classes of Neural Networks 1. Perceptrons 2. Multi-Layered.

1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.

Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.

Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.

BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.

Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.

Machine Learning Supervised Learning Classification and Regression

Artificial Neural Networks

Learning with Perceptrons and Neural Networks

CSE P573 Applications of Artificial Intelligence Neural Networks

Classification / Regression Neural Networks 2

Machine Learning Today: Reading: Maria Florina Balcan

Classification Neural Networks 1

Perceptron as one Type of Linear Discriminants

CSE 573 Introduction to Artificial Intelligence Neural Networks

Artificial Neural Networks

Neural Networks Geoff Hulten.

Lecture Notes for Chapter 4 Artificial Neural Networks

Neural Networks ICS 273A UC Irvine Instructor: Max Welling

COSC 4335: Part2: Other Classification Techniques

Seminar on Machine Learning Rada Mihalcea

David Kauchak CS158 – Spring 2019

Presentation transcript:

ICS 273A UC Irvine Instructor: Max Welling Neural Networks

Neurons Neurons communicate by receiving signals on their dendrites. Adding these signals and firing off a new signal along the axon if the total input exceeds a threshold. The axon connects to new dendrites through synapses which can learn how much signal is transmitted. McCulloch and Pitt (’43) built a first abstract model of a neuron. input output weights activation function bias 1 b

Neurons We have about neurons, each one connected to other neurons on average. Each neuron needs at least seconds to transmit the signal. So we have many, slow neurons. Yet we recognize our grandmother in sec. Computers have much faster switching times: sec. Conclusion: brains compute in parallel ! In fact, neurons are unreliable/noisy as well. But since things are encoded redundantly by many of them, their population can do computation reliably and fast.

Classification & Regression Neural nets are a parameterized function Y=f(X;W) from inputs (X) to outputs (Y). If Y is continuous: regression, if Y is discrete: classification. We adapt the weights so as to minimize the error between the data and the model predictions. This is just a perceptron with a quadratic cost function.

Optimization We use stochastic gradient descent: pick a single data-item, compute the contribution of that data-point to the overall gradient and update the weights.

Stochastic Gradient Descent stochastic updates full updates (averaged over all data-items) Stochastic gradient descent does not converge to the minimum, but “dances” around it. To get to the minimum, one needs to decrease the step-size as one get closer to the minimum. Alternatively, one can obtain a few samples and average predictions over them (similar to bagging).

Multi-Layer Nets Single layers can only do linear things. If we want to learn non-linear decision surfaces, or non-linear regression curves, we need more than one layer. In fact, NN with 1 hidden layer can approximate any boolean and cont. functions y h2 h1 x W3,b3 W2,b2 W1,b1

Back-propagation How do we learn the weights of a multi-layer network? Answer: Stochastic gradient descent. But now the gradients are harder! i W3,b3 x h1 h2 y W2,b2 W1,b1

Back Propagation i W3,b3 x h1 h2 y W2,b2 W1,b1 i W3,b3 x h1 h2 y W2,b2 W1,b1 Upward pass downward pass

Back Propagation i W3,b3 x h1 h2 y W2,b2 W1,b1

ALVINN This hidden unit detects a mildly left sloping road and advices to steer left. How would another hidden unit look like? Learning to drive a car

Weight Decay NN can also overfit (of course). We can try to avoid this by initializing all weights/biases terms to very small random values and grow them during learning. One can now check performance on a validation set and stop early. Or one can change the update rule to discourage large weights: Now we need to set using X-validation. This is called “weight-decay” in NN jargon.

Momentum In the beginning of learning it is likely that the weights are changed in a consistent manner. Like a ball rolling down a hill, we should gain speed if we make consistent changes. It’s like an adaptive stepsize. This idea is easily implemented by changing the gradient as follows: (and similar to biases)

Conclusion NN are a flexible way to model input/output functions They are robust against noisy data Hard to interpret the results (unlike DTs) Learning is fast on large datasets when using stochastic gradient descent plus momentum. Local minima in optimization is a problem Overfitting can be avoided using weight decay or early stopping There are also NN which feed information back (recurrent NN) Many more interesting NNs: Boltzman machines, self-organizing maps,...