CS 188: Artificial Intelligence Learning II: Linear Classification and Neural Networks Instructors: Stuart Russell and Pat Virtue University of California,

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Advertisements

1 Image Classification MSc Image Processing Assignment March 2003.
G53MLE | Machine Learning | Dr Guoping Qiu
Navneet Goyal, BITS-Pilani Perceptrons. Labeled data is called Linearly Separable Data (LSD) if there is a linear decision boundary separating the classes.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Classification Neural Networks 1
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Simple Neural Nets For Pattern Classification
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
CES 514 – Data Mining Lecture 8 classification (contd…)
Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
An Illustrative Example
Artificial Neural Networks
CS 4700: Foundations of Artificial Intelligence
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Machine learning Image source:
Machine learning Image source:
Neural Networks Lecture 8: Two simple learning algorithms
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Radial Basis Function Networks
Computer Science and Engineering
Multi-Layer Perceptrons Michael J. Watts
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Classification: Feature Vectors
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
Handwritten digit recognition
Linear Classification with Perceptrons
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
Chapter 6 Neural Network.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Convolutional Neural Network
Neural Networks References: “Artificial Intelligence for Games” "Artificial Intelligence: A new Synthesis"
Page 1 CS 546 Machine Learning in NLP Review 1: Supervised Learning, Binary Classifiers Dan Roth Department of Computer Science University of Illinois.
Lecture 12. Outline of Rule-Based Classification 1. Overview of ANN 2. Basic Feedforward ANN 3. Linear Perceptron Algorithm 4. Nonlinear and Multilayer.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Neural networks and support vector machines
Artificial Neural Networks
Dan Roth Department of Computer and Information Science
Artificial Intelligence
Artificial neural networks
ECE 5424: Introduction to Machine Learning
Lecture 24: Convolutional neural networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks CS 446 Machine Learning.
Machine Learning: The Connectionist
CS 4/527: Artificial Intelligence
CSSE463: Image Recognition Day 17
Introduction to Neural Networks
CS 188: Artificial Intelligence
Classification Neural Networks 1
Artificial Intelligence Chapter 3 Neural Networks
Lecture Notes for Chapter 4 Artificial Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
CSSE463: Image Recognition Day 17
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
CSSE463: Image Recognition Day 17
Introduction to Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

CS 188: Artificial Intelligence Learning II: Linear Classification and Neural Networks Instructors: Stuart Russell and Pat Virtue University of California, Berkeley

Regression vs Classification 1 x y 1 x y

Threshold perceptron as linear classifier

Binary Decision Rule  A threshold perceptron is a single unit that outputs  y = h w (x) = 1 when w.x  0 = 0 when w.x < 0  In the input vector space  Examples are points x  The equation w.x=0 defines a hyperplane  One side corresponds to y=1  Other corresponds to y=0 w 0 : -3 w free : 4 w money : 2 free money y=1 (SPAM) y=0 (HAM) w.x=0

Example Dear Stuart, I’m leaving Macrosoft to return to academia. The money is is great here but I prefer to be free to do my own research; and I really love teaching undergrads! Do I need to finish my BA first before applying? Best wishes Bill w 0 : -3 w free : 4 w money : 2 free money y=1 (SPAM) y=0 (HAM) w.x=0 x 0 : 1 x free : 1 x money : 1 w.x = -3x1 + 4x1 + 2x1 = 3

Weight Updates

Perceptron learning rule  If true y  h w (x) (an error), adjust the weights  If w.x < 0 but the output should be y=1  This is called a false negative  Should increase weights on positive inputs  Should decrease weights on negative inputs  If w.x > 0 but the output should be y=0  This is called a false positive  Should decrease weights on positive inputs  Should increase weights on negative inputs  The perceptron learning rule does this:  w  w +  (y – h w (x)) x learning rate +1, -1, or 0 (no error)

Example Dear Stuart, I wanted to let you know that I have decided to leave Macrosoft and return to academia. The money is is great here but I prefer to be free to pursue more interesting research and I really love teaching undergraduates! Do I need to finish my BA first before applying? Best wishes Bill w 0 : -3 w free : 4 w money : 2 free money y=1 (SPAM) y=0 (HAM) w.x=0 x 0 : 1 x free : 1 x money : 1 w.x = -3x1 + 4x1 + 2x1 = 3 w  w +  (y – h w (x)) x  = 0.5 w  (-3,4,2) (0 – 1) (1,1,1) = (-3.5,3.5,1.5)

Perceptron convergence theorem  A learning problem is linearly separable iff there is some hyperplane exactly separating +ve from –ve examples  Convergence: if the training data are separable, perceptron learning applied repeatedly to the training set will eventually converge to a perfect separator Separable Non-Separable

Example: Earthquakes vs nuclear explosions 63 examples, 657 updates required

Perceptron convergence theorem  A learning problem is linearly separable iff there is some hyperplane exactly separating +ve from –ve examples  Convergence: if the training data are separable, perceptron learning applied repeatedly to the training set will eventually converge to a perfect separator  Convergence: if the training data are non-separable, perceptron learning will converge to a minimum-error solution provided the learning rate  is decayed appropriately (e.g.,  =1/t) Separable Non-Separable

Perceptron learning with fixed 

Perceptron learning with decaying 

Other Linear Classifiers 1 x y  Support Vector Machines (SVM)  Maximize margin between boundary and nearest points 1 x y

Neural Networks

Very Loose Inspiration: Human Neurons

Simple Model of a Neuron (McCulloch & Pitts, 1943)  Inputs a i come from the output of node i to this node j (or from “outside”)  Each input link has a weight w i,j  There is an additional fixed input a 0 with bias weight w 0,j  The total input is in j =  i w i,j a i  The output is a j = g(in j ) = g(  i w i,j a i ) = g(w.a)

Single Neuron

Minimize Single Neuron Loss

Choice of Activation Function

Multiclass Classification

softmax function

Multilayer Perceptrons  A multilayer perceptron is a feedforward neural network with at least one hidden layer (nodes that are neither inputs nor outputs)  MLPs with enough hidden nodes can represent any function

Neural Network Equations w 133

Minimize Neural Network Loss

Error Backpropagation a 22

Deep Learning: Convolutional Neural Networks  LeNet5 – Lecun, et al, 1998  Convnets for digit recognition LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE (1998):

Convolutional Neural Networks  Alexnet – Alex Krizhevsky, Geoffrey Hinton, et al, 2012  Convnets for image classification  More data & more compute power Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems

Deep Learning: GoogLeNet Szegedy, Christian, et al. ”Going deeper with convolutions." CVPR (2015).

Neural Nets  Incredible success in the last three years  Data (ImageNet)  Compute power  Optimization  Activation functions (ReLU)  Regularization  Reducing overfitting (d ropout)  Software packages  Caffe (UC Berkeley)  Theano (Université de Montréal)  Torch (Facebook, Yann Lecun)  TensorFlow (Google)

Practical Issues