ECE 5424: Introduction to Machine Learning

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Lecture 14 – Neural Networks
Support Vector Machines (and Kernel Methods in general)
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –SVM –Multi-class SVMs –Neural Networks –Multi-layer Perceptron Readings:
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
CS 4700: Foundations of Artificial Intelligence
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Neural Networks –Backprop –Modular Design.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Nearest Neighbour Readings: Barber 14 (kNN)
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Regression Readings: Barber 17.1, 17.2.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Decision/Classification Trees Readings: Murphy ; Hastie 9.2.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Neural networks and support vector machines
CSSE463: Image Recognition Day 14
PREDICT 422: Practical Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Other Classification Models: Neural Network
Support Vector Machines and Kernels
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
An Introduction to Support Vector Machines
An Introduction to Support Vector Machines
ECE 5424: Introduction to Machine Learning
CS621: Artificial Intelligence
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Machine Learning Today: Reading: Maria Florina Balcan
Support Vector Machines
CS 2750: Machine Learning Support Vector Machines
ECE 5424: Introduction to Machine Learning
CSSE463: Image Recognition Day 14
Neural Network - 2 Mayank Vatsa
CSSE463: Image Recognition Day 14
Neural Networks and Deep Learning
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 14
Support Vector Machines and Kernels
Support Vector Machine I
CSSE463: Image Recognition Day 15
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Introduction to Radial Basis Function Networks
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 15
COSC 4368 Machine Learning Organization
Machine Learning Support Vector Machine Supervised Learning
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Linear Discrimination
Prof. Pushpak Bhattacharyya, IIT Bombay
Support Vector Machines
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

ECE 5424: Introduction to Machine Learning Topics: SVM Multi-class SVMs Neural Networks Multi-layer Perceptron Readings: Barber 17.5, Murphy 16.5 Stefan Lee Virginia Tech

HW2 Graded Mean 63/61 = 103% Max: 76 Min: 20 (C) Dhruv Batra

Administrativia HW3 Due: Nov 7th 11:55PM You will implement primal & dual SVMs Kaggle competition: Higgs Boson Signal vs Background classification (C) Dhruv Batra

Administrativia (C) Dhruv Batra

Recap of Last Time (C) Dhruv Batra

Linear classifiers – Which line is better? w.x = j w(j) x(j)

Dual SVM derivation (1) – the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin

Dual SVM derivation (1) – the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin

Dual SVM formulation – the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin

Dual SVM formulation – the non-separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin

Dual SVM formulation – the non-separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin

Why did we learn about the dual SVM? Builds character! Exposes structure about the problem There are some quadratic programming algorithms that can solve the dual faster than the primal The “kernel trick”!!! (C) Dhruv Batra Slide Credit: Carlos Guestrin

Dual SVM interpretation: Sparsity w.x + b = +1 w.x + b = -1 w.x + b = 0 margin 2 (C) Dhruv Batra Slide Credit: Carlos Guestrin

Dual formulation only depends on dot-products, not on w! (C) Dhruv Batra

Common kernels Polynomials of degree d Polynomials of degree up to d Gaussian kernel / Radial Basis Function Sigmoid 2 (C) Dhruv Batra Slide Credit: Carlos Guestrin

Plan for Today SVMs Multi-class Neural Networks (C) Dhruv Batra

What about multiple classes? (C) Dhruv Batra Slide Credit: Carlos Guestrin

One against All (Rest) Learn N classifiers: y2 Not y2 y1 Not y1 (C) Dhruv Batra Slide Credit: Carlos Guestrin

Learn N-choose-2 classifiers: One against One Learn N-choose-2 classifiers: y2 y1 y1 y3 y2 y3 (C) Dhruv Batra Slide Credit: Carlos Guestrin

Problems (C) Dhruv Batra Image Credit: Kevin Murphy

Learn 1 classifier: Multiclass SVM Simultaneously learn 3 sets of weights (C) Dhruv Batra Slide Credit: Carlos Guestrin

Learn 1 classifier: Multiclass SVM (C) Dhruv Batra Slide Credit: Carlos Guestrin

Addressing non-linearly separable data – Option 1, non-linear features Choose non-linear features, e.g., Typical linear features: w0 + i wi xi Example of non-linear features: Degree 2 polynomials, w0 + i wi xi + ij wij xi xj Classifier hw(x) still linear in parameters w As easy to learn Data is linearly separable in higher dimensional spaces Express via kernels (C) Dhruv Batra Slide Credit: Carlos Guestrin

Addressing non-linearly separable data – Option 2, non-linear classifier Choose a classifier hw(x) that is non-linear in parameters w, e.g., Decision trees, neural networks,… More general than linear classifiers But, can often be harder to learn (non-convex optimization required) Often very useful (outperforms linear classifiers) In a way, both ideas are related (C) Dhruv Batra Slide Credit: Carlos Guestrin

New Topic: Neural Networks (C) Dhruv Batra

Synonyms Neural Networks Artificial Neural Network (ANN) Feed-forward Networks Multilayer Perceptrons (MLP) Types of ANN Convolutional Nets Autoencoders Recurrent Neural Nets [Back with a new name]: Deep Nets / Deep Learning (C) Dhruv Batra

Biological Neuron (C) Dhruv Batra

Artificial “Neuron” Perceptron (with step function) Logistic Regression (with sigmoid) (C) Dhruv Batra

Sigmoid w0=2, w1=1 w0=0, w1=1 w0=0, w1=0.5 (C) Dhruv Batra Slide Credit: Carlos Guestrin

Many possible response functions Linear Sigmoid Exponential Gaussian …

Limitation A single “neuron” is still a linear decision boundary What to do? (C) Dhruv Batra

(C) Dhruv Batra

Limitation A single “neuron” is still a linear decision boundary What to do? Idea: Stack a bunch of them together! (C) Dhruv Batra

Hidden layer 1-hidden layer feed-forward network: On board (C) Dhruv Batra

Neural Nets Best performers on OCR NetTalk Rick Rashid speaks Mandarin http://yann.lecun.com/exdb/lenet/index.html NetTalk Text to Speech system from 1987 http://youtu.be/tXMaFhO6dIY?t=45m15s Rick Rashid speaks Mandarin http://youtu.be/Nu-nlQqFCKg?t=7m30s (C) Dhruv Batra

Universal Function Approximators Theorem 3-layer network with linear outputs can uniformly approximate any continuous function to arbitrary accuracy, given enough hidden units [Funahashi ’89] (C) Dhruv Batra

Neural Networks Demo http://playground.tensorflow.org/ (C) Dhruv Batra