ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –SVM –Multi-class SVMs –Neural Networks –Multi-layer Perceptron Readings:

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

ECG Signal processing (2)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Classification / Regression Support Vector Machines

Crash Course on Machine Learning Part III
An Introduction of Support Vector Machine
Support Vector Machines
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
SVMs Reprised. Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...
Supervised Learning Recap
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Lecture 14 – Neural Networks
Support Vector Machines (and Kernel Methods in general)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.
CS 4700: Foundations of Artificial Intelligence
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
An Introduction to Support Vector Machines Martin Law.
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Machine Learning Queens College Lecture 13: SVM Again.
This week: overview on pattern recognition (related to machine learning)
ECE 5984: Introduction to Machine Learning
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Matlab Matlab Sigmoid Sigmoid Perceptron Perceptron Linear Linear Training Training Small, Round Blue-Cell Tumor Classification Example Small, Round Blue-Cell.
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Neural Networks –Backprop –Modular Design.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Nearest Neighbour Readings: Barber 14 (kNN)
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
CS621 : Artificial Intelligence
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Regression Readings: Barber 17.1, 17.2.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
CSSE463: Image Recognition Day 15 Announcements: Announcements: Lab 5 posted, due Weds, Jan 13. Lab 5 posted, due Weds, Jan 13. Sunset detector posted,
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Decision/Classification Trees Readings: Murphy ; Hastie 9.2.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
Support Vector Machines Optimization objective Machine Learning.
CSSE463: Image Recognition Day 15 Today: Today: Your feedback: Your feedback: Projects/labs reinforce theory; interesting examples, topics, presentation;
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
Neural networks and support vector machines
CSSE463: Image Recognition Day 14
PREDICT 422: Practical Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 15
Linear Discrimination
Presentation transcript:

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –SVM –Multi-class SVMs –Neural Networks –Multi-layer Perceptron Readings: Barber 17.5, Murphy 16.5

HW2 Graded Mean 66/61 = 108% –Min: 47 –Max: 75 (C) Dhruv Batra2

Administrativia HW3 –Due: in 2 weeks –You will implement primal & dual SVMs –Kaggle competition: Higgs Boson Signal vs Background classification – learning-hw3https://inclass.kaggle.com/c/2015-Spring-vt-ece-machine- learning-hw3 – (C) Dhruv Batra3

Administrativia (C) Dhruv Batra4

Administrativia Project Mid-Sem Spotlight Presentations –Friday: 5-7pm, Whittemore 654 –5 slides (recommended) –4 minute time (STRICT) min Q&A –Tell the class what you’re working on –Any results yet? –Problems faced? –Upload slides on Scholar (C) Dhruv Batra5

Recap of Last Time (C) Dhruv Batra6

Linear classifiers – Which line is better? w.x =  j w (j) x (j) 7

Slide Credit: Carlos Guestrin8(C) Dhruv Batra Dual SVM derivation (1) – the linearly separable case

Slide Credit: Carlos Guestrin9(C) Dhruv Batra Dual SVM derivation (1) – the linearly separable case

Slide Credit: Carlos Guestrin10(C) Dhruv Batra Dual SVM formulation – the linearly separable case

Slide Credit: Carlos Guestrin11(C) Dhruv Batra Dual SVM formulation – the non-separable case

Slide Credit: Carlos Guestrin12(C) Dhruv Batra Dual SVM formulation – the non-separable case

Builds character! Exposes structure about the problem There are some quadratic programming algorithms that can solve the dual faster than the primal The “kernel trick”!!! Slide Credit: Carlos Guestrin13(C) Dhruv Batra Why did we learn about the dual SVM?

Dual SVM interpretation: Sparsity Slide Credit: Carlos Guestrin14(C) Dhruv Batra w.x + b = +1 w.x + b = -1 w.x + b = 0 margin 2 

15(C) Dhruv Batra Dual formulation only depends on dot-products, not on w!

Common kernels Polynomials of degree d Polynomials of degree up to d Gaussian kernel / Radial Basis Function Sigmoid Slide Credit: Carlos Guestrin16(C) Dhruv Batra 2

Plan for Today SVMs –Multi-class Neural Networks (C) Dhruv Batra17

What about multiple classes? Slide Credit: Carlos Guestrin18(C) Dhruv Batra

One against All (Rest) Learn N classifiers: Slide Credit: Carlos Guestrin19(C) Dhruv Batra y1 Not y1 y2 Not y2 Noty3 y3

One against One Learn N-choose-2 classifiers: Slide Credit: Carlos Guestrin20(C) Dhruv Batra y2 y1 y2 y3 y1 y3

Problems (C) Dhruv Batra21Image Credit: Kevin Murphy

Learn 1 classifier: Multiclass SVM Simultaneously learn 3 sets of weights Slide Credit: Carlos Guestrin22(C) Dhruv Batra

Learn 1 classifier: Multiclass SVM Slide Credit: Carlos Guestrin23(C) Dhruv Batra

Not linearly separable data Some datasets are not linearly separable! – M.htmlhttp:// M.html

Addressing non-linearly separable data – Option 1, non-linear features Slide Credit: Carlos Guestrin(C) Dhruv Batra25 Choose non-linear features, e.g., –Typical linear features: w 0 +  i w i x i –Example of non-linear features: Degree 2 polynomials, w 0 +  i w i x i +  ij w ij x i x j Classifier h w (x) still linear in parameters w –As easy to learn –Data is linearly separable in higher dimensional spaces –Express via kernels

Addressing non-linearly separable data – Option 2, non-linear classifier Slide Credit: Carlos Guestrin(C) Dhruv Batra26 Choose a classifier h w (x) that is non-linear in parameters w, e.g., –Decision trees, neural networks,… More general than linear classifiers But, can often be harder to learn (non-convex optimization required) Often very useful (outperforms linear classifiers) In a way, both ideas are related

New Topic: Neural Networks (C) Dhruv Batra27

Synonyms Neural Networks Artificial Neural Network (ANN) Feed-forward Networks Multilayer Perceptrons (MLP) Types of ANN –Convolutional Nets –Autoencoders –Recurrent Neural Nets [Back with a new name]: Deep Nets / Deep Learning (C) Dhruv Batra28

Biological Neuron (C) Dhruv Batra29

Artificial “Neuron” Perceptron (with step function) Logistic Regression (with sigmoid) (C) Dhruv Batra30

Sigmoid w 0 =0, w 1 =1w 0 =2, w 1 =1w 0 =0, w 1 =0.5 Slide Credit: Carlos Guestrin(C) Dhruv Batra31

Many possible response functions Linear Sigmoid Exponential Gaussian …

Limitation A single “neuron” is still a linear decision boundary What to do? (C) Dhruv Batra33

(C) Dhruv Batra34

Limitation A single “neuron” is still a linear decision boundary What to do? Idea: Stack a bunch of them together! (C) Dhruv Batra35

Hidden layer 1-hidden layer (or 3-layer network): –On board (C) Dhruv Batra36

Neural Nets Best performers on OCR – NetTalk –Text to Speech system from 1987 – Rick Rashid speaks Mandarin – (C) Dhruv Batra37

Universal Function Approximators Theorem –3-layer network with linear outputs can uniformly approximate any continuous function to arbitrary accuracy, given enough hidden units [Funahashi ’89] (C) Dhruv Batra38

Neural Networks Demo – pprox.htmlhttp://neuron.eng.wayne.edu/bpFunctionApprox/bpFunctionA pprox.html (C) Dhruv Batra39