ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Supervised Learning –General Setup, learning from data –Nearest Neighbour.

Slides:



Advertisements
Similar presentations
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Advertisements

Indian Statistical Institute Kolkata
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –SVM –Multi-class SVMs –Neural Networks –Multi-layer Perceptron Readings:
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
INSTANCE-BASE LEARNING
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
ECE 5984: Introduction to Machine Learning
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS Instance Based Learning1 Instance Based Learning.
Part I: Classification and Bayesian Learning
Crash Course on Machine Learning
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Naïve Bayes Readings: Barber
Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Probability Review –Statistical Estimation (MLE) Readings: Barber 8.1, 8.2.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Neural Networks –Backprop –Modular Design.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Statistical Estimation (MLE, MAP, Bayesian) Readings: Barber 8.6, 8.7.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Learning from observations
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Nearest Neighbour Readings: Barber 14 (kNN)
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Regression Readings: Barber 17.1, 17.2.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Bias-Variance in Machine Learning. Bias-Variance: Outline Underfitting/overfitting: –Why are complex hypotheses bad? Simple example of bias/variance Error.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
CS 2750: Machine Learning Machine Learning Basics + Matlab Tutorial
Data Mining and Decision Support
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Decision/Classification Trees Readings: Murphy ; Hastie 9.2.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
CS 2750: Machine Learning Probability Review Prof. Adriana Kovashka University of Pittsburgh February 29, 2016.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
KNN & Naïve Bayes Hongning Wang
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
CS 2750: Machine Learning Introduction
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Computational Learning Theory
Computational Learning Theory
Machine learning overview
CS639: Data Management for Data Science
Machine Learning – a Probabilistic Perspective
INTRODUCTION TO Machine Learning 3rd Edition
Presentation transcript:

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Supervised Learning –General Setup, learning from data –Nearest Neighbour Readings: Barber 14 (kNN)

Administrativia New class room –GBJ 102 More space –Force-adds approved Scholar –Anybody not have access? –Still have problems reading/submitting? Resolve ASAP. –Please post questions on Scholar Forum. –Please check scholar forums. You might not know you have a doubt. (C) Dhruv Batra2

Administrativia Reading/Material/Pointers –Slides on Scholar –Scanned handwritten notes on Scholar –Readings/Video pointers on Public Website (C) Dhruv Batra3

Administrativia Computer Vision & Machine Learning Reading Group –Meet: Fridays 5-6pm –Reading CV/ML conference papers –Whittemore 654 (C) Dhruv Batra4

Plan for today Supervised/Inductive Learning –Setup –Goal: Classification, Regression –Procedural View –Statistical Estimation View –Loss functions Your first classifier: k-Nearest Neighbour (C) Dhruv Batra5

Types of Learning Supervised learning –Training data includes desired outputs Unsupervised learning –Training data does not include desired outputs Weakly or Semi-supervised learning –Training data includes a few desired outputs Reinforcement learning –Rewards from sequence of actions (C) Dhruv Batra6

Supervised / Inductive Learning Given –examples of a function (x, f(x)) Predict function f(x) for new examples x –Discrete f(x): Classification –Continuous f(x): Regression –f(x) = Probability(x): Probability estimation (C) Dhruv Batra7

Slide Credit: Pedro Domingos, Tom Mitchel, Tom Dietterich

Supervised Learning Input: x (images, text, s…) Output: y(spam or non-spam…) (Unknown) Target Function –f: X  Y(the “true” mapping / reality) Data –(x 1,y 1 ), (x 2,y 2 ), …, (x N,y N ) Model / Hypothesis Class –g: X  Y –y = g(x) = sign(w T x) Learning = Search in hypothesis space –Find best g in model class. (C) Dhruv Batra9

10 Slide Credit: Yaser Abu-Mostapha

Basic Steps of Supervised Learning Set up a supervised learning problem Data collection –Start with training data for which we know the correct outcome provided by a teacher or oracle. Representation –Choose how to represent the data. Modeling –Choose a hypothesis class: H = {g: X  Y} Learning/Estimation –Find best hypothesis you can in the chosen class. Model Selection –Try different models. Picks the best one. (More on this later) If happy stop –Else refine one or more of the above (C) Dhruv Batra11

Learning is hard! No assumptions = No learning (C) Dhruv Batra12

Klingon vs Mlingon Classification Training Data –Klingon: klix, kour, koop –Mlingon: moo, maa, mou Testing Data: kap Which language? Why? (C) Dhruv Batra13

Loss/Error Functions How do we measure performance? Regression: –L 2 error Classification: –#misclassifications –Weighted misclassification via a cost matrix –For 2-class classification: True Positive, False Positive, True Negative, False Negative –For k-class classification: Confusion Matrix (C) Dhruv Batra14

Training vs Testing What do we want? –Good performance (low loss) on training data? –No, Good performance on unseen test data! Training Data: –{ (x 1,y 1 ), (x 2,y 2 ), …, (x N,y N ) } –Given to us for learning f Testing Data –{ x 1, x 2, …, x M } –Used to see if we have learnt anything (C) Dhruv Batra15

Procedural View Training Stage: –Raw Data  x (Feature Extraction) –Training Data { (x,y) }  f (Learning) Testing Stage –Raw Data  x (Feature Extraction) –Test Data x  f(x) (Apply function, Evaluate error) (C) Dhruv Batra16

Statistical Estimation View Probabilities to rescue: –x and y are random variables –D = (x 1,y 1 ), (x 2,y 2 ), …, (x N,y N ) ~ P(X,Y) IID: Independent Identically Distributed –Both training & testing data sampled IID from P(X,Y) –Learn on training set –Have some hope of generalizing to test set (C) Dhruv Batra17

Concepts Capacity –Measure how large hypothesis class H is. –Are all functions allowed? Overfitting –f works well on training data –Works poorly on test data Generalization –The ability to achieve low error on new test data (C) Dhruv Batra18

Guarantees 20 years of research in Learning Theory oversimplified: If you have: –Enough training data D –and H is not too complex –then probably we can generalize to unseen test data (C) Dhruv Batra19

New Topic: Nearest Neighbours (C) Dhruv Batra20Image Credit: Wikipedia

Synonyms Nearest Neighbours k-Nearest Neighbours Member of following families: –Instance-based Learning –Memory-based Learning –Exemplar methods –Non-parametric methods (C) Dhruv Batra21

Nearest Neighbor is an example of…. Instance-based learning Has been around since about To make a prediction, search database for similar datapoints, and fit with the local points. Assumption: Nearby points behavior similarly wrt y x 1 y 1 x 2 y 2 x 3 y 3. x n y n (C) Dhruv Batra22

Instance/Memory-based Learning Four things make a memory based learner: A distance metric How many nearby neighbors to look at? A weighting function (optional) How to fit with the local points? Slide Credit: Carlos Guestrin(C) Dhruv Batra23

1-Nearest Neighbour Four things make a memory based learner: A distance metric –Euclidean (and others) How many nearby neighbors to look at? –1 A weighting function (optional) –unused How to fit with the local points? –Just predict the same output as the nearest neighbour. Slide Credit: Carlos Guestrin(C) Dhruv Batra24

k-Nearest Neighbour Four things make a memory based learner: A distance metric –Euclidean (and others) How many nearby neighbors to look at? –k A weighting function (optional) –unused How to fit with the local points? –Just predict the average output among the nearest neighbours. (C) Dhruv Batra25Slide Credit: Carlos Guestrin

1 vs k Nearest Neighbour (C) Dhruv Batra26Image Credit: Ying Wu

1 vs k Nearest Neighbour (C) Dhruv Batra27Image Credit: Ying Wu

Nearest Neighbour Demo 1 – htmlhttp://cgm.cs.mcgill.ca/~soss/cs644/projects/perrier/Nearest. html Demo 2 – (C) Dhruv Batra28

Spring 2013 Projects Gender Classification from body proportions –Igor Janjic & Daniel Friedman, Juniors (C) Dhruv Batra29

Scene Completion [Hayes & Efros, SIGGRAPH07] (C) Dhruv Batra30

Hays and Efros, SIGGRAPH 2007

… 200 total Hays and Efros, SIGGRAPH 2007

Context Matching Hays and Efros, SIGGRAPH 2007

Graph cut + Poisson blending Hays and Efros, SIGGRAPH 2007