CS 2750: Machine Learning Machine Learning Basics + Matlab Tutorial

Slides:



Advertisements
Similar presentations
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Advertisements

Imbalanced data David Kauchak CS 451 – Fall 2013.
Machine learning continued Image source:
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 2: Review Part 2.
Indian Statistical Institute Kolkata
What is Statistical Modeling
Recognition: A machine learning approach
Amit Sethi, EEE, IIT Cepstrum, Oct 16,
EECS 349 Machine Learning Instructor: Doug Downey Note: slides adapted from Pedro Domingos, University of Washington, CSE
CSE 546 Data Mining Machine Learning Instructor: Pedro Domingos.
Three kinds of learning
Part I: Classification and Bayesian Learning
Introduction to machine learning
Crash Course on Machine Learning
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Lecture 2: Introduction to Machine Learning
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
CpSc 810: Machine Learning Design a learning system.
Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Machine Learning.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Statistical Estimation (MLE, MAP, Bayesian) Readings: Barber 8.6, 8.7.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Supervised Learning –General Setup, learning from data –Nearest Neighbour.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
Instructor: Pedro Domingos
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Machine Learning Introduction. Class Info Office Hours –Monday:11:30 – 1:00 –Wednesday:10:00 – 1:00 –Thursday:11:30 – 1:00 Course Text –Tom Mitchell:
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
CS 445/545 Machine Learning Winter, 2014 See syllabus at
CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.
Data Mining and Decision Support
1 Introduction to Machine Learning Chapter 1. cont.
CS 2750: Machine Learning Introduction Prof. Adriana Kovashka University of Pittsburgh January 6, 2016.
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.
Introduction Machine Learning: Chapter 1. Contents Types of learning Applications of machine learning Disciplines related with machine learning Well-posed.
CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016.
CS 2750: Machine Learning Linear Regression Prof. Adriana Kovashka University of Pittsburgh February 10, 2016.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
CS 2750: Machine Learning Probability Review Prof. Adriana Kovashka University of Pittsburgh February 29, 2016.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 1-2:30p Instructor: Christoph F. Eick Classroom:AH301
語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山 助教: 熊信寬
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Who am I? Work in Probabilistic Machine Learning Like to teach 
Instructor: Pedro Domingos
CS 2750: Machine Learning Introduction
The Elements of Statistical Learning
ECE 5424: Introduction to Machine Learning
Classification with Perceptrons Reading:
Machine Learning for dotNET Developer Bahrudin Hrnjica, MVP
CSEP 546 Data Mining Machine Learning
CS 2750: Machine Learning Linear Regression
CSEP 546 Data Mining Machine Learning
CS 2770: Computer Vision Intro to Visual Recognition
CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off
CSEP 546 Data Mining Machine Learning
Overview of Machine Learning
Lecture 6: Introduction to Machine Learning
CS 1675: Intro to Machine Learning Introduction
CS639: Data Management for Data Science
Presentation transcript:

CS 2750: Machine Learning Machine Learning Basics + Matlab Tutorial Prof. Adriana Kovashka University of Pittsburgh January 11, 2016

Course Info Website: http://people.cs.pitt.edu/~kovashka/cs2750 Instructor: Adriana Kovashka (kovashka@cs.pitt.edu)  Use "CS2750" at the beginning of your Subject Office: Sennott Square 5325 Office hours: Mon/Wed, 12:15pm-1:15pm Textbook: Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006  And other readings Plan + website (4 min) Introductions of students (10 min) Intro to visual recognition (12 min) Topics (12 min) Structure of course (14 min) Prereqs test (15 min) Questions + extra (8 min)

TA Changsheng Liu Office: Sennott Square 6805 Office hours: Wednesday, 5-6pm, and Thursday, 12:30-2:30pm

Should I take this class? It will be a lot of work! But you will learn a lot Some parts will be hard and require that you pay close attention! But I will have periodic ungraded pop quizzes to see how you’re doing I will also pick on students randomly to answer questions Use instructor’s and TA’s office hours!!!

Should I take this class? Homework 1 should be fairly representative of the other homeworks I think it will take you 20-25 hours Projects will take at least three times the work So overall you will spend 10+ hours working on this class per week Drop deadline is January 19

Plan for Today Machine learning notation, tasks, and challenges HW1 overview Matlab tutorial

ML in a Nutshell y = f(x) Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set Testing: apply f to a never before seen test example x and output the predicted value y = f(x) output prediction function image feature Slide credit: L. Lazebnik

f( ) = “tomato” f( ) = “cow” ML in a Nutshell Apply a prediction function to a feature representation of the image to get the desired output: f( ) = “apple” f( ) = “tomato” f( ) = “cow” Slide credit: L. Lazebnik

ML in a Nutshell Training Testing Training Labels Training Images Features Training Learned model Testing Features Learned model Prediction Test Image Slide credit: D. Hoiem and L. Lazebnik

Training vs Testing vs Validation What do we want? High accuracy on training data? No, high accuracy on unseen/new/test data! Why is this tricky? Training data Features (x) and labels (y) used to learn mapping f Test data Features used to make a prediction Labels only used to see how well we’ve learned f!!! Validation data Held-out set of the training data Can use both features and labels to see how well we are doing

Statistical Estimation View Probabilities to rescue: x and y are random variables D = (x1,y1), (x2,y2), …, (xN,yN) ~ P(X,Y) IID: Independent and Identically Distributed Both training & testing data sampled IID from P(X,Y) Learn on training set Have some hope of generalizing to test set Slide credit: Dhruv Batra

Example of Solving a ML Problem vs Slide credit: Dhruv Batra

Intuition Spam Emails Regular Emails a lot of words like “money” “free” “bank account” Regular Emails word usage pattern is more spread out Slide credit: Dhruv Batra, Fei Sha

Simple strategy: Let’s count! This is X This is Y Slide credit: Dhruv Batra, Fei Sha

… and weigh these counts (final step) Why these words? Where do the weights come from? Adapted from Dhruv Batra, Fei Sha

Klingon vs Mlingon Classification Training Data Klingon: klix, kour, koop Mlingon: moo, maa, mou Testing Data: kap Which language? Why? Slide credit: Dhruv Batra

Why not just hand-code these weights? We’re letting the data do the work rather than develop hand-code classification rules The machine is learning to program itself But there are challenges…

“I saw her duck” Slide credit: Dhruv Batra, figure credit: Liang Huang

“I saw her duck” Slide credit: Dhruv Batra, figure credit: Liang Huang

“I saw her duck” Slide credit: Dhruv Batra, figure credit: Liang Huang

“I saw her duck with a telescope…” Slide credit: Dhruv Batra, figure credit: Liang Huang

What humans see Slide credit: Larry Zitnick

What computers see Slide credit: Larry Zitnick 239 240 225 206 185 188 243 239 240 225 206 185 188 218 211 216 242 110 67 31 34 152 213 208 221 123 58 94 82 132 77 108 215 235 217 115 212 236 247 139 91 209 233 131 222 219 226 196 114 74 214 232 116 150 69 56 52 201 228 223 182 186 184 179 159 93 154 133 129 81 175 252 241 238 230 128 172 138 65 63 234 249 245 237 143 59 78 10 255 248 251 193 55 33 144 253 161 149 109 47 156 190 107 39 102 73 17 7 51 137 23 32 148 168 203 43 27 12 8 26 160 22 19 35 24 Slide credit: Larry Zitnick

Challenges Some challenges: ambiguity and context Machines take data representations too literally Humans are much better than machines at generalization, which is needed since test data will rarely look exactly like the training data

Challenges Why might it be hard to: Predict if a viewer will like a movie? Recognize cars in images? Translate between languages?

The Time is Ripe to Study ML Many basic effective and efficient algorithms available. Large amounts of on-line data available. Large amounts of computational resources available. Slide credit: Ray Mooney

Where does ML fit in? Slide credit: Dhruv Batra, Fei Sha

Defining the Learning Task Improve on task, T, with respect to performance metric, P, based on experience, E. T: Playing checkers P: Percentage of games won against an arbitrary opponent E: Playing practice games against itself T: Recognizing hand-written words P: Percentage of words correctly classified E: Database of human-labeled images of handwritten words T: Driving on four-lane highways using vision sensors P: Average distance traveled before a human-judged error E: A sequence of images and steering commands recorded while observing a human driver. T: Categorize email messages as spam or legitimate P: Percentage of email messages correctly classified E: Database of emails, some with human-given labels Slide credit: Ray Mooney

ML in a Nutshell Every machine learning algorithm has: You also need: Representation / model class Evaluation / objective function Optimization You also need: A way to represent your data Adapted from Pedro Domingos

Representation / model class Decision trees Sets of rules / Logic programs Instances Graphical models (Bayes/Markov nets) Neural networks Support vector machines Model ensembles Etc. Slide credit: Pedro Domingos

Evaluation / objective function Accuracy Precision and recall Squared error Likelihood Posterior probability Cost / Utility Margin Entropy K-L divergence Etc. Slide credit: Pedro Domingos

Optimization Discrete / combinatorial optimization E.g. graph algorithms Continuous optimization E.g. linear programming Adapted from Dhruv Batra, image from Wikipedia

Data Representation Let’s brainstorm what our “X” should be for various “Y” prediction tasks…

Types of Learning Supervised learning Unsupervised learning Training data includes desired outputs Unsupervised learning Training data does not include desired outputs Weakly or Semi-supervised learning Training data includes a few desired outputs Reinforcement learning Rewards from sequence of actions Slide credit: Dhruv Batra

Tasks Supervised Learning Unsupervised Learning x Classification y x Discrete x Regression y Continuous Unsupervised Learning x Clustering y Discrete ID Dimensionality Reduction x y Continuous Slide credit: Dhruv Batra

Slide credit: Pedro Domingos, Tom Mitchell, Tom Dietterich

Measuring Performance If y is discrete: Accuracy: # correctly classified / # all test examples Weighted misclassification via a confusion matrix In case of only two classes: True Positive, False Positive, True Negative, False Negative Diagnosis example

Measuring Performance If y is discrete: Precision/recall Precision = # predicted true pos / # predicted pos Recall = # predicted true pos / # true pos F-measure = 2PR / (P + R) Want evaluation metric to be in some range, e.g. [0 1] 0 = worst possible classifier, 1 = best possible classifier

Precision / Recall / F-measure True positives (images that contain people) True negatives (images that do not contain people) Predicted positives (images predicted to contain people) Predicted negatives (images predicted not to contain people) Precision = 2 / 5 = 0.4 Recall = 2 / 4 = 0.5 F-measure = 2*0.4*0.5 / 0.4+0.5 = 0.44 Accuracy: 5 / 10 = 0.5

Measuring Performance If y is continuous: L2 error between true and predicted y

Homework 1 http://people.cs.pitt.edu/~kovashka/cs2750/hw1.htm

Matlab Tutorial http://cs.brown.edu/courses/cs143/2011/docs/matlab-tutorial/ https://people.cs.pitt.edu/~milos/courses/cs2750/Tutorial/ http://www.math.udel.edu/~braun/M349/Matlab_probs2.pdf

Homework Get started on Homework 1 Do the assigned readings! Bishop Ch. 1, Sec. 3.2 Szeliski Sec. 4.1, Grauman/Leibe Ch. 1-3 Review both linked Matlab tutorials

Matlab Exercise http://www.facstaff.bucknell.edu/maneval/help211/basicexercises.html Do Problems 1-8, 12 Most also have solutions Ask the TA if you have any problems

Next time The bias-variance trade-off Data representations in computer vision