Introduction to Machine Learning

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

An Introduction of Support Vector Machine

Support vector machine

Machine learning continued Image source:

An Overview of Machine Learning

CMPUT 466/551 Principal Source: CMU

Chapter 4: Linear Models for Classification

Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.

Lecture 20 Object recognition I

Classification and risk prediction

Pattern Recognition: Readings: Ch 4: , , 4.13

Chapter 5: Linear Discriminant Functions

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.

Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.

Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Data mining and statistical learning - lecture 13 Separating hyperplane.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Feature extraction Feature extraction involves finding features of the segmented image. Usually performed on a binary image produced from.

Brief overview of ideas In this introductory lecture I will show short explanations of basic image processing methods In next lectures we will go into.

Radial Basis Function Networks

Evaluating Classifiers

Chapter 4 Pattern Recognition Concepts continued.

Principles of Pattern Recognition

ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.

Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.

CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.

An Introduction to Support Vector Machine (SVM)

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Introduction to Pattern Recognition (การรู้จํารูปแบบเบื้องต้น)

GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.

Support-Vector Networks C Cortes and V Vapnik (Tue) Computational Models of Intelligence Joon Shik Kim.

6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Adaboost (Adaptive boosting) Jo Yeong-Jun Schapire, Robert E., and Yoram Singer. "Improved boosting algorithms using confidence- rated predictions."

CSSE463: Image Recognition Day 14

CS 9633 Machine Learning Support Vector Machines

Reading: R. Schapire, A brief introduction to boosting

Evaluating Classifiers

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

CSSE463: Image Recognition Day 11

CH 5: Multivariate Methods

Basic machine learning background with Python scikit-learn

Features & Decision regions

Machine Learning Week 1.

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

CSSE463: Image Recognition Day 11

An Introduction to Supervised Learning

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Pattern Recognition and Image Analysis

CSSE463: Image Recognition Day 14

CSSE463: Image Recognition Day 14

CSSE463: Image Recognition Day 14

CSSE463: Image Recognition Day 14

Midterm Exam Closed book, notes, computer Similar to test 1 in format:

CSSE463: Image Recognition Day 14

CSSE463: Image Recognition Day 11

Machine Learning with Clinical Data

CSSE463: Image Recognition Day 11

Linear Discrimination

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Presentation transcript:

Introduction to Machine Learning TUM Praktikum: Sensorgestützte intelligente Umgebungen Oscar Martinez Mozos Robotics Group, University of Zaragoza, Spain

Example classification problem This is the weather prediction of two cities for the next days: This is what classification does, taking some data assign a label to the data Classify the prediction into Alicante or Munich…

Machine Learning Pattern recognition Classification The difference is not clear although both classification and pattern recognition are insedie Machine learning Sometimes the differences are unclear. You can use the three definitions indistinctly. In this course we concentrate on labeling observations -> Pure classification problem.

What does classification mean? We give labels to examples representing something: Vehicles: /'vi:əkəl/ Label:People Label: Vehicles

Classes and instances Class: group of examples sharing some characteristic Instance: just one concrete example Truck Man Car Woman People Vehicles In this talk we concentrate on classes.

Representing the examples Computers understand numbers. Transform examples into numbers. How? → Feature extraction People Vehicles

Representing the examples Computers understand numbers. Transform examples into numbers. How? → Feature extraction f2 f2 f1: width f2: height f1 f1 Height: /hait/ f2 f2 f1 f1

Feature vector Each example is now represented by its set of features x1 = { f1, f2 } x3 = { f1, f2 } x2 = { f1, f2 } Height: /hait/ x4 = { f1, f2 }

Feature space We plot the feature vectors into the feature space which will be used for the classification algorithms. f2 (height) x3 (truck) x1 (man) x2 (woman) x4 (car) Height: /hait/ f1 (width)

General idea of classification Find a mathematical function that separates the classes in the feature space. f2 (height) x3 x1 x2 x4 Height: /hait/ f1 (width)

General idea of classification Find a mathematical function that separates the classes in the feature space. f2 (height) x3 x1 x2 Vehicles People x4 Height: /hait/ f1 (width)

New classifications Once we have the function we try to classify new examples in the feature space f2 (height) x3 x1 f2 x2 x4 Height: /hait/ x5 f1 x5 ={ f1, f2 } f1 (width) x5 is a person

New classifications Classifiers are not perfect: errors in the classification f2 (height) x3 x1 f2 x2 x4 f1 Height: /hait/ x6 x6 ={ f1, f2 } f1 (width) x6 is a person???

Training set Is the set of already labeled examples that is used to calculate the classifier. This process is called training. training set classifier Height: /hait/ People Vehicles

Test set Is the set of new examples that we want to classify (assign a label) training set test set classifier Height: /hait/ People Vehicles labels???

Measuring the quality of a classifier We use the test set to measure the quality of the resulting classifier. Of course the examples of the test set need to be labeled to be able to get statistics. Usually a set of labeled examples is divided into training (~70%) and test (~30%) sets. test Height: /hait/ training

Measuring the quality of a classifier The confusion matrix shows the confusions between classes when classifying the test: People Vehicles 90% 10% 30% 70% Height: /hait/ The diagonals indicate the true classifications. The rest are false classifications. A good classifier has high true classifications and low false classifications.

Detectors A detector is a classifier which try to distinguish only one class of objects from the rest of the world. In this case, the class to detect is called positive and the rest of classes in the world are called negative. Example: people detector. Height: /hait/ positive examples negative examples

Measuring the quality of a classifier In the case of a detector, the confusions have especial names: True positives(TP), True negatives (TN), False positives(FP), False Negatives(FN) True False TP FN FP TN Height: /hait/ Good detector: high trues (T*) and low falses (F*).

Example: Binary classifier based on Bayes Classes: w1, w2 represented by Gaussians. Example x is assigned to class w1 if w1 x f2 w2 Height: /hait/ f1

Decision hyperplane w1 f2 w2 Height: /hait/ f1

Bayesian classification Height: /hait/

Bayesian classification using Gaussians Height: /hait/

Naive Bayes Height: /hait/

Naive Bayes Height: /hait/

Estimating probabilities in Naive Bayes Height: /hait/

Classifying with Naive Bayes Height: /hait/

Linear classifiers Do not assume any model for the classes wi, and just look for a hyperplane dividing the data f2 x3 x1 x2 x4 Height: /hait/ f1

Support Vector Machines Finds the hyperplane which optimizes the distance between the two classes d x2 and x3 all called support vectors f2 x3 x1 x2 d x4 Height: /hait/ f1

Nonlinear case If the classes are nonlinear separable we can apply a transformation in the feature space. Usually a radial function is used in the form: Which is actually a distance with respect to a center. Typical used function in SVM has a Gaussian form: Height: /hait/

Nonlinear case Original feature space x1 x3 Height: /hait/ x2 x4

Nonlinear case Transformed feature space x1 Height: /hait/ x3 x2 x4

Boosting Meta classifier that improves the result of simple classifiers. For each feature fj we create a linear classifier: θ1 Height: /hait/ x1 x2 x3 x4 f1 Where θj is a threshold and pj indicates the direction of the inequality (> or <)

x1 xN Boosting Boosting . . . α1h1 . Σ {1,-1} . . αT hT f1 fM . . . Simple classifiers are combined to form a strong classifier x1 xN . . . α1h1 . Boosting . Σ {1,-1} . αT hT Height: /hait/ f1 fM . . . Binary Strong Classifier H

Height: /hait/

Main idea positive negative θ x1 x2 x3 x4 x5 < positive negative

Main idea positive negative θ x1 x2 x3 x4 x5 < positive negative

Main idea positive negative θ x1 x2 x3 x4 x5 < negative positive

Video AdaBoost Bayes=/beiz/

Example application: People detection in 2D laser data Bayes=/beiz/

Detection in One Level Divide the scan into segments.

Detection in One Level Divide the scan into segments. Select the segments corresponding to people.

Detection in One Level Divide the scan into segments. Learn the segments corresponding to people. Classification problem: Select the features representing a segment. Learn a classifier using Boosting

Features from Segments

Results

Final considerations when learning a classifier Outliers Overfitting Normalization of data Feature selection

Outliers Examples with some features having extreme values Possible reasons: Error during the extraction/calculation of the features Exceptional example in the class Solution: ignore these examples

Overfitting The learned classifier is to close to the training examples. This reduces generalization if new examples are slightly different from the training.

Overfitting The learned classifier is to close to the training examples. This reduces generalization if new examples are slightly different from the training.

Normalization of data The ranges of the features could be very different. It is interesting to normalize them: f1 [1 2] f2 [-100 100] f1, f2 [0, 1]

Feature selection Some features discriminate better than others. f2 x3

Feature selection Projection on f1 x1 x3 x2 x4 f1

Feature selection Projection on f1, easy classification x1 x3 x2 x4 f1

Feature selection Projection on f2 f2 x3 x1 x2 x4 f1

Feature selection Projection on f2, difficult classification f2 x3 x1

End