Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Machine Learning for Category Representation

Similar presentations


Presentation on theme: "Introduction to Machine Learning for Category Representation"— Presentation transcript:

1 Introduction to Machine Learning for Category Representation
Jakob Verbeek October 1, 2010 Course website: Many slides adapted from S. Lazebnik

2 Plan for the course Session 1, October 1 2010
Cordelia Schmid: Introduction Jakob Verbeek: Introduction Machine Learning Session 2, December Jakob Verbeek: Clustering with k-means, mixture of Gaussians Cordelia Schmid: Local invariant features Student presentation 1: Scale and affine invariant interest point detectors, Mikolajczyk, Schmid, IJCV 2004. Session 3, December Cordelia Schmid: Instance-level recognition: efficient search Student presentation 2: Scalable Recognition with a Vocabulary Tree, Nister and Stewenius, CVPR 2006.

3 Plan for the course Session 4, December 17 2010
Jakob Verbeek: Mixture of Gaussians, EM algorithm, Fisher Vector image representation Cordelia Schmid: Bag-of-features models for category-level classification Student presentation 2: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, Lazebnik, Schmid and Ponce, CVPR 2006. Session 5, January Jakob Verbeek: Classification 1: generative and non-parameteric methods Student presentation 4: Large-Scale Image Retrieval with Compressed Fisher Vectors, Perronnin, Liu, Sanchez and Poirier, CVPR 2010. Cordelia Schmid: Category level localization: Sliding window and shape model Student presentation 5: Object Detection with Discriminatively Trained Part Based Models, Felzenszwalb, Girshick, McAllester and Ramanan, PAMI 2010. Session 6, January Jakob Verbeek: Classification 2: discriminative models Student presentation 6: TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation, Guillaumin, Mensink, Verbeek and Schmid, ICCV 2009. Student presentation 7: IM2GPS: estimating geographic information from a single image, Hays and Efros, CVPR 2008.

4 What is machine learning?
According to wikipedia “Learning is acquiring new knowledge, behaviors, skills, values, preferences or understanding, and may involve synthesizing different types of information. The ability to learn is possessed by humans, animals and some machines. Progress over time tends to follow learning curves.” “Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to change behavior based on data, such as from sensor data or databases. A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data. Hence, machine learning is closely related to fields such as statistics, probability theory, data mining, pattern recognition, artificial intelligence, adaptive control, and theoretical computer science.”

5 Why machine learning? Extract knowledge/information from past experience/data Use this knowledge/information to analyze new experiences/data Designing rules to deal with new data by hand can be difficult How to design a rule to decide whether there is a cat in an image? Collecting data can be easier Find images with cats, and ones without them Use machine learning to automatically find such rules. Goal of this course: introduction to machine learning techniques used in current object recognition systems.

6 Steps in machine learning
Problem formulation What is it that we try to predict for new data Data collection “training data”, optionally with “labels” provided by a “teacher”. Representation how the data are encoded into “features” when presented to learning algorithm. Modeling choose the class of models that the learning algorithm will choose from. Estimation find the model that best explains the data: simple and fits well. Validation evaluate the learned model and compare to solution found using other model classes. Deploy the learned model

7 Data Representation Important issue when using learning techniques
Different types of representations Vectorial, graphs, … Homogeneous or heterogeneous, e.g. Images + text Choice of representation may impact the choice of learning algorithm. Domain knowledge can help to design or select good features. The ultimate feature would solve the learning problem… Automatic methods known as “feature selection” methods

8 Probability & Statistics in Learning
Many learning methods formulated as a probabilistic model of data Can deal with uncertainty in the data Missing values for some data can be handled Provides a unified framework to combine many different models for different types of data Statistics are used to analyze the behavior of learning algorithms Does the learning algorithm recover the underlying model given enough data: “consistency” How fast does is do so: rate of convergence Common important assumption Training data sampled from the true data distribution The test data is sampled from the same distribution

9 Different forms of learning
Supervised Classification Regression Unsupervised Clustering Dimension reduction Topic models Density estimation Semi-supervised Combine labeled data wit unlabeled data Active learning Determine the most useful data to label next Many other forms…

10 Supervised learning Training data provided as pairs (x,y)
The goal is to predict an “output” y from an “input” x Output y for each input x is the “supervision” that is given to the learning algorithm. Often obtained by manual “annotation” of the inputs x Can be costly to do Most common examples Classification Regression

11 Classification Predict for input x to which of a finite set of classes the input belongs Training data consists of pairs (x,y) Example: Input x : image Output y : category label, eg “cat” vs. “no cat” Output y : category label, eg “cat” vs. “dog” vs “bird” Learn a “classifier” function f(x) from the input data that outputs the class label or a probability over the class labels. Classification can be binary (two classes), or over a larger number of classes (multi-class). In binary classification we often refer to one class as “positive”, and the other as “negative” Classifiers partition the input space into regions assigned to each class

12 Example of classification
Given: training images and their categories What are the categories of these test images?

13 Regression Similar to classification, but output y has the form of one or more real numbers. Training data consists of pairs (x,y) y can be a vector x might contain both continuous values, as well as discrete Learn a function f(x) that gives an output close to the true y. A “loss” function measures how good a certain function f is In classification we want to minimize nr. of errors using a 0/1 loss: correct classification : loss 0 incorrect classification : loss 1 In regression loss gets bigger as f(x) is further from correct y Squared loss: ( y – f(x) )2

14 Regression: example 2 Training set:
x: face image, processed by detection of characteristic points y: age of that person Learn: function f(x) to predict the age of person Vector of pairwise distances Appearance around points Age estimate f(x)

15 Other forms of supervised learning
Structured prediction tasks: predict several interdependent output variables Word recognition can be easier than recognizing the individual letters Context of other easier letters disambiguates the interpretation of the more difficult letters Word Image

16 Structured Prediction
Estimation of body poses: part locations interdependent Data association problem: assigning edges body parts model Source: D. Ramanan

17 Other supervised learning scenarios
Metric Learning: learn distance metric to compare objects Training data Pairs of images: x1, x2 Label: +1: same class, or -1 different classes Decide if a new pair of images belong to the same class Source: X. Sui, K. Grauman

18 Learning face similarities
Training data: pairs of faces labeled as same/different Similarity measure should ignore: pose, expression, … Some examples of face-pairs recognized as the same [Guillaumin, Verbeek, Schmid, ICCV 2009]

19 Unsupervised learning
Input data x given without desired output variables y. Goal is to learn something about the “structure” of the data Examples include Clustering Dimensionality reduction Density estimation Not always clear how to measure success of unsupervised learning Probabilistic models can be evaluated by computing likelihood assigned to other data sampled from the same distribution Clustering can be evaluated by learning on labeled data, measure how clusters correspond to classes, but classes may not define most apparent clusters Dimensionality reduction can be evaluated by reconstruction errors

20 Clustering Finding a group structure in the data
Data in one cluster similar to each other Data in different clusters dissimilar Map each data point to a discrete cluster index “flat” methods find k groups (k known, or automatically set) “hierarchical” methods define a tree structure over the data

21 Clustering example Metric learning from training face-pairs labeled as same/different Clustering of other face (different people) produced using the learned similarity [Guillaumin, Verbeek, Schmid, ICCV 2009]

22 Dimension reduction Finding a lower dimensional representation of the data Useful for compression, visualization, noise reduction Unlike regression: target values not given

23 Dimension reduction High dimensional input: black image with moving white square Representation: 20x20 pixel values collected in 400d vector x 3D visualization: linear projection of 400d space, images with white square in neighboring locations are connected for visualization

24 Dimension reduction High dimensional input: 20x28 pixel grey valued images of a face 2D visualization: automatically found, captures pose + expression

25 Density estimation Fit probability density on the training data
Can be combination of discrete and continuous data Good fit: high likelihood on training data Smooth function: generalizes to new data Can be used to detect anomalies Many forms of (un)supervised learning can be understood as doing density estimation Clustering Dimension reduction Classification

26 Different forms of learning
Supervised Classification Regression Unsupervised Clustering Dimension reduction Density estimation Semi-supervised Combine labeled data wit unlabeled data Active learning Determine the most useful data to label next Many other forms…

27 Semi-supervised learning
Learn from supervised and unsupervised data Labeled data often expensive to obtain Unlabeled data often cheap to obtain Why should this work? Unsupervised data used to learn about distribution on inputs x Supervised data used to learn about input x given output y ?

28 Example of semi-supervised learning
Classification of newsgroup articles into 20 different classes: politics, sports, education,… Use EM to iteratively estimate class label of unlabeled data and update the model Helps when few labeled examples are available [Nigam et al., Machine Learning, Vol. 39, pp 103—134, 2000]

29 Active learning The learning algorithm can choose its own training examples, or ask a “teacher” for an answer on selected inputs Labeling of most uncertain images Labeling of images that maximally reduce uncertainty in model parameters S. Vijayanarasimhan and K. Grauman, “Cost-Sensitive Active Visual Category Learning,” 2009 

30 Generalization The goal is to predict as well as possible on new data, not seeen during training, but sampled from the same underlying distribution. To learn models we only have access to the (labeled) training set What makes generalization possible? Inductive bias: set of assumptions a learner uses to predict the target value for previously unseen inputs Use domain knowledge to choose good features Use domain knowledge to design good models (and learn their parameters from training data) Types of inductive bias Occam’s razor: simple models to be preferred over complex ones, unless invalidated by (training) data Similarity/continuity bias: similar inputs should have similar outputs

31 Achieving good generalization
Consideration 1: Bias How well does your model fit the observed data? It may be a good idea to accept some fitting error, because it may be due to noise or other “accidental” characteristics of one particular training set Consideration 2: Variance How robust is the model to the selection of a particular training set? To put it differently, if we learn models on two different training sets, how consistent will the models be?

32 Bias/variance tradeoff
Models with too many parameters may fit the training data well (low bias), but are sensitive to choice of training set (high variance)

33 Bias/variance tradeoff
Models with too many parameters may fit the training data well (low bias), but are sensitive to choice of training set (high variance) Models with too few parameters may not fit the data well (high bias) but are consistent across different training sets (low variance) 2

34 Bias/variance tradeoff
Models with too many parameters may fit the training data well (low bias), but are sensitive to choice of training set (high variance) Generalization error is due to overfitting Models with too few parameters may not fit the data well (high bias) but are consistent across different training sets (low variance) Generalization error is due to underfitting 2

35 Underfitting and overfitting
How to recognize underfitting? High training error and high test error How to deal with underfitting? Find a more complex model How to recognize overfitting? Low training error, but high test error How to deal with overfitting? Get more training data Decrease the number of parameters in your model Regularization: penalize certain parts of the parameter space or introduce additional constraints to deal with a potentially ill-posed problem

36 Methodology Distinction between training and testing is crucial
Correct performance on training set is just memorization! Not enough to perform well on new test data Strictly speaking, the researcher should never look at the test data when designing the system Generalization performance should be evaluated on a held-out or validation set Raises some troubling issues for learning “benchmarks” Source: R. Parr

37 Plan for the course Session 1, October 1 2010
Cordelia Schmid: Introduction Jakob Verbeek: Introduction Machine Learning Session 2, December Jakob Verbeek: Clustering with k-means, mixture of Gaussians Cordelia Schmid: Local invariant features Student presentation 1: Scale and affine invariant interest point detectors, Mikolajczyk, Schmid, IJCV 2004. Session 3, December Cordelia Schmid: Instance-level recognition: efficient search Student presentation 2: Scalable Recognition with a Vocabulary Tree, Nister and Stewenius, CVPR 2006. Course website:


Download ppt "Introduction to Machine Learning for Category Representation"

Similar presentations


Ads by Google