Download presentation
Presentation is loading. Please wait.
Published byJessie Stevenson Modified over 9 years ago
1
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning
2
Supervised Learning I Example: Imagine you want to classify versus Data: 100 monkey images and 200 human images with labels what is what. where x represents the greyscale of the image pixels and y=0 means “monkey” while y=1 means “human”. Task: Here is a new image: monkey or human?
3
1 nearest neighbors (your first ML algorithm!) Idea: 1.Find the picture in the database which is closest to your query image. 2.Check its label. 3.Declare the class of your query image to be the same as that of the closest picture. query closest image
4
1NN Decision Surface decision curve
5
Distance Metric How do we measure what it means to be “close”? Depending on the problem we should choose an appropriate “distance” metric (or more generally, a (dis)similarity measure) -Demo: http://www.comp.lancs.ac.uk/~kristof/research/notes/nearb/cluster.htmlhttp://www.comp.lancs.ac.uk/~kristof/research/notes/nearb/cluster.html -Matlab Demo.
6
Remarks on NN methods We only need to construct a classifier that works locally for each query. Hence: We don’t need to construct a classifier everywhere in space. Classifying is done at query time. This can be computationally taxing at a time where you might want to be fast. Memory inefficient. Curse of dimensionality: imagine many features are irrelevant / noisy distances are always large. Very flexible, not many prior assumptions. k-NN variants robust against “bad examples”.
7
Non-parametric Methods Non-parametric methods keep all the data cases/examples in memory. A better name is: “instance-based” learning As the data-set grows, the complexity of the decision surface grows. Sometimes, non-parametric methods have some parameters to tune... “Assumption light”
8
In unsupervised learning we are concerned with finding the structure in the data. For instance, we may want to learn the probability distribution from which the data was sampled. Data now looks like (no labels): A nonparametric estimate of the probability distribution can be obtained through Parzen-windowing (or kernel estimation). Unsupervised learning
9
First define a “kernel”. This is some function that “smoothes” a data-point and satisfies: We estimate the distribution as: Example: Gaussian kernel (RBF kernel) When you add these up with weights 1/N you get: Parzen Windowing
10
Example: Distribution of Grades
11
Overfitting Fundamental issue with Parzen windowing: How do you choose the width of the kernel ? To broad: you loose the details, to small, you see too much detail. Imagine you left out part of the data-points when you produced your estimate for the distribution. A good fit is one that would have high probability for the left out data-points. So you can tune on the left out data! This is called (cross) validation. This issue is fundamental to machine learning: how much fitting is just right.
12
Overfitting in a Picture error on hold-out data error on training data
13
Generalization Consider the following regression problem: Predict the real value on the y-axis from the real value on the x-axis. You are given 6 examples: {Xi,Yi}. What is the y-value for a new query point X* ? X*
14
Generalization
16
which curve is best?
17
Ockham’s razor: prefer the simplest hypothesis consistent with data. Generalization
18
Your Take-Home Message Learning is concerned with accurate prediction of future data, not accurate prediction of training data. (The single most important sentence you will see in the course)
19
Homework Read chapter 4, pages 161-165, 168-17. Download netflix data and plot: – ratings as a function of time. – variance in ratings as a function of the mean of movie ratings.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.