Introduction
Some labeled training examples
Bag-of-words bit vector 4 USENET groups comp rec sci comp, talk Bag-of-words bit vector No single threshold value will serve to unambiguously discriminate between the two categories; The value marked l∗ will lead to the smallest number of errors, on average.
Three types of iris flowers setosa versicolor virginica
red: setosa green: versicolor blue: virginica Which flower is easiest to classify?
Features permuted
Face detection
Regression
Unsupervised learning
Principal components dimensionality reduction 2D linear subspace embedded in 3D 2D representation of the data
25 individual faces
Eigenfaces
Missing data A noisy image with an occluder. An estimate of the underlying pixel intensities, based on a pairwise Markov random field model.
Voronoi Tessellation Euclidean distance Manhattan distance
3-NN
10-nearest neighbors: red class
10-nearest neighbors: blue class
Maximum a posteriori of class labels blue: class 2
Polynomial Regression degree 14 degree 20
Sigmoid or logistic function
Sigmoid or logistic function
Logistic regression Solid black dots are SAT scores. accept? Solid black dots are SAT scores. The open red circles are the predicted probabilities of acceptance. The green crosses denote two students with the same SAT score of 525 logistic regression is a form of classification, not regression! SAT scores
KNN K=1 K=5