Pattern Recognition Lecture 1 - Overview Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007
J. M. Rehg © Goal Learn a function that maps features x to predictions C, given a dataset D = {C k, x k } Elements of the problem Knowledge about data-generating process and task Design of feature space for x based on data Decision rule f : x C’ Loss function L(C’,C) for measuring quality of prediction Learning algorithm for computing f from D Empirical measurement of classifier performance Visualization of classifier performance and data properties Computational cost of classification (and learning)
J. M. Rehg © Example: Skin Detection in Web Images Images containing people are interesting Most images with people in them contain visible skin Skin can be detected in images based on its color. Goal: Automatic detection of “adult” images DEC Cambridge Research Lab, 1998
J. M. Rehg © Physics of Skin Color Skin color is due to melanin and hemoglobin. Hue (normalized color) of skin is largely invariant across the human population. Saturation of skin color varies with concentration of melanin and hemoglobin (e.g. lips). Detailed color models exist for melanoma identification using calibrated illumination. But observed skin color will be effected by lighting, image acquisition device, etc.
J. M. Rehg © Skin Classification Via Statistical Inference Joint work with Michael Jones at DEC CRL M. Jones and J. M. Rehg, “Statistical Color Models with Application to Skin Detection”, IJCV, Model color distribution in skin and nonskin cases Estimate p(RGB | skin) and p(RBG | nonskin) Decision rule: f : RGB {“skin”, “nonskin”} Pixel is “skin” when p(skin | RGB) > p(nonskin | RGB) Data set D 12,000 example photos sampled from a 2 million image set obtained from an AltaVista web crawl 1 billion hand-labeled pixels in training set
J. M. Rehg © Some Example Photos Example skin images Example non-skin images
J. M. Rehg © Manually Labeling Skin and Nonskin Labeled skin pixels are segmented by hand: Labeled nonskin pixels are easily obtained from images without people
J. M. Rehg © Skin Color Modeling Using Histograms Feature space design Standard RGB color space - easily available, efficient Histogram probability model P(RBG | skin)P(RBG | nonskin)
J. M. Rehg © Skin Color Histogram Segmented skin regions produce a histogram in RGB space showing the distribution of skin colors. Three views of the same skin histogram are shown:
J. M. Rehg © Non-Skin Color Histogram Three views of the same non-skin histogram showing the distribution of non-skin colors:
J. M. Rehg © Decision Rule Class labels: “skin” C=1 “nonskin” C=0 Equivalently: > < f =1 f = 0
J. M. Rehg © Likelihood Ratio Test > < f =1 f = 0 > < f =1 f = 0 The ratio of class priors is usually treated as a parameter (threshold) which is adjusted to trade-off between types of errors
J. M. Rehg © Skin Classifier Architecture Input Image P(RBG | skin) P(RBG | nonskin) > < f =1 f = 0 Output “skin”
J. M. Rehg © Measuring Classifier Quality Given a testing set T = {C j, x j } that was not used for training, apply the classifier to obtain predictions Testing set partitioned into four categories Indicator function for boolean B:
J. M. Rehg © Measuring Classifier Quality A standard convention is to report Fraction of positive examples classified correctly Fraction of negative examples classified incorrectly
J. M. Rehg © Trading Off Types of Errors Consider Classifier always outputs f = 1 regardless of input All positive examples correct, all negative examples incorrect d R = 1 and f R = 1 Consider Classifier always outputs f = 0 regardless of input All positive examples incorrect, all negative examples correct d R = 0 and f R = 0 > < f =1 f = 0
J. M. Rehg © ROC Curve Detection Rate d R False Positive Rate f R Each sample point on ROC curve is obtained by scoring T with a particular Generating ROC curve does not require classifier retraining
J. M. Rehg © ROC Curve Detection Rate d R False Positive Rate f R A fair way to com- pare two classifiers is to show their ROC curves for the same T ROC stands for “Receiver Oper- ating Characteristic” and was originally developed for tuning radar receivers
J. M. Rehg © Scalar Measures of Classifier Performance Detection Rate d R False Positive Rate f R Equal Error Rate Area under the ROC curve
J. M. Rehg © ROC Curve Summary ROC curve gives “application independent” measure of classifier performance Performance reports based on a single point on the ROC curve are generally meaningless Several possible scalar “summaries” Area under the ROC curve Equal error rate Compute ROC by iterating over the values of Compute the detection and false positive rates on the testing set for each value of and plot the resulting point.
J. M. Rehg © Example Results Skin examples: Nonskin examples:
J. M. Rehg © Skin Detector Performance Extremely good results considering only color of single pixel is being used. Best published results (at the time) One of the largest datasets used in a vision model (nearly 1 billion labeled pixels). False Positive Rate f R Detection Rate d R But why does it work so well ???
J. M. Rehg © Analyzing the color distributions 2D color histogram for photos on the web projected onto a slice through the 3D histogram: Surface plot of the 2D histogram: Why does it work so well?
J. M. Rehg © Contour Plots Full color model (includes skin and non-skin):
J. M. Rehg © Contour Plots Continued Non-skin model:Skin model: Skin color distribution is surprisingly well-separated from the background distribution of color in web images
J. M. Rehg © Comparison to Mixture Models Both histogram and mixture models are examples of graphical models. Bin size controls generalization of histogram Size 32 gave the best performance Mixture models have often been used for skin color modeling in small sample size cases. We found histograms to give better accuracy They are also much faster to evaluate
J. M. Rehg © Adult Image Detection Skin Detector Image Observation: Adult images usually contain large areas of skin Output of skin detector can be used to create feature vector for an image Adult image classifier trained on feature vectors Exploring joint image/text analysis Skin Features Neural net Classifier Text Features Classifier HTML Adult?
J. M. Rehg © Adult Detection Examples These images are all correctly classified as adult images.
J. M. Rehg © More Examples Classified as not adult Classified as not adult Incorrectly classified as adult - closups of faces are a failure mode due to large amounts of skin
J. M. Rehg © Performance of Adult Image Detector
J. M. Rehg © Adult Image Detection Results Two sets of html pages collected. Crawl A: Adult sites (2365 pages, images). Crawl B: Non-adult sites (2692 pages, images). image-based text-based combined “OR” detector detector detector % of adult images rated correctly (set A): 85.8% 84.9% 93.9% % of non-adult images rated correctly (set B): 92.5% 98.9% 92.0%
J. M. Rehg © Computational Cost Analysis General image properties Average width = 301 pixels Average height = 269 pixels Time to read an image =.078 sec Skin Color Based Adult Image Detector Time to classify =.043 sec Implies 23 images/sec throughput
J. M. Rehg © Person Detection From Skin Detection Skin detector gives evidence for the presence of people, but has false positives and negatives. Use skin detector output for person detection Construct feature vector from detected skin pixels. Classify image into person/non-person Features Percent of pixels in image detected as skin Average probability of skin pixels Largest connected component of skin
J. M. Rehg © Person Detection Example Results Person No Person
J. M. Rehg © Person Detection Results Continued No Person Person
J. M. Rehg © Person Detector Performance Two classifiers were built using these measures on 1400 training images. A test set of 456 images was used to evaluate the classifier. Classifier Performance Training Testing examples examples Neural network 76.2% 74.3% Decision tree 75.8% 72.1%
J. M. Rehg © Applications of Person Detection “Person Detected” tag for media search Skin and face analysis tag photos and video frames with people in them. Improved ranking of query returns: Photos of people appear at top of list. Image similarity measure Photos with people in them are grouped together. Can be used during query refinement.
J. M. Rehg © Summary of Skin Detection Example What are the factors that made skin detection successful? Problem which seemed hard a priori but turned out to be easy (classes surprisingly separable). Low dimensionality makes adequate data collection feasible and classifier design a non-issue. Intrinisic dimensions are clear a priori – Concentration of nonskin model along grey line is completely predictable from the design of perceptual color spaces
J. M. Rehg © Perspectives on Pattern Recognition Our goal is to uncover the underlying organization for what often seems to be a laundry list of methods: Linear and Tree Classifiers Gaussian Mixture Classifiers Logistic Regression Neural Networks Support Vector Machines Gaussian Process Classifiers AdaBoost …
J. M. Rehg © Statistical Perspective Statistical Inference Approach Probability model p(C, x | ), where is vector of parameters estimated from D using statistical inference Decision rule is derived from p(C, x | ) Two philosophical schools – Frequentist Statistics – Bayesian Statistics Learning Theory Approach Classifiers with distribution-free performance guarantees Connections to CS theory, computability, etc. Examples: PAC learning, structured risk minimization, etc.
J. M. Rehg © Decision Theory Perspective Three ways to obtain the decision rule f (x) Generative Modeling Model p(x | C) and p(C) using D Obtain p(C | x) using Bayes Rule Obtain the decision rule from the posterior Advantages – Use p(x) for novelty detection – Sample from p(x) to generate synthetic data and assess model quality – Use p(C | x) to assess confidence in answer (reject region) – Easy to compose modules that output posterior probabilities
J. M. Rehg © Decision Rule Discriminative modeling Obtain the posterior p(C | x) directly from D Derive the decision rule from the posterior Advantages – The posterior is often much simpler than the likelihood function – Posterior more directly related to the classification rule, may yield fewer prediction errors.