Download presentation
Presentation is loading. Please wait.
Published bySheryl Davis Modified over 9 years ago
1
Pattern Recognition Lecture 1 - Overview Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007
2
J. M. Rehg © 2007 2 Goal Learn a function that maps features x to predictions C, given a dataset D = {C k, x k } Elements of the problem Knowledge about data-generating process and task Design of feature space for x based on data Decision rule f : x C’ Loss function L(C’,C) for measuring quality of prediction Learning algorithm for computing f from D Empirical measurement of classifier performance Visualization of classifier performance and data properties Computational cost of classification (and learning)
3
J. M. Rehg © 2007 3 Example: Skin Detection in Web Images Images containing people are interesting Most images with people in them contain visible skin Skin can be detected in images based on its color. Goal: Automatic detection of “adult” images DEC Cambridge Research Lab, 1998
4
J. M. Rehg © 2007 4 Physics of Skin Color Skin color is due to melanin and hemoglobin. Hue (normalized color) of skin is largely invariant across the human population. Saturation of skin color varies with concentration of melanin and hemoglobin (e.g. lips). Detailed color models exist for melanoma identification using calibrated illumination. But observed skin color will be effected by lighting, image acquisition device, etc.
5
J. M. Rehg © 2007 5 Skin Classification Via Statistical Inference Joint work with Michael Jones at DEC CRL M. Jones and J. M. Rehg, “Statistical Color Models with Application to Skin Detection”, IJCV, 2001. Model color distribution in skin and nonskin cases Estimate p(RGB | skin) and p(RBG | nonskin) Decision rule: f : RGB {“skin”, “nonskin”} Pixel is “skin” when p(skin | RGB) > p(nonskin | RGB) Data set D 12,000 example photos sampled from a 2 million image set obtained from an AltaVista web crawl 1 billion hand-labeled pixels in training set
6
J. M. Rehg © 2007 6 Some Example Photos Example skin images Example non-skin images
7
J. M. Rehg © 2007 7 Manually Labeling Skin and Nonskin Labeled skin pixels are segmented by hand: Labeled nonskin pixels are easily obtained from images without people
8
J. M. Rehg © 2007 8 Skin Color Modeling Using Histograms Feature space design Standard RGB color space - easily available, efficient Histogram probability model P(RBG | skin)P(RBG | nonskin)
9
J. M. Rehg © 2007 9 Skin Color Histogram Segmented skin regions produce a histogram in RGB space showing the distribution of skin colors. Three views of the same skin histogram are shown:
10
J. M. Rehg © 2007 10 Non-Skin Color Histogram Three views of the same non-skin histogram showing the distribution of non-skin colors:
11
J. M. Rehg © 2007 11 Decision Rule Class labels: “skin” C=1 “nonskin” C=0 Equivalently: > < f =1 f = 0
12
J. M. Rehg © 2007 12 Likelihood Ratio Test > < f =1 f = 0 > < f =1 f = 0 The ratio of class priors is usually treated as a parameter (threshold) which is adjusted to trade-off between types of errors
13
J. M. Rehg © 2007 13 Skin Classifier Architecture Input Image P(RBG | skin) P(RBG | nonskin) > < f =1 f = 0 Output “skin”
14
J. M. Rehg © 2007 14 Measuring Classifier Quality Given a testing set T = {C j, x j } that was not used for training, apply the classifier to obtain predictions Testing set partitioned into four categories Indicator function for boolean B:
15
J. M. Rehg © 2007 15 Measuring Classifier Quality A standard convention is to report Fraction of positive examples classified correctly Fraction of negative examples classified incorrectly
16
J. M. Rehg © 2007 16 Trading Off Types of Errors Consider Classifier always outputs f = 1 regardless of input All positive examples correct, all negative examples incorrect d R = 1 and f R = 1 Consider Classifier always outputs f = 0 regardless of input All positive examples incorrect, all negative examples correct d R = 0 and f R = 0 > < f =1 f = 0
17
J. M. Rehg © 2007 17 ROC Curve Detection Rate d R False Positive Rate f R 010.50.250.75 0 0.25 0.5 0.75 1 Each sample point on ROC curve is obtained by scoring T with a particular Generating ROC curve does not require classifier retraining
18
J. M. Rehg © 2007 18 ROC Curve Detection Rate d R False Positive Rate f R 010.50.250.75 0 0.25 0.5 0.75 1 A fair way to com- pare two classifiers is to show their ROC curves for the same T ROC stands for “Receiver Oper- ating Characteristic” and was originally developed for tuning radar receivers
19
J. M. Rehg © 2007 19 Scalar Measures of Classifier Performance Detection Rate d R False Positive Rate f R 010.50.250.75 0 0.25 0.5 0.75 1 Equal Error Rate Area under the ROC curve
20
J. M. Rehg © 2007 20 ROC Curve Summary ROC curve gives “application independent” measure of classifier performance Performance reports based on a single point on the ROC curve are generally meaningless Several possible scalar “summaries” Area under the ROC curve Equal error rate Compute ROC by iterating over the values of Compute the detection and false positive rates on the testing set for each value of and plot the resulting point.
21
J. M. Rehg © 2007 21 Example Results Skin examples: Nonskin examples:
22
J. M. Rehg © 2007 22 Skin Detector Performance Extremely good results considering only color of single pixel is being used. Best published results (at the time) One of the largest datasets used in a vision model (nearly 1 billion labeled pixels). False Positive Rate f R Detection Rate d R But why does it work so well ???
23
J. M. Rehg © 2007 23 Analyzing the color distributions 2D color histogram for photos on the web projected onto a slice through the 3D histogram: Surface plot of the 2D histogram: Why does it work so well?
24
J. M. Rehg © 2007 24 Contour Plots Full color model (includes skin and non-skin):
25
J. M. Rehg © 2007 25 Contour Plots Continued Non-skin model:Skin model: Skin color distribution is surprisingly well-separated from the background distribution of color in web images
26
J. M. Rehg © 2007 26 Comparison to Mixture Models Both histogram and mixture models are examples of graphical models. Bin size controls generalization of histogram Size 32 gave the best performance Mixture models have often been used for skin color modeling in small sample size cases. We found histograms to give better accuracy They are also much faster to evaluate
27
J. M. Rehg © 2007 27 Adult Image Detection Skin Detector Image Observation: Adult images usually contain large areas of skin Output of skin detector can be used to create feature vector for an image Adult image classifier trained on feature vectors Exploring joint image/text analysis Skin Features Neural net Classifier Text Features Classifier HTML Adult?
28
J. M. Rehg © 2007 28 Adult Detection Examples These images are all correctly classified as adult images.
29
J. M. Rehg © 2007 29 More Examples Classified as not adult Classified as not adult Incorrectly classified as adult - closups of faces are a failure mode due to large amounts of skin
30
J. M. Rehg © 2007 30 Performance of Adult Image Detector
31
J. M. Rehg © 2007 31 Adult Image Detection Results Two sets of html pages collected. Crawl A: Adult sites (2365 pages, 11323 images). Crawl B: Non-adult sites (2692 pages, 13973 images). image-based text-based combined “OR” detector detector detector ----------------- ------------- ------------------- % of adult images rated correctly (set A): 85.8% 84.9% 93.9% % of non-adult images rated correctly (set B): 92.5% 98.9% 92.0%
32
J. M. Rehg © 2007 32 Computational Cost Analysis General image properties Average width = 301 pixels Average height = 269 pixels Time to read an image =.078 sec Skin Color Based Adult Image Detector Time to classify =.043 sec Implies 23 images/sec throughput
33
J. M. Rehg © 2007 33 Person Detection From Skin Detection Skin detector gives evidence for the presence of people, but has false positives and negatives. Use skin detector output for person detection Construct feature vector from detected skin pixels. Classify image into person/non-person Features Percent of pixels in image detected as skin Average probability of skin pixels Largest connected component of skin
34
J. M. Rehg © 2007 34 Person Detection Example Results Person No Person
35
J. M. Rehg © 2007 35 Person Detection Results Continued No Person Person
36
J. M. Rehg © 2007 36 Person Detector Performance Two classifiers were built using these measures on 1400 training images. A test set of 456 images was used to evaluate the classifier. Classifier Performance Training Testing examples examples Neural network 76.2% 74.3% Decision tree 75.8% 72.1%
37
J. M. Rehg © 2007 37 Applications of Person Detection “Person Detected” tag for media search Skin and face analysis tag photos and video frames with people in them. Improved ranking of query returns: Photos of people appear at top of list. Image similarity measure Photos with people in them are grouped together. Can be used during query refinement.
38
J. M. Rehg © 2007 38 Summary of Skin Detection Example What are the factors that made skin detection successful? Problem which seemed hard a priori but turned out to be easy (classes surprisingly separable). Low dimensionality makes adequate data collection feasible and classifier design a non-issue. Intrinisic dimensions are clear a priori – Concentration of nonskin model along grey line is completely predictable from the design of perceptual color spaces
39
J. M. Rehg © 2007 39 Perspectives on Pattern Recognition Our goal is to uncover the underlying organization for what often seems to be a laundry list of methods: Linear and Tree Classifiers Gaussian Mixture Classifiers Logistic Regression Neural Networks Support Vector Machines Gaussian Process Classifiers AdaBoost …
40
J. M. Rehg © 2007 40 Statistical Perspective Statistical Inference Approach Probability model p(C, x | ), where is vector of parameters estimated from D using statistical inference Decision rule is derived from p(C, x | ) Two philosophical schools – Frequentist Statistics – Bayesian Statistics Learning Theory Approach Classifiers with distribution-free performance guarantees Connections to CS theory, computability, etc. Examples: PAC learning, structured risk minimization, etc.
41
J. M. Rehg © 2007 41 Decision Theory Perspective Three ways to obtain the decision rule f (x) Generative Modeling Model p(x | C) and p(C) using D Obtain p(C | x) using Bayes Rule Obtain the decision rule from the posterior Advantages – Use p(x) for novelty detection – Sample from p(x) to generate synthetic data and assess model quality – Use p(C | x) to assess confidence in answer (reject region) – Easy to compose modules that output posterior probabilities
42
J. M. Rehg © 2007 42 Decision Rule Discriminative modeling Obtain the posterior p(C | x) directly from D Derive the decision rule from the posterior Advantages – The posterior is often much simpler than the likelihood function – Posterior more directly related to the classification rule, may yield fewer prediction errors.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.