Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pattern Recognition Lecture 1 - Overview Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007.

Similar presentations


Presentation on theme: "Pattern Recognition Lecture 1 - Overview Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007."— Presentation transcript:

1 Pattern Recognition Lecture 1 - Overview Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007

2 J. M. Rehg © 2007 2 Goal  Learn a function that maps features x to predictions C, given a dataset D = {C k, x k }  Elements of the problem Knowledge about data-generating process and task Design of feature space for x based on data Decision rule f : x C’ Loss function L(C’,C) for measuring quality of prediction Learning algorithm for computing f from D Empirical measurement of classifier performance Visualization of classifier performance and data properties Computational cost of classification (and learning)

3 J. M. Rehg © 2007 3 Example: Skin Detection in Web Images  Images containing people are interesting  Most images with people in them contain visible skin  Skin can be detected in images based on its color.  Goal: Automatic detection of “adult” images DEC Cambridge Research Lab, 1998

4 J. M. Rehg © 2007 4 Physics of Skin Color  Skin color is due to melanin and hemoglobin.  Hue (normalized color) of skin is largely invariant across the human population.  Saturation of skin color varies with concentration of melanin and hemoglobin (e.g. lips).  Detailed color models exist for melanoma identification using calibrated illumination.  But observed skin color will be effected by lighting, image acquisition device, etc.

5 J. M. Rehg © 2007 5 Skin Classification Via Statistical Inference  Joint work with Michael Jones at DEC CRL M. Jones and J. M. Rehg, “Statistical Color Models with Application to Skin Detection”, IJCV, 2001.  Model color distribution in skin and nonskin cases Estimate p(RGB | skin) and p(RBG | nonskin)  Decision rule: f : RGB {“skin”, “nonskin”} Pixel is “skin” when p(skin | RGB) > p(nonskin | RGB)  Data set D 12,000 example photos sampled from a 2 million image set obtained from an AltaVista web crawl 1 billion hand-labeled pixels in training set

6 J. M. Rehg © 2007 6 Some Example Photos Example skin images Example non-skin images

7 J. M. Rehg © 2007 7 Manually Labeling Skin and Nonskin Labeled skin pixels are segmented by hand: Labeled nonskin pixels are easily obtained from images without people

8 J. M. Rehg © 2007 8 Skin Color Modeling Using Histograms  Feature space design Standard RGB color space - easily available, efficient  Histogram probability model P(RBG | skin)P(RBG | nonskin)

9 J. M. Rehg © 2007 9 Skin Color Histogram Segmented skin regions produce a histogram in RGB space showing the distribution of skin colors. Three views of the same skin histogram are shown:

10 J. M. Rehg © 2007 10 Non-Skin Color Histogram Three views of the same non-skin histogram showing the distribution of non-skin colors:

11 J. M. Rehg © 2007 11 Decision Rule Class labels: “skin” C=1 “nonskin” C=0 Equivalently: > < f =1 f = 0

12 J. M. Rehg © 2007 12 Likelihood Ratio Test > < f =1 f = 0 > < f =1 f = 0 The ratio of class priors is usually treated as a parameter (threshold) which is adjusted to trade-off between types of errors

13 J. M. Rehg © 2007 13 Skin Classifier Architecture Input Image P(RBG | skin) P(RBG | nonskin) > < f =1 f = 0 Output “skin”

14 J. M. Rehg © 2007 14 Measuring Classifier Quality  Given a testing set T = {C j, x j } that was not used for training, apply the classifier to obtain predictions  Testing set partitioned into four categories Indicator function for boolean B:

15 J. M. Rehg © 2007 15 Measuring Classifier Quality A standard convention is to report Fraction of positive examples classified correctly Fraction of negative examples classified incorrectly

16 J. M. Rehg © 2007 16 Trading Off Types of Errors  Consider Classifier always outputs f = 1 regardless of input All positive examples correct, all negative examples incorrect d R = 1 and f R = 1  Consider Classifier always outputs f = 0 regardless of input All positive examples incorrect, all negative examples correct d R = 0 and f R = 0 > < f =1 f = 0

17 J. M. Rehg © 2007 17 ROC Curve Detection Rate d R False Positive Rate f R 010.50.250.75 0 0.25 0.5 0.75 1 Each sample point on ROC curve is obtained by scoring T with a particular  Generating ROC curve does not require classifier retraining

18 J. M. Rehg © 2007 18 ROC Curve Detection Rate d R False Positive Rate f R 010.50.250.75 0 0.25 0.5 0.75 1 A fair way to com- pare two classifiers is to show their ROC curves for the same T ROC stands for “Receiver Oper- ating Characteristic” and was originally developed for tuning radar receivers

19 J. M. Rehg © 2007 19 Scalar Measures of Classifier Performance Detection Rate d R False Positive Rate f R 010.50.250.75 0 0.25 0.5 0.75 1 Equal Error Rate Area under the ROC curve

20 J. M. Rehg © 2007 20 ROC Curve Summary  ROC curve gives “application independent” measure of classifier performance  Performance reports based on a single point on the ROC curve are generally meaningless  Several possible scalar “summaries” Area under the ROC curve Equal error rate  Compute ROC by iterating over the values of  Compute the detection and false positive rates on the testing set for each value of  and plot the resulting point.

21 J. M. Rehg © 2007 21 Example Results Skin examples: Nonskin examples:

22 J. M. Rehg © 2007 22 Skin Detector Performance Extremely good results considering only color of single pixel is being used. Best published results (at the time) One of the largest datasets used in a vision model (nearly 1 billion labeled pixels). False Positive Rate f R Detection Rate d R But why does it work so well ???

23 J. M. Rehg © 2007 23 Analyzing the color distributions 2D color histogram for photos on the web projected onto a slice through the 3D histogram: Surface plot of the 2D histogram: Why does it work so well?

24 J. M. Rehg © 2007 24 Contour Plots Full color model (includes skin and non-skin):

25 J. M. Rehg © 2007 25 Contour Plots Continued Non-skin model:Skin model: Skin color distribution is surprisingly well-separated from the background distribution of color in web images

26 J. M. Rehg © 2007 26 Comparison to Mixture Models  Both histogram and mixture models are examples of graphical models.  Bin size controls generalization of histogram Size 32 gave the best performance  Mixture models have often been used for skin color modeling in small sample size cases.  We found histograms to give better accuracy  They are also much faster to evaluate 

27 J. M. Rehg © 2007 27 Adult Image Detection Skin Detector Image  Observation: Adult images usually contain large areas of skin  Output of skin detector can be used to create feature vector for an image  Adult image classifier trained on feature vectors  Exploring joint image/text analysis Skin Features Neural net Classifier Text Features Classifier HTML Adult?

28 J. M. Rehg © 2007 28 Adult Detection Examples These images are all correctly classified as adult images.

29 J. M. Rehg © 2007 29 More Examples Classified as not adult Classified as not adult Incorrectly classified as adult - closups of faces are a failure mode due to large amounts of skin

30 J. M. Rehg © 2007 30 Performance of Adult Image Detector

31 J. M. Rehg © 2007 31 Adult Image Detection Results Two sets of html pages collected. Crawl A: Adult sites (2365 pages, 11323 images). Crawl B: Non-adult sites (2692 pages, 13973 images). image-based text-based combined “OR” detector detector detector ----------------- ------------- ------------------- % of adult images rated correctly (set A): 85.8% 84.9% 93.9% % of non-adult images rated correctly (set B): 92.5% 98.9% 92.0%

32 J. M. Rehg © 2007 32 Computational Cost Analysis  General image properties Average width = 301 pixels Average height = 269 pixels Time to read an image =.078 sec  Skin Color Based Adult Image Detector Time to classify =.043 sec Implies 23 images/sec throughput

33 J. M. Rehg © 2007 33 Person Detection From Skin Detection  Skin detector gives evidence for the presence of people, but has false positives and negatives.  Use skin detector output for person detection Construct feature vector from detected skin pixels. Classify image into person/non-person  Features Percent of pixels in image detected as skin Average probability of skin pixels Largest connected component of skin

34 J. M. Rehg © 2007 34 Person Detection Example Results Person No Person

35 J. M. Rehg © 2007 35 Person Detection Results Continued No Person Person

36 J. M. Rehg © 2007 36 Person Detector Performance Two classifiers were built using these measures on 1400 training images. A test set of 456 images was used to evaluate the classifier. Classifier Performance Training Testing examples examples Neural network 76.2% 74.3% Decision tree 75.8% 72.1%

37 J. M. Rehg © 2007 37 Applications of Person Detection  “Person Detected” tag for media search Skin and face analysis tag photos and video frames with people in them. Improved ranking of query returns: Photos of people appear at top of list.  Image similarity measure Photos with people in them are grouped together. Can be used during query refinement.

38 J. M. Rehg © 2007 38 Summary of Skin Detection Example  What are the factors that made skin detection successful? Problem which seemed hard a priori but turned out to be easy (classes surprisingly separable). Low dimensionality makes adequate data collection feasible and classifier design a non-issue. Intrinisic dimensions are clear a priori – Concentration of nonskin model along grey line is completely predictable from the design of perceptual color spaces

39 J. M. Rehg © 2007 39 Perspectives on Pattern Recognition  Our goal is to uncover the underlying organization for what often seems to be a laundry list of methods: Linear and Tree Classifiers Gaussian Mixture Classifiers Logistic Regression Neural Networks Support Vector Machines Gaussian Process Classifiers AdaBoost …

40 J. M. Rehg © 2007 40 Statistical Perspective  Statistical Inference Approach Probability model p(C, x |  ), where  is vector of parameters estimated from D using statistical inference Decision rule is derived from p(C, x |  ) Two philosophical schools – Frequentist Statistics – Bayesian Statistics  Learning Theory Approach Classifiers with distribution-free performance guarantees Connections to CS theory, computability, etc. Examples: PAC learning, structured risk minimization, etc.

41 J. M. Rehg © 2007 41 Decision Theory Perspective  Three ways to obtain the decision rule f (x)  Generative Modeling Model p(x | C) and p(C) using D Obtain p(C | x) using Bayes Rule Obtain the decision rule from the posterior Advantages – Use p(x) for novelty detection – Sample from p(x) to generate synthetic data and assess model quality – Use p(C | x) to assess confidence in answer (reject region) – Easy to compose modules that output posterior probabilities

42 J. M. Rehg © 2007 42 Decision Rule  Discriminative modeling Obtain the posterior p(C | x) directly from D Derive the decision rule from the posterior Advantages – The posterior is often much simpler than the likelihood function – Posterior more directly related to the classification rule, may yield fewer prediction errors.


Download ppt "Pattern Recognition Lecture 1 - Overview Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007."

Similar presentations


Ads by Google