Pattern Recognition Lecture 1 - Overview Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007.

Pattern Recognition Lecture 1 - Overview Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007

J. M. Rehg © 2007 2 Goal  Learn a function that maps features x to predictions C, given a dataset D = {C k, x k }  Elements of the problem Knowledge about data-generating process and task Design of feature space for x based on data Decision rule f : x C’ Loss function L(C’,C) for measuring quality of prediction Learning algorithm for computing f from D Empirical measurement of classifier performance Visualization of classifier performance and data properties Computational cost of classification (and learning)

J. M. Rehg © 2007 3 Example: Skin Detection in Web Images  Images containing people are interesting  Most images with people in them contain visible skin  Skin can be detected in images based on its color.  Goal: Automatic detection of “adult” images DEC Cambridge Research Lab, 1998

J. M. Rehg © 2007 4 Physics of Skin Color  Skin color is due to melanin and hemoglobin.  Hue (normalized color) of skin is largely invariant across the human population.  Saturation of skin color varies with concentration of melanin and hemoglobin (e.g. lips).  Detailed color models exist for melanoma identification using calibrated illumination.  But observed skin color will be effected by lighting, image acquisition device, etc.

J. M. Rehg © 2007 5 Skin Classification Via Statistical Inference  Joint work with Michael Jones at DEC CRL M. Jones and J. M. Rehg, “Statistical Color Models with Application to Skin Detection”, IJCV, 2001.  Model color distribution in skin and nonskin cases Estimate p(RGB | skin) and p(RBG | nonskin)  Decision rule: f : RGB {“skin”, “nonskin”} Pixel is “skin” when p(skin | RGB) > p(nonskin | RGB)  Data set D 12,000 example photos sampled from a 2 million image set obtained from an AltaVista web crawl 1 billion hand-labeled pixels in training set

J. M. Rehg © 2007 14 Measuring Classifier Quality  Given a testing set T = {C j, x j } that was not used for training, apply the classifier to obtain predictions  Testing set partitioned into four categories Indicator function for boolean B:

J. M. Rehg © 2007 16 Trading Off Types of Errors  Consider Classifier always outputs f = 1 regardless of input All positive examples correct, all negative examples incorrect d R = 1 and f R = 1  Consider Classifier always outputs f = 0 regardless of input All positive examples incorrect, all negative examples correct d R = 0 and f R = 0 > < f =1 f = 0

J. M. Rehg © 2007 17 ROC Curve Detection Rate d R False Positive Rate f R 010.50.250.75 0 0.25 0.5 0.75 1 Each sample point on ROC curve is obtained by scoring T with a particular  Generating ROC curve does not require classifier retraining

J. M. Rehg © 2007 18 ROC Curve Detection Rate d R False Positive Rate f R 010.50.250.75 0 0.25 0.5 0.75 1 A fair way to com- pare two classifiers is to show their ROC curves for the same T ROC stands for “Receiver Oper- ating Characteristic” and was originally developed for tuning radar receivers

J. M. Rehg © 2007 20 ROC Curve Summary  ROC curve gives “application independent” measure of classifier performance  Performance reports based on a single point on the ROC curve are generally meaningless  Several possible scalar “summaries” Area under the ROC curve Equal error rate  Compute ROC by iterating over the values of  Compute the detection and false positive rates on the testing set for each value of  and plot the resulting point.

J. M. Rehg © 2007 22 Skin Detector Performance Extremely good results considering only color of single pixel is being used. Best published results (at the time) One of the largest datasets used in a vision model (nearly 1 billion labeled pixels). False Positive Rate f R Detection Rate d R But why does it work so well ???

J. M. Rehg © 2007 23 Analyzing the color distributions 2D color histogram for photos on the web projected onto a slice through the 3D histogram: Surface plot of the 2D histogram: Why does it work so well?

J. M. Rehg © 2007 26 Comparison to Mixture Models  Both histogram and mixture models are examples of graphical models.  Bin size controls generalization of histogram Size 32 gave the best performance  Mixture models have often been used for skin color modeling in small sample size cases.  We found histograms to give better accuracy  They are also much faster to evaluate 

J. M. Rehg © 2007 27 Adult Image Detection Skin Detector Image  Observation: Adult images usually contain large areas of skin  Output of skin detector can be used to create feature vector for an image  Adult image classifier trained on feature vectors  Exploring joint image/text analysis Skin Features Neural net Classifier Text Features Classifier HTML Adult?

J. M. Rehg © 2007 31 Adult Image Detection Results Two sets of html pages collected. Crawl A: Adult sites (2365 pages, 11323 images). Crawl B: Non-adult sites (2692 pages, 13973 images). image-based text-based combined “OR” detector detector detector ----------------- ------------- ------------------- % of adult images rated correctly (set A): 85.8% 84.9% 93.9% % of non-adult images rated correctly (set B): 92.5% 98.9% 92.0%

J. M. Rehg © 2007 32 Computational Cost Analysis  General image properties Average width = 301 pixels Average height = 269 pixels Time to read an image =.078 sec  Skin Color Based Adult Image Detector Time to classify =.043 sec Implies 23 images/sec throughput

J. M. Rehg © 2007 33 Person Detection From Skin Detection  Skin detector gives evidence for the presence of people, but has false positives and negatives.  Use skin detector output for person detection Construct feature vector from detected skin pixels. Classify image into person/non-person  Features Percent of pixels in image detected as skin Average probability of skin pixels Largest connected component of skin

J. M. Rehg © 2007 36 Person Detector Performance Two classifiers were built using these measures on 1400 training images. A test set of 456 images was used to evaluate the classifier. Classifier Performance Training Testing examples examples Neural network 76.2% 74.3% Decision tree 75.8% 72.1%

J. M. Rehg © 2007 37 Applications of Person Detection  “Person Detected” tag for media search Skin and face analysis tag photos and video frames with people in them. Improved ranking of query returns: Photos of people appear at top of list.  Image similarity measure Photos with people in them are grouped together. Can be used during query refinement.

J. M. Rehg © 2007 38 Summary of Skin Detection Example  What are the factors that made skin detection successful? Problem which seemed hard a priori but turned out to be easy (classes surprisingly separable). Low dimensionality makes adequate data collection feasible and classifier design a non-issue. Intrinisic dimensions are clear a priori – Concentration of nonskin model along grey line is completely predictable from the design of perceptual color spaces

J. M. Rehg © 2007 39 Perspectives on Pattern Recognition  Our goal is to uncover the underlying organization for what often seems to be a laundry list of methods: Linear and Tree Classifiers Gaussian Mixture Classifiers Logistic Regression Neural Networks Support Vector Machines Gaussian Process Classifiers AdaBoost …

J. M. Rehg © 2007 40 Statistical Perspective  Statistical Inference Approach Probability model p(C, x |  ), where  is vector of parameters estimated from D using statistical inference Decision rule is derived from p(C, x |  ) Two philosophical schools – Frequentist Statistics – Bayesian Statistics  Learning Theory Approach Classifiers with distribution-free performance guarantees Connections to CS theory, computability, etc. Examples: PAC learning, structured risk minimization, etc.

J. M. Rehg © 2007 41 Decision Theory Perspective  Three ways to obtain the decision rule f (x)  Generative Modeling Model p(x | C) and p(C) using D Obtain p(C | x) using Bayes Rule Obtain the decision rule from the posterior Advantages – Use p(x) for novelty detection – Sample from p(x) to generate synthetic data and assess model quality – Use p(C | x) to assess confidence in answer (reject region) – Easy to compose modules that output posterior probabilities

J. M. Rehg © 2007 42 Decision Rule  Discriminative modeling Obtain the posterior p(C | x) directly from D Derive the decision rule from the posterior Advantages – The posterior is often much simpler than the likelihood function – Posterior more directly related to the classification rule, may yield fewer prediction errors.

Pattern Recognition Lecture 1 - Overview Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007.

Similar presentations

Presentation on theme: "Pattern Recognition Lecture 1 - Overview Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pattern Recognition Lecture 1 - Overview Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007.

Similar presentations

Presentation on theme: "Pattern Recognition Lecture 1 - Overview Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007."— Presentation transcript:

Similar presentations

About project

Feedback