Pattern Recognition Lecture 1 - Overview Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Detecting Faces in Images: A Survey
Pattern Recognition and Machine Learning
Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.
Supervised Learning Recap
What is Statistical Modeling
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Project 4 out today –help session today –photo session today Project 2 winners Announcements.
Learning From Data Chichang Jou Tamkang University.
Machine Learning CMPT 726 Simon Fraser University
Decision Theory Naïve Bayes ROC Curves
Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Slide credits for this chapter: Frank Dellaert, Forsyth & Ponce, Paul Viola, Christopher Rasmussen.
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Crash Course on Machine Learning
Computer vision: models, learning and inference Chapter 6 Learning and Inference in Vision.
Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Principles of Pattern Recognition
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
CPSC 601 Lecture Week 5 Hand Geometry. Outline: 1.Hand Geometry as Biometrics 2.Methods Used for Recognition 3.Illustrations and Examples 4.Some Useful.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Object Recognition in Images Slides originally created by Bernd Heisele.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 3. Bayes Decision Theory: Part II. Prof. A.L. Yuille Stat 231. Fall 2004.
Today Ensemble Methods. Recap of the course. Classifier Fusion
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
A Face processing system Based on Committee Machine: The Approach and Experimental Results Presented by: Harvest Jang 29 Jan 2003.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Human pose recognition from depth image MS Research Cambridge.
Face Detection Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Christopher M. Bishop Object Recognition: A Statistical Learning Perspective Microsoft Research, Cambridge Sicily, 2003.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Classification Ensemble Methods 1
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
NTU & MSRA Ming-Feng Tsai
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
EE368: Digital Image Processing Bernd Girod Leahy, p.1/15 Face Detection on Similar Color Images Scott Leahy EE368, Stanford University May 30, 2003.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Lecture 1.31 Criteria for optimal reception of radio signals.
Computer vision: models, learning and inference
Histograms CSE 6363 – Machine Learning Vassilis Athitsos
COMP61011 : Machine Learning Ensemble Models
Lecture 26: Faces and probabilities
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
CS4670: Intro to Computer Vision
Pattern Recognition and Machine Learning
Bayesian Classification
Announcements Project 2 artifacts Project 3 due Thursday night
Announcements Project 4 out today Project 2 winners help session today
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

Pattern Recognition Lecture 1 - Overview Jim Rehg School of Interactive Computing Georgia Institute of Technology Atlanta, Georgia USA June 12, 2007

J. M. Rehg © Goal  Learn a function that maps features x to predictions C, given a dataset D = {C k, x k }  Elements of the problem Knowledge about data-generating process and task Design of feature space for x based on data Decision rule f : x C’ Loss function L(C’,C) for measuring quality of prediction Learning algorithm for computing f from D Empirical measurement of classifier performance Visualization of classifier performance and data properties Computational cost of classification (and learning)

J. M. Rehg © Example: Skin Detection in Web Images  Images containing people are interesting  Most images with people in them contain visible skin  Skin can be detected in images based on its color.  Goal: Automatic detection of “adult” images DEC Cambridge Research Lab, 1998

J. M. Rehg © Physics of Skin Color  Skin color is due to melanin and hemoglobin.  Hue (normalized color) of skin is largely invariant across the human population.  Saturation of skin color varies with concentration of melanin and hemoglobin (e.g. lips).  Detailed color models exist for melanoma identification using calibrated illumination.  But observed skin color will be effected by lighting, image acquisition device, etc.

J. M. Rehg © Skin Classification Via Statistical Inference  Joint work with Michael Jones at DEC CRL M. Jones and J. M. Rehg, “Statistical Color Models with Application to Skin Detection”, IJCV,  Model color distribution in skin and nonskin cases Estimate p(RGB | skin) and p(RBG | nonskin)  Decision rule: f : RGB {“skin”, “nonskin”} Pixel is “skin” when p(skin | RGB) > p(nonskin | RGB)  Data set D 12,000 example photos sampled from a 2 million image set obtained from an AltaVista web crawl 1 billion hand-labeled pixels in training set

J. M. Rehg © Some Example Photos Example skin images Example non-skin images

J. M. Rehg © Manually Labeling Skin and Nonskin Labeled skin pixels are segmented by hand: Labeled nonskin pixels are easily obtained from images without people

J. M. Rehg © Skin Color Modeling Using Histograms  Feature space design Standard RGB color space - easily available, efficient  Histogram probability model P(RBG | skin)P(RBG | nonskin)

J. M. Rehg © Skin Color Histogram Segmented skin regions produce a histogram in RGB space showing the distribution of skin colors. Three views of the same skin histogram are shown:

J. M. Rehg © Non-Skin Color Histogram Three views of the same non-skin histogram showing the distribution of non-skin colors:

J. M. Rehg © Decision Rule Class labels: “skin” C=1 “nonskin” C=0 Equivalently: > < f =1 f = 0

J. M. Rehg © Likelihood Ratio Test > < f =1 f = 0 > < f =1 f = 0 The ratio of class priors is usually treated as a parameter (threshold) which is adjusted to trade-off between types of errors

J. M. Rehg © Skin Classifier Architecture Input Image P(RBG | skin) P(RBG | nonskin) > < f =1 f = 0 Output “skin”

J. M. Rehg © Measuring Classifier Quality  Given a testing set T = {C j, x j } that was not used for training, apply the classifier to obtain predictions  Testing set partitioned into four categories Indicator function for boolean B:

J. M. Rehg © Measuring Classifier Quality A standard convention is to report Fraction of positive examples classified correctly Fraction of negative examples classified incorrectly

J. M. Rehg © Trading Off Types of Errors  Consider Classifier always outputs f = 1 regardless of input All positive examples correct, all negative examples incorrect d R = 1 and f R = 1  Consider Classifier always outputs f = 0 regardless of input All positive examples incorrect, all negative examples correct d R = 0 and f R = 0 > < f =1 f = 0

J. M. Rehg © ROC Curve Detection Rate d R False Positive Rate f R Each sample point on ROC curve is obtained by scoring T with a particular  Generating ROC curve does not require classifier retraining

J. M. Rehg © ROC Curve Detection Rate d R False Positive Rate f R A fair way to com- pare two classifiers is to show their ROC curves for the same T ROC stands for “Receiver Oper- ating Characteristic” and was originally developed for tuning radar receivers

J. M. Rehg © Scalar Measures of Classifier Performance Detection Rate d R False Positive Rate f R Equal Error Rate Area under the ROC curve

J. M. Rehg © ROC Curve Summary  ROC curve gives “application independent” measure of classifier performance  Performance reports based on a single point on the ROC curve are generally meaningless  Several possible scalar “summaries” Area under the ROC curve Equal error rate  Compute ROC by iterating over the values of  Compute the detection and false positive rates on the testing set for each value of  and plot the resulting point.

J. M. Rehg © Example Results Skin examples: Nonskin examples:

J. M. Rehg © Skin Detector Performance Extremely good results considering only color of single pixel is being used. Best published results (at the time) One of the largest datasets used in a vision model (nearly 1 billion labeled pixels). False Positive Rate f R Detection Rate d R But why does it work so well ???

J. M. Rehg © Analyzing the color distributions 2D color histogram for photos on the web projected onto a slice through the 3D histogram: Surface plot of the 2D histogram: Why does it work so well?

J. M. Rehg © Contour Plots Full color model (includes skin and non-skin):

J. M. Rehg © Contour Plots Continued Non-skin model:Skin model: Skin color distribution is surprisingly well-separated from the background distribution of color in web images

J. M. Rehg © Comparison to Mixture Models  Both histogram and mixture models are examples of graphical models.  Bin size controls generalization of histogram Size 32 gave the best performance  Mixture models have often been used for skin color modeling in small sample size cases.  We found histograms to give better accuracy  They are also much faster to evaluate 

J. M. Rehg © Adult Image Detection Skin Detector Image  Observation: Adult images usually contain large areas of skin  Output of skin detector can be used to create feature vector for an image  Adult image classifier trained on feature vectors  Exploring joint image/text analysis Skin Features Neural net Classifier Text Features Classifier HTML Adult?

J. M. Rehg © Adult Detection Examples These images are all correctly classified as adult images.

J. M. Rehg © More Examples Classified as not adult Classified as not adult Incorrectly classified as adult - closups of faces are a failure mode due to large amounts of skin

J. M. Rehg © Performance of Adult Image Detector

J. M. Rehg © Adult Image Detection Results Two sets of html pages collected. Crawl A: Adult sites (2365 pages, images). Crawl B: Non-adult sites (2692 pages, images). image-based text-based combined “OR” detector detector detector % of adult images rated correctly (set A): 85.8% 84.9% 93.9% % of non-adult images rated correctly (set B): 92.5% 98.9% 92.0%

J. M. Rehg © Computational Cost Analysis  General image properties Average width = 301 pixels Average height = 269 pixels Time to read an image =.078 sec  Skin Color Based Adult Image Detector Time to classify =.043 sec Implies 23 images/sec throughput

J. M. Rehg © Person Detection From Skin Detection  Skin detector gives evidence for the presence of people, but has false positives and negatives.  Use skin detector output for person detection Construct feature vector from detected skin pixels. Classify image into person/non-person  Features Percent of pixels in image detected as skin Average probability of skin pixels Largest connected component of skin

J. M. Rehg © Person Detection Example Results Person No Person

J. M. Rehg © Person Detection Results Continued No Person Person

J. M. Rehg © Person Detector Performance Two classifiers were built using these measures on 1400 training images. A test set of 456 images was used to evaluate the classifier. Classifier Performance Training Testing examples examples Neural network 76.2% 74.3% Decision tree 75.8% 72.1%

J. M. Rehg © Applications of Person Detection  “Person Detected” tag for media search Skin and face analysis tag photos and video frames with people in them. Improved ranking of query returns: Photos of people appear at top of list.  Image similarity measure Photos with people in them are grouped together. Can be used during query refinement.

J. M. Rehg © Summary of Skin Detection Example  What are the factors that made skin detection successful? Problem which seemed hard a priori but turned out to be easy (classes surprisingly separable). Low dimensionality makes adequate data collection feasible and classifier design a non-issue. Intrinisic dimensions are clear a priori – Concentration of nonskin model along grey line is completely predictable from the design of perceptual color spaces

J. M. Rehg © Perspectives on Pattern Recognition  Our goal is to uncover the underlying organization for what often seems to be a laundry list of methods: Linear and Tree Classifiers Gaussian Mixture Classifiers Logistic Regression Neural Networks Support Vector Machines Gaussian Process Classifiers AdaBoost …

J. M. Rehg © Statistical Perspective  Statistical Inference Approach Probability model p(C, x |  ), where  is vector of parameters estimated from D using statistical inference Decision rule is derived from p(C, x |  ) Two philosophical schools – Frequentist Statistics – Bayesian Statistics  Learning Theory Approach Classifiers with distribution-free performance guarantees Connections to CS theory, computability, etc. Examples: PAC learning, structured risk minimization, etc.

J. M. Rehg © Decision Theory Perspective  Three ways to obtain the decision rule f (x)  Generative Modeling Model p(x | C) and p(C) using D Obtain p(C | x) using Bayes Rule Obtain the decision rule from the posterior Advantages – Use p(x) for novelty detection – Sample from p(x) to generate synthetic data and assess model quality – Use p(C | x) to assess confidence in answer (reject region) – Easy to compose modules that output posterior probabilities

J. M. Rehg © Decision Rule  Discriminative modeling Obtain the posterior p(C | x) directly from D Derive the decision rule from the posterior Advantages – The posterior is often much simpler than the likelihood function – Posterior more directly related to the classification rule, may yield fewer prediction errors.