Computer Vision: 3D Shape Reconstruction Use images to build 3D model of object or site 3D site model built from laser range scans collected by CMU autonomous helicopter
Computer Vision: Guiding Motion Visually guided manipulation – Hand-eye coordination Visually guided locomotion – robotic vehicles CMU NavLab II
Computer Vision: Recognition & Classification
Challenges in Object Recognition
Object Recognition Research Low Image Quality Large Quantity of Data Intra- class Object Variation Large number of Object Classes Automated Learning Robust Algorithms Advanced Image Enhancement Segmentation and Hierarchical Analysis Lips Face Text Building Hand Gesture Vehicle Clock License Plate Object Detection Object Detection Issues Quality/Quantity Issues
Intra-Class Variation
Lighting Variation
Geometric Variation
Simpler Problem: Classification Fixed size input Fixed object size, orientation, and alignment “Object is present” (at fixed size and alignment) “Object is NOT present” (at fixed size and alignment) Decision
Detection: Apply Classifier Exhaustively Search in position Search in scale
View-based Classifiers Face Classifier #1 Face Classifier #2 Face Classifier #3
1) Apply Local Operators f 1 (0, 1) = #3214 f 1 (0, 0) = #5710 f k (n, m) = #723
2) Look Up Probabilities f 1 (0, 1) = #3214 f 1 (0, 0) = #5710 f k (n, m) = #723 P 1 ( #5710, 0, 0 | obj) = 0.53 P 1 ( #5710, 0, 0 | non-obj) = 0.56 P 1 ( #3214, 0, 1 | obj) = 0.57 P 1 ( #3214, 0, 1 | non-obj) = 0.48 P k ( #723, n, m | obj) = 0.83 P k ( #723, n, m | non-obj) = 0.19
3) Make Decision P 1 ( #5710, 0, 0 | obj) = 0.53 P 1 ( #5710, 0, 0 | non-obj) = 0.56 P 1 ( #3214, 0, 1 | obj) = 0.57 P 1 ( #3214, 0, 1 | non-obj) = 0.48 P k ( #723, n, m | obj) = 0.83 P k ( #723, n, m | non-obj) = * 0.57 *... * * 0.48 *... * 0.19 >
Two Classifiers Trained for Faces
Eight Classifiers Trained for Cars
Probabilities Estimated Off-Line f 1 (0, 0) = #567H 1 (#567, 0, 0) = H 1 (567, 0, 0) + 1 f k (n, m) = #350H k (#350, 0, 0) = H k (#350, 0, 0) + 1 P 1 (#567, 0, 0) = H 1 (#i, 0, 0) H 1 (#567, 0, 0) P k (#350, 0, 0) = H k (#i, 0, 0) H k (#350, 0, 0)
Training Classifiers Cars: images per viewpoint Faces: 2,000 images per viewpoint ~1,000 synthetic variations of each original image – background scenery, orientation, position, frequency 2000 non-object images – Samples selected by bootstrapping Minimization of classification error on training set – AdaBoost algorithm (Freund & Shapire ‘97, Shapire & Singer ‘99) Iterative method Determines weights for samples
Web-based Demo of Face Detector