Object Detection.

Slides:

Advertisements

Similar presentations

QR Code Recognition Based On Image Processing

Advertisements

Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.

Histograms of Oriented Gradients for Human Detection

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

LPP-HOG: A New Local Image Descriptor for Fast Human Detection Andy Qing Jun Wang and Ru Bo Zhang IEEE International Symposium.

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,

Segmentation (2): edge detection

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

Project 4 out today –help session today –photo session today Project 2 winners Announcements.

Object Recognition A wise robot sees as much as he ought, not as much as he can Search for objects that are important lamps outlets wall corners doors.

Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Robust estimation Problem: we want to determine the displacement (u,v) between pairs of images. We are given 100 points with a correlation score computed.

כמה מהתעשייה? מבנה הקורס השתנה Computer vision.

October 8, 2013Computer Vision Lecture 11: The Hough Transform 1 Fitting Curve Models to Edges Most contours can be well described by combining several.

Face Detection using the Viola-Jones Method

Classification with Hyperplanes Defines a boundary between various points of data which represent examples plotted in multidimensional space according.

October 14, 2014Computer Vision Lecture 11: Image Segmentation I 1Contours How should we represent contours? A good contour representation should meet.

HOUGH TRANSFORM Presentation by Sumit Tandon

HOUGH TRANSFORM & Line Fitting Introduction  HT performed after Edge Detection  It is a technique to isolate the curves of a given shape / shapes.

Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.

Face detection Slides adapted Grauman & Liebe’s tutorial

Generalized Hough Transform

Object Recognition in Images Slides originally created by Bernd Heisele.

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.

Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.

CS654: Digital Image Analysis Lecture 25: Hough Transform Slide credits: Guillermo Sapiro, Mubarak Shah, Derek Hoiem.

Methods for classification and image representation

CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.

CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.

CSE 185 Introduction to Computer Vision Feature Matching.

October 16, 2014Computer Vision Lecture 12: Image Segmentation II 1 Hough Transform The Hough transform is a very general technique for feature detection.

A New Method for Crater Detection Heather Dunlop November 2, 2006.

Digital Image Processing Lecture 17: Segmentation: Canny Edge Detector & Hough Transform Prof. Charlene Tsai.

Hough Transform CS 691 E Spring Outline Hough transform Homography Reading: FP Chapter 15.1 (text) Some slides from Lazebnik.

Object Recognition. Segmentation –Roughly speaking, segmentation is to partition the images into meaningful parts that are relatively homogenous in certain.

Grouping and Segmentation. Sometimes edge detectors find the boundary pretty well.

Another Example: Circle Detection

Cascade for Fast Detection

Deep Feedforward Networks

Fitting: Voting and the Hough Transform

Introduction to Machine Learning

EE465: Introduction to Digital Image Processing Copyright Xin Li'2003

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Line Fitting James Hayes.

Detection of discontinuity using

Image Segmentation – Edge Detection

Lit part of blue dress and shadowed part of white dress are the same color

Basic machine learning background with Python scikit-learn

Recognition using Nearest Neighbor (or kNN)

Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.

Object detection as supervised classification

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Fitting Curve Models to Edges

In summary C1={skin} C2={~skin} Given x=[R,G,B], is it skin or ~skin?

ECE 692 – Advanced Topics in Computer Vision

Photo by Carl Warner.

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.

Photo by Carl Warner.

Neural Networks.

Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.

Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.

Hough Transform.

Announcements Project 2 artifacts Project 3 due Thursday night

Announcements Project 4 out today Project 2 winners help session today

Outline Announcement Perceptual organization, grouping, and segmentation Hough transform Read Chapter 17 of the textbook File: week14-m.ppt.

CSE 185 Introduction to Computer Vision

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Introduction to Artificial Intelligence Lecture 22: Computer Vision II

The “Margaret Thatcher Illusion”, by Peter Thompson

Presentation transcript:

Object Detection

Object Identification Object Recognition Object Detection Object Identification Where is Rania? Where is a Face? Is there a face in the image? Who is it? Is it Ahmed or Hassan?

a challenge: object perception Object detection Object segmentation Object recognition Typical systems require human-prepared training data; can we use autonomous experimentation?

a challenge: object perception Object detection Object segmentation Object recognition Typical systems require human-prepared training data; can we use autonomous experimentation? Fruit detection

a challenge: object perception Object detection Object segmentation Object recognition Typical systems require human-prepared training data; can we use autonomous experimentation? Fruit segmentation

a challenge: object perception Object detection Object segmentation Object recognition Typical systems require human-prepared training data – can’t adapt to new situations autonomously Fruit recognition

Object Detection Find the location of an object if it appear in an image Does the object appear? Where is it? Wally / Waldo

Face Detector 8

Face detection We slide a window over the image ? features classify -1 not face x F(x) y We slide a window over the image Extract features for each window Classify each window into face/non-face We start with a walk through a face detector.

Classification Examples are points in Rn + + Examples are points in Rn Positives are separated from negatives by the hyperplane w y=sign(wTx-b) + + + + + - w - + - - We have converted all windows to n-dimensional vectors, or points in N-d - - - -

Classification x  Rn - data points P(x) - distribution of the data + + x  Rn - data points P(x) - distribution of the data y(x) - true value of y for each x F - decision function: y=F(x, )  - parameters of F, e.g. =(w,b) We want F that makes few mistakes + + + + + - w - + - - We have converted all windows to n-dimensional vectors, or points in N-dimensional space. Not all points are equally likely and P(x) captures the distribution of the data. For each point there the correct prediction value y(x). In this - - - -

Loss function Our decision may have severe implications + + POSSIBLE CANCER Our decision may have severe implications L(y(x),F(x, )) - loss function How much we pay for predicting F(x,), when the true value is y(x) Classification error: Hinge loss + + + + + - w - + - - We have converted all windows to n-dimensional vectors, or points in N-dimensional space. Not all points are equally likely and P(x) captures the distribution of the data. For each point there the correct prediction value y(x). In this - ABSOLUTELY NO RISK OF CANCER - - -

Face Detection – basic scheme Classification Result Off-line training Face examples Non-face examples Classifier Feature vector (x1, x2 ,…, xn) Feature Extraction Pixel pattern Search for faces at different resolutions and locations

Feature Engineering or Feature Learning? In Vision: SIFT, HOG, pixels, sparse coding, RBM, autoencoder, LCC, scattering net (Mallat), deep conv net (discriminative feature learning), etc. In NLP: N-gram, hashing, XXX, YYY, ZZZ etc. In Speech: MFCC’s, PLPs, SPLICE (for noise robustness), autoencoder, scattering spectra, learned mapping from filterbank to MFCCs, DNN, etc.

Training and Testing Training Set Labeled Test Set Train Classifier False Positive Correct Labeled Test Set Classify Sensitivity

Learning Components Start with small initial regions Expand into one of four directions Extract new components from images Train SVM classifiers Choose best expansion according to error bound of SVMs

Some Examples

What Types of Problems Fit (not fit) Deep Learning (some conjectures) “Perceptual” AI “Data matching” e.g.: Image/video recognition Speech recognition Speech/text understanding Sequential data with temporal structure (stock market prediction?) e.g.: Malware detection(ICASSP-2013) movie recommender, speaker/language detection? Easy data representation e.g., histogram of events, user-watched movies, etc. Non-obvious data representations Deep networks are not a panacea, however. We know that they work well on visual object recognition and speech recognition. There is early evidence that they will work well on document understanding. In all of these kind of problems, it’s very difficult for an engineer to specify or build a representation for the data (that isn’t just the lowest-level building blocks, such as pixels or words). Deep networks should be state-of-the-art in those kind of problems. Many machine learning applications inside and outside of Microsoft are not difficult AI problems: they fall under ‘data mining’. An example of such a problem is movie recommendation. In movie recommendation, the problem of how to represent data is very easy. For example, you just represent the movies that a user watches in a big sparse matrix. Perhaps you add the demographics of the user. Then, simply apply an existing ML algorithm. We expect deep networks will have no benefit in this (or similar) cases. Deep learning may not win over standard machine learning Deep learning already shows tremendous benefits

face/non-face Classifier Window-based models Generate and score candidates Scans the detector at multiple locations and scales face/non-face Classifier

Haar-features (1) The difference between pixels’ sum of the white and black areas Four types, put on different locations with different scales

Haar-features (2) Capture the face symmetry

Can be extracted at any location with any scale! Haar-features (3) Type A Four types of haar features Can be extracted at any location with any scale! A 24x24 detection window

Integral image (1) Example: Time complexity? 1 2 3 4 3 3 5 8 12 15 Sum of pixel values in the blue area Example: Time complexity? 1 2 3 4 3 2 1 2 2 3 2 1 1 1 2 Image 3 5 8 12 15 5 8 11 16 22 28 9 14 18 24 31 39 Integral image

Integral image (2) Sum(4) = ? d + a – b – c 6-point 8-point 9-point a = sum(1) b = sum(1+2) c = sum(1+3) d = sum(1+2+3+4) 1 2 a b 3 4 c d Sum(4) = ? d + a – b – c Four-point calculation! A, B: 2 rectangles => C: 3 rectangles => D: 4 rectangles => 6-point 8-point 9-point

Feature selection A weak classifier h f1 f2 f1 > θ (a threshold) => Face! f2 ≤ θ (a threshold) => Not a Face! h = 1 if fi > θ 0 otherwise

Feature selection Idea: Combining several weak classifiers to generate a strong classifier …… α1 α3 α2 αT ~performance of the weak classifier on the training set feature: type, location, scale α1h1+ α2h2 + α3h3 + … + αThT >< Tthresold weak classifier (feature, threshold) h1 = 1 or 0

K Nearest Neighbors Memorize all training data Find K closest points to the query The neighbors vote for the label: Vote(+)=2 Vote(–)=1 + + + + + + o + - - - + - - + + - - - - - - - -

K-Nearest Neighbors Nearest Neighbors (silhouettes) Kristen Grauman, Gregory Shakhnarovich, and Trevor Darrell, Virtual Visual Hulls: Example-Based 3D Shape Inference from Silhouettes

K-Nearest Neighbors Silhouettes from other views 3D Visual hull Kristen Grauman, Gregory Shakhnarovich, and Trevor Darrell, Virtual Visual Hulls: Example-Based 3D Shape Inference from Silhouettes

Support vector machines Simple decision Good classification Good generalization + + + + w + margin + - - - + - - + + - - - - - - -

Support vector machines + + + + w + + - - - + - - + + - - - Support vectors: - - - -

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

centered diagonal uncentered cubic-corrected Sobel Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

EE465: Introduction to Digital Image Processing Copyright Xin Li'2003 Neural Network based FD Cited from “Neural network based face detection”, by Henry A. Rowley, Ph.D. thesis, CMU, May 1999 EE465: Introduction to Digital Image Processing Copyright Xin Li'2003

Vehicle Detection Intelligent vehicles aim at improving the driving safety by machine vision techniques

Chain Code

Chain Code

This example shows that the Chain Code is independent of Location, Starting Point and orientation

Segmentation Segmentation Roughly speaking, segmentation is to partition the images into meaningful parts that are relatively homogenous in certain sense

Segmentation by Fitting a Model One view of segmentation is to group pixels (tokens, etc.) belong together because they conform to some model In many cases, explicit models are available, such as a line Also in an image a line may consist of pixels that are not connected or even close to each other

Canny and Hough Together

Hough Transform It locates straight lines It locates straight line intervals It locates circles It locates algebraic curves It locates arbitrary specific shapes in an image But you pay progressively for complexity of shapes by time and memory usage

Hough Transform for circles * You need three parameters to describe a circle * * * * * * Vote space is three dimensional

First Parameterization of Hough Transform for lines

Hough Transform – cont. Straight line case Consider a single isolated edge point (xi, yi) There are an infinite number of lines that could pass through the points Each of these lines can be characterized by some particular equation

Line detection Mathematical model of a line: x y Y = mx + n Y1=m x1+n P(x1,y1) P(x2,y2) YN=m xN+n

Image and Parameter Spaces intercept slope x y Y = mx + n Y1=m x1+n Y2=m x2+n YN=m xN+n Y = m’x + n’ m’ n’ m n Image Space Line in Img. Space ~ Point in Param. Space

Looking at it backwards … Parameter space Y1=m x1+n Can be re-written as: n = -x1 m + Y1 Fix (-x1,y1), Vary (m,n) - Line n = -x1 m + Y1 intercept slope m n m’ n’

Least squares line fitting Data: (x1, y1), …, (xn, yn) Line equation: yi = m xi + b Find (m, b) to minimize y=mx+b (xi, yi) Matlab: p = A \ y; Modified from S. Lazebnik

Hough Transform – cont. Hough transform algorithm 1. Find all of the desired feature points in the image 2. For each feature point For each possibility i in the accumulator that passes through the feature point Increment that position in the accumulator 3. Find local maxima in the accumulator 4. If desired, map each maximum in the accumulator back to image space

HT for Circles Extend HT to other shapes that can be expressed parametrically Circle, fixed radius r, centre (a,b) (x1-a)2 + (x2-b)2 = r2 accumulator array must be 3D unless circle radius, r is known re-arrange equation so x1 is subject and x2 is the variable for every point on circle edge (x,y) plot range of (x1,x2) for a given r

Hough Transform – cont. Here the radius is fixed

Hough circle Fitting

Hough circle Fitting