COMPUTER VISION: SOME CLASSICAL PROBLEMS ADWAY MITRA MACHINE LEARNING LABORATORY COMPUTER SCIENCE AND AUTOMATION INDIAN INSTITUTE OF SCIENCE June 24, 2013.

Slides:

Advertisements

Similar presentations

Joint Face Alignment The Recognition Pipeline

Advertisements

Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition Waqas Sultani, Imran Saleemi CVPR 2014.

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Vision Based Control Motion Matt Baker Kevin VanDyke.

Adviser ： Ming-Yuan Shieh Student ID ： M Student ： Chung-Chieh Lien VIDEO OBJECT SEGMENTATION AND ITS SALIENT MOTION DETECTION USING ADAPTIVE BACKGROUND.

Human-Computer Interaction Human-Computer Interaction Segmentation Hanyang University Jong-Il Park.

Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.

December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.

Local Descriptors for Spatio-Temporal Recognition

Lecture 5 Template matching

Advanced Computer Vision Introduction Goal and objectives To introduce the fundamental problems of computer vision. To introduce the main concepts and.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

A Study of Approaches for Object Recognition

Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.

Distinctive Image Feature from Scale-Invariant KeyPoints

CS 223B Assignment 1 Help Session Dan Maynes-Aminzade.

Scale Invariant Feature Transform (SIFT)

Tracking Video Objects in Cluttered Background

Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques

Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Real-Time Face Detection and Tracking Using Multiple Cameras RIT Computer Engineering Senior Design Project John RuppertJustin HnatowJared Holsopple This.

Learning to classify the visual dynamics of a scene Nicoletta Noceti Università degli Studi di Genova Corso di Dottorato.

Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.

Computer vision.

Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)

EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.

Introduction to Computer Vision Olac Fuentes Computer Science Department University of Texas at El Paso El Paso, TX, U.S.A.

CS654: Digital Image Analysis Lecture 3: Data Structure for Image Analysis.

Professor: S. J. Wang Student : Y. S. Wang

S EGMENTATION FOR H ANDWRITTEN D OCUMENTS Omar Alaql Fab. 20, 2014.

Digital Image Processing CCS331 Relationships of Pixel 1.

Computer Vision Why study Computer Vision? Images and movies are everywhere Fast-growing collection of useful applications –building representations.

Pedestrian Detection and Localization

ECE738 Advanced Image Processing Face Detection IEEE Trans. PAMI, July 1997.

Face Recognition: An Introduction

December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.

Vehicle Segmentation and Tracking From a Low-Angle Off-Axis Camera Neeraj K. Kanhere Committee members Dr. Stanley Birchfield Dr. Robert Schalkoff Dr.

School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.

1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.

Efficient Visual Object Tracking with Online Nearest Neighbor Classifier Many slides adapt from Steve Gu.

Expectation-Maximization (EM) Case Studies

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.

Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

CSSE463: Image Recognition Day 29 This week This week Today: Surveillance and finding motion vectors Today: Surveillance and finding motion vectors Tomorrow:

Suspicious Behavior in Outdoor Video Analysis - Challenges & Complexities Air Force Institute of Technology/ROME Air Force Research Lab Unclassified IED.

Face Detection Using Neural Network By Kamaljeet Verma ( ) Akshay Ukey ( )

Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com.

Image features and properties. Image content representation The simplest representation of an image pattern is to list image pixels, one after the other.

A Discriminatively Trained, Multiscale, Deformable Part Model Yeong-Jun Cho Computer Vision and Pattern Recognition,2008.

Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons.

Machine learning & object recognition Cordelia Schmid Jakob Verbeek.

Another Example: Circle Detection

Visual homing using PCA-SIFT

Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Recognition: Face Recognition

Dynamical Statistical Shape Priors for Level Set Based Tracking

State-of-the-art face recognition systems

Vehicle Segmentation and Tracking in the Presence of Occlusions

“The Truth About Cats And Dogs”

Presented by: Yang Yu Spatiotemporal GMM for Background Subtraction with Superpixel Hierarchy Mingliang Chen, Xing Wei, Qingxiong.

Brief Review of Recognition + Context

Learning complex visual concepts

Presentation transcript:

COMPUTER VISION: SOME CLASSICAL PROBLEMS ADWAY MITRA MACHINE LEARNING LABORATORY COMPUTER SCIENCE AND AUTOMATION INDIAN INSTITUTE OF SCIENCE June 24, 2013

WHAT IS COMPUTER VISION and WHY IS IT DIFFICULT? Computer Vision, obviously, aims to build computers that can see! In other words, it deals with analyzing/understanding images and videos through computers Aim of analysis is to find known patterns in images - Detection, or match images with known patterns - Recognition For analysis of image we first need a representation for it An image is stored in a computer as a 2 or 3 dimensional matrix, each element a pixel A single pixel carries very little, if any, semantic information!!! !

Representation with Features For most applications of machine learning, the first and foremost step is to find features Features are used for representation of the data Features should be such that we can have a metric space for them - usually they are vectors Very elaborate features (high-dimensional) need to be avoided for computational reasons Feature Vector- Difficult to process Smaller Feature Vector Representation Dimensionality Reduction

Features for Computer Vision Pixel values can serve as features, but are often not very meaningful Groups of pixels can have more meaning- but how to form such groups?? Groups-of-pixels/sub-images at large number of scales and positions Image gradients/edges Various Filter Outputs have also been explored Difficult to interpret semantically, but found to work well in certain applications Finding concise, semantically meaningful features still a very major issue in Computer Vision

SIFT Interest Points A filter is an operator which processes a signal and removes some undesired components Difference-of-Gaussian Filters - a popular filter for images Positions of local maxima of this filter output are the interest points Some interest points, like those on the edges, are discarded At each interest point, a feature vector is computed using image gradients and their orientations inside small windows around the interest point This feature is invariant to orientation and scale of the image SIFT: Scale-Invariant Feature Transform

SIFT INTEREST POINTS

FACE DETECTION-PROBLEM Given an image, find the faces in it. Used in many places like digital cameras and photo sharing albums, including Facebook Given a rectangular region in an image, say if it is a face or not! Repeat this process for every location and every size of the rectangular region

FACE DETECTION-GENERAL APPROACH Basically a binary classification problem Requires building model for face Needs training samples- both positive and negative Positive samples are face images, negative samples are non-face images FACE images NON-FACE images

FACE DETECTION-GENERAL APPROACH Basically a binary classification problem Requires building model for face Needs training samples- both positive and negative Positive samples are face images, negative samples are non-face images Learning algorithm finds boundary between face and non-face images FACE images NON-FACE images

FACE DETECTION-GENERAL APPROACH Basically a binary classification problem Requires building model for face Needs training samples- both positive and negative Positive samples are face images, negative samples are non-face images Learning algorithm finds boundary between face and non-face images FACE images NON-FACE images Candidate

FACE DETECTION- BENCHMARK and EVALUATION Standard face-detection benchmark datasets available FDDB: Face Detection dataset for unconstrained setting Performance usually measured using Precision and Recall Precision: Of the reported face detections, how many were actually faces? Recall: Of the faces actually present, how many were detected? F-score: Harmonic mean of precision and recall

FACE RECOGNITION-PROBLEM Consists of a training phase and a testing phase In the training phase we are given many face images, each marked with the identity of the person In the testing phase, we are given a new face image, belonging to one of these persons The task is to find out the identity of the person This is a simple Classification problem in Machine Learning First suitable features and representations have to be found

FACE RECOGNITION-PROBLEM One approach is to build a model for each person, using the training images provided for him Second approach is to compare the test image to each of the training images, and find the closest match It may be observed that not every part of face image helps in recognition- certain things about faces are common to everyone A good strategy is to find the features that are most distinctive and represent images only by them Eigenfaces (1991) uses the last two strategies Recognition accuracy is the obvious evaluation criteria A good recognition algorithm should work well with less number of training images

FACE RECOGNITION-CURRENT STATUS Face recognition has traditionally been done with well-cropped, focussed face images - Controlled Environment Considered a solved problem. Nowadays face recognition is being revisited for semi-controlled or uncontrolled environments. LFW (Labelled Faces in Wild) - a dataset of face images taken in such settings - a new benchmark

OBJECT RECOGNITION- PROBLEM Classification task like face recognition Practically much more complex Large number of images given from many object categories Classify a test image into one of these categories Problem made very difficult by intra-class variations

OBJECT RECOGNITION- GENERAL APPROACH Once again the idea is to build models for different objects No single feature may be enough for classification Some objects may have a distinctive color, others may have a distinctive shape Multiple Kernel Learning - a sophisticated machine learning formulation, generally considered the best approach for this problem Caltech-101: a dataset of 101 object categories Close to 80 % accuracy obtained by Multiple Kernel Learning Caltech-256: a dataset of 256 object categories - Accuracy of 50 % considered good! Intra-class variations continue to pose significant challenge and even scepticism - is it at all a valid problem???

OBJECT DETECTION Given an image find all the birds, trees, and cars in it! Requires building models for each of these objects Once again search entire image at multiple positions and scales Part-based Models of objects considered efficient Instead of modelling whole object, model different parts separately Helps to handle occlusion and perhaps intra-class variations

IMAGE SEGMENTATION Given an image, divide it such that each segment contains an object Basically a clustering problem Does not require features and is done purely with pixel values Has inspired advanced clustering techniques like spectral clustering Graph-based method- models image as graph with each pixel representing a node and adjacent pixels connected by edges Each edge is given a weight according to similarilty of the corresponding pixel values Requires number of segments to be specified

IMAGE SEGMENTATION Segmentation evaluated with respect to a gold standard segmentation Every pair of pixels coming in the same segment in the gold standard should also be in same segment in the segmentation (and similarly for each pair of pixels coming in different segments)

Video Problems Videos are collections of images taken over an interval of time- successive images are quite similar Having to handle several images rather than one may make video problems tougher But the temporal continuity of videos provides a way out Joint modelling of multiple similar images can, in fact, give better performance than modelling single image For video tasks, additional motion-based features like optical flow can be used Concept of Interest-points for images is extended to Space-Time Interest Points for videos Face Recognition, Face Detection etc can also be done in videos, often more effectively than in images

OBJECT TRACKING-PROBLEM Given a video which shows a person/object moving Need to find it in each frame Naive approach- reduce it to object detection problem If object is at position (x, y) in frame t, it will be very close in frame (t + 1) So if we know the position in time t, we need to search only around that same position Reduces search space greatly!! Main idea is to build an appearance model for the object The appearance may change over time due to variations in size, illumination, viewpoint etc The appearance model must be adaptive- and recomputed throughout the video

OBJECT TRACKING- BENCHMARK and EVALUATION Performance measured with respect to gold standard, where in each frame a bounding box is provided Proportion of overlapping areas of the gold standard and reported bounding boxes

OBJECT TRACKING-CURRENT STATUS Considered a solved problem under controlled illumination and background Current research aims to handle occlusion of the object, and sudden changes in background and illumination Tracking multiple objects at the same time is another important problem Tracking is a real-time application. Efforts are on to process as many frames as possible per second To adapt or not adapt- remains the fundamental problem in vision. A single miss can make the whole tracking go wrong. Detection and correction of miss is an important problem to solve

ACTION RECOGNITION IN VIDEOS Surveillance cameras are nowadays available at many sensitive public locations The aim is to record activities of people Requires use of dynamic features, which make use of the motion in videos Some image-based features can be extended to videos, like space-time interest points These can be used by viewing the video as a space-time volume The features can also be in the form of time-series

ACTION RECOGNITION IN VIDEOS In presenece of a benign background, static camera and a single actor, the problem is considered solved Current research aims to handle complex environments, like crowded places, where the persons frequently get occluded Multi-person interaction recognition is another recent branchout of the problem