Categorical Perception 강우현. Introduction Scalable representations for visual categorization Representation for functional and affordance-based categorization.

Slides:

Advertisements

Similar presentations

Distinctive Image Features from Scale-Invariant Keypoints

Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.

Distinctive Image Features from Scale-Invariant Keypoints David Lowe.

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.

DONG XU, MEMBER, IEEE, AND SHIH-FU CHANG, FELLOW, IEEE Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment.

Presented by Xinyu Chang

Evaluating Color Descriptors for Object and Scene Recognition Koen E.A. van de Sande, Student Member, IEEE, Theo Gevers, Member, IEEE, and Cees G.M. Snoek,

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.

Fitting: The Hough transform. Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not.

MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…

The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features Kristen Grauman Trevor Darrell MIT.

A NOVEL LOCAL FEATURE DESCRIPTOR FOR IMAGE MATCHING Heng Yang, Qing Wang ICME 2008.

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.

Groups of Adjacent Contour Segments for Object Detection Vittorio Ferrari Loic Fevrier Frederic Jurie Cordelia Schmid.

Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA

Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.

ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 

Fitting: The Hough transform

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

Computational Vision: Object Recognition Object Recognition Jeremy Wyatt.

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

A Study of Approaches for Object Recognition

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.

Distinctive Image Feature from Scale-Invariant KeyPoints

Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.

CS292 Computational Vision and Language Visual Features - Colour and Texture.

1 Invariant Local Feature for Object Recognition Presented by Wyman 2/05/2006.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.

Scalable Text Mining with Sparse Generative Models

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.

Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.

Computer vision.

Final Exam Review CS485/685 Computer Vision Prof. Bebis.

Advanced Computer Vision Chapter 4 Feature Detection and Matching Presented by: 傅楸善 & 許承偉

Recognition and Matching based on local invariant features Cordelia Schmid INRIA, Grenoble David Lowe Univ. of British Columbia.

FlowString: Partial Streamline Matching using Shape Invariant Similarity Measure for Exploratory Flow Visualization Jun Tao, Chaoli Wang, Ching-Kuang Shene.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.

A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.

COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.

Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

Fitting: The Hough transform

Local invariant features Cordelia Schmid INRIA, Grenoble.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,

CSE 185 Introduction to Computer Vision Feature Matching.

A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )

Colour and Texture. Extract 3-D information Using Vision Extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.

CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

Image features and properties. Image content representation The simplest representation of an image pattern is to list image pixels, one after the other.

Distinctive Image Features from Scale-Invariant Keypoints

An Additive Latent Feature Model

Machine Learning Basics

Mean Shift Segmentation

Paper Presentation: Shape and Matching

Dynamical Statistical Shape Priors for Level Set Based Tracking

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

CAP 5415 Computer Vision Fall 2012 Dr. Mubarak Shah Lecture-5

“The Truth About Cats And Dogs”

Brief Review of Recognition + Context

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

KFC: Keypoints, Features and Correspondences

CSE 185 Introduction to Computer Vision

Presented by Xu Miao April 20, 2005

Recognition and Matching based on local invariant features

Presentation transcript:

Categorical Perception 강우현

Introduction Scalable representations for visual categorization Representation for functional and affordance-based categorization Contents

Recently, there has been significant progress in visual class recognition Visual category modeling Visual category learning Visual category robustness Scalability to more classes This chapter will be discussing about important design/paradigm choices Highlighting the implied metrics and draw-backs Introduction1

Many object classes have high intra-class variability but many share similar features and appearances across different classes This may lead to low inter-class variation Introduction1

Generative vs. Discriminative paradigms Generative paradigm The object model represents all properties or object instances of an object class Learn each class and determine as to which class the observation belongs to Computationally expensive Ex) GMM (Gaussian mixture models), HMM (hidden Markov model) Discriminative paradigm The model retreats to a simpler mapping from the observed properties to the predicted category label Determine the differences without learning any classes Requires a large training set including a substantial amount of background data Ex) Logistic regression, SVM (support vector machine) Introduction1

Local vs. Global object representation paradigms Local object representation Can be made robust to scale and rotation changes Enable robustness to partial occlusion Require sophisticated models to describe the geometric and topological structure of object classes Global object representation Describe the geometric and topological structure by describing the global appearance of an object Do not expose the desired robustness to partial occlusion Difficult to generalize cross-instance Introduction1

Different Learning paradigms Pixel-level segmentation Bounding-box annotations Image level annotation Unsupervised methods Introduction1 Fine level annotation More supervision Less training data Coarse level annotation Less supervision More training data No label

Central role of representation Representation plays a central role in a cognitive system The way categories are stored, organized and related to each other determines the capabilities to learn, evolve and interact of the overall system Introduction1

Hierarchical representation Derive and organize the features at multiple levels Build on top of each other by exploiting the sharability of features among more complex compositions High computational efficiency and expressive power  Propose a novel approach to representing object categories within an indexable, hierarchical compositional framework Scalable representations for visual categorization 2

Hierarchical compositionality Each hierarchical unit is shared among many more complex higher layer compositions Computational cost is highly reduced compared to searching for each complex interpretation in isolation The receptive field sizes increase with the level of hierarchy Formed compositions are designed to respond to only smaller spatial subsets in their receptive fields Higher robustness and faster processing Statistics driven learning Features and higher level combinations are learned in an unsupervised manner Avoid hand-labelling of massive image data Capture the regularities within the visual data effectively and compactly Scalable representations for visual categorization 2

Robust and Repeatable detection Features comprising the individual hierarchical layer should be manifested as models Enable robust verification of the presence of their underlying components Models should incorporate loose geometric relations to achieve the spatial binding of features Should encode enough flexibility to ensure repeatability and gain discrimination gradually through composition within the hierarchy Scalable representations for visual categorization 2

2

2

Indexing List in the library can accessed in constant time during image processing Retrieve the local spatial neighborhood Matching Compare the local spatial neighborhood against the allowable prototypical compositions within the hierarchical library Check the presence of subparts pertaining the compositions; Relative locations Allowed variance Position of the central part Scalable representations for visual categorization 2

Unsupervised learning of part compositions Extract statistically salient compositions that encode spatial relations between the constituent parts from the layer below Local inhibition performed around each image feature Statistical updating of spatial maps that capture pairwise geometric relations between parts Learning the higher order compositions by tracking co-occurrence of spatial pairs Category-specific higher layers Learning of higher layers proceeds only on a subset of parts The ones that are the most repeatable in a specific category Performed in images of individual categories The final categorical layer combines the most repeatable parts through the object center to form the representation of a category Scalable representations for visual categorization 2

Experiments and results Scalable representations for visual categorization 2

Functional/affordance-based categorization An object’s shape and geometry should be represented explicitly Local feature-based representation have rarely been employed In this work, various shape-based features with appearance-based descriptors will be compared Representation for functional and affordance-based categorization 3

Shape-based features K-adjacent segments Extension to contour segment networks (graph-based method for template-matching) Edgels are detected using boundary detector Neighboring edgels are chained Edgel-chains are replaced by straight line approximations (contour segments), and joined into a global contour segment network for the image Representation for functional and affordance-based categorization 3

Shape-based features Geometric blur Robust to small affine distortions Extracts 4 channels of oriented edge energy to obtain a sparse signal Within the sparse signal, the region centered at the interest point is blurred with a Gaussian kernel to obtain the geometric blur. The geometric blur is then sub-sampled over all channels at distinct locations in a circular grid The final descriptor is the concatenation of all samples Representation for functional and affordance-based categorization 3

Shape-based features Shape context Originally based on edge information For a given interest point location, it accumulates the relative locations of nearby edge points in a coarse log-polar histogram Compute a histogram containing 9 spatial bins over 4 edge orientation channels Similar for homologous points and dissimilar for non-homologous points Representation for functional/affordance-based categorization 3

Appearance-based descriptors SIFT (scale invariant feature transform) 3D histogram over local gradient locations and orientations Weighted by gradient and magnitude Representation for functional and affordance-based categorization 3

Appearance-based descriptors GLOH (gradient location orientation histograms) Extension of the SIFT descriptor 17 bins for location and 16 bins for orientation in a histogram over a log-polar location grid Interest point detectors Compute local region descriptors GB, SC, SIFT, GLOH Harris-Laplace Selects corners at locations Hessian-Laplace Responds to blob-like structures Salient Regions Identifies local image regions that are non- predictable across scales Representation for functional and affordance-based categorization 3

Evaluation Cluster precision Measure to what extent features of a given class are grouped together by clustering High scores will be obtained by big clusters with features from many instances of a single object class Low scores will be obtained by small clusters with few features but from multiple classes Representation for functional and affordance-based categorization 3

Evaluation Naïve Bayes Represent object in terms of occurrence statistics over codebook entries Train a multi-class-classifier on a training set of representations Model the posterior distribution of an object class Representation for functional and affordance-based categorization 3

Evaluation Localized Bag-of-Words Measure the impact of adding location information in terms of classification accuracy Based on histograms of feature occurrences over a codebook Representation for functional and affordance-based categorization 3

Results Local shape and appearance based features do not show great difference The choice of detector is more important on average than the choice of descriptor Hessian-Laplace with SIFT and GLOH is best on average Shape based features performs mostly worse than appearance based features K-AS capture generic local shape properties rather than discriminant information for an object category Benefit more from added location information than appearance based features Representation for functional and affordance-based categorization 3

Thank You!!!