Categorical Perception 강우현. Introduction Scalable representations for visual categorization Representation for functional and affordance-based categorization.

Categorical Perception 강우현

Introduction Scalable representations for visual categorization Representation for functional and affordance-based categorization Contents

Recently, there has been significant progress in visual class recognition Visual category modeling Visual category learning Visual category robustness Scalability to more classes This chapter will be discussing about important design/paradigm choices Highlighting the implied metrics and draw-backs Introduction1

Many object classes have high intra-class variability but many share similar features and appearances across different classes This may lead to low inter-class variation Introduction1

Generative vs. Discriminative paradigms Generative paradigm The object model represents all properties or object instances of an object class Learn each class and determine as to which class the observation belongs to Computationally expensive Ex) GMM (Gaussian mixture models), HMM (hidden Markov model) Discriminative paradigm The model retreats to a simpler mapping from the observed properties to the predicted category label Determine the differences without learning any classes Requires a large training set including a substantial amount of background data Ex) Logistic regression, SVM (support vector machine) Introduction1

Local vs. Global object representation paradigms Local object representation Can be made robust to scale and rotation changes Enable robustness to partial occlusion Require sophisticated models to describe the geometric and topological structure of object classes Global object representation Describe the geometric and topological structure by describing the global appearance of an object Do not expose the desired robustness to partial occlusion Difficult to generalize cross-instance Introduction1

Different Learning paradigms Pixel-level segmentation Bounding-box annotations Image level annotation Unsupervised methods Introduction1 Fine level annotation More supervision Less training data Coarse level annotation Less supervision More training data No label

Central role of representation Representation plays a central role in a cognitive system The way categories are stored, organized and related to each other determines the capabilities to learn, evolve and interact of the overall system Introduction1

Hierarchical representation Derive and organize the features at multiple levels Build on top of each other by exploiting the sharability of features among more complex compositions High computational efficiency and expressive power  Propose a novel approach to representing object categories within an indexable, hierarchical compositional framework Scalable representations for visual categorization 2

Hierarchical compositionality Each hierarchical unit is shared among many more complex higher layer compositions Computational cost is highly reduced compared to searching for each complex interpretation in isolation The receptive field sizes increase with the level of hierarchy Formed compositions are designed to respond to only smaller spatial subsets in their receptive fields Higher robustness and faster processing Statistics driven learning Features and higher level combinations are learned in an unsupervised manner Avoid hand-labelling of massive image data Capture the regularities within the visual data effectively and compactly Scalable representations for visual categorization 2

Robust and Repeatable detection Features comprising the individual hierarchical layer should be manifested as models Enable robust verification of the presence of their underlying components Models should incorporate loose geometric relations to achieve the spatial binding of features Should encode enough flexibility to ensure repeatability and gain discrimination gradually through composition within the hierarchy Scalable representations for visual categorization 2

Indexing List in the library can accessed in constant time during image processing Retrieve the local spatial neighborhood Matching Compare the local spatial neighborhood against the allowable prototypical compositions within the hierarchical library Check the presence of subparts pertaining the compositions; Relative locations Allowed variance Position of the central part Scalable representations for visual categorization 2

Unsupervised learning of part compositions Extract statistically salient compositions that encode spatial relations between the constituent parts from the layer below Local inhibition performed around each image feature Statistical updating of spatial maps that capture pairwise geometric relations between parts Learning the higher order compositions by tracking co-occurrence of spatial pairs Category-specific higher layers Learning of higher layers proceeds only on a subset of parts The ones that are the most repeatable in a specific category Performed in images of individual categories The final categorical layer combines the most repeatable parts through the object center to form the representation of a category Scalable representations for visual categorization 2

Experiments and results Scalable representations for visual categorization 2

Functional/affordance-based categorization An object’s shape and geometry should be represented explicitly Local feature-based representation have rarely been employed In this work, various shape-based features with appearance-based descriptors will be compared Representation for functional and affordance-based categorization 3

Shape-based features K-adjacent segments Extension to contour segment networks (graph-based method for template-matching) Edgels are detected using boundary detector Neighboring edgels are chained Edgel-chains are replaced by straight line approximations (contour segments), and joined into a global contour segment network for the image Representation for functional and affordance-based categorization 3

Shape-based features Geometric blur Robust to small affine distortions Extracts 4 channels of oriented edge energy to obtain a sparse signal Within the sparse signal, the region centered at the interest point is blurred with a Gaussian kernel to obtain the geometric blur. The geometric blur is then sub-sampled over all channels at distinct locations in a circular grid The final descriptor is the concatenation of all samples Representation for functional and affordance-based categorization 3

Shape-based features Shape context Originally based on edge information For a given interest point location, it accumulates the relative locations of nearby edge points in a coarse log-polar histogram Compute a histogram containing 9 spatial bins over 4 edge orientation channels Similar for homologous points and dissimilar for non-homologous points Representation for functional/affordance-based categorization 3

Appearance-based descriptors SIFT (scale invariant feature transform) 3D histogram over local gradient locations and orientations Weighted by gradient and magnitude Representation for functional and affordance-based categorization 3

Appearance-based descriptors GLOH (gradient location orientation histograms) Extension of the SIFT descriptor 17 bins for location and 16 bins for orientation in a histogram over a log-polar location grid Interest point detectors Compute local region descriptors GB, SC, SIFT, GLOH Harris-Laplace Selects corners at locations Hessian-Laplace Responds to blob-like structures Salient Regions Identifies local image regions that are non- predictable across scales Representation for functional and affordance-based categorization 3

Evaluation Cluster precision Measure to what extent features of a given class are grouped together by clustering High scores will be obtained by big clusters with features from many instances of a single object class Low scores will be obtained by small clusters with few features but from multiple classes Representation for functional and affordance-based categorization 3

Evaluation Naïve Bayes Represent object in terms of occurrence statistics over codebook entries Train a multi-class-classifier on a training set of representations Model the posterior distribution of an object class Representation for functional and affordance-based categorization 3

Evaluation Localized Bag-of-Words Measure the impact of adding location information in terms of classification accuracy Based on histograms of feature occurrences over a codebook Representation for functional and affordance-based categorization 3

Results Local shape and appearance based features do not show great difference The choice of detector is more important on average than the choice of descriptor Hessian-Laplace with SIFT and GLOH is best on average Shape based features performs mostly worse than appearance based features K-AS capture generic local shape properties rather than discriminant information for an object category Benefit more from added location information than appearance based features Representation for functional and affordance-based categorization 3

Thank You!!!

Categorical Perception 강우현. Introduction Scalable representations for visual categorization Representation for functional and affordance-based categorization.

Similar presentations

Presentation on theme: "Categorical Perception 강우현. Introduction Scalable representations for visual categorization Representation for functional and affordance-based categorization."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Categorical Perception 강우현. Introduction Scalable representations for visual categorization Representation for functional and affordance-based categorization.

Similar presentations

Presentation on theme: "Categorical Perception 강우현. Introduction Scalable representations for visual categorization Representation for functional and affordance-based categorization."— Presentation transcript:

Similar presentations

About project

Feedback