Grammar of Image Zhaoyin Jia, 03-30-2009. Problems  Enormous amount of vision knowledge:  Computational complexity  Semantic gap …… Classification,

Slides:

Advertisements

Similar presentations

Office of SA to CNS GeoIntelligence Introduction Data Mining vs Image Mining Image Mining - Issues and Challenges CBIR Image Mining Process Ontology.

Advertisements

Recognizing Surfaces using Three-Dimensional Textons Thomas Leung and Jitendra Malik Computer Science Division University of California at Berkeley.

Analysis of Contour Motions Ce Liu William T. Freeman Edward H. Adelson Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.

Image Analysis Phases Image pre-processing –Noise suppression, linear and non-linear filters, deconvolution, etc. Image segmentation –Detection of objects.

Parsing Clothing in Fashion Photographs

Outline  Facial Attributes Analysis  Animated Pose Templates(APT) for Modeling and Detecting Human Actions  Unsupervised Structure Learning of Stochastic.

Yuanlu Xu Advisor: Prof. Liang Lin Person Re-identification by Matching Compositional Template with Cluster Sampling.

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Qualifying Exam: Contour Grouping Vida Movahedi Supervisor: James Elder Supervisory Committee: Minas Spetsakis, Jeff Edmonds York University Summer 2009.

UNIVERSIDADE ESTADUAL DE MATO GROSSO FACULDADE DE CIÊNCIAS EXATAS CAMPUS DE BARRA DO BUGRES ROOF CONTOURS RECOGNITION USING LIDAR DATA AND MARKOV RANDOM.

Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)

A KLT-Based Approach for Occlusion Handling in Human Tracking Chenyuan Zhang, Jiu Xu, Axel Beaugendre and Satoshi Goto 2012 Picture Coding Symposium.

Computer and Robot Vision I

Image Parsing: Unifying Segmentation and Detection Z. Tu, X. Chen, A.L. Yuille and S-C. Hz ICCV 2003 (Marr Prize) & IJCV 2005 Sanketh Shetty.

Learning to Detect A Salient Object Reporter: 鄭綱 (3/2)

Good morning, everyone, thank you for coming to my presentation.

1 On the Statistical Analysis of Dirty Pictures Julian Besag.

Primal Sketch Integrating Structure and Texture Ying Nian Wu UCLA Department of Statistics Keck Meeting April 28, 2006 Guo, Zhu, Wu (ICCV, 2003; GMBV,

Zhu, Song-Chun and Mumford, David. A Stochastic Grammar of Images. Foundations and Trends in Computer Graphics and Vision 2(4), (2006) Hemerson.

Automatic 2D-3D Registration Student: Lingyun Liu Advisor: Prof. Ioannis Stamos.

CS 223B Assignment 1 Help Session Dan Maynes-Aminzade.

E.G.M. PetrakisTexture1 Repeative patterns of local variations of intensity on a surface –texture pattern: texel Texels: similar shape, intensity distribution.

1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.

A M P C CC C Automatic Contextual Pattern Modeling Pengyu Hong Beckman Institute for Advanced Science and Technology University of Illinois at Urbana Champaign.

Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques

REALTIME OBJECT-OF-INTEREST TRACKING BY LEARNING COMPOSITE PATCH-BASED TEMPLATES Yuanlu Xu, Hongfei Zhou, Qing Wang*, Liang Lin Sun Yat-sen University,

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Recognition using Regions (Demo) Sudheendra V. Outline Generating multiple segmentations –Normalized cuts [Ren & Malik (2003)] Uniform regions –Watershed.

Project title : Automated Detection of Sign Language Patterns Faculty: Sudeep Sarkar, Barbara Loeding, Students: Sunita Nayak, Alan Yang Department of.

Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.

Leo Zhu CSAIL MIT Joint work with Chen, Yuille, Freeman and Torralba 1.

Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)

ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.

Graphical Models in Vision. Alan L. Yuille. UCLA. Dept. Statistics.

Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.

Markov Random Fields Probabilistic Models for Images

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

Deformable Part Model Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 11 st, 2013.

In Defense of Nearest-Neighbor Based Image Classification Oren Boiman The Weizmann Institute of Science Rehovot, ISRAEL Eli Shechtman Adobe Systems Inc.

Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.

Scene Completion Using Millions of Photographs James Hays, Alexei A. Efros Carnegie Mellon University ACM SIGGRAPH 2007.

Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.

Levels of Image Data Representation 4.2. Traditional Image Data Structures 4.3. Hierarchical Data Structures Chapter 4 – Data structures for.

Human Re-identification by Matching Compositional Template with Cluster Sampling Yuanlu Xu 1, Liang Lin 1, Wei-Shi Zheng 1, Xiaobai Liu 2 Abstract This.

Grammars in computer vision

Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.

Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N 工科所錢雅馨 2011/01/16 Li-Jia Li, Richard.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Bayesian Inference and Visual Processing: Image Parsing & DDMCMC. Alan Yuille (Dept. Statistics. UCLA) Tu, Chen, Yuille & Zhu (ICCV 2003).

By Kyle McCardle.  Issues with Natural Language  Basic Components  Syntax  The Earley Parser  Transition Network Parsers  Augmented Transition Networks.

Gaussian Conditional Random Field Network for Semantic Segmentation

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Nonparametric Semantic Segmentation

Adversarially Tuned Scene Generation

Outline Statistical Modeling and Conceptualization of Visual Patterns

Image Segmentation Techniques

Edges/curves /blobs Grammars are important because:

Outline S. C. Zhu, X. Liu, and Y. Wu, “Exploring Texture Ensembles by Efficient Markov Chain Monte Carlo”, IEEE Transactions On Pattern Analysis And Machine.

Image Parsing & DDMCMC. Alan Yuille (Dept. Statistics. UCLA)

Brief Review of Recognition + Context

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Human-object interaction

Deep Structured Scene Parsing by Learning with Image Descriptions

Random Neural Network Texture Model

Learning to Detect Human-Object Interactions with Knowledge

Presentation transcript:

Grammar of Image Zhaoyin Jia,

Problems  Enormous amount of vision knowledge:  Computational complexity  Semantic gap …… Classification, Recognition

Task of image parsing

Objectives in this paper  Framework for vision  And-Or Graph  Algorithm for this framework  Top-down/bottom-up computation  Generalization of small sample  Use Monte Carlos simulation to synthesis more configurations  Fill the semantic gap

Grammar  Language: co-occurance of s is more than chance  Image: Parallel; T-junction CONSTANTINOPLE

Formulation of grammar  Start symbol: S  Non-terminal nodes: V N  Reproduction Rule: R  Terminal nodes: V T

Formulation of grammar  Start symbol: S  Non-terminal nodes: V N  Reproduction Rule: R  Terminal nodes: V T

Formulation of grammar  Start symbol: S  Non-terminal nodes: V N  Reproduction Rule: R  Terminal nodes: V T S NP VP VP VP PP VP V NP ……

Formulation of grammar  Start symbol: S  Non-terminal nodes: V N  Reproduction Rule: R  Terminal nodes: V T

Formulation of grammar  Start symbol: S  Non-terminal nodes: V N  Reproduction Rule: R  Terminal nodes: V T

Image grammar  Start symbol: S  Reproduction Rules  Non-terminal nodes: V N  Terminal nodes: V T

Overlapping parts/Ambiguity

 Similar color, occlusion, etc. Overlapping parts/Ambiguity

 For each V N, we have reproduction rules: with a probability associated with each one:  Probability of parsing tree:  Probability of sentence: Stochastic Context Free Grammar

Stochastic Grammar with Context  From left to right: bi-gram model (Markov chain) a sentence with n words:  Non-local relations: tree model

New issues in Image Grammar  Loss of “left to right” order: region adjacency graph

New issues in Image Grammar  Scaling makes different terminal in parsing tree

New issues in Image Grammar  Switch between texture and structure

Building the image grammar  Visual Vocabulary: primitives, sketch graph, textons…  Relations and configurations: co-occurance, attached, hinged, supported, occluded…  And-or Graph representation embedding image grammar  Learning /testing the parse graph find the possible inference

Database  Lotus Hill Institute Dataset  636,748 images, 3,927,130 Physical Objects  A few hundred are free Benjamin Yao, Xiong Yang, and Song-Chun Zhu, “Introduction to a large scale general purpose ground truth dataset: methodology, annotation tool, and benchmarks.” EMMCVPR,

Free Data  6 categories, 145 subsets Manmade Object 75 Nature Object 40 Objects in Scene 6 Transportation 9 UCLA Aerial Image 5 UIUC Sport Activity 10  Outline & segmentation of the object

Free Data  6 categories, 145 subsets Manmade Object 75 Nature Object 40 Objects in Scene 6 Transportation 9 UCLA Aerial Image 5 UIUC Sport Activity 10  Segmentation of a scene (street)

Free Data  6 categories, 145 subsets Manmade Object 75 Nature Object 40 Objects in Scene 6 Transportation 9 UCLA Aerial Image 5 UIUC Sport Activity 10  Physical parts of the object

Visual Vocabulary  The “Lego Land”  Language

Visual Vocabulary   : function of image primitives : a) geometry transformation b) appearance  : bond between each primitives

Visual Vocabulary  Sketch and Texture  S. C. Zhu, Y. N. Wu, and D. B. Mumford, “Minimax entropy principle and its applications to texture modeling,” Neural Computation, vol. 9, no. 8, pp. 1627–1660, November 1997

Primal sketch model Input image Sketch graph Texture pixels C. E. Guo, S. C. Zhu, and Y. N. Wu, “Primal sketch: Integrating texture and structure,” in Proceedings of International Conference on Computer Vision,2003.

Primal sketch model C. E. Guo, S. C. Zhu, and Y. N. Wu, “Primal sketch: Integrating texture and structure,” in Proceedings of International Conference on Computer Vision,2003.

High level visual vocabulary  Cloth: collar, left/right sleeves, hands H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006

Relations and configurations  Definition of relation: bonds: relations:, : structure, : compatibility  Three types of relations  Bonds and connections  Joints and junctions  Object interactions/semantics  Definition of configurations:

Relations  Bonds and connections connects primitives into bigger graphs intensity/color compatibility

Relations  Joint and junctions

Relations  Object interactions

Configuration  Spatial layout of entities at a certain level Primal sketch – parts – object – scene

Reconfigurable graphs  Treat bonds as random variables: address nodes

Inference of the configuration  Have the primal sketch of the image  Detect the ‘T-junction’  Simulated annealing to infer the Gestalt Law R. X. Gao and S. C. Zhu, “From primal sketch to 2.1D sketch,” Technical Report, Lotus Hill Institute, 2006 Red dot: connect region Black line: known edge Green line: inferred connection

Reconfigurable graphs Ru-Xin Gao1, Tian-Fu Wu, Song-Chun Zhu, and Nong Sang, “Bayesian Inference for Layer Representation with Mixed Markov Random Field ” Source imageT-junction Inferred connection Layer extraction

Reconfigurable graphs R. X. Gao and S. C. Zhu, “From primal sketch to 2.1D sketch,” Technical Report, Lotus Hill Institute, 2006

And-Or Graph  Parse graph of the image pt: parse tree of vocabularyE: relations  Inference the parse graph: Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu, “Recursive top-down/bottom up algorithm for object recognition,” Technical Report, Lotus Hill Research Institute, 2007.

 Contain all the valid parse graphs  And node, Or node, leaf- node  Relation between children of And node  Parse tree: assigning label on Or node And-Or Graph Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu, “Recursive top-down/bottom up algorithm for object recognition,” Technical Report, Lotus Hill Research Institute, 2007.

 Definition:   image primitives  relations at all level  : probability model defined on the And-Or graph  : valid configuration of terminal nodes And-Or Graph

Stochastic Model on And-Or graph  Terminal (leaf) node:  And-Or node:  Set of links:  Switch variable at Or-node:  Attributes of primitives:

Stochastic Model on And-Or graph  Terminal (leaf) node:  And-Or node:  Set of links:  Switch variable at Or-node:  Attributes of primitives: SCFG: weigh the frequency at the children of or-nodes

Stochastic Model on And-Or graph  Terminal (leaf) node:  And-Or node:  Set of links:  Switch variable at Or-node:  Attributes of primitives: Weigh the local compatibility of primitives (geometric and appearance)

Stochastic Model on And-Or graph  Terminal (leaf) node:  And-Or node:  Set of links:  Switch variable at Or-node:  Attributes of primitives: Spatial and appearance between primitives (parts or objects)

Learning And-Or Graph  Learning the vocabulary  Learning the relation set R, given  Learning the parameters, given R and

Learning And-Or Graph  Learning the vocabulary, and hierarchic And-Or Graph  Learning the relation set R, given  Learning the parameters, given R and Discussed in the paper

Learning And-Or Graph  Learning and Pursuing Relation Set R:  Start from Stochastic Context Free Graph (a)  Learn the relations that maximally reduce the KL divergence to the observation (b-e) Observation: Learning model: J. Porway, Z. Y. Yao, and S. C. Zhu, “Learning an And–Or graph for modeling and recognizing object categories,” Technical Report, Department of Statistics,2007

 Learning graph parameter  Approximating to  Similar to texture synthesis S. C. Zhu, Y. N. Wu, and D. B. Mumford, “Minimax entropy principle and its applications to texture modeling,” Neural Computation, vol. 9, no. 8, pp. 1627–1660, November 1997 Learning And-Or Graph

Case I: Rectangle  Nodes: Rectangle  Two vanishing points, four edge direction  Rules: F. Han and S. C. Zhu, “Bottom-up/top-down image parsing by attribute graph grammar”. Proceedings of International Conference on Computer Vision, Beijing,China, 2005.

Case I: Rectangle  Get the primal sketch of the scene  Find the ‘strong’ rectangular (bottom-up, red)  Weigh (score) different hypothesis (top- down, blue)  Weight is the compatibility of the image with the proposed rectangular (primal-sketch)  Accept the best one  Do the previous 3 steps until all the weigh is small. (negative) F. Han and S. C. Zhu, “Bottom-up/top-down image parsing by attribute graph grammar”. Proceedings of International Conference on Computer Vision, Beijing,China, 2005.

Case I: Rectangle  Inference process

Case I: Rectangle F. Han and S. C. Zhu, “Bottom-up/top-down image parsing by attribute graph grammar”. Proceedings of International Conference on Computer Vision, Beijing,China, 2005.

Case II: Human Cloth  Use And-Or graph to generate a matching model  Vocabulary (training dataset) Matching using the And-or Graph

Case II: Human Cloth  The And-Or Graph H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings of IEEE Conference on Pattern Recognition and Computer Vision, New York, June  Novel Configuration

 Inference process Case II: Human Cloth H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings of IEEE Conference on Pattern Recognition and Computer Vision, New York, June Localize face, then estimate the parts of the body Bottom-up: a coarse matching of the parts Top-down: refine the matching using the relation

Case II: Human Cloth  Inference result H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006.

Case II: Human Cloth  Inference result H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings of IEEE Conference on Pattern Recognition and Computer Vision, New York, June Hands are not exactly the same: find the best matching in the dataset

Case III: Recognition Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu, “Recursive top-down/bottomup algorithm for object recognition,” Technical Report, Lotus Hill Research Institute, 2007.

Conclusion  Enormous amount of vision knowledge: (Add-Or graph) ……

Conclusion  Computational complexity :  Remain open for scheduling bottom-up/top-down procedure  Semantic Gap  Learning the And-Or Graph  Learning the vocabulary, and its attributes After all, we are not supposed to define so many things: ideal vision words: what we have now:

Thank you Zhaoyin Jia