Research on Intelligent Image Indexing and Retrieval James Z. Wang School of Information Sciences and Technology The Pennsylvania State University Joint.

Slides:



Advertisements
Similar presentations
LEARNING SEMANTICS OF WORDS AND PICTURES TEJASWI DEVARAPALLI.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Image Retrieval: Current Techniques, Promising Directions, and Open Issues Yong Rui, Thomas Huang and Shih-Fu Chang Published in the Journal of Visual.
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Multi-Media Retrieval by Paul McGlade Modified by Shinta P.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Face detection Many slides adapted from P. Viola.
Computer Vision Group, University of BonnVision Laboratory, Stanford University Abstract This paper empirically compares nine image dissimilarity measures.
Search Engines and Information Retrieval
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Support Vector Machines Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas Second Edition A Tutorial on Support Vector Machines for Pattern.
February, Content-Based Image Retrieval Saint-Petersburg State University Natalia Vassilieva Il’ya Markov
Image Categorization by Learning and Reasoning with Regions Yixin Chen, University of New Orleans James Z. Wang, The Pennsylvania State University Published.
WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES Prasad Gabbur, Kobus Barnard University of Arizona.
Presented by Zeehasham Rasheed
Wayne State University, 1/31/ Multiple-Instance Learning via Embedded Instance Selection Yixin Chen Department of Computer Science University of.
A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.
A structured learning framework for content- based image indexing and visual Query (Joo-Hwee, Jesse S. Jin) Presentation By: Salman Ahmad (270279)
Jia Li, Ph.D. The Pennsylvania State University Image Retrieval and Annotation via a Stochastic Modeling Approach.
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
Content-Based Video Retrieval System Presented by: Edmund Liang CSE 8337: Information Retrieval.
Intelligent Indexing and Retrieval of Images A Machine Learning Approach Yixin Chen Dept. of Computer Science and Engineering The Pennsylvania State University.
From Pixels to Semantics – Research on Intelligent Image Indexing and Retrieval James Z. Wang PNC Technologies Career Dev. Professorship School of Information.
Search Engines and Information Retrieval Chapter 1.
Human abilities Presented By Mahmoud Awadallah 1.
Multimedia Databases (MMDB)
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Content-Based Image Retrieval
Wei Zhang Akshat Surve Xiaoli Fern Thomas Dietterich.
University of New Orleans 1 Intelligent Indexing and Retrieval of Images A Machine Learning Approach Yixin Chen Dept. of Computer Science and Engineering.
Combined Central and Subspace Clustering for Computer Vision Applications Le Lu 1 René Vidal 2 1 Computer Science Department, Johns Hopkins University,
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.
System for Comparing Images Allison Deal April 27, 2004.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 29, 2000.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Image Classification for Automatic Annotation
V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Yixin Chen and James Z. Wang The Pennsylvania State University
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Query by Image and Video Content: The QBIC System M. Flickner et al. IEEE Computer Special Issue on Content-Based Retrieval Vol. 28, No. 9, September 1995.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Image Retrieval and Annotation via a Stochastic Modeling Approach
Introduction Multimedia initial focus
School of Computer Science & Engineering
Multimedia Content-Based Retrieval
Content-Based Image Retrieval Readings: Chapter 8:
Advanced Techniques for Automatic Web Filtering
Advanced Techniques for Automatic Web Filtering
Ying Dai Faculty of software and information science,
A Similarity Retrieval System for Multimodal Functional Brain Images
Multimedia Information Retrieval
CSE 635 Multimedia Information Retrieval
Ying Dai Faculty of software and information science,
Text Categorization Berlin Chen 2003 Reference:
Presentation transcript:

Research on Intelligent Image Indexing and Retrieval James Z. Wang School of Information Sciences and Technology The Pennsylvania State University Joint work: Jia Li Department of Statistics The Pennsylvania State University

Photographer annotation: Greece (Athens, Crete, May/June 2003), taken by James Z. Wang

“Building, sky, lake, landscape, Europe, tree” Can a computer do this?

EMPEROR database by C.-c. Chen

Outline Introduction Our related SIMPLIcity work ALIP: Automatic modeling and learning of concepts Conclusions and future work

The field: Image Retrieval The retrieval of relevant images from an image database on the basis of automatically-derived image features Applications: biomedicine, homeland security, law enforcement, NASA, defense, commercial, cultural, education, entertainment, Web, …… Our approach: Wavelets Statistical modeling Supervised and unsupervised learning… Address the problem in a generic way for different applications

Chicana Art Project, high quality paintings of Stanford Art Library Goal: help students and researchers to find visually related paintings Used wavelet-based features [Wang+,1997]

Feature-based Approach + Handles low-level semantic queries + Many features can be extracted -- Cannot handle higher-level queries (e.g.,objects) Signature feature 1 feature 2 …… feature n

Region-based Approach Extract objects from images first + Handles object-based queries e.g., find images with objects that are similar to some given objects + Reduce feature storage adaptively -- Object segmentation is very difficult -- User interface: region marking, feature combination

UCB Blobworld [Carson+, 1999]

Outline Introduction Our related SIMPLIcity work ALIP: Automatic modeling and learning of concepts Conclusions and future work

Motivations Observations: Human object segmentation relies on knowledge Precise computer image segmentation is a very difficult open problem Hypothesis: It is possible to build robust computer matching algorithms without first segmenting the images accurately

Our SIMPLIcity Work [PAMI, 2001(1)] [PAMI, 2001(9)][PAMI, 2002(9)] Semantics-sensitive Integrated Matching for Picture LIbraries Major features Sensitive to semantics: combine statistical semantic classification with image retrieval Efficient processing: wavelet- based feature extraction Reduced sensitivity to inaccurate segmentation and simple user interface: Integrated Region Matching (IRM)

Wavelets

Fast Image Segmentation Partition an image into 4×4 blocks Extract wavelet-based features from each block Use k-means algorithm to cluster feature vectors into ‘regions’ Compute the shape feature by normalized inertia

K-means Statistical Clustering Some segmentation algorithms: 8 minute CPU time per image Our approach: use unsupervised statistical learning method to analyze the feature space Goal: minimize the mean squared error between the training samples and their representative prototypes Learning VQ [ Hastie+, Elements of Statistical Learning, 2001 ]

IRM: Integrated Region Matching IRM defines an image-to-image distance as a weighted sum of region-to-region distances Weighting matrix is determined based on significance constrains and a ‘MSHP’ greedy algorithm

A 3-D Example for IRM

IRM: Major Advantages 1. Reduces the influence of inaccurate segmentation 2. Helps to clarify the semantics of a particular region given its neighbors 3. Provides the user with a simple interface

Experiments and Results Speed 800 MHz Pentium PC with LINUX OS Databases: 200,000 general-purpose image DB (60,000 photographs + 140,000 hand-drawn arts) 70,000 pathology image segments Image indexing time: one second per image Image retrieval time: Without the scalable IRM, 1.5 seconds/query CPU time With the scalable IRM, 0.15 second/query CPU time External query: one extra second CPU time

RANDOM SELECTION DB: 200,000 COREL

Current SIMPLIcity System Query Results

External Query

EMPEROR Project C.-C. Chen Simmons College

(1) Random Browsing

(2) Similarity Search

(3) External Image Query

Outline Introduction Our related SIMPLIcity work ALIP: Automatic modeling and learning of concepts Conclusions and future work

Why ALIP? Size 1 million images Understandability & Vision “meaning” depend on the point-of-view Can we translate contents and structure into linguistic terms dogs Kyoto

(cont.) Query formulation SIMILARITY: look similar to a given picture OBJECT: contains a historical building OBJECT RELATIONSHIP: contains a cat and a person MOOD: a happy picture TIME/PLACE: sunset of Athens

Automatic Linguistic Indexing of Pictures (ALIP) An exciting research direction Differences from computer vision ALIP: deal with a large number of concepts ALIP: rarely find enough number of “good” (diversified/3D?) training images ALIP: build knowledge bases automatically for real-time linguistic indexing (generic method) ALIP: highly interdisciplinary (AI, statistics, mining, imaging, applied math, domain knowledge, ……)

Automatic Modeling and Learning of Concepts for Image Indexing Observations: Human beings are able to build models about objects or concepts by mining visual scenes The learned models are stored in the brain and used in the recognition process Hypothesis: It is achievable for computers to mine and learn a large collection of concepts by 2D or 3D image-based training 2000-now: [ACM MM, 2002][PAMI 2003(10)]

Concepts to be Trained Concepts: Basic building blocks in determining the semantic meanings of images Training concepts can be categorized as: Basic Object: flower, beach Object composition: building+grass+sky+tree Location: Asia, Venice Time: night sky, winter frost Abstract: sports, sadness High-level Low-level

Database: most significant Asian paintings (Copyright: museums/governments/…) Question: can we build a “dictionary” of different painting styles?

Database: terracotta soldiers of the First Emperor of China Question: can we train a computer to annotate these? EMPEROR images of C.-c. Chen

System Design Train statistical models of a dictionary of concepts using sets of training images 2D images are currently used 3D-image training can be much better Compare images based on model comparison Select the most statistical significant concept(s) to index images linguistically Initial experiment: 600 concepts, each trained with 40 images 15 minutes Pentium CPU time per concept, train only once highly parallelizable algorithm

Training Process

Automatic Annotation Process

Training Training images used to train the concept “male” with description “man, male, people, cloth, face”

Initial Model: 2-D Wavelet MHMM [Li+, 1999] Model: Inter-scale and intra-scale dependence States: hierarchical Markov mesh, unobservable Features in SIMPLIcity: multivariate Gaussian distributed given states A model is a knowledge base for a concept

2D MHMM Start from the conventional 1-D HMM Extend to 2D transitions Conditional Gaussian distributed feature vectors Then add Markovian statistical dependence across resolutions Use EM algorithm to estimate parameters

Annotation Process Statistical significances are computed to annotate images Favor the selection of rare words When n, m >> k, we have

Preliminary Results Computer Prediction: people, Europe, man-made, water Building, sky, lake, landscape, Europe, tree People, Europe, female Food, indoor, cuisine, dessert Snow, animal, wildlife, sky, cloth, ice, people

More Results

Results: using our own photographs P: Photographer annotation Underlined words: words predicted by computer (Parenthesis): words not in the learned “dictionary” of the computer

10 classes: Africa, beach, buildings, buses, dinosaurs, elephants, flowers, horses, mountains, food. Systematic Evaluation

600-class Classification Task: classify a given image to one of the 600 semantic classes Gold standard: the photographer/publisher classification This procedure provides lower-bounds of the accuracy measures because: There can be overlaps of semantics among classes (e.g., “Europe” vs. “France” vs. “Paris”, or, “tigers I” vs. “tigers II”) Training images in the same class may not be visually similar (e.g., the class of “sport events” include different sports and different shooting angles) Result: with 11,200 test images, 15% of the time ALIP selected the exact class as the best choice I.e., ALIP is about 90 times more intelligent than a system with random-drawing system

Can ALIP learn to be a domain specialist? For each concept, train the ALIP system with a couple to several EMPEROR images and their metadata Use the trained models about these concepts to annotate images

Experiments Eight (8) images are used to train the concept “Terracotta Warriors and Horses” Four (4) images are used to train the concept “Roof Tile-end”

Classification Tests Terracotta warriors and horses: 98% The Great Wall: 98% Roof Tile-end: 82% Terracotta warriors – head: 96% Afang Palace: 82%

The only wrongly annotated images and their true annotations

Preliminary Results on Painting Images

Classification of Painters Five painters: SHEN Zhou (Ming Dynasty), DONG Qichang (Ming), GAO Fenghan (Qing), WU Changshuo (late Qing), ZHANG Daqian (modern China)

Outline Introduction Our related SIMPLIcity work ALIP: Automatic modeling and learning of concepts Conclusions and future work

Conclusions Automatic Linguistic Indexing of Pictures Highly challenging but crucially important Interdisciplinary collaboration is critical Approaches: Penn State, UC Berkeley,… Our ALIP System: Automatic modeling and learning of semantic concepts 600 concepts can be learned automatically Application of ALIP to historical materials

Advantages of Our Approach Accumulative learning Highly scalable (unlike CART, SVM, ANN) Flexible: Amount of training depends on the complexity of the concept Context-dependent: Spatial relations among pixels taken into consideration Universal image similarity: statistical likelihood rather than relying on segmentation

Limitations Training with 2D images which have no information about the actual object sizes Training with not enough number of images for complex concepts Low diversity of available training images Iterative training with negative examples? Training with conflicting opinions? (learning the same subject from two teachers with different viewpoints) ……

Future Work Explore new methods for better accuracy refine statistical modeling of images learning from 3D (may use 2D-to-3D technologies if cannot acquire directly in 3D) refine matching schemes Apply these methods to special image databases (e.g., art, biomedicine) very large databases (e.g., Web images) Integration with large-scale information systems ……

Acknowledgments NSF ITR (since 08/2002) Endowed professorship from the PNC Foundation Equipment grants from SUN and NSF Penn State Univ. Earlier funding ( ): IBM QBIC, NEC AMORA, SRI AI, Stanford Lib/Math/Biomedical Informatics/CS, Lockheed Martin, NSF DL2

More Information Papers in PDF, demos, etc Monographs: 1.J. Z. Wang, Integrated Region-based Image Retrieval, Kluwer, J. Li, R. M. Gray, Image Segmentation and Compression using Hidden Markov Models, Kluwer, 2000.