Download presentation
1
Image Annotation and Feature Extraction
Digital Forensics: Image Annotation and Feature Extraction Latifur Khan, November 2007
2
Outline How do we retrieve Images? Motivation Annotation Enhancement
Correspondence: Models Enhancement Future Work Results Reference
3
How do we retrieve images?
Use Google image search ! Google uses filenames, surrounding text and ignores contents of the images.
4
Motivation How to retrieve images/videos?
CBIR is based on similarity search of visual features Doesn’t support textual queries Doesn’t capture “semantics” Automatically annotate images then retrieve based on the textual annotations. Example Annotations: Tiger, grass.
5
Motivation Query by CBIR Retrieved image
There is a gap between perceptual issue and conceptual issue. Semantic gap: Hard to represent semantic meaning using low-level image features like color, texture and shape. It’s possible to answer query ‘Red ball’ with ‘Red Rose’. Query by CBIR Retrieved image
6
Motivation Most of current automatic image annotation and retrieval approaches consider Keywords Low-level image features for visual token/region/object Correspondence between keywords and visual tokens Our goal is to develop automated image annotation tecniques with better accuracy
7
Annotation
8
Annotation Major steps: Segmentation into regions
Clustering to construct blob-tokens Analyze correspondence between key words and blob-tokens Auto Annotation
9
Annotation: Segmentation & Clustering
Images Segments Blob-tokens
10
Annotation: Correspondence/Linking
Our purpose is to find correspondence between words and blob-tokens. P(Tiger|V1), P(V2|grass)…
11
Auto Annotation Lion Grass Tiger … …. ??
12
Segmentation: Image Vocabulary
Can we represent all the images with a finite set of symbols? Text documents consist of words Images consist of visual terms V123 V89 V988 V4552 V12336 V2 V765 V9887 copyright © R. Manmatha
13
Construction of Visual Terms
Segmented images ( e.g., Blobworld, Normalized-cuts algorithm.) Cluster segments. Each cluster is a visual term/blob-token Images Segments Visterms/blobtoken V1 V2 V3 V4 V1 V5 V6 … …
14
Discrete Visual terms Rectangular partition works better!
Partition keyframe, clusters across images. Segmentation problem can be avoided at some extent. copyright © R. Manmatha
15
Visual terms Or partition using a rectangular grid and cluster.
Actually works better.
16
Grid vs Segmentation Segmentation vs Rectangular Partition.
Results - Rectangular Partition better than segmentation! Model learned over many images. Segmentation over one image.
17
Feature Extraction & Clustering
Color Texture Shape K-means clustering: To generate finite visual terms. Each cluster’s centroid represents a visual term.
18
Co-Occurrence Models Mori et al. 1999
w1 w2 w3 w4 V1 12 2 1 V2 32 40 13 V3 V4 65 43 Mori et al. 1999 Create the co-occurrence table using a training set of annotated images Tend to annotate with high frequency words Context is ignored Needs joint probability models P( w1 | v1 ) = 12/( )=0.8 P( v3 | w2 ) = 12/( )=0.12
19
Correspondence: Translation Model (TM)
Pr(f|e) = ∑ Pr(f,a|e) a Pr(w|v) = ∑ Pr(w,a|v)
20
Translation Models Duygulu et al. 2002
Use classical IBM machine translation models to translate visterms into words IBM machine translation models Need a bi-lingual corpus to train the models Mary did not slap the green witch Mary no daba una botefada a la bruja verde V2 V4 V6 Maui People Dance V1 V34 V321 V21 Tiger grass sky … … … … … …
21
Correspondence (TM ) B B N X = W N W
22
Correspondence (TM ) B W Bj Wi N N
23
Results Dataset Corel Stock Photo CDs.
600 CDs, each of them consists of 100 images under same topic. We select 5000 images (4500 training, 500 testing). Each image has manual annotation. 374 words and 500 blobs. sun city sky mountain grizzly bear meadow water
24
Results Experimental Context 3,000 training objects
300 images for testing Each object is represented by a vector of 30 dimensions: color, texture, and shape
25
Results Each Image Object/Blob-token has 30 features:
Size -- portion of the image covered by the region. Position -- coordinates of the region center of mass normalized by the image dimensions. Color -- average and standard deviation of (R,G, B), (L, a, b) over the region. Texture -- average and variance of 16 filter responses, four differences of Gaussian filters with different sigmas, and 12 oriented filters, aligned in 30-degree increments. For shape, we use six features (i.e., area, x, y, boundary, convexity, and moment of inertia).
26
Results Examples for automatic annotation
27
Results The number of segments annotated correctly among 299 testing segments for different models
28
Results Correspondence based on K-means--- PTK.
Correspondence based on Weighted Feature Selection --- PTS. With GDR dimensionality of image object will be reduced (say from 30 to 20) and then apply K-means and so on.
29
Results Precision p Recall r Result of Common E measure NumCorrect
NumCorrect means the number of retrieved images which contain query keyword in its original annotation NumRetrieved is the number of retrieved images NumExist is the total number of images in test set containing query keyword in annotation Result of Common E measure E=1-2/(1/p+1/r) NumCorrect NumExist NumRetrieved
30
Results: Precision, Recall and E
Precision of retrieval for different models
31
Results: Precision, Recall and E-measure
Recall of retrieval for different models
32
Results: Precision, Recall and E-measure
E Measure of retrieval for different models
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.