Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign
Motivation Sparse-Graph based Semi-supervised Learning Handling of Noisy Tags Inferring Concepts in Semantic Concept Space Experiments Summarization and Future Work
No manual annotation are required.
With models: SVM GMM … Infer labels directly: k-NN Graph-based semi-supervised methods
A common disadvantage: Have certain parameters that require manual tuning Performance is sensitive to parameter tuning The graphs are constructed based on visual distance Many links between samples with unrelated-concepts The label information will be propagated incorrectly. Locally linear reconstruction: Still needs to select neighbors based on visual distance
Sparse Graph based Learning Noisy Tag Handling Inferring Concepts in the Concept Space
Human vision system seeks a sparse representation for the incoming image using a few visual words in a feature vocabulary. ( Neural Science ) Advantages: Reduce the concept-unrelated links to avoid the propagation of incorrect information; Practical for large-scale applications, since the sparse representation can reduce the storage requirement and is feasible for large-scale numerical computation.
Normal Graph Construction. Sparse Graph Construction.
The ℓ 1 -norm based linear reconstruction error minimization can naturally lead to a sparse representation for the images *. * J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 31(2):210– 227, Feb The sparse reconstruction can be obtained by solving the following convex optimization problem: min w ||w|| 1, s.t. x=Dw w ∈ R n : the vector of the reconstruction coefficients; x ∈ R d : feature vector of the image to be reconstructed; D ∈ R d*n (d < n) : a matrix formed by the feature vectors of the other images in the dataset.
Handle the noise on certain elements of x: Reformulate x = Dw+ξ, where ξ ∈ R d is the noise term. Then : Set the edge weight of the sparse graph:
Result:
The problem with : M uu is typically very large for image annotation It is often computationally prohibitive to calculate its inverse directly Iterative solution with non-negative constraints: may not be reasonable since some samples may have negative contributions to the other samples Solution: Reformulate: The generalized minimum residual method (usually abbreviated as GMRES) can be used to iteratively solve this large-scale sparse system of linear equations effectively and efficiently.
√: correct; ?: ambiguous; m: missing
We cannot assume that the training tags are fixed during the inference process. The noisy training tags should be refined during the label inference. Solution: adding two regularization terms into the inferring framework to handle the noise:
Solution: Set the original label vector as the initial estimation of ideal label vector, that is, set, and then solve and we can obtain a refined f l. Fix f l and solve Use the obtained to replace the y in the previous graph-based method, and we can solve the sparse system of linear equations to infer the labels of the unlabeled samples.
It is well-known that inferring concepts based on low-level visual features cannot work very well due to the semantic gap. To bridge this semantic gap Construct a concept space and then infer the semantic concepts in this space. The semantic relations among different concepts are inherently embedded in this space to help the concept inference.
Low-semantic-gap: Concepts in the constructed space should have small semantic gaps; Informative: These concepts can cover the semantic space spanned by all useful concepts (tags), that is, the concept space should be informative; Compact: The set including all the concepts forming the space should be compact (i.e., the dimension of the concept space is small).
Basic terms: Ω : the set of all concepts; Θ : the constructed concept set. Three measures: Semantic Modelability: SM(Θ) Coverage of Semantic Concept Space: CE(Θ, Ω) Compactness: CP(Θ)=1/#(Θ) Objective:
Simplification: fix the size of the concept space. Then we can transform this maximization to a standard quadratic programming problem. See the paper for more details.
Image mapping: x i D(i) Query concept mapping: c x Q(c x ) Ranking the given images:
Dataset NUS-WIDE Lite Version (55,615 images) Low-level Features Color Histogram (CH) and Edge Direction Histogram (EDH), combine directly. Evaluation 81 concepts AP and MAP
Ex1: Comparisons among Different Learning Methods
Ex2: Concept Inference with and without Concept Space
Ex3: Inference with Tags vs. Inference with Ground-truth We can achieve an MAP of by inference from tags in the concept space, which is comparable to the MAP obtained by inference from ground-truth of training labels.
Exploited the problem of inferring semantic concepts from community-contributed images and their associated noisy tags. Three points: Sparse graph based label propagation Noisy tag handling Inference in a low-semantic-gap concept space
Training set construction from the web resource
Thanks! Questions?