IIIT Hyderabad Synthesizing Classifiers for Novel Settings Viresh Ranjan CVIT,IIIT-H Adviser: Prof. C. V. Jawahar, IIIT-H Co-Adviser: Dr. Gaurav Harit, IIT, Jodhpur 1
IIIT Hyderabad Overview 1.Visual Recognition & Retrieval Tasks. 2.Challenges in Visual Recognition & Retrieval a)Dataset Shift. b)Large number of categories. 3.Handling Dataset Shift. 4.Handling large number of categories.
IIIT Hyderabad Overview 1.Visual Recognition & Retrieval Tasks 2.Challenges in Visual Recognition & Retrieval a)Dataset Shift b)Large number of categories 3.Handling Dataset Shift 4.Handling large number of categories
IIIT Hyderabad Introduction Image Feature Extraction Classifier Image labels “Car” “Not Car” “Car” “Not Car” Visual Recognition & Retrieval Object Recognition
IIIT Hyderabad Introduction Image Feature Extraction Classifier Image labels “room” “Not room” “room” “Not room” Visual Recognition & Retrieval Word image retrieval
IIIT Hyderabad Introduction Image Feature Extraction Classifier Image labels “2” “Not 2” “2” “Not 2” Visual Recognition & Retrieval Handwritten digit classification
IIIT Hyderabad Overview 1.Visual Recognition & Retrieval Tasks 2.Challenges in Visual Recognition & Retrieval a)Dataset Shift b)Large number of categories 3.Handling Dataset Shift 4.Handling large number of categories
IIIT Hyderabad Introduction Challenges in Visual Recognition & Retrieval Dataset Shift Target (test set)Source (training set) Dataset Shift in Object Recognition
IIIT Hyderabad Introduction Challenges in Visual Recognition & Retrieval Dataset Shift Source(training set) Target(test set) Printed handwritten Dataset Shift in digits classification
IIIT Hyderabad Introduction Challenges in Visual Recognition & Retrieval Dataset Shift Source(training set) Target(test set) Dataset Shift in word image retrieval
IIIT Hyderabad Overview 1.Visual Recognition & Retrieval Tasks 2.Challenges in Visual Recognition & Retrieval a)Dataset Shift b)Large number of categories 3.Handling Dataset Shift 4.Handling large number of categories
IIIT Hyderabad Introduction Challenges in Visual Recognition & Retrieval Dataset Shift Too many categories Around 200K word categories in English language
IIIT Hyderabad Introduction Challenges in Visual Recognition & Retrieval Dataset Shift Too many categories Tackling the challenges Dataset Shift –i) Domain Adaptation ii) Kernelized feature extraction Too many categories – Transfer Learning
IIIT Hyderabad Overview 1.Visual Recognition & Retrieval Tasks 2.Challenges in Visual Recognition & Retrieval a)Dataset Shift b)Large number of categories 3.Handling Dataset Shift a)Handling Dataset Shift in object recognition by Domain Adaptation b)Handling Dataset Shift in digit classification by Domain Adaptation c)Handling Dataset Shift in word image retrieval by Kernelized Feature Extraction 4.Handling large number of categories
IIIT Hyderabad 3. a. Handling Dataset Shift in object recognition by Domain Adaptation
IIIT Hyderabad Problem Statement Target DomainSource Domain Given: Labeled Source Domain, Unlabeled Target Domain. Goal: Classify target domain images. 16
IIIT Hyderabad Overview of Domain Adaptation 17 Target Classification Target classification using Source classifier using DA (a) (b) Unlabeled Target domain images Labeled Source domain images
IIIT Hyderabad Proposed Approach Target Domain Source Domain Domain Specific Domain Independent Decompose features into: Domain Specific features Domain Independent features 18 Domain Specific
IIIT Hyderabad Source Specific Domain Independent Discard domain specific features 19 Target Specific Discard Proposed Approach Target Domain Source Domain
IIIT Hyderabad Proposed Approach Domain Independent Train classifiers using domain independent features 20 Classifier Train Test Target Domain Source Domain
IIIT Hyderabad Sparse Representation: ImageDictionary Sparse coefficients 21 However, above sparse representation cannot separate domain specific & independent features. How do we separate domain specific & independent features ? Learning Domain Specific & Domain Independent features
IIIT Hyderabad Learning Domain Specific & Domain Independent features Key idea: domain specific & shared atoms in dictionary. Source imageSource Specific Atoms Shared Atoms 22 Target imageTarget Specific Atoms Shared Atoms Coeff. for Source specific atoms Coeff. for shared atoms Coeff. for Target specific atoms Coeff. for shared atoms
IIIT Hyderabad Source Specific Atoms Shared Atoms Target Specific Atoms (1)(2) 23 Learning Domain Specific & Domain Independent features Target Domain Source Domain
IIIT Hyderabad 24 Learning Cross Domain Classifiers Source images Target images Source specific coeffs. Coeffs. for shared atoms Target specific coeffs. Coeffs. for shared atoms Sparse representation
IIIT Hyderabad 25 Learning Cross Domain Classifiers Source images Discard domain specific coeffs. Train classifiers using coeffs. for shared atoms
IIIT Hyderabad (3) Source reconstruction errorTarget reconstruction error 26 Learning Domain Specific & Domain Independent features where Y s contains source images, Y t contains target images, D s and D t are source and target dictionary. (4) (5)
IIIT Hyderabad Experiments Dataset 10 object classes from Caltech-256 (C), Webcam(W), Dslr(D), Amazon(A) Feature representation SURF features BOW representation(800 visual words) 27
IIIT Hyderabad Results Unsupervised Setting(no target labels) 28 MethodC->AC->DA->CA->WW->CW->AD->AD->W MOD src MOD tgt Gopalan et al Gong et al Ni et al PSDL(ours)
IIIT Hyderabad 29 Results PSDL Original features PSDL Original features PSDL Original features Query Retrieved Images
IIIT Hyderabad Overview 1.Visual Recognition & Retrieval Tasks 2.Challenges in Visual Recognition & Retrieval a)Dataset Shift b)Large number of categories 3.Handling Dataset Shift a)Handling Dataset Shift in word image retrieval by Kernelized Feature Extraction b)Handling Dataset Shift in digit classification by Domain Adaptation. c)Handling Dataset Shift in object recognition by Domain Adaptation 4.Handling large number of categories
IIIT Hyderabad 3. b. Handling Dataset Shift in digit classification by Domain Adaptation
IIIT Hyderabad Problem Statement Given: Labeled Source Domain, Unlabeled Target Domain. Goal: Classify target domain images. 32 Target DomainSource Domain
IIIT Hyderabad Approach Overview 33 Source data Target data Source Subspace Target Subspace Common Subspace
IIIT Hyderabad 34 Desired properties for Subspace: Preserve local geometry of data. Utilize label information. Locality Preserving Projections(LPP) [1]: Preserves local neighborhood. Can utilize label information. [1] X. He and P. Niyogi, “Locality preserving projections,” in NIPS, 2003, pp. 234–241 Locality Preserving Subspace Alignment(LPSA)
IIIT Hyderabad 35 Locality Preserving Projection(LPP): (6) Locality Preserving Subspace Alignment(LPSA)
IIIT Hyderabad Locality Preserving Subspace Alignment(LPSA) 36 Supervised Locality Preserving Projection(sLPP): (6)
IIIT Hyderabad Locality Preserving Subspace Alignment(LPSA) 37
IIIT Hyderabad Locality Preserving Subspace Alignment(LPSA) 38 Approach: Aligning Subspaces Target Subspace (7)
IIIT Hyderabad Approach: Projection,
IIIT Hyderabad Datasets 40 DatasetSourceNo. Images Printed digitsRendering digits in 300 different fonts Handwritten digits(HW) Sampling 300 images per digit MNIST. 3000
IIIT Hyderabad Experimental Results 41 SourceTargetMethodAccuracy HandwrittenPrintedNo Adaptation48.8 HandwrittenPrintedPCA(source)55.9 HandwrittenPrintedPCA(target)56.5 HandwrittenPrintedPCA(combined)56.5 HandwrittenPrintedFernando et al [2] 57.0 HandwrittenPrintedLPSA(Ours)64.8 [2] Fernando, Basura, et al. "Unsupervised visual domain adaptation using subspace alignment." Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013.
IIIT Hyderabad Experimental Results 42 SourceTargetMethodAccuracy PrintedHandwrittenNo Adaptation70.0 PrintedHandwrittenPCA(source)68.1 PrintedHandwrittenPCA(target)68.9 PrintedHandwrittenPCA(combined)70.2 PrintedHandwrittenFernando et al [2] 70.6 PrintedHandwrittenLPSA(Ours)73.2 [2] Fernando, Basura, et al. "Unsupervised visual domain adaptation using subspace alignment." Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013.
IIIT Hyderabad Experimental Results 43 Test Image No Adaptation DA using LPSA
IIIT Hyderabad Overview 1.Visual Recognition & Retrieval Tasks 2.Challenges in Visual Recognition & Retrieval a)Dataset Shift b)Large number of categories 3.Handling Dataset Shift a)Handling Dataset Shift in object recognition by Domain Adaptation b)Handling Dataset Shift in digit classification by Domain Adaptation c)Handling Dataset Shift in word image retrieval by Kernelized Feature Extraction 4.Handling large number of categories
IIIT Hyderabad 3.c. Handling Dataset Shift in word image retrieval by Kernelized Feature Extraction
IIIT Hyderabad Style-Content Factorization 46
IIIT Hyderabad Style-Content Factorization 47 Asymmetric Bilinear Model (Freeman et al. 2000). Factor 1 Factor 2 Image
IIIT Hyderabad 48 Asymmetric Bilinear Model (Freeman et al. 2000). Style Content Image Style-Content Factorization
IIIT Hyderabad 49 Asymmetric Bilinear Model (Freeman et al. 2000). Style Content Image Style-Content Factorization
IIIT Hyderabad 50 Asymmetric Bilinear Model (Freeman et al. 2000). (8) Style dependent Basis Vectors Content Vector Image Notation: refers to style(font), refers to content. Style-Content Factorization
IIIT Hyderabad 51 Asymmetric Bilinear Model (Freeman et al. 2000). (8) Style dependent Basis Vectors Content Vector Image Notation: refers to style(font), refers to content. Style-Content Factorization
IIIT Hyderabad 52 Problems with Asymmetric Bilinear Model – Needs separate learning for each new style(font). – Model is too simplistic, overlooks nonlinear interactions. Style-Content Factorization
IIIT Hyderabad 53 Problems with Asymmetric Bilinear Model – Needs separate learning for each new style. – Model is too simplistic, overlooks nonlinear relationship. To tackle these problems, we propose a kernelized version of Asymmetric Bilinear Model. Style-Content Factorization
IIIT Hyderabad Non-linear Style-Content Factorization 54 Asymmetric Kernel Bilinear model(AKBM) (10) (11) where
IIIT Hyderabad Non-linear Style-Content Factorization 55 (12) Style Basis Content vector Asymmetric Kernel Bilinear model(AKBM)
IIIT Hyderabad Non-linear Style-Content Factorization 56 Learning the Asymmetric Kernel Bilinear model(AKBM) parameters (13) Data fitting term Regularizer
IIIT Hyderabad Non-linear Style-Content Factorization 57 Learning the Asymmetric Kernel Bilinear model(AKBM) parameters The mapping function is not known. Kernel trick comes to rescue. (13)
IIIT Hyderabad Non-linear Style-Content Factorization 58 Learning the Asymmetric Kernel Bilinear model(AKBM) parameters Kernel Trick (13) (14) Here is the kernel matrix.
IIIT Hyderabad Non-linear Style-Content Factorization 59 Learning the Asymmetric Kernel Bilinear model(AKBM) parameters (14) Objective is non-convex in and, but convex with respect to any one of them. We solve it by alternating between solving the convex problem for keeping constant and vice-versa.
IIIT Hyderabad 60 Non-linear Style-Content Factorization Representing content using AKBM For novel query in any style, content is found by minimizing following objective (15) (16)
IIIT Hyderabad Datasets DatasetNo. distinct wordsNo. word images D D D D D Dlab D1-D5 consists of word images from 5 different books, varying in font. Dlab is generated under laboratory settings, consists of 10 widely varying fonts.
IIIT Hyderabad Datasets Dlab
IIIT Hyderabad Experimental Results 63 D1->D2D1->D3D1->D4D2->D1D2->D3D2->D4 No Transfer ABM(Freeman et al.) AKBM(ours) Asymmetric Kernel Bilinear model(AKBM) refers to our Kernelized style-content factorization.
IIIT Hyderabad QueryRetrieved Images(Cross font) No Transfer AKBM No Transfer AKBM
IIIT Hyderabad 65 Retrieval results on Dlab Experimental Results
IIIT Hyderabad Overview 1.Visual Recognition & Retrieval Tasks 2.Challenges in Visual Recognition & Retrieval a)Dataset Shift b)Large number of categories 3.Handling Dataset Shift a)Handling Dataset Shift in object recognition by Domain Adaptation b)Handling Dataset Shift in digit classification by Domain Adaptation c)Handling Dataset Shift in word image retrieval by Kernelized Feature Extraction 4.Handling large number of categories via Transfer Learning
IIIT Hyderabad 4. Handling large number of categories via Transfer Learning
IIIT Hyderabad Problem Statement 68 To design a scalable classifier based document image retrieval system. Around 200K word categories in English language
IIIT Hyderabad Proposed Approach 69 Top few frequent words have most coverage. A query word can be Frequent query : corresponding to the frequent words(higher coverage). Rare query : corresponding to the rare words(less coverage).
IIIT Hyderabad Proposed Approach 70 Classifiers are trained for frequent queries & synthesized on-the-fly for rare queries. Rare queries consist of characters already present in one or multiple frequent queries. To synthesize classifier for a novel rare query, cut and paste relevant portions from existing frequent classifiers.
IIIT Hyderabad Proposed Approach 71 On-the-fly classifier synthesis
IIIT Hyderabad Proposed Approach 72 On-the-fly classifier synthesis
IIIT Hyderabad Datasets 73 DatasetSourceTypeNo. of Images D11 bookClean26,555 D22 booksClean35,730 D31 bookNoisy4373
IIIT Hyderabad Experimental Results 74 Where mAP is the mean average precision for the 100 queries. DatasetSourceType# Images# queries OCR (mAP) LDA (mAP) D11 bookClean26, D22 booksClean35, D31 bookNoisy
IIIT Hyderabad Experimental Results 75 Datase t No. of queries mAP (frequent queries) mAP (rare queries) D D D Where mAP is the mean average precision for the 100 queries.
IIIT Hyderabad Conclusion 76 Domain Adaptation reduces the mismatch across source & target domains. AKBM is more robust to font variations, in comparison to Asymmetric Bilinear Model. Transfer learning can be used to design scalable classifier based word image retrieval systems.
IIIT Hyderabad Contributions 77 PSDL: a joint dictionary learning strategy, suitable for domain adaptation. LPSA: a subspace alignment strategy for domain adaptation. AKBM: a nonlinear style-content factorization model. DQC: a transfer learning strategy for on-the-fly learning of word image classifiers.
IIIT Hyderabad Thank You 78 Related Publications 1. Viresh Ranjan, Gaurav Harit and C.V. Jawahar: Enhancing World Image Retrieval in Presence of Font Variations, International Conference on Pattern Recognition, 2014 (Oral) 2. Viresh Ranjan, Gaurav Harit and C.V. Jawahar: Document Retrieval with Unlimited Vocabulary, IEEE Winter Conference on Applications of Computer Vision(WACV), Viresh Ranjan, Gaurav Harit and C.V. Jawahar: Learning Partially Shared Dictionaries for Domain Adaptation, 12th Asian Conference on Computer Vision (ACCV 2014) (Workshop: FSLCV 2014) 4. Viresh Ranjan, Gaurav Harit and C.V. Jawahar: Domain Adaptation by Aligning Locality Preserving Subspaces, 8th International Conference on Advances in Pattern Recognition(ICAPR 2015)