Exploiting Big Data via Attributes (Offline Contd.)
Recap - Attributes What are attributes? Slide Credit: Devi Parikh
Recap - Attributes Rich Understanding Image Credit: Ali Farhadi
Recap - Annotations Zero-shot learning Frogs are green, have heads and legs. What is this? Image Credit: Olga Russakovsky
Recap - Annotations Attributes help in getting richer description from Annotators. Image Credit: Devi Parikh
Understanding Single Image Or Learning a Classifier (w/ Human Feedback) Single or Few images to Big Data Slide Credit: Abhinav Gupta
Big Data 90% of web data is visual! 142 Billion Images 6 Billion added monthly 6 Billion Images 72 hours of video uploaded every minute How attributes can help in learning from big-data? Slide Credit: Abhinav Gupta
What this part is about Semi-supervised Learning Slide Credit: Abhinav Gupta
What this part is about Before the start of the debate, Mr. Obama and Mrs. Clinton met with the moderators, Charles Gibson, left, and George Stephanopoulos, right, of ABC News. A officer on the left of car checks the speed of other cars on the road. Weakly-labeled Learning Slide Credit: Abhinav Gupta
Key-insight Attributes can help in coupling the learning and hence provide constraints for joint learning Amphitheatre Auditorium Goal: Learn multiple classifiers simultaneously. Banquet Bedroom Slide Credit: Abhinav Gupta
Semi-supervised Learning Shrivastava et al., 2012 Slide Credit: Abhinav Gupta
S EMI -S UPERVISED [Zhu, TR, 2005], [Chunsheng Fang, Slides, 2009] Slide Credit: Abhinav Gupta
Labeled Seed Examples Amphitheatre Unlabeled Data Select Candidates Train Models Add to Labeled Set Retrain Models Amphitheatre B OOTSTRAPPING Slide Credit: Abhinav Gupta
B OOTSTRAPPING Retrain Models Labeled Seed Examples Amphitheatre Unlabeled Data Select Candidates Add to Labeled Set Amphitheatre 25 th Iteration [Curran et al., PACL 2007] Semantic Drift Amphitheatre + Auditorium Slide Credit: Abhinav Gupta
G RAPH - BASED M ETHODS [Ebert et al., ECCV 2010] [Fergus et al., NIPS 2009] Slide Credit: Abhinav Gupta
Amphitheatre C ONSTRAINED B OOTSTRAPPING Amphitheatre Auditorium Amphitheatre Auditorium Slide Credit: Abhinav Gupta
Amphitheatre Auditorium Amphitheatre Auditorium Joint Learning [Carlson et al., NAACL HLT Workshop on SSL for NLP 2009] Share Data C ONSTRAINED B OOTSTRAPPING Slide Credit: Abhinav Gupta
Amphitheatre Auditorium Banquet Hall Banquet Hall Conference Room Conference Room Binary Attributes (BA) [Farhadi et al., CVPR 2009] [Lampert et al., CVPR 2009] Slide Credit: Abhinav Gupta
Tables and Chairs Conference Room Banquet Hall Auditorium Amphitheatre Indoor Large Seating Capacity Man-made [Patterson and Hays, CVPR 2012] Tables and Chairs Conference Room Banquet Hall Auditorium Amphitheatre Indoor Large Seating Capacity Man-made Binary Attributes (BA) Slide Credit: Abhinav Gupta
Auditorium Slide Credit: Abhinav Gupta
Sharing via Dissimilarity AmphitheatreAuditorium Has Larger Circular Structures [Parikh and Grauman, ICCV 2011] [Gupta and Davis, ECCV 2008] Slide Credit: Abhinav Gupta
AmphitheatreAuditorium ? Slide Credit: Abhinav Gupta
AmphitheatreAuditorium Slide Credit: Abhinav Gupta
Dissimilarity Has Larger Circular Structures [Parikh and Grauman, ICCV 2011] [Gupta and Davis, ECCV 2008] C OMPARATIVE A TTRIBUTES Slide Credit: Abhinav Gupta
Similar to Relative Attributes. Uses pair of images as data-points during learning. Instead of predicting a real number, it uses binary classifier. C OMPARATIVE A TTRIBUTES Slide Credit: Abhinav Gupta
Dissimilarity C OMPARATIVE A TTRIBUTES Has Larger Circular Structures [Parikh and Grauman, ICCV 2011] [Gupta and Davis, ECCV 2008] ………… Features GIST RGB (Tiny Image) Line Histogram of: Length Orientation LAB histogram Slide Credit: Abhinav Gupta
………… Dissimilarity C OMPARATIVE A TTRIBUTES [Parikh and Grauman, ICCV 2011] [Gupta and Davis, ECCV 2008] ………… Has Larger Circular Structures Classifier Boosted Decision Tree [Hoiem et al., IJCV 2007] or Has Larger Circular Structures Slide Credit: Abhinav Gupta
Comparative Attributes [Parikh and Grauman, ICCV 2011] [Gupta and Davis, ECCV 2008] Amphitheatre>Barn Amphitheatre>Conference Room Desert>Barn Is More Open Church (Outdoor)>Cemetery Barn>Cemetery Has Taller Structures Slide Credit: Abhinav Gupta
Amphitheatre Auditorium Amphitheatre Auditorium Labeled Seed Examples Bootstrapping Slide Credit: Abhinav Gupta
Labeled Seed Examples Amphitheatre Auditorium Amphitheatre Auditorium Bootstrapping Amphitheatre Auditorium Constrained Bootstrapping Indoor Has Seat Rows Attributes Has Larger Circular Structures Comparative Attributes Slide Credit: Abhinav Gupta
Banquet Bedroom Labeled Data Unlabeled Data has more space has larger structures Training Pairwise Data Promoted Instances Conference Room Banquet Hall [Gupta and Davis, ECCV 2008] Comparative Attribute Classifiers more space larger structures Attribute Classifiers indoor has grass Scene Classifiers bedroom banquet hall Slide Credit: Abhinav Gupta
Bootstrapping BA Constraints Amphitheatre C-Bootstrapping Seed Images
BA Constraints Bridge Seed Images Bootstrapping C-Bootstrapping Slide Credit: Abhinav Gupta
Attributes help improve Recall Slide Credit: Abhinav Gupta
1 40 Banquet Hall 10 Iterations Seed Images Slide Credit: Abhinav Gupta
Iteration-1 Iteration-60 Bootstrapping C-Bootstrapping Iteration-1 Iteration-60 Seed Images Bedroom
Scene Classification Eigen Functions: [Fergus et al., NIPS 2009] Slide Credit: Abhinav Gupta
Co-training (large Scale) 15 Scene Categories 25 Seed images / category Unlabeled Set 1Million (SUN Database + ImageNet) >95% distractors SUN Database: [Xiao et al., CVPR 2010] ImageNet: [Deng et al., CVPR 2009] Improve 12 out of 15 scene classifiers Slide Credit: Abhinav Gupta
L IMITATIONS C-bootstrapping uses semantic attributes and needs manually specified relationships Amphitheatre>Barn Amphitheatre>Conference Room Desert>Barn Is More Open Can we learn the relationships? Slide Credit: Abhinav Gupta
Choi et al., Adding Unlabeled Samples to Categories by Learned Attributes, CVPR 2013Adding Unlabeled Samples to Categories by Learned Attributes Framework for jointly learning visual classifiers and noun-attribute mapping.
Formulation A joint optimization for – Learning classifier in visual feature space ( w c a ) – Learning classifier in attribute space ( w c v ) – With finding the samples ( I ) Non-convex – Mixed integer program: NP-complete problem – Solution: Block coordinate-descent Learning a classifier on visual feature space Learning a classifier on attribute space with a selection criterion Learning a classifier on attribute space with a selection criterion Mutual Exclusion Not convex discrete continuous Slide Credit: Junghyun Choi
Overview Diagram Initial Labeled-Samples Build Attribute Space Project Find Useful Attributes Unlabeled Samples Project Choose Confident Examples To Add Auxiliary data Slide Credit: Jonghyun Choi
Example Qualitative Results Categorical: common traits of a category Selected by Categorical Attributes Initial Labeled Training Examples Dotted Animal-like shape … Slide Credit: Jonghyun Choi
Weakly-Labeled Learning Gupta et al., 2008 Slide Credit: Abhinav Gupta
Captions - Bag of Nouns Learning Classifiers involves establishing correspondence. road.Aofficer on the left of carchecks the speed of other cars on the officer car road officer car road Slide Credit: Abhinav Gupta
Correspondence - Co-occurrence Relationship Bear Water Bear Field Water Bear Field Slide Credit: Abhinav Gupta
Co-occurrence Relationship (Problems) RoadCarRoad Car Road Car RoadCarRoad Car RoadCar Road Car Hypothesis 1 Hypothesis 2 CarRoad Slide Credit: Abhinav Gupta
Beyond Nouns – Exploit Relationships Use annotated text to extract nouns and relationships between nouns. road.officer on the left of carchecks the speed of other cars on theA On (car, road) Left (officer, car) car officer road Constrain the correspondence problem using the relationships On (Car, Road) Road Car Road Car More Likely Less Likely Key insight: Solve the correspondence problem jointly using constraints! Slide Credit: Abhinav Gupta
Relationships Prepositions – A preposition usually indicates the temporal, spatial or logical relationship of its object to the rest of the sentence The most common prepositions in English are "about," "above," "across," "after," "against," "along," "among," "around," "at," "before," "behind," "below," "beneath," "beside," "between," "beyond," "but," "by," "despite," "down," "during," "except," "for," "from," "in," "inside," "into," "like," "near," "of," "off," "on," "onto," "out," "outside," "over," "past," "since," "through," "throughout," "till," "to," "toward," "under," "underneath," "until," "up," "upon," "with," "within," and "without” where indicated in bold are the ones (the vast majority) that have clear utility for the analysis of images and video. Comparative attributes – relating to color, size, movement- “larger”, “smaller”, “taller”, “heavier”, “faster”……… Goal: Learn models of nouns, prepositions, comparative attributes simultaneously from weakly-labeled data. Slide Credit: Abhinav Gupta
Learning the Model – Chicken Egg Problem Chicken-Egg Problem: We treat assignment as missing data and formulate an EM approach. Road Car Road Assignment Problem Learning Problem On (car, road) Slide Credit: Abhinav Gupta
EM Approach- Learning the Model E-Step: Compute the noun assignment for a given set of object and relationship models from previous iteration. M-Step: For the noun assignment computed in the E-step, we find the new ML parameters by learning both relationship and object classifiers. For initialization of the EM approach, we can use any image annotation approach with localization such as the translation based model described in [1]. [1] Duygulu, P., Barnard, K., Freitas, N., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. ECCV (2002)
Relationships modeled Most relationships are learned “correctly” –Above, behind, below, left, right, beside, bluer, greener, nearer, more-textured, smaller, larger, brighter But some are associated with the wrong features –In (topological relationships not captured by color, shape and location) –on-top-of –taller (most tall objects are thin and the segmentation algorithm tends to fragment them) Slide Credit: Abhinav Gupta
Resolution of Correspondence Ambiguities [2] Barnard, K., Fan, Q., Swaminathan, R., Hoogs, A., Collins, R., Rondot, P., Kaufold, J.: Evaluation of localized semantics: data, methodology and experiments. Univ. of Arizona, TR-2005 (2005) Duygulu et. al [1]Our Approach [1] Duygulu, P., Barnard, K., Freitas, N., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. ECCV (2002) below(birds,sun) above(sun, sea) brighter(sun,sea) below(waves,sun) above(statue,rocks); ontopof(rocks, water); larger(water,statue) below(flowers,horses); ontopof(horses,field); below(flowers,foals) Slide Credit: Abhinav Gupta
Summary Attributes can help in exploiting big-data. Attributes represent how class A is similar to class B, and how class B is different from class A… These relationships can help in formulating joint-learning problem and improve learning from large unlabeled and weakly labeled data. Slide Credit: Abhinav Gupta