Jointly Aligning and Segmenting Multiple Web Photo Streams for the Inference of Collective Photo Storylines Gunhee Kim Eric P. Xing 1 School of Computer.

Slides:



Advertisements
Similar presentations
Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Advertisements

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Automatic Image Annotation Using Group Sparsity
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
Unsupervised Detection of Regions of Interest Using Iterative Link Analysis Gunhee Kim 1 Antonio Torralba 2 1: SCS, CMU 2: CSAIL, MIT Neural Information.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Image Congealing (batch/multiple) image (alignment/registration) Advanced Topics in Computer Vision (048921) Boris Kimelman.
LARGE-SCALE IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill road building car sky.
CVPR2013 Poster Representing Videos using Mid-level Discriminative Patches.
Patch to the Future: Unsupervised Visual Prediction
INTRODUCTION Heesoo Myeong, Ju Yong Chang, and Kyoung Mu Lee Department of EECS, ASRI, Seoul National University, Seoul, Korea Learning.
Data-driven Visual Similarity for Cross-domain Image Matching
Graph-Based Image Segmentation
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
On Multiple Foreground Cosegmentation Gunhee Kim Eric P. Xing 1 School of Computer Science, Carnegie Mellon University June 18, 2012.
Computer Vision Group University of California Berkeley Shape Matching and Object Recognition using Shape Contexts Jitendra Malik U.C. Berkeley (joint.
Learning to Detect A Salient Object Reporter: 鄭綱 (3/2)
Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.
An Optimization Approach to Improving Collections of Shape Maps Andy Nguyen, Mirela Ben-Chen, Katarzyna Welnicka, Yinyu Ye, Leonidas Guibas Computer Science.
Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.
Gunhee Kim1 Eric P. Xing1 Li Fei-Fei2 Takeo Kanade1
LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.
Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.
Abstract Extracting a matte by previous approaches require the input image to be pre-segmented into three regions (trimap). This pre-segmentation based.
The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects By John Winn & Jamie Shotton CVPR 2006 presented by Tomasz.
Scalable Text Mining with Sparse Generative Models
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
Time-Sensitive Web Image Ranking and Retrieval via Dynamic Multi-Task Regression Gunhee Kim Eric P. Xing 1 School of Computer Science, Carnegie Mellon.
Tal Mor  Create an automatic system that given an image of a room and a color, will color the room walls  Maintaining the original texture.
Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which usually.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
School of Electronic Information Engineering, Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping.
Bag-of-Words based Image Classification Joost van de Weijer.
Joint Summarization of Large-scale Collections of Web Images and Videos for Storyline Reconstruction Gunhee Kim Leonid Sigal Eric P. Xing 1 June 16, 2014.
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/24/10.
Nonparametric Part Transfer for Fine-grained Recognition Presenter Byungju Kim.
Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.
I 3D: Interactive Planar Reconstruction of Objects and Scenes Adarsh KowdleYao-Jen Chang Tsuhan Chen School of Electrical and Computer Engineering Cornell.
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.
Locality-constrained Linear Coding for Image Classification
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Learning Features and Parts for Fine-Grained Recognition Authors: Jonathan Krause, Timnit Gebru, Jia Deng, Li-Jia Li, Li Fei-Fei ICPR, 2014 Presented by:
Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.
Photoconsistency constraint C2 q C1 p l = 2 l = 3 Depth labels If this 3D point is visible in both cameras, pixels p and q should have similar intensities.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Semi-Supervised Clustering
Data Driven Attributes for Action Detection
Nonparametric Semantic Segmentation
Fast Preprocessing for Robust Face Sketch Synthesis
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Object-Graphs for Context-Aware Category Discovery
“The Truth About Cats And Dogs”
Brief Review of Recognition + Context
Lecture 31: Graph-Based Image Segmentation
Announcements more panorama slots available now
Noah Snavely.
Announcements more panorama slots available now
Topological Signatures For Fast Mobility Analysis
A Graph-Matching Kernel for Object Categorization
“Traditional” image segmentation
Presentation transcript:

Jointly Aligning and Segmenting Multiple Web Photo Streams for the Inference of Collective Photo Storylines Gunhee Kim Eric P. Xing 1 School of Computer Science, Carnegie Mellon University June 19, 2013

Problem Statement Algorithm  Dataset and preprocessing  Alignment of Multiple Photo Streams  Large-scale Cosegmentation Experiments Conclusion Outline 2

3 Background Online photo sharing becomes so popular Information overload problem in visual data

4 Background Query scuba+diving from Flickr Any meaningful structural summary? Likely to share common storylines Taken in different spatial, temporal, and personal perspective

5 Our Ultimate Goal Narrative structural summary vs. independently retrieved images beach boat divingon boatunderwater coralsunset dinner cf) ranking and retrieval by Reconstructing photo storylines from large-scale online images An example of scuba+diving storyline

6 Objective of This Paper As a first technical step, jointly perform two crucial tasks... PS14 User 2 at 03/19/2005 (Phuket, Thailand) PS3 User 3 at 08/27/2008 (Cozumel, Mexico) PS2 User 1 at 10/19/2008 (Cayman Islands) Mutually rewarding! AlignmentCosegmentation Match images from different photo streams Segment K common regions from aligned M images

PS14 User 2 at 03/19/2005 (Phuket, Thailand) PS3 User 3 at 08/27/2008 (Cozumel, Mexico) PS2 User 1 at 10/19/2008 (Cayman Islands) 7 Objective of This Paper As a first technical step, jointly perform two crucial tasks... Mutually rewarding! AlignmentCosegmentation Online images are too diverse to segment together at once The alignment discovers the images that share common regions

8 Objective of This Paper As a first technical step, jointly perform two crucial tasks... Mutually rewarding! AlignmentCosegmentation Improve image matching by a better image similarity measure Closing a loop between the two tasks

Problem Statement Algorithm  Dataset and preprocessing  Alignment of Multiple Photo Streams  Large-scale Cosegmentation Experiments Conclusion Outline 9

10 Flickr Dataset Flickr dataset of 15 outdoor recreational activities Experiments with more than 100K images of 1K photo streams Larger than those of previous work by orders of magnitude # of images (1,514,976) # photo streams (13,157) Surfing Beach Horse Riding RAftingYAcht Air Ballooning ROwingScuba Diving Formula One SNow boarding Safari Park Mountain Camping Rock Climbing Tour de France London Marathon Fly Fishing

11 Image Descriptor and Similarity Measure Image description Image similarity measure 1. No segmentation available 2. Segmentation available Not robust against location/pose changes HSV color SIFT and HOG features on regular grid L1 normalized spatial pyramid histogram using 300 visual words (Our assumption) Segmentation enhances the image alignment. Histogram intersection on SPH Histogram intersection on the best assignment of segments

Problem Statement Algorithm  Dataset and preprocessing  Alignment of Multiple Photo Streams  Large-scale Cosegmentation Experiments Conclusion Outline 12

13 Alignment of Photo Streams Input: A set of photo streams (PS): P = {P 1,…,P L } Photo Stream: a set of photos taken in sequence by a single user in a single day P1P1 P2P2 P3P3 P4P4 Idea: Align all photo streams at once after building K-NN graph Naïve-Bayes Nearest Neighbor(NBNN) [Boiman et al. 08] for similarity metric

14 Alignment of Photo Streams Input: A set of photo streams (PS): P = {P 1,…,P L } Photo Stream: a set of photos taken in sequence by a single user in a single day Idea: Align all photo streams at once after building K-NN graph P1P1 P2P2 P3P3 P4P4 For simplicity, first consider pairwise alignment of two photo streams Pairwise alignment Naïve-Bayes Nearest Neighbor(NBNN) [Boiman et al. 08] for similarity metric

15 Pairwise Alignment Goal of alignment: find a matching btw a pair of PS Optimization: MRF-based energy minimization means I in P 1 has no match in P 2. ex) Deformable image matching [Shekhovtsov et al.07] or SIFT flow [Liu et al. 2009] Solved by discrete BP Flexibility: Various energy terms

16 Pairwise Alignment Objective function Data term : The matched image pairs should be visually similar. Time term : The matched image pairs should be temporally similar. Smoothness term : The matched images to neighbors in P 1 should be neighbors in P 2. 9AM 10AM 6PM

P1P1 P2P2 P3P3 P4P4 17 Alignment of Multiple Photo Streams Objective : MRF-based energy minimization Pairwise alignment : All pairs of NN photo streams Message-passing based optimization until convergence or for fixed iterations

P1P1 P2P2 P3P3 P4P4 18 Alignment of Multiple Photo Streams Objective : MRF-based energy minimization : All pairs of NN photo streams Message-passing based optimization until convergence or for fixed iterations

Problem Statement Algorithm  Dataset and preprocessing  Alignment of Multiple Photo Streams  Large-scale Cosegmentation Experiments Conclusion Outline 19

20 Build an Image Graph Idea: Connect the images that are similar enough to be cosegmented Image Graph G = (I, E) E = E B U E w E B : Edges between different photo streams (results of alignment) E W : Edges within a photo stream I : The set of images. E : The set of edges. EBEB For each image I, links I with the K-NN of I ( E W ). consider the images such that EwEw

21 Scalable Cosegmentation Iteratively run the MFC algorithm [Kim and Xing, 2012] on the image graph Review of MFC algorithm Foreground Modeling Region Assignment Learn appearance models of K FGs and BG Allocate the regions of image into one of K FGs or BG Iterate Ex. Gaussian mixture on RGB, linear SVM on SPH Any region classifiers or their combination Very efficiently solve using the idea of combinatorial auction Cosegmentation: Jointly segment M images into K+1 regions (K foregrounds (FG) + background (BG))

22 Scalable Cosegmentation on Image Graph FG 2 (road) FG 3 (BG) FG 1 (car) Message-passing based optimization Learn FG Models from neighbors of I i. Run region assignment on I i. Iteratively solve…

23 Scalable Cosegmentation on Image Graph FG 1 (car) FG 3 (BG) FG 2 (road) Message-passing based optimization Learn FG Models from neighbors of I i. Run region assignment on I i. Initialization Iteratively solve… Supervised: start from seed labels Unsupervised: use the algorithm of CoSand [Kim et al. 2011].

24 Summary of Algorithm As a first technical step, jointly perform two tasks... Mutually rewarding! AlignmentCosegmentation Build K-NN graph between photo streams (PS) Align all photo streams at once guided by the PS graph Message-passing style optimization Build K-NN graph between images Cosegment all images at once guided by the image graph Computation time: O(T|E|) O(TL) where L: # of photo streams O(TN) where N: # of images

Problem Statement Algorithm  Dataset and preprocessing  Alignment of Multiple Photo Streams  Large-scale Cosegmentation Experiments Conclusion Outline 25

26 Evaluation for Alignment Evaluation for cosegmentation Evaluation – Two Experiments Task: Temporal localization (inspired by geo-location estimation) Task: Foreground detection We manually annotate 100 images per class Very hard to obtain groundtruth! Correspondences btw two sets of thousands of images? Accuracy is measured by intersection-over-union [Hays and Efros. 2008] Where is it likely to be taken? When are they likely to be taken? P1P1 Timeline

27 Evaluation of Alignment Procedures of temporal localization Training (80%) Test (20%) 1. Given a set of photo streams, randomly split training and test sets 2. Run alignment 3. Estimate timestamps of all images in test photo streams 4. Temporal localization is correct if Baselines Popular multiple sequence alignment – Image similarity only (the simplest) Justify closing a loop BPS: Our Alignment + Cosegmentation BP: Our alignment only KNN: K-nearest neighbors HMM: Hidden Markov Models DTW: Dynamic Time Windows Better temporal localization ≠ Better Alignment

28 BPS: Our Alignment + Cosegmentation BP: Our alignment only KNN: K-nearest neighbors HMM: Hidden Markov Models DTW: Dynamic Time Windows Evaluation of Alignment Temporal localization is correct if = 60 min.

29 Evaluation of Cosegmentation Task: Foreground detection Examples BP+MFC: (Proposed) Alignment + Cosegmentation MFC: Our cosegmentation without alignment COS : Submodular optimization [Kim et al. ICCV11] LDA: LDA-based localization [Russell et al. CVPR06]

Problem Statement Algorithm  Dataset and preprocessing  Alignment of Multiple Photo Streams  Large-scale Cosegmentation Experiments Conclusion Outline 30

31 Conclusion Ultimate goal: building photo storylines from large-scale online images safari+park horse+riding

32 As a first technical step Code will be available soon. An unified framework for both alignment and cosegmentation Scalable (ex. linear computation time with # PS, # of images) Jointly perform Alignment and cosegmentation Message-passing based optimization Conclusion Much remains to be done.

Thank you ! Stop by our poster. 33