CVPR2013 Poster Representing Videos using Mid-level Discriminative Patches.

Slides:



Advertisements
Similar presentations
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Advertisements

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.
Three things everyone should know to improve object retrieval
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
Patch to the Future: Unsupervised Visual Prediction
Transferable Dictionary Pair based Cross-view Action Recognition Lin Hong.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Discriminative Segment Annotation in Weakly Labeled Video Kevin Tang, Rahul Sukthankar Appeared in CVPR 2013 (Oral)
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Discriminative and generative methods for bags of features
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
By Fernando Seoane, April 25 th, 2006 Demo for Non-Parametric Classification Euclidean Metric Classifier with Data Clustering.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Presented by Zeehasham Rasheed
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Recommender systems Ram Akella November 26 th 2008.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Radial Basis Function Networks
Exercise Session 10 – Image Categorization
Bag-of-Words based Image Classification Joost van de Weijer.
Data mining and machine learning A brief introduction.
Presented by Tienwei Tsai July, 2005
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Professor: S. J. Wang Student : Y. S. Wang
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.
Chao-Yeh Chen and Kristen Grauman University of Texas at Austin Efficient Activity Detection with Max- Subgraph Search.
CSE 185 Introduction to Computer Vision Face Recognition.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,
Machine Learning Queens College Lecture 7: Clustering.
Fast Learning in Networks of Locally-Tuned Processing Units John Moody and Christian J. Darken Yale Computer Science Neural Computation 1, (1989)
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
Unsupervised Classification
Evaluation of Gender Classification Methods with Automatically Detected and Aligned Faces Speaker: Po-Kai Shen Advisor: Tsai-Rong Chang Date: 2010/6/14.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Experience Report: System Log Analysis for Anomaly Detection
Data Driven Attributes for Action Detection
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Efficient Image Classification on Vertically Decomposed Data
Efficient Image Classification on Vertically Decomposed Data
Presentation transcript:

CVPR2013 Poster Representing Videos using Mid-level Discriminative Patches

Outline Introduction Mining Discriminative Patches Analyzing Videos Experimental Evaluation & Conclusion

1. Introduction Q.1:What does it mean to understand this video ? Q.2:How might we achieve such an understanding?

1. Introduction Video single feature vector semantic action object bits and pieces General framework detect object primitive actionsBayesian networks storyline

1. Introduction Drawback: computational models for identifying semantic entities are not robust enough to serve as a basis for video analysis

1. Introduction Represent video use not use Discriminative spatio-temporal patches global feature vector or set of semantic entities Discriminative spatio-temporal patches primitive human action semantic object human-object pair random but informative patches correspond automatically mined from training data consisting of hundreds of videos

1. Introduction spatio-temporal patches act as a discriminative vocabulary for action classification establish strong correspondence between patches in training and test videos. Using label transfer techniques align the videos and perform tasks (Ex. object localization, finer-level action detection etc.)

1. Introduction

2. Mining Discriminative Patches Two conditions Challenge (1)They occur frequently within a class. (2)They are distinct from patches in other classes. (1)Space of potential spatio-temporal patches is extremely large given that these patches can occur over a range of scales. (2) Overwhelming majority of video patches are uninteresting.

2. Mining Discriminative Patches Paradigm : bag-of words Major drawbacks Step1:Sample a few thousand patches, perform k-means clustering to find representative clusters Step2:Rank these clusters based on membership in different action classes. (1)High-Dimensional Distance Metric (2)Partitioning

2. Mining Discriminative Patches (1)High-Dimensional Distance Metric K-means use standard distance metric (Ex. Euclidean or normalized cross-correlation) Not well in high-dimensional spaces ※ We use HOG3D

2. Mining Discriminative Patches (2)Partitioning Standard clustering algorithms partition the entire feature space. Every data point is assigned to one of the clusters during the clustering procedure. However, in many cases, assigning cluster memberships to rare background patches is hard. Due to the forced clustering they significantly diminish the purity of good clusters to which they are assigned

2. Mining Discriminative Patches Resolve these issues Using Exemplar-SVM(e-SVM) to learn 1.Use an exemplar-based clustering approach 2.Every patch is considered as a possible cluster center Drawback : computationally infeasible Resolve: use motion  use Nearest Neighbor

2. Mining Discriminative Patches Training videos Validation partition : rank the clusters based on representativeness Training partition: (form cluster) ( ⅰ ) Using simple nearest-neighbor approach(typically k=20) ( ⅱ ) Score each patch and rank ( ⅲ )select a few patches per action class and use the e-SVM to learn ( ⅳ )e-SVM are used to form clusters ( ⅴ )re-rank

2. Mining Discriminative Patches Goal : smaller dictionary(set of representative patches) Criteria (a)Appearance Consistency (b)Purity Consistency score tf-idf (score): same class/different class ※ All patches are ranked using a linear combination of the two score

2. Mining Discriminative Patches

3. Analyzing Videos Action Classification Beyond Classification: Explanation via Discriminative Patches Top n e-SVM detectors input : test videosfeature vector SVM classifier output : class Q. How we can use detections of discriminative patches for establishing correspondences between training and test videos? Q. Which detections to select for establishing correspondence?

3. Analyzing Videos Context-dependent Patch Selection Vocabulary size : Ncandidate detections :{D 1,D 2,…,D N } whether or not the detection of e-SVM i is selected : x i Appearance term(A i ):e-SVM score for patch i Class Consistency term(C li ):This term promotes selection of certain e-SVMs over others given the action class. For example, for the weightlifting class it prefers selection of the patches with man and bar with vertical motion. We learn C l from the training data by counting the number of times that an e-SVM fires for each class.

3. Analyzing Videos Optimization Integer Program is an NP-hard problem use IPFP algorithm 5~10 iterations Penalty term(P ij ):is the penalty term for selecting a pair of detections together. (1)e-SVMs i and j do not fire frequently together in the training data. (2) the e-SVMs i and j are trained from different action classes.

4. Experimental Evaluation Datasets :UCF-50,Olympics Sport Implementation Details: Classification Results ※ Our current implementation considers only cuboid patches ※ Patches are represented with HOG3D features (4x4x5 cells with 20 discrete orientations).

4. Experimental Evaluation

Correspondence and Label Transfer

4. Experimental Evaluation

Conclusion 1.A new representation for videos. 2.Automatically mine these patches using exemplar-based clustering approach. 3.Obtaining strong correspondence and align the videos for transferring annotations. 4.As a vocabulary to achieve state of the art results for action classification.