Https://github.com/zaeemzadeh/IPM Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Proposed concepts illustrated well on sets of face images extracted from video: Face texture and surface are smooth, constraining them to a manifold Recognition.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
ADVISE: Advanced Digital Video Information Segmentation Engine
A Study of Approaches for Object Recognition
CONTENT BASED FACE RECOGNITION Ankur Jain 01D05007 Pranshu Sharma Prashant Baronia 01D05005 Swapnil Zarekar 01D05001 Under the guidance of Prof.
Video summarization by graph optimization Lu Shi Oct. 7, 2003.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Radial Basis Function Networks
Multiclass object recognition
Presented By Wanchen Lu 2/25/2013
Person-Specific Domain Adaptation with Applications to Heterogeneous Face Recognition (HFR) Presenter: Yao-Hung Tsai Dept. of Electrical Engineering, NTU.
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,
Face Recognition: An Introduction
MUSTAFA OZAN ÖZEN PINAR SAĞLAM LEVENT ÜNVER MEHMET YILMAZ.
Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Naifan Zhuang, Jun Ye, Kien A. Hua
Big data classification using neural network
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
Learning to Compare Image Patches via Convolutional Neural Networks
Predicting Visual Search Targets via Eye Tracking Data
Deeply learned face representations are sparse, selective, and robust
The Relationship between Deep Learning and Brain Function
Guillaume-Alexandre Bilodeau
An Artificial Intelligence Approach to Precision Oncology
Instance Based Learning
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
Understanding and Predicting Image Memorability at a Large Scale
Learning Mid-Level Features For Recognition
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Supervised Time Series Pattern Discovery through Local Importance
Unsupervised Riemannian Clustering of Probability Density Functions
Mixture of SVMs for Face Class Modeling
Recognition: Face Recognition
Recovery from Occlusion in Deep Feature Space for Face Recognition
Machine Learning Basics
Recognition using Nearest Neighbor (or kNN)
Outline Multilinear Analysis
Fast and Robust Object Tracking with Adaptive Detection
Principal Component Analysis
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
Singular Value Decomposition
Bird-species Recognition Using Convolutional Neural Network
“Bayesian Identity Clustering”
Nearest-Neighbor Classifiers
Probabilistic Models with Latent Variables
The Open World of Micro-Videos
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Principal Component Analysis
Pose Estimation for non-cooperative Spacecraft Rendevous using CNN
Discriminative Frequent Pattern Analysis for Effective Classification
Creating Data Representations
Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.
Ying Dai Faculty of software and information science,
Asymmetric Transitivity Preserving Graph Embedding
CS4670: Intro to Computer Vision
Text Categorization Berlin Chen 2003 Reference:
Introduction to Object Tracking
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Junheng, Shengming, Yunsheng 11/09/2018
Presented by: Anurag Paul
Week 3 Volodymyr Bobyr.
Presentation transcript:

https://github.com/zaeemzadeh/IPM Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision Alireza Zaeemzadeh* , Mohsen Joneidi* , Nazanin Rahnavard, and Mubarak Shah Joint work between Communications and Wireless Networks Lab (CWNLAB) and Center for Research in Computer Vision (CRCV) University of Central Florida https://github.com/zaeemzadeh/IPM ∗ indicates shared first authorship.

Problem Selecting K distinct representative samples from M available data points (K << M), while preserving the structure.

Applications Active learning, Dataset summarization, Video summarization, ...

Challenges Computational complexity Robustness to outliers Generalization Methods based on convex-relaxation are not usually feasible on large datasets. Selection based on diversity is not robust to outliers The selection algorithm should work for different problems and datasets, without much effort

Intuition of Our Method Eigenfaces are not within data points. SVD is computationally expensive. Our algorithm selects according to spectrum.

Iterative Projection and Mapping

Iterative Projection and Mapping What does this mean mathematically? rank-K  factorization T is a subset of data samples \pi_T(A) is projection of all data on span of the selected samples. a rank-K  factorization V contains k rows of A; while there is no constraint on U  Assume we are to select one sample at a time, which is the best representation of all data. Since Problem (3) involves a combinatorial search and is not easy to tackle, let us modify (3) into two consecutive problems. The first sub-problem relaxes the constraint vk 2 A  in (3) to a moderate constraint kvk = 1 , First right singular value Select one sample at a time and the second sub-problem reimposes the underlying constraint. First right singular value first right singular vector

Theoretical Guarantees Linear complexity At least one sample with high correlation with the first singular vector exists. The first right singular vector is the least susceptible to noise compared to all the other singular vectors. The first singular vector can be calculated in linear time.

Theoretical Guarantees Robustness to noise At least one sample with high correlation with the first singular vector exists. The first right singular vector is the least susceptible to noise compared to all the other singular vectors. The first singular vector can be calculated in linear time.

Theoretical Guarantees Existence of a highly correlated sample with the first singular vector. At least one sample with high correlation with the first singular vector exists. The first right singular vector is the least susceptible to noise compared to all the other singular vectors. The first singular vector can be calculated in linear time.

Experiments Active Learning on UCF 101 Learning Using Representatives CMU Multi-PIE ImageNet Video summarization on UTE Egocentric

Task I: Active Learning Addresses the costly data labeling problem. Train the model Model Labeled Training Set Unlabeled Data Oracle Selection Extract features and/or uncertainty scores.

Task I: Active Learning Dataset: UCF-101 13,320 action instances from 101 human action classes The average duration of each video is about 7 seconds. Model: 3D ResNet18 architecture, pretrained on Kinetics-400 dataset Feature space: convolutional features from the last convolutional layer

Task I: Active Learning (UCF101) During the first few cycles, since the classifier is not able to generate reliable uncertainty score, uncertainty-based selection does not lead to a performance gain. On the other hand, IPM is able to select the critical samples and outperforms other methods.

Task I: Active Learning (UCF101) DS3 IPM Clean and Jerk DS3 IPM Kayaking In general, in the clips selected by IPM, the critical features of the action, such as barbell and kayak, are more visible and/or the bounding box for the action is bigger. Lifting and Kayaking Frames of the first selected representative

Task I: Active Learning (UCF101) Knitting, PlayingFlute 2D visualization of two classes of UCF-101 dataset.

Task II: Learning Using Representatives Find the representatives and use them for learning. If the samples contain enough information, performance will not deteriorate, saves computation and storage.

Task II: Learning Using Representatives Dataset: Multi-PIE 249 subjects, 9 poses, 20 illuminations, and two expressions 9×20×2 images per subject Feature space: 200 dimensional space using PCA.

Task II: Learning Using Representatives (GAN) Multi-view face generation using CR-GAN. Trained on reduced training set 9 images per subject And on the full dataset 360 images per subject.

Task II: Learning Using Representatives (GAN) Quantitative performance investigation: Identity similarities between the real and generated images 256 D Features from a ResNet18, trained on MS-Celeb-1M. Distances of features correspond to the face dissimilarity

Task II: Learning Using Representatives K-medoids DS3 IPM IPM selects from 10 different angles, while the selected images by DS3 and K-medoids contain repetitious angles.

Task II: Learning Using Representatives (ImageNet) Dataset: ImageNet 1000 classes 700-1300 images per class Feature space: 128 dimensional space using non-parametric instance discrimination (unsupervised) Only IPM and k-medoids were able to generate results.

Task II: Learning Using Representatives (ImageNet) Selecting 5 representatives per class IPM K-medoids IPM K-medoids

Task II: Learning Using Representatives (ImageNet) How well the selected images cover the space? K-NN classification accuracy using selected representatives Accuracy using all the labeled data ( 1.2M samples) is 46.86%

Task III: Video Summarization The goal is to select key clips and create a video summary, such that it contains the most essential contents of the video. Two minutes summarization of the first video of UTE Egocentric dataset (232 minutes long). 24 5-second clips are selected. The selected scenes cover the story of the whole video which is about 4 hours.

Task III: Video Summarization Dataset: UT Egocentric (UTE) dataset (contains 4 first-person videos of 3-5 hours of daily activities) Feature space: 1024-dimensional feature vectors extracted using GoogleNet. Feature Extraction Clustering Selection Video summary Video clip

Task III: Video Summarization F-measure and recall scores using ROUGE-SU metric.

Conclusions IPM: Iterative Projection and Matching Linear complexity w.r.t. number of data points. Robustness to outliers. No parameters for fine tuning. The superiority of IPM is shown in different applications.

Thank You