Download presentation
Presentation is loading. Please wait.
Published byTero Rantanen Modified over 5 years ago
1
https://github.com/zaeemzadeh/IPM
Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision Alireza Zaeemzadeh* , Mohsen Joneidi* , Nazanin Rahnavard, and Mubarak Shah Joint work between Communications and Wireless Networks Lab (CWNLAB) and Center for Research in Computer Vision (CRCV) University of Central Florida ∗ indicates shared first authorship.
2
Problem Selecting K distinct representative samples from M available data points (K << M), while preserving the structure.
3
Applications Active learning, Dataset summarization,
Video summarization, ...
4
Challenges Computational complexity Robustness to outliers
Generalization Methods based on convex-relaxation are not usually feasible on large datasets. Selection based on diversity is not robust to outliers The selection algorithm should work for different problems and datasets, without much effort
5
Intuition of Our Method
Eigenfaces are not within data points. SVD is computationally expensive. Our algorithm selects according to spectrum.
6
Iterative Projection and Mapping
7
Iterative Projection and Mapping
What does this mean mathematically? rank-K factorization T is a subset of data samples \pi_T(A) is projection of all data on span of the selected samples. a rank-K factorization V contains k rows of A; while there is no constraint on U Assume we are to select one sample at a time, which is the best representation of all data. Since Problem (3) involves a combinatorial search and is not easy to tackle, let us modify (3) into two consecutive problems. The first sub-problem relaxes the constraint vk 2 A in (3) to a moderate constraint kvk = 1 , First right singular value Select one sample at a time and the second sub-problem reimposes the underlying constraint. First right singular value first right singular vector
8
Theoretical Guarantees
Linear complexity At least one sample with high correlation with the first singular vector exists. The first right singular vector is the least susceptible to noise compared to all the other singular vectors. The first singular vector can be calculated in linear time.
9
Theoretical Guarantees
Robustness to noise At least one sample with high correlation with the first singular vector exists. The first right singular vector is the least susceptible to noise compared to all the other singular vectors. The first singular vector can be calculated in linear time.
10
Theoretical Guarantees
Existence of a highly correlated sample with the first singular vector. At least one sample with high correlation with the first singular vector exists. The first right singular vector is the least susceptible to noise compared to all the other singular vectors. The first singular vector can be calculated in linear time.
11
Experiments Active Learning on UCF 101 Learning Using Representatives
CMU Multi-PIE ImageNet Video summarization on UTE Egocentric
12
Task I: Active Learning
Addresses the costly data labeling problem. Train the model Model Labeled Training Set Unlabeled Data Oracle Selection Extract features and/or uncertainty scores.
13
Task I: Active Learning
Dataset: UCF-101 13,320 action instances from 101 human action classes The average duration of each video is about 7 seconds. Model: 3D ResNet18 architecture, pretrained on Kinetics-400 dataset Feature space: convolutional features from the last convolutional layer
14
Task I: Active Learning (UCF101)
During the first few cycles, since the classifier is not able to generate reliable uncertainty score, uncertainty-based selection does not lead to a performance gain. On the other hand, IPM is able to select the critical samples and outperforms other methods.
15
Task I: Active Learning (UCF101)
DS3 IPM Clean and Jerk DS3 IPM Kayaking In general, in the clips selected by IPM, the critical features of the action, such as barbell and kayak, are more visible and/or the bounding box for the action is bigger. Lifting and Kayaking Frames of the first selected representative
16
Task I: Active Learning (UCF101)
Knitting, PlayingFlute 2D visualization of two classes of UCF-101 dataset.
17
Task II: Learning Using Representatives
Find the representatives and use them for learning. If the samples contain enough information, performance will not deteriorate, saves computation and storage.
18
Task II: Learning Using Representatives
Dataset: Multi-PIE 249 subjects, 9 poses, 20 illuminations, and two expressions 9×20×2 images per subject Feature space: 200 dimensional space using PCA.
19
Task II: Learning Using Representatives (GAN)
Multi-view face generation using CR-GAN. Trained on reduced training set 9 images per subject And on the full dataset 360 images per subject.
20
Task II: Learning Using Representatives (GAN)
Quantitative performance investigation: Identity similarities between the real and generated images 256 D Features from a ResNet18, trained on MS-Celeb-1M. Distances of features correspond to the face dissimilarity
21
Task II: Learning Using Representatives
K-medoids DS3 IPM IPM selects from 10 different angles, while the selected images by DS3 and K-medoids contain repetitious angles.
22
Task II: Learning Using Representatives (ImageNet)
Dataset: ImageNet 1000 classes images per class Feature space: 128 dimensional space using non-parametric instance discrimination (unsupervised) Only IPM and k-medoids were able to generate results.
23
Task II: Learning Using Representatives (ImageNet)
Selecting 5 representatives per class IPM K-medoids IPM K-medoids
24
Task II: Learning Using Representatives (ImageNet)
How well the selected images cover the space? K-NN classification accuracy using selected representatives Accuracy using all the labeled data ( 1.2M samples) is 46.86%
25
Task III: Video Summarization
The goal is to select key clips and create a video summary, such that it contains the most essential contents of the video. Two minutes summarization of the first video of UTE Egocentric dataset (232 minutes long) second clips are selected. The selected scenes cover the story of the whole video which is about 4 hours.
26
Task III: Video Summarization
Dataset: UT Egocentric (UTE) dataset (contains 4 first-person videos of 3-5 hours of daily activities) Feature space: 1024-dimensional feature vectors extracted using GoogleNet. Feature Extraction Clustering Selection Video summary Video clip
27
Task III: Video Summarization
F-measure and recall scores using ROUGE-SU metric.
28
Conclusions IPM: Iterative Projection and Matching
Linear complexity w.r.t. number of data points. Robustness to outliers. No parameters for fine tuning. The superiority of IPM is shown in different applications.
29
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.