Computational Models for Multimedia Pattern Recognition

Computational Models for Multimedia Pattern Recognition
Shayok Chakraborty Assistant Professor, CS, FSU Department of Computer Science, FSU

Outline Active Learning Video Summarization
Deep Learning for Unsupervised Domain Adaptation Deep Active Learning Questions and Discussions

What is Machine Learning ?
Sensors and Actuators in today’s world generate humongous amounts of digital data Produces about frames per second Processes about 100 peta-bytes of data per day About 300 hours of video uploaded every minute

Machine Learning to develop predictive models
What is Machine Learning ? Machine Learning to develop predictive models Machine Learning is used to… Organize, categorize and make sense of large volumes of data Develop a model that is a good and useful approximation to the data Predict the future behavior of a system by making use of past data (e.g. weather forecasting)

Efficient Storage and Retrieval Learning Best Set of Features
Challenges Data Annotation Scanning the Database Active Learning Data Summarization Efficient Storage and Retrieval Learning Best Set of Features Hash Code Learning Deep Learning

Active vs. Passive Learning
Active Learning Active vs. Passive Learning World Passive Learner Output Model / Classifier Input Query Output Active Learner Model / Classifier World Response Tong and Koller, JMLR 2000

Active Learning - Applications
Text Categorization Image Retrieval Spam filter design Face Recognition Optical Character Recognition (OCR) Medical Image Classification Natural Language Parsing

Categories of Active Learning
Sculley CEAS 2007, Monteleoni, CVPR 2007 Guo and Schuurmans NIPS 2007, Guo NIPS 2010, Hoi ICML 2006 Tong and Koller JMLR 2000, Cohn et al JAIR 1996, Holub et al. CVPR 2008

Batch Mode Active Learning - Illustration
Human Annotator Unlabeled pool of points Classifier update module Training Set

Problem Formulation Given: A training set Lt and an unlabeled set Ut at time t Goal: Select a batch B containing k unlabeled points so as to maximize the performance of the future learner Basic Idea: Information Criterion: Select a batch of points which furnish high information Redundancy Criterion: Select a diverse batch of points Metric to compute information Shannon’s entropy. Larger entropy => more information Metric to compute redundancy (divergence) Kullback Leibler divergence. A high KL divergence between two points means that they have high divergence (less redundancy among them)

NP-hard Integer Quadratic Programming Problem
Problem Formulation For a set of n unlabeled samples Compute a vector c є Rnx1 quantifying the entropy of every unlabeled sample Compute a matrix R є Rnxn quantifying the divergence between every pair of samples Define a binary vector m є Rnx1 where mi denotes whether the unlabeled sample xi will be selected in the batch (mi = 1) or not (mi = 0) The active batch selection can be posed as the following optimization problem: Combining R and c into a single matrix D: NP-hard Integer Quadratic Programming Problem

Deterministic Performance Guarantee
BatchRank: Convex Relaxation 1 Lemma 1: The integer quadratic programming (IQP) problem can be simplified into an equivalent integer linear programming (ILP) problem. Lemma 2: The convex LP relaxation of the ILP is equivalent to a ranking of the entries in the matrix D Theorem 1: Consider the function Original NP-hard IQP looks for a minimizer of this function. Let m* be the solution to the original IQP and m’ be the solution obtained by the convex LP relaxation. Then: Deterministic Performance Guarantee The Iterative Truncated Power algorithm can be used to further improve the solution quality, with a guaranteed monotonic convergence S. Chakraborty, V. Balasubramanian, Q. Sun, S. Panchanathan, J. Ye, IEEE TPAMI 2015

BatchRand: Convex Relaxation 2
Original NP-hard IQP Change of variables Reformulation

Probabilistic Performance Guarantee
BatchRand: Convex Relaxation 2 Relax each variable y to a multi-dimensional vector v Solve for the multi-dimensional vectors v from an SDP formulation Take a random unit vector r uniformly distributed on the unit sphere Find the dot product of r with all the v vectors Select the unlabeled samples whose v vectors yield a positive dot product with r Similar to the MAX-CUT problem [Goemans and Williamson, 1995] Theorem 2: Let W denote the value of the objective function produced using BatchRand and E(W) denote its expectation. Also, let Dtotal denote the sum of all entries in the matrix D. Then the following bound holds: Probabilistic Performance Guarantee S. Chakraborty, V. Balasubramanian, Q. Sun, S. Panchanathan, J. Ye, IEEE TPAMI 2015

Performance on Face Recognition Datasets
Sample Results Active Learning Performance Performance on Face Recognition Datasets S. Chakraborty, V. Balasubramanian, Q. Sun, S. Panchanathan, J. Ye, IEEE TPAMI 2015

Sample Results Computation Time Analysis
Dataset Unlabeled size Random Most Uncertain Fisher Disc Matrix BatchRank BatchRand VidTIMIT 1000 0.01 1.78 3.92 923.6 171.8 14.27 26.08 MOBIO 1.18 2.37 757.4 204.5 11.92 24.69 Mind Reading 1.48 88.71 12.23 6.19 13.32 MMI 0.44 1.86 73.94 129.7 6.43 11.94 Average time (in seconds) required to query a batch of points from an unlabeled pool S. Chakraborty, V. Balasubramanian, Q. Sun, S. Panchanathan, J. Ye, IEEE TPAMI 2015

Active Learning: Other Problem Variants
Dynamic Batch Mode Active Learning Compute the batch size adaptively based on the complexity of a data stream IEEE CVPR 2011, IEEE TNNLS 2015, US Patent Batch Mode Active Learning for Hierarchical Classification Batch selection framework for hierarchical classification problems (continuation of work done at MSR) ACM SIGKDD 2015 Optimal Batch Selection for Multi-label Learning BMAL framework for multi-label classification problems ACM Multimedia 2011 Active Matrix Completion Select the most informative entries to complete a low rank matrix IEEE ICDM 2013

Video Summarization Extracts the salient and informative frames from a video Provides an easily interpreted synopsis Reduces viewing time Storage efficient Other Applications Security and Surveillance Traffic Monitoring Sports Highlights News Videos Documentaries Music Videos

Categories of Video Summarization
Offline Summarization [Shroff IEEE TMM 2010, Khosla CVPR 2013, Gong NIPS 2014, Chu CVPR 2015] Distributed Summarization [Ioannis, ICIST 2010] Online Summarization [Almageed ICIP 2008, Almeida PR Letters 2012, Almeida JVCIR 2013]

Problem Formulation Given: A video V with |V| frames and a summary length k Objective: Select a subset S of size k containing the most informative frames in V Proposed Approach Select the summary S such that the frames in the summary are: Exhaustive – they capture a large portion of the events in the video Mutually Exclusive – the summary frames are non-redundant (distinct from each other) The exhaustive score of a summary S is quantified as: Similarity between frame i and frame j The mutually exclusive score of a summary S is quantified as: Distance between frame i and frame j

Proposed Approach The net utility score of a summary set S is computed as: Summary selection can be posed as the following optimization: Exponentially large search space; exhaustive techniques prohibitive B Submodular Functions x A Diminishing returns property

Proposed Approach Offline Summarization
Use the standard greedy algorithm (Nemhauser et al, 1978) to solve the submodular maximization problem Produces a solution guaranteed to be within 1 – 1/e = 63% of the optimum Online Summarization Sieve-Streaming algorithm (Badanidiyuru et al, KDD 2014) for maximizing a monotone submodular function in the online setting with provable performance guarantees Distributed Summarization GREEDI framework for distributed submodular maximization (Mirzasoleiman et al, NIPS 2013) Provable performance guarantee wrt the optimum centralized solution

S. Chakraborty, O. Tickoo, R. Iyer, IEEE WACV 2015
Sample Results: Offline Summarization S. Chakraborty, O. Tickoo, R. Iyer, IEEE WACV 2015

S. Chakraborty, O. Tickoo, R. Iyer, ACM Multimedia 2015
Sample Results: Distributed Summarization S. Chakraborty, O. Tickoo, R. Iyer, ACM Multimedia 2015

Sample Results: Online Summarization
S. Chakraborty, O. Tickoo, R. Iyer, S. Panchanathan, IJCV (Under Review)

Sample Results: User Study
Offline Summarization Distributed Summarization Online Summarization

Facial Expression Recognition
Domain Adaptation Object Recognition “On the edge of my seat all through. Absolute Thriller. ” “Delightful service. Delicious food.” ? Handwritten Digit Recognition MNIST USPS Domain Disparity: Facial Expression Recognition

Domain Adaptation using Hierarchical Features
Domain Adaptive Hash network Supervised hash loss for source data MMD alignment for fully connected layers Unsupervised entropy loss for target data H. Venkateswara, J. Eusebio, S. Chakraborty, S. Panchanathan, IEEE CVPR 2017

Supervised Hash Loss for Source: MMD Alignment Loss for Source and Target :

Unsupervised Entropy Loss for Target: Entropy loss: Hash loss + MMD loss + Entropy loss:

Dataset for Domain Adaptation
Current Datasets Inadequacies Office-Home Dataset Current datasets were created before deep learning era and are small in size 4 domains: Art, Clipart, Product and Real-World 15,000 images 65 categories Office-Home Dataset

Sample Results: Recognition Accuracy
Recognition accuracies (%) for domain adaptation experiments on the Oﬃce-Home dataset. { Art (Ar), Clipart (Cl), Product (Pr), Real-World (Rw) } . Ar → Cl implies Ar is source and Cl is target.

Sample Results: Feature Analysis
A-distance between domain pairs: A = 2(1 - 2€) € -> generalization error of a binary classifier trained to distinguish between the two domains Deep Features DAN Features DAH Features t-SNE Embeddings for 10 categories from Art (o) and Clipart (+) domains

Deep Active Learning Wang and Shang, IJCNN 2014
Train a deep neural network on the labeled training data using a conventional loss function Pass each unlabeled sample to the network to get output probabilities with respect to all the classes Compute measures of uncertainty using the probability values and query the sample that produces maximal value of the uncertainty Uncertainty Measures Least Confidence: Margin Sampling: Entropy Sampling: Does not leverage the feature learning capabilities of deep neural networks The network is trained on a conventional loss function, but is used for active learning Can we use the feature learning capabilities of deep networks to develop a deep model, which is specially tailored for the active learning task

Problem Formulation Our Approach
Modify the loss function of the deep network and derive a function specific to active learning; modified loss function consist of two terms Cross-Entropy Loss for Labeled Data - Ensures labeled training samples are correctly classified Entropy Loss for Unlabeled Data - Ensures that the unlabeled data have minimal entropy Joint Loss for Active Learning - Combination of both cross entropy and entropy losses

Network Architecture

Deep Active Learning

Sample Results H. Ranganathan, H. Venkateswara, S. Chakraborty, S. Panchanathan, IEEE ICIP 2017

Thank You Active Learning Video Summarization

Computational Models for Multimedia Pattern Recognition

Similar presentations

Presentation on theme: "Computational Models for Multimedia Pattern Recognition"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computational Models for Multimedia Pattern Recognition

Similar presentations

Presentation on theme: "Computational Models for Multimedia Pattern Recognition"— Presentation transcript:

Similar presentations

About project

Feedback