Semantic Video Classification

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.

Image Retrieval: Current Techniques, Promising Directions, and Open Issues Yong Rui, Thomas Huang and Shih-Fu Chang Published in the Journal of Visual.

Face Recognition & Biometric Systems, 2005/2006 Face recognition process.

Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.

A Study of Approaches for Object Recognition

ACM Multimedia th Annual Conference, October , 2004

1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.

Traditional Database Indexing Techniques for Video Database Indexing Jianping Fan Department of Computer Science University of North Carolina at Charlotte.

Presented by Zeehasham Rasheed

Data mining and statistical learning - lecture 13 Separating hyperplane.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.

Oral Defense by Sunny Tang 15 Aug 2003

DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

Computer vision: models, learning and inference Chapter 6 Learning and Inference in Vision.

Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.

CSE 185 Introduction to Computer Vision Pattern Recognition.

Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign.

Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

IBM QBIC: Query by Image and Video Content Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC 28223

A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

Hierarchical Classification

1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.

Image Classification for Automatic Annotation

Autonomous Robots Vision © Manfred Huber 2014.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.

Multimedia Analytics Jianping Fan Department of Computer Science University of North Carolina at Charlotte.

Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.

Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC Relevance Feedback for Image Retrieval.

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Course : T Computer Vision

Data Mining – Intro.

Computer vision: models, learning and inference

Convolutional Neural Network

ECE 417 Lecture 1: Multimedia Signal Processing

Automatic Video Shot Detection from MPEG Bit Stream

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

Supervised Time Series Pattern Discovery through Local Importance

Personalized Social Image Recommendation

Mixture of SVMs for Face Class Modeling

Basic machine learning background with Python scikit-learn

Machine Learning Basics

Dynamical Statistical Shape Priors for Level Set Based Tracking

Object detection as supervised classification

Project Implementation for ITCS4122

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

In summary C1={skin} C2={~skin} Given x=[R,G,B], is it skin or ~skin?

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Image Segmentation Techniques

I don’t need a title slide for a lecture

Multimedia Information Retrieval

Ying Dai Faculty of software and information science,

John H.L. Hansen & Taufiq Al Babba Hasan

Paper Reading Dalong Du April.08, 2011.

Midterm Exam Closed book, notes, computer Similar to test 1 in format:

Midterm Exam Closed book, notes, computer Similar to test 1 in format:

Human-object interaction

Presentation transcript:

Semantic Video Classification Jianping Fan Department of Computer Science UNC-Charlotte, USA

Outline of presentation Traditional Approaches Deep Learning Approach

1. Research Motivation Challenges: Content Representation and Concept Interpretation Tusk – Spear Ear – Huge Fan Body -- Wall Tail -- Snake Trunk --- Tree Leg --- Pillar Most existing CBVR systems describe multimedia like six blind men!

Semantic Medical Concepts 1. Research Motivation Semantic Medical Concepts Semantic Gap Low-Level Multimodal Signals We focus on a medical education video domain!

2. Problem Definition Quality of features Video content representation Semantic video concept modeling Classifier training and feature selection Semantic video classification Knowledge Visualization for Video Access

3. Video Content Representation Video Shots Easy to implement Features may not be representative Semantic Video Objects Difficult to implement Features are representative How to take both advantages?

3. Video Content Representation Semantic Medical Concepts Gap 2 Semantic Gap Principal Video Shots Gap 1 Low-Level Multimodal Signals

3. Video Content Representation Definition of Principal Video Shot Video Shots Salient Objects of Interesting Principal Video Shot Physical Video Shot Concept-Sensitive Salient Objects

3. Video Content Representation Salient Object Definition Medical Video Stream Video Stream Audio Stream Surgery Objects Medical Equipment Video Text Medical Environment Human Voice Noise Music Taxonomy of Medical Video Components

3. Video Content Representation Salient object definition: semantic-sensitive, feature-invariant objects in the domain of interest General News Medical Face Microphone Visceral Videotext Crowd Blood

3. Video Content Representation Automatic Salient Object Detection

3. Video Content Representation Automatic Salient Object Detection Support Vector Machine is used for binary region classification!

3. Video Content Representation Automatic Salient Object Detection Salient Objects Human Face Lecture Slide Blood Regions Background Music Precision 90.3% 92.4% 90.2% 94.6% Recall 87.5% 89.6% 86.7% 81.2% Salient Objects Human Speech Human Skin Lecture Sketch Blue Cloth Precision 89.8% 96.3% 93.3% 96.7% Recall 82.6% 95.4% 88.5% 95.8%

3. Video Content Representation

3. Video Content Representation

3. Video Content Representation

3. Video Content Representation

3. Video Content Representation

3. Video Content Representation Global Visual Features for Video Shot Representation; Local Visual Features for Salient Object Representation

3. Video Content Representation Global Visual Features for Video Shot Representation Video Sequence Shot 1 Shot i Shot n Color HSV color histogram, dominant color, … Texture Edge histogram, wavelet coefficients, Tamura features, … Motion Directional motion histogram, Camera motion, … Other features

3. Video Content Representation Local Visual Features for Salient Object Representation Video Sequence Object 1 Object i Object n Color HSV color histogram, dominant color, … Texture Edge histogram, Tamura, …. Shape Rectangular box, moments, ….. Motion Trajectory, motion histogram, … Confidence Map

4. Semantic Video Concept Modeling Concept i Semantic Video Concept Nc Gap 2 PVS Class 1 PSV Class j PSV Class S Semantic Gap Gap 1 Low-Level Auditory Signal Low-Level Vision Signal Low-Level Image-Textual Signal

4. Semantic Video Concept Modeling

4. Semantic Video Concept Modeling

4. Semantic Video Concept Modeling How can we formulate this composition relationship (concept modeling)?

4. Semantic Video Concept Modeling How to model the data distributions for concepts?

4. Semantic Video Concept Modeling Atomic Video Concept Modeling Atomic Video Concept Type 1 PSV Type i PSV Type n PSV

4. Semantic Video Concept Modeling Hierarchical Video Concept Modeling Semantic Video Concept at the second level: Semantic Video Concept at the nth level:

5. Video Concept Learning for Classification Model Selection & Parameter Estimation We need techniques for estimating: Model Structure Model Parameters

5. Video Concept Learning for Classification Adaptive EM Algorithm (a) Start from a large K Perform automatic merging, splitting, & elimination © Incorporating negative samples for concept learning

5. Video Concept Learning for Classification Adaptive EM Algorithm Overlapped mixture components from same concept model: merging

5. Video Concept Learning for Classification Adaptive EM Algorithm Elongated mixture components from same concept model: splitting Sample Distributions Mixture Component

5. Video Concept Learning for Classification Adaptive EM Algorithm Tailed mixture components from different concept models: splitting & elimination Concept 1 Concept 2

5. Video Concept Learning for Classification Merging: Splitting: Elimination:

5. Video Concept Learning for Classification Normalization Acceptance Probability

5. Video Concept Learning for Classification Advantages Performing model selection and parameter estimation jointly in a single algorithm! Negative samples are incorporated for discriminative learning of finite mixture models with higher classification accuracy! © New approach for hierarchical classifier training by performing automatic merging, splitting, and elimination of components!

6. Learning with Unlabeled Samples Unlabeled Samples for Concept Learning Certain Unlabeled Samples Unlabeled Samples Used for Concept Learning Unlabeled Samples Informative Unlabeled Samples Uncertain Unlabeled Samples Outliers & New Concepts

6. Learning with Unlabeled Samples Unlabeled samples partitioning Certain unlabeled samples Outliers & New Concepts Informative unlabeled samples

6. Learning with Unlabeled Samples Unlabeled Samples for Concept Learning Confidence Score:

6. Learning with Unlabeled Samples Unlabeled Samples for Concept Learning Penalty Term: Certain Unlabeled Samples Uncertain Unlabeled Samples 1

6. Learning with Unlabeled Samples Birth of New Mixture Components Birth operation is added to capture unknown image context classes that are interpreted by informative unlabeled samples .

7. Hierarchical Video Concept Learning Second-Level Semantic Video Concept Parent Node Atomic Video Concept 1 Atomic Video Concept i Atomic Video Concept Children Nodes

7. Hierarchical Video Concept Learning Model Integration by combining mixture components from all its children nodes where the initial model structure is:

7. Hierarchical Video Concept Learning Elimination Merging

7. Hierarchical Video Concept Learning Splitting Conclusion: Using class distribution for high-level concept learning can achieve better performance than using their prediction outputs!

8. Semantic Video Classification Bayesian Rule for Video Classification:

8. Semantic Video Classification

8. Semantic Video Classification

8. Semantic Video Classification

9. Benchmark and Algorithm Evaluation Precision Missing-recall

9. Algorithm Evaluation Convergence of Adaptive EM Algorithm

9. Algorithm Evaluation Convergence of Adaptive EM Algorithm

9. Algorithm Evaluation Effectiveness of Adaptive EM Algorithm

9. Algorithm Evaluation Hierarchical Concept Learning More concept levels may reduce hypothesis variance! (b) More concept levels may increase risk of transmission of errors among levels!

9. Algorithm Evaluation Training with Unlabeled Samples

9. Benchmark Classification accuracy vs. representation Concepts Class Lecture Trauma Surgery Diagnosis Salient Objects 81.6%, 82.1% 82.3%, 81.9% 81.7%, 89.3% Video Shots 64.9%, 68.5% 64.3%, 65.7% 61.1%, 59.7% Gastroinstinal Surgery Concepts Dialog Burn Surgery Salient Objects 81.4%, 85.2% 85.1%, 89.3% 79.2%, 78.6% Video Shots 71.5%, 69.3% 67.5%, 68.2% 64.8%, 69.1%

9. Benchmark

10. Video Skimming

11. Video Privacy Protection

11. Video Privacy Protection

12. Semantic Video Concept Modeling via Support Vector Machines

12. Semantic Video Concept Modeling via Support Vector Machines positive samples negative samples positive support vectors margin negative support vectors

12. Semantic Video Concept Modeling via Support Vector Machines

13. SVM Classifier Training Support Vector Machine may be one solution, but.. -- Training cost is , M is sample size -- Curse of Dimensionality – large-scale training samples are needed for reliable SVM classifier training! High-dimensional feature space! -- Kernel functions should be able to characterize the underlying geometric properties of image data! Heterogeneous feature space!

13. SVM Classifier Training If RBF kernel is used, we need techniques for estimating the SVM parameters: C Coast to Fine Search

13. SVM Classifier Training Incremental SVM Classifier Training & Convergence ---How to incorporate informative unlabeled samples for classifier training? Batch-Based Approach Incremental Approach

13. SVM Classifier Training Unlabeled Samples for Classifier Training Weak SVM Classifier Certain Unlabeled Samples Incremental SVM Classifier Training Unlabeled Samples Informative Unlabeled Samples Uncertain Unlabeled Samples New Concepts & Outliers

14. Feature Partition and Selection Feature Heterogeneity – Different feature types RBF kernel cannot effectively characterize the underlying geometric properties of image data in such heterogeneous feature space More existing feature selection techniques can only work on homogeneous feature space! filtering, wrapping, boosting Feature correlation is ignored!

14. Feature Partition and Selection Semantic Video Concept Feature Selection by selecting the most relevant classifiers! High-Dimensional Visual Features Prior knowledge PCA Feature Subset i Feature Subset 1 Feature Subset n Boost on both training samples and features!

14. Feature Partition and Selection Advantages -- The dimensions for each feature subset are relative low, thus less training samples are needed --- speed up SVM classifier training -- Feature selection in heterogeneous space via classifier selection -- The geometric property of image data in each feature subset can be characterized effectively by using RBF kernels. --Hierarchical feature selection

15. Benchmark More is Less: Optimal Feature Subset

16. Hierarchical Video Database Summarization & Visualization

16. Hierarchical Video Database Summarization & Visualization Concept-Oriented Video Summarization

Deep Learning for Video Classification

Deep Learning for Video Classification

Deep Learning for Video Classification

Deep Learning for Video Classification

Deep Learning for Video Classification

Deep Learning for Video Classification

Deep Learning for Video Classification

Deep Learning for Video Classification

Deep Learning for Video Classification

Deep Learning for Video Classification

Deep Learning for Video Classification

Multi-Modal Deep Learning for Video Classification

Multi-Modal Deep Learning for Video Classification

Multi-Modal Deep Learning for Video Classification

Multi-Modal Deep Learning for Video Classification

Multi-Modal Deep Learning for Video Classification