Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

Applications of one-class classification
Bayesian Belief Propagation
Unsupervised Learning
Hyeonsoo, Kang. ▫ Structure of the algorithm ▫ Introduction 1.Model learning algorithm 2.[Review HMM] 3.Feature selection algorithm ▫ Results.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Supervised Learning Recap
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Hidden Markov Models in NLP
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Lecture 5: Learning models using EM
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Conditional Random Fields
Presented by Zeehasham Rasheed
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Isolated-Word Speech Recognition Using Hidden Markov Models
Graphical models for part of speech tagging
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Structure Discovery of Pop Music Using HHMM E6820 Project Jessie Hsu 03/09/05.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Using Inactivity to Detect Unusual behavior Presenter : Siang Wang Advisor : Dr. Yen - Ting Chen Date : Motion and video Computing, WMVC.
CS Statistical Machine learning Lecture 24
Slides for “Data Mining” by I. H. Witten and E. Frank.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
CSE 517 Natural Language Processing Winter 2015
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Machine Learning and Data Mining Clustering
Statistical Models for Automatic Speech Recognition
estimated tracklet partition
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Hidden Markov Models Part 2: Algorithms
A Unifying View on Instance Selection
Statistical Models for Automatic Speech Recognition
SMEM Algorithm for Mixture Models
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
CONTEXT DEPENDENT CLASSIFICATION
Markov Random Fields Presented by: Vladan Radosavljevic.
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Machine Learning and Data Mining Clustering
Presentation transcript:

Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011

What purpose does Markov Chain Monte-Carlo(MCMC) serve in this chapter? Quiz of the Chapter

 1 Introduction  1.1Keywords  1.2 Examples  1.3 Structure discovery problem  1.4 Characteristics of video structure  1.5 Approach  2 Methods  Hierarchical Hidden Markov Models  Learning HHMM parameters with EM  Bayesian model adaptation  Feature selection for unsupervised learning  3 Experiments & Results  4 Conclusion Agenda

 Algorithms for discovering statistical structures and finding informative features from videos in an unsupervised setting.  Effective solutions to video indexing require detection and recognition of structures and events.  We focus on temporal structures 1 Introduction

 Hierarchical hidden Markov Model(HHMM)  Hidden Markov model(HMM)  Markov Chain Monte-Carlo(MCMC)  Dynamic Bayesian network(DBN)  Bayesian Information criteria(BIC)  Maximum Likelihood(ML)  Expectation Maximization(EM) 1.1 Introduction: keywords

 General to various domains and applicable at different levels  At the lowest level, repeating color schemes in a video  At the mid level, seasonal trends in web traffics  At the highest level, genetic functional regions 1.2Introduction: examples

 The problem of identifying structure consists of two parts: finding and locating.  The former is referred as training, while the latter is referred to as classification.  Hidden Markov Model(HMM) is a discrete state-space stochastic model with efficient learning algorithm that works well for temporally correlated data streams and successful application. However, due to domain restrictions, we propose a new algorithm that fully unsupervised statistical techniques. 1.3 Introduction: the structure discovery problem

 Fixed domain: audio-visual streams  The structures have the following properties:  Video structure are in a discrete state-space  features are stochastic  sequences are correlated in time  Focus on dense structures  Assumptions  Within events, states are discrete and Markov  Observations are associated with states under Gaussian 1.4 Introduction: Characteristics of Video Structure

 Model the temporal, dependencies in video and generic structure of events in a unified statistical framework  Model recurring events in each video as HMM and HHMM, where the state inference and parameter estimation learned using EM  Developed algorithms to address model selection and feature selection problems  Bayesian learning techniques for model complexity  Bayesian Information Criteria as model posterior  Filter-wrapper method for feature selection 1.5 Introduction: Approach

 Use two-level hierarchical hidden Markov model  Higher- level elements correspond to semantic events and lower-levels elements represent variations  Special case of Dynamic Bayesian Network  Could be extended to more levels and feature distribution is not constrained to a mixture of Gaussians 2 Hierarchical Hidden Markov Models(HHMM)

2. Hierarchical Hidden Markov Models: Graphical Representation

 Generalization to HMM with a hierarchical control structure.  Bottom-up structure 2 Hierarchical Hidden Markov Models: Structure of HHMM

 (1) supervised learning  (2) unsupervised learning  (3) a mixture of the above 2 Hierarchical Hidden Markov Models: Structure of HHMM: applications

 Multi-level hidden state inference with HHMM is O(T 3 );however, not optimal due to some other algorithm with O(T).  A generalized forward-backward algorithm for hidden state inference  A generalized EM algorithm for parameter estimation with O(DT*|Q| 2D ). 2 Complexity of Inferencing and Learning with HHMM

 Representations of states and parameter set of an HHMM  Scope of EM is the basic parameter estimation  Model size given and Learned over a per-defined feature set 2 Learning HHMM parameter with EM

2 Learning HHMM parameters with EM: representing an HHMM The entire configuration of the hierarchical states from top to bottom with N-ary and D-digit integer. Whole parameter set theta of an HHMM is represented by the followings:

2 Learning HHMM parameters with EM: Overview of EM algorithm

 Parameter learning for HHMM using EM is known to converge to a local max and predefined model structures.  It has drawbacks, thus we adopt and Bayesian model.  use a Markov Chain Monte Carlo(MCMC) to maximize Bayesian information criterion 2 Bayesian Model adaptation

 A class of algorithms designed to solve high dimensional optimization problems  MCMC iterates between two steps  new model sample based on current model and stat of data  Decision step computes an acceptance probability based on fitness of the proposed new model  Converge to global optimum 2 Overview of MCMC

 Model adaptation for HHMM involves an iterative procedure.  Based on the current model, compute a probability profile involving EM, split(d),merge(d) and swap(d)  Certain formula to determine whether a proposed move is accepted 2 MCMC for HHMM

 Select a relevant and compact feature subset that fits the HHMM model  Task of feature selection is divided into two aspect:  Eliminating irrelevant and redundant ones 2 Feature selection for unsupervised learning

 Suppose the feature is a discrete set  e.g F={ f 1, …,f D }  Markov blanket filtering to eliminate redundant features  A human operator needed to decide on whether to iterate 2 Feature selection for unsupervised learning: feature selection algorithm

2 Feature selection for unsupervised learning: evaluating information gain

 After wrapping information gain criterion, we are left with possible redundancy.  Need to apply Markov blanket to solve this matter  Iterative algorithm with a threshold less than 5% 2 Feature selection for unsupervised learning: finding a Markov blanket

 Computes a value that influences decision on whether to accept it.  Initialization and convergence issues exist, so randomization. 2 Feature selection for unsupervised learning: normalized BIC

3 Experiments & Results Sports videos represent an interesting structure discovery

 We compare the learning accuracy of four different learning schemes against the ground truth  Supervised HMM  Supervised HHMM  Unsupervised HHMM without model adaptation  Unsupervised HHMM with model adaptation  EM  MCMC 3 Experiments & Results: parameter and structure learning

Run each of the four algorithm for 15 times with random starting points

 Test the performance of the automatic feature selection method on the two video clips  For the Spain case, the evaluation has an accuracy of 74.8% and the Korea clip achieves an accuracy of 74.5% 3 Experiments & Results: feature selection

 Conduct the baseball video clip on a different domain  HHMM learning with full model adaptation  Consistent results and agree with intuition 3 Experiments & Results: testing on a different domain

 Simplified HHMM boils down to a sub-HMM but left to right model with skips  Fully connected general 2-level HHMM model  Results show the constrained model is 2.3% lower than the fully connected model, but more modeling power 3 Experiments & Results: comparing to HHMM with simplifying constraints

 In this chapter, we proposed algorithms for unsupervised discovery of structure from video sequences.  We model video structures using HHMM with parameters learned using EM and MCMC.  We test them out on two different video clips and achieve results comparable to its supervised learning counterparts  Application to many other domains and simplified constraints. 4 Conclusion

 It serves to solve high dimensional optimization problems Solution to the Quiz

 Questions? Q&A