Unsupervised Modelling, Detection and Localization of Anomalies in Surveillance Videos Project Advisor : Prof. Amitabha Mukerjee Deepak Pathak (10222)

Slides:



Advertisements
Similar presentations
Ziming Zhang, Yucheng Zhao and Yiwen Wan.  Introduction&Motivation  Problem Statement  Paper Summeries  Discussion and Conclusions.
Advertisements

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Space-time interest points Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute.
Statistical Topic Modeling part 1
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.
Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov April TexPoint fonts used in EMF. Read the TexPoint manual before.
Local Descriptors for Spatio-Temporal Recognition
Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.
Segmentation Divide the image into segments. Each segment:
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Latent Dirichlet Allocation a generative model for text
Project 4 out today –help session today –photo session today Project 2 winners Announcements.
Segmentation by Clustering Reading: Chapter 14 (skip 14.5) Data reduction - obtain a compact representation for interesting image data in terms of a set.
Human Action Recognition
Generative learning methods for bags of features
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Using spatio-temporal probabilistic framework for object tracking By: Guy Koren-Blumstein Supervisor: Dr. Hayit Greenspan Emphasis on Face Detection &
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
Instructor : Dr. K. R. Rao Presented by: Rajesh Radhakrishnan.
An Introduction to Action Recognition/Detection Sami Benzaid November 17, 2009.
Optical flow (motion vector) computation Course: Computer Graphics and Image Processing Semester:Fall 2002 Presenter:Nilesh Ghubade
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Tracking Pedestrians Using Local Spatio- Temporal Motion Patterns in Extremely Crowded Scenes Louis Kratz and Ko Nishino IEEE TRANSACTIONS ON PATTERN ANALYSIS.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
Machine learning & category recognition Cordelia Schmid Jakob Verbeek.
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Computer Vision, Robert Pless Lecture 11 our goal is to understand the process of multi-camera vision. Last time, we studies the “Essential” and “Fundamental”
Category Discovery from the Web slide credit Fei-Fei et. al.
Correspondence-Free Determination of the Affine Fundamental Matrix (Tue) Young Ki Baik, Computer Vision Lab.
Mentor: Salman Khokhar Action Recognition in Crowds Week 7.
Using Inactivity to Detect Unusual behavior Presenter : Siang Wang Advisor : Dr. Yen - Ting Chen Date : Motion and video Computing, WMVC.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber
Expectation-Maximization (EM) Case Studies
Using Webcast Text for Semantic Event Detection in Broadcast Sports Video IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 7, NOVEMBER 2008.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
1 Motion Analysis using Optical flow CIS601 Longin Jan Latecki Fall 2003 CIS Dept of Temple University.
Latent Dirichlet Allocation
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:
Motion Features for Action Recognition YeHao 3/11/2014.
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
14.0 Linguistic Processing and Latent Topic Analysis.
A PPLICATIONS OF TOPIC MODELS Daphna Weinshall B Slides credit: Joseph Sivic, Li Fei-Fei, Brian Russel and others.
Motion tracking TEAM D, Project 11: Laura Gui - Timisoara Calin Garboni - Timisoara Peter Horvath - Szeged Peter Kovacs - Debrecen.
The topic discovery models
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
Nonparametric Semantic Segmentation
Video Google: Text Retrieval Approach to Object Matching in Videos
The topic discovery models
Text Detection in Images and Video
Lecture 26: Faces and probabilities
The topic discovery models
Anomaly Detection in Crowded Scenes
Image and Video Processing
Announcements Project 4 out today Project 2 winners help session today
Topic Models in Text Processing
Video Google: Text Retrieval Approach to Object Matching in Videos
EM Algorithm and its Applications
The “Margaret Thatcher Illusion”, by Peter Thompson
Presentation transcript:

Unsupervised Modelling, Detection and Localization of Anomalies in Surveillance Videos Project Advisor : Prof. Amitabha Mukerjee Deepak Pathak (10222) Abhijit Sharang (10007)

What is an “Anomaly” ? Anomaly refers to the unusual (or rare event) occurring in the video Definition is ambiguous and depends on context Idea : Learn the “usual” events in the video and use the information to tag the rare events.

Modelling Unsupervised Modelling Detection Anomalous Clip Detection Localization Spatio- Temporal Anomaly Localization

Step 1 : Unsupervised Modelling Model the “usual” behaviour of scene using parametric bayesian modelling. Topic Models : Leveraged from Natural Language Processing Given: Document and Vocabulary Document is histogram over vocabulary Goal: Identify topics in a given set of Documents [Topics are latent variables] Alternate view : Clustering in topic space Dimensionality reduction

NLP to Vision : Notations Text AnalysisVideo Analysis Vocabulary of wordsVocabulary of visual words Text documentsVideo clips TopicsActions/Events

Video Clips (or Documents) 45 minute video footage of traffic available 25 frames per second 4 kinds of anomaly Divided into clips of fixed size of 4 seconds (obtained empirically last semester)

Feature Extraction Three components of visual word : Location Spatio-Temporal Gradient and Flow Information Object size Features are extracted only from foreground pixels for increasing the efficiency

Foreground Extraction Extracted using ViBe foreground algorithm and smoothened afterwards using morphological filters

Visual Word Location : Each frame of dimension m x n is divided into blocks of 20 x 20 HOG - HOF descriptor : For each block, a foreground pixel was selected at random and spatio-temporal descriptor was computed around it. From the descriptors obtained from the training set, 200,000 descriptors were randomly selected. 20 cluster centres were obtained from these descriptors by k- means clustering. Each descriptor was assigned to one of these centres. Size : In each block, we compute the connected components of the foreground pixels The size of the connected components is quantised to two values: large and small

pLSA : Topic Model

Step 2 : Detection We propose “Projection Model Algorithm” with the following key idea – Project the information learnt in training onto the test document word space, and analyze each word individually to tag it as usual or anomalous. Robust to the quantity of anomaly present in video clip.

Preliminaries

word Test document m nearest training documents Bhattacharya distance Cumulative histogram of words Check Frequency Eight Spatial neighbours of word Word is “Usual”

Detection : Now each visual word has been labelled as “anomalous” or “usual”. Depending on the amount of anomalous words, call the complete test document as anomalous or usual.

Step 3 : Localization Spatial Localization : Since every word has location information in it, w can directly localize the anomalous words in test document to their spatial locality. Temporal Localization : This requires some book-keeping while creating term-frequency matrix of documents. We could maintain a list of frame numbers corresponding to document-word pair.

Results Demo Anomaly detection Anomaly localization

Results : Precision-Recall Curve

Results : ROC Curve

Main Contributions Richer word feature space by incorporating local spatio- temporal gradient-flow information. Proposed “projection model algorithm” which is agnostic to quantity of anomaly present. Anomaly Localization in spatio-temporal domain. Other Benefit : Extraction of common actions corresponding to most probable topics.

References Varadarajan, Jagannadan, and J-M. Odobez. "Topic models for scene analysis and abnormality detection." Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on. IEEE, Niebles, Juan Carlos, Hongcheng Wang, and Li Fei-Fei. "Unsupervised learning of human action categories using spatial-temporal words." International Journal of Computer Vision 79.3 (2008): Olivier Barnich and Marc Van Droogenbroeck. “Vibe: A universal background subtraction algorithm for video sequences”. Image Processing, IEEE Transactions on, 20(6): , Mahadevan, Vijay, et al. "Anomaly detection in crowded scenes." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, Roshtkhari, Mehrsan Javan, and Martin D. Levine. "Online Dominant and Anomalous Behavior Detection in Videos.“ Ivan Laptev, Marcin Marszalek, Cordelia Schmid, and Benjamin Rozenfeld. “Learning realistic human actions from movies”. In Computer Vision and Pattern Recognition, CVPR IEEE Conference on, pages 1-8. IEEE, Hofmann, Thomas. "Probabilistic latent semantic indexing." Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine Learning research 3 (2003):

Summary (Last Semester) Related Work Image Processing – Foreground Extraction – Dense Optical Flow – Blob extraction Implementing adapted pLSA Empirical estimation of certain parameters Tangible Actions/Topics Extraction

Extra Slides About Background subtraction HOG HOF pLSA and its EM Previous results

Background subtraction Extraction of foreground from image Frame difference D(t+1) = | I(x,y,t+1) – I(x,y,t) | Thresholding on the value to get a binary output Simplistic approach(can do with extra data but cannot miss any essential element) Foreground smoothened using median filter

Optical flow example (a) Translation perpendicular to a surface. (b) Rotation about axis perpendicular to image plane. (c) Translation parallel to a surface at a constant distance. (d) Translation parallel to an obstacle in front of a more distant background. Slides from Apratim Sharma’s presentation on optical flow,CS676

Optical flow mathematics Gradient based optical flow Basic assumption: I(x+Δx,y+Δy,t+Δt) = I(x,y,t) Expanded to get I x V x +I y V y +I t = 0 Sparse flow or dense flow Dense flow constraint: Smoothness : motion vectors are spatially smooth Minimise a global energy function

pLSA : Topic Model

EM Algorithm: Intuition E-Step Expectation step where expectation of the likelihood function is calculated with the current parameter values M-Step Update the parameters with the calculated posterior probabilities Find the parameters that maximizes the likelihood function

EM: Formalism

EM in pLSA: E Step It is the probability that a word w occurring in a document d, is explained by aspect z (based on some calculations)

EM in pLSA: M Step All these equations use p(z|d,w) calculated in E Step Converges to local maximum of the likelihood function

Results (ROC Plot)

Results (PR Curve)