Weilun Lao, Jungong Han, and Peter H.N. de With, Fellow, IEEE IEEE Transactions on Consumer Electronics, Vol. 55, No. 2, MAY 2009 Automatic Video-Based.

Slides:

Advertisements

Similar presentations

Caroline Rougier, Jean Meunier, Alain St-Arnaud, and Jacqueline Rousseau IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 5,

Advertisements

Víctor Ponce Miguel Reyes Xavier Baró Mario Gorga Sergio Escalera Two-level GMM Clustering of Human Poses for Automatic Human Behavior Analysis Departament.

Automatic Video Shot Detection from MPEG Bit Stream Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC.

Patch to the Future: Unsupervised Visual Prediction

1 Video Processing Lecture on the image part (8+9) Automatic Perception Volker Krüger Aalborg Media Lab Aalborg University Copenhagen

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Visual Event Detection & Recognition Filiz Bunyak Ersoy, Ph.D. student Smart Engineering Systems Lab.

Robust Foreground Detection in Video Using Pixel Layers Kedar A. Patwardhan, Guillermoo Sapire, and Vassilios Morellas IEEE TRANSACTION ON PATTERN ANAYLSIS.

Adviser ： Ming-Yuan Shieh Student ID ： M Student ： Chung-Chieh Lien VIDEO OBJECT SEGMENTATION AND ITS SALIENT MOTION DETECTION USING ADAPTIVE BACKGROUND.

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.

A KLT-Based Approach for Occlusion Handling in Human Tracking Chenyuan Zhang, Jiu Xu, Axel Beaugendre and Satoshi Goto 2012 Picture Coding Symposium.

Broadcast Court-Net Sports Video Analysis Using Fast 3-D Camera Modeling Jungong Han Dirk Farin Peter H. N. IEEE CSVT 2008.

Rodent Behavior Analysis Tom Henderson Vision Based Behavior Analysis Universitaet Karlsruhe (TH) 12 November /9.

Region-Level Motion- Based Background Modeling and Subtraction Using MRFs Shih-Shinh Huang Li-Chen Fu Pei-Yung Hsiao 2007 IEEE.

Multiple Human Objects Tracking in Crowded Scenes Yao-Te Tsai, Huang-Chia Shih, and Chung-Lin Huang Dept. of EE, NTHU International Conference on Pattern.

Vigilant Real-time storage and intelligent retrieval of visual surveillance data Dr Graeme A. Jones.

1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.

Effective Gaussian mixture learning for video background subtraction Dar-Shyang Lee, Member, IEEE.

Tracking Video Objects in Cluttered Background

Presented by Zeehasham Rasheed

A Probabilistic Framework For Segmentation And Tracking Of Multiple Non Rigid Objects For Video Surveillance Aleksandar Ivanovic, Tomas S. Huang ICIP 2004.

Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition Oytun Akman.

Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Dorin Comaniciu Visvanathan Ramesh (Imaging & Visualization Dept., Siemens Corp. Res. Inc.) Peter Meer (Rutgers University) Real-Time Tracking of Non-Rigid.

Object Tracking for Retrieval Application in MPEG-2 Lorenzo Favalli, Alessandro Mecocci, Fulvio Moschetti IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR.

Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques

Learning to classify the visual dynamics of a scene Nicoletta Noceti Università degli Studi di Genova Corso di Dottorato.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Tracking Pedestrians Using Local Spatio- Temporal Motion Patterns in Extremely Crowded Scenes Louis Kratz and Ko Nishino IEEE TRANSACTIONS ON PATTERN ANALYSIS.

Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Rare and Frequent Events in Multi-camera Surveillance.

Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)

Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.

A General Framework for Tracking Multiple People from a Moving Camera

BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.

3D SLAM for Omni-directional Camera

 Tsung-Sheng Fu, Hua-Tsung Chen, Chien-Li Chou, Wen-Jiin Tsai, and Suh-Yin Lee Visual Communications and Image Processing (VCIP), 2011 IEEE, 6-9 Nov.

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

Experiments Test different parking lot images captured in different luminance conditions The test samples include 1300 available parking spaces and 1500.

ENTERFACE 08 Project 2 “multimodal high-level data integration” Mid-term presentation August 19th, 2008.

1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.

Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.

Efficient Visual Object Tracking with Online Nearest Neighbor Classifier Many slides adapt from Steve Gu.

The geometry of the system consisting of the hyperbolic mirror and the CCD camera is shown to the right. The points on the mirror surface can be expressed.

A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER

Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.

By Naveen kumar Badam. Contents INTRODUCTION ARCHITECTURE OF THE PROPOSED MODEL MODULES INVOLVED IN THE MODEL FUTURE WORKS CONCLUSION.

GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.

 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.

Hand Gesture Recognition Using Haar-Like Features and a Stochastic Context-Free Grammar IEEE 高裕凱陳思安.

Wonjun Kim and Changick Kim, Member, IEEE

Target Tracking In a Scene By Saurabh Mahajan Supervisor Dr. R. Srivastava B.E. Project.

Preliminary Transformations Presented By: -Mona Saudagar Under Guidance of: - Prof. S. V. Jain Multi Oriented Text Recognition In Digital Images.

Key Ideas In Content Math 412 January 14, The Content Standards Number and Operations Algebra Geometry Measurement Data Analysis and Probability.

Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons.

Ontology-based Automatic Video Annotation Technique in Smart TV Environment Jin-Woo Jeong, Hyun-Ki Hong, and Dong-Ho Lee IEEE Transactions on Consumer.

Trajectory-Based Ball Detection and Tracking with Aid of Homography in Broadcast Tennis Video Xinguo Yu, Nianjuan Jiang, Ee Luang Ang Present by komod.

Zhaoxia Fu, Yan Han Measurement Volume 45, Issue 4, May 2012, Pages 650–655 Reporter: Jing-Siang, Chen.

Over the recent years, computer vision has started to play a significant role in the Human Computer Interaction (HCI). With efficient object tracking.

Ehsan Nateghinia Hadi Moradi (University of Tehran, Tehran, Iran) Video-Based Multiple Vehicle Tracking at Intersections.

Student Gesture Recognition System in Classroom 2.0 Chiung-Yao Fang, Min-Han Kuo, Greg-C Lee, and Sei-Wang Chen Department of Computer Science and Information.

GraphiCon 2008 | 1 Trajectory classification based on Hidden Markov Models Jozef Mlích and Petr Chmelař Brno University of Technology, Faculty of Information.

Automatic Video Shot Detection from MPEG Bit Stream

A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology

Gait Analysis for Human Identification (GAHI)

Eric Grimson, Chris Stauffer,

PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD

Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.

Presentation transcript:

Weilun Lao, Jungong Han, and Peter H.N. de With, Fellow, IEEE IEEE Transactions on Consumer Electronics, Vol. 55, No. 2, MAY 2009 Automatic Video-Based Human Motion Analyzer for Consumer Surveillance System

Outline Introduction Literature on surveillance video analysis Requirements of surveillance analysis systems Overview of proposed visual motion analysis system Techniques for human motion analysis Experimental results

Introduction Video surveillance can contribute to the safety of people in the home and ease control of home-entrance and equipment-usage functions.

Literature on surveillance video analysis Most surveillance systems have focused on understanding the events through the study of trajectories and positions of persons using a-priori knowledge about the scene. The Pfinder [2] system was developed to describe a moving person in an indoor environment. The VSAM [3] system can monitor activities over various scenarios, using multiple cameras which are connected as a network. The real-time visual surveillance system W4 [4] employs the combined techniques of shape analysis and body tracking, and models different appearances of a person. [2] C.R. Wren, A. Azarbayejani, T. Darrell and A.P. Pentland, “Pfinder:real-time tracking of the human body,” [3] R.T. Collins, A.J. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins, Y. Tsin, D. Tolliver, N. Enomoto and O. Hasegawa, “A system for video surveillance and monitoring [4] I. Haritaoglu, D. Harwood and L. Davis, “W4: real-time surveillance of people and their activities,”

Literature on surveillance video analysis Relying on the detected trajectories of the concerned objects. As the local properties of the detected persons are missing, the developed systems lack the semantic recognition result of dynamic human activities. In this paper, we explore the combination of using trajectory and posture recognition in order to improve the semantic analysis of the human behavior.

Requirements of surveillance analysis systems The specific challenges for consumer applications are as follows: The posture and motion analysis results should have sufficient accuracy for consumer acceptance. High-processing efficiency achieving (near) real-time operation with low-cost consumer hardware. A conversion of 2-D results to a 3-D space can facilitate the analysis of special events such as burglary.

In this paper The total framework consists of four processing levels: 1. A pre-processing level including background modeling and multiple- person detection. 2. An object-based level performing trajectory estimation and posture classification. 3. An event-based level for semantic analysis. 4. A visualization level including camera calibration and 3-D scene reconstruction.

In this paper It achieves a near real-time performance (6-8 frames/second)

In this paper The location and posture of persons are visualized in a 3-D space after performing camera calibration and integrating context knowledge. The accurate and realistic reconstruction in a virtual space can significantly contribute to the scene understanding, like crime- evidence collection and healthcare behavior analysis.

Overview of proposed visual motion analysis system Pre-processing level : The background modeling and object detection. Object-based level : It performs trajectory estimation and posture classification. Event-based level : Interaction relationships are modeled to infer a multiple-person event. Visualization level : With the aim of 2D-3D mapping calibration.

Overview of proposed visual motion analysis system

Techniques for human motion analysis Pre-processing level : Multi-person detection Object-based level : Trajectory estimation Individual action recognition with CHMM Event-based level : Interaction modeling Visualization level : 3-D scene reconstruction

Background subtraction: We perform a pixel-based background subtraction. The scene model has a probability density function for each pixel separately. A pixel from a new frame is considered to be a background pixel if its new value is well described by its density function. The Gaussian Mixture Model (GMM) is employed for the background subtraction. Recognizing persons: We use the k-Nearest Neighbor (k-NN) classifier The classifier utilizes two features : area, and the ratio of the bounding box attached to each detected object. Multi-person detection

Trajectory estimation Using mean-shift algorithm : For tracking persons Based on their individual appearance model Represented as a color histogram

Trajectory estimation 1. Extracting every new person entering the scene. 2. Calculating the corresponding histogram model in the image domain. 3. In subsequent frames for tracking that person, we shift the person object to the location whose histogram is the closest to the previous frame. After the trajectory is located, we can conduct the body-based analysis at the location of the person in every frame. When the trajectory is obtained, we can also estimate the position of the persons involved in the video scene.

Individual action recognition with CHMM Posture representation HV-PCA : a new, simple and effective shape descriptor, to represent the silhouette in each frame. 1. Every detected person silhouette is adapted to an M×N pixel template in a normalization phase (M=180 and N=80). 2. We apply the horizontal and vertical projections M=180 N=80 (0)

Individual action recognition with CHMM HV-PCA : In the vertical projection : 180-D shape vector → 60*3 → 2*3 (By PCA) → 6*1(reshape) Similarly, a vector of 8×1 is reshaped from the horizontal projection P(.) indicates our part-based PCA implementation

PCA Principal component analysis (PCA) is a mathematical procedure to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components.

Individual action recognition with CHMM Temporal modeling with CHMM A single-frame recognition is not sufficiently accurate when we require general motion classification. The temporal consistency is required. We use the Continuous Hidden Markov Model (CHMM) with left- right topology [12]. [12] L.R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,”

Individual action recognition with CHMM Suppose a CHMM has E states F output symbols It is fully specified by the triplet The E*E-state transition matrix A The E*F -state output probability matrix B is defined as The initial state distribution vector is specified as

Individual action recognition with CHMM Assign a CHMM model to each of the predefined posture types for the observed human body. Using the Baum-Welch algorithm to train each CHMM. So the triplet λ is obtained for each model. An observation sequence Calculate Recognize the posture class as being the one that is represented by the maximum probable model : (K=5, for the types : left- pointing, right-pointing, squatting, raising hands overhead, and lying)

Interaction Modeling In multi-person events, the event analysis is achieved by understanding the interactions between people. The events are rely on the temporal order and relationship of their sub-events (the individual posture).

Interaction Modeling To represent temporal relationships of sub-events: Temporal relationships TR={after, meets, during, finishes, overlaps, equal, starts} We can apply the heuristic rules to understand the scene. i.e. in robbery detection, the posture ‘’pointing’’ is a key reference posture.

Interaction Modeling

3-D Scene Reconstruction We want to implement the 2D-3D mapping. It is useful for scene understanding. Camera calibration: Since both the ground and the displayed image are planar, the mapping between them is a homography. p=Hp'

3-D Scene Reconstruction In our previous work [11], we have developed an automatic algorithm to establish the homography mapping for analyzing a tennis video. We manually put four white lines forming a rectangular on the ground. We have measured the length of each line in the real world, thereby defining their coordinates in the real-world domain. After performing the mapping, it plays a useful role in the crime- scene analysis, data retrieval and evidence collection. [11] J. Han, D. Farin, P.H.N. de With and W. Lao, “Real-time video content analysis tool for consumer media storage system,”

3-D Scene Reconstruction

Experimental results Training:10 video sequences, containing various single/multi-person motion (15 frames/s). Testing:15 similar sequences. Result: Person detection: 98% accuracy rate Person tracking : 95% detection rate

Experimental results The robbery detection rate is 90% in our captured simulated- robbery video sequences (in total 10 sequences) Our system is efficient, achieving a near real-time performance (6-8 frames/second for 640*480 resolution (VGA), with a P-IV 3-GHz PC)

Experimental results