Monitoring Human Behavior in an Office Environment Douglas Ayers and Mubarak Shah * Research Conducted at the University of Central Florida *Now at Harris.

Slides:



Advertisements
Similar presentations
ARTIFICIAL PASSENGER.
Advertisements

Bayesian Decision Theory Case Studies
Parts of a Computer.
Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.
Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
Automated Shot Boundary Detection in VIRS DJ Park Computer Science Department The University of Iowa.
Word Lesson 1 Microsoft Word Basics
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Computer Vision REU Week 2 Adam Kavanaugh. Video Canny Put canny into a loop in order to process multiple frames of a video sequence Put canny into a.
Conversation Form l One path through a use case that emphasizes interactions between an actor and the system l Can show optional and repeated actions l.
Video Object Tracking and Replacement for Post TV Production LYU0303 Final Year Project Spring 2004.
Face Detection: a Survey Speaker: Mine-Quan Jing National Chiao Tung University.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.
Detecting and Tracking Moving Objects for Video Surveillance Isaac Cohen and Gerard Medioni University of Southern California.
Multiple Human Objects Tracking in Crowded Scenes Yao-Te Tsai, Huang-Chia Shih, and Chung-Lin Huang Dept. of EE, NTHU International Conference on Pattern.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
CSE 291 Final Project: Adaptive Multi-Spectral Differencing Andrew Cosand UCSD CVRR.
Tracking Video Objects in Cluttered Background
Smart Traveller with Visual Translator. What is Smart Traveller? Mobile Device which is convenience for a traveller to carry Mobile Device which is convenience.
Behavior Analysis Midterm Report Lipov Irina Ravid Dan Kotek Tommer.
A Vision-Based System that Detects the Act of Smoking a Cigarette Xiaoran Zheng, University of Nevada-Reno, Dept. of Computer Science Dr. Mubarak Shah,
Real-Time Face Detection and Tracking Using Multiple Cameras RIT Computer Engineering Senior Design Project John RuppertJustin HnatowJared Holsopple This.
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
Knowledge Systems Lab JN 8/24/2015 A Method for Temporal Hand Gesture Recognition Joshua R. New Knowledge Systems Laboratory Jacksonville State University.
Vocabulary Test Review Directions: (Press F5 to start) 1.Click the mouse or use the arrow keys to get to the next slide. 2.Answer the question in your.
Communicating Information: and Attachments.
Pasewark & Pasewark 1 Word Lesson 1 Word Basics Microsoft Office 2007: Introductory.
Multimedia Databases (MMDB)
Chapter 1 Databases and Database Objects: An Introduction
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
© 2011 The McGraw-Hill Companies, Inc. All rights reserved Chapter 6: Video.
Visual Basic 2005 CHAPTER 2 Program and Graphical User Interface Design.
Tracking CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
1 Perception and VR MONT 104S, Fall 2008 Lecture 21 More Graphics for VR.
1 Artificial Intelligence: Vision Stages of analysis Low level vision Surfaces and distance Object Matching.
Limitations of Cotemporary Classification Algorithms Major limitations of classification algorithms like Adaboost, SVMs, or Naïve Bayes include, Requirement.
Technology Training Preview #1 START. Tips & Tricks Tab – Use the Tab Key, instead of your mouse, to jump to the next text area. Shift-Tab – Hold down.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Key Applications Module Lesson 17 — Organizing Worksheets Computer Literacy BASICS.
Autonomous Robots Vision © Manfred Huber 2014.
Ⅰ. PS Driver ML-4050N Series PostScript, Driver ML-4050N Series PostScript, Driver.
ExitTOC TIMSNT Introduction 2003-Part 3 1 Introduction to TIMSNT-Part 3 Part 3: Menu Items Runs, Driver Directions, Stops, Routes and Options Use the buttons.
CSSE463: Image Recognition Day 29 This week This week Today: Surveillance and finding motion vectors Today: Surveillance and finding motion vectors Tomorrow:
1 Word Lesson 1 Microsoft Word Basics Microsoft Office 2010 Introductory Pasewark & Pasewark.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Microsoft Office 2007: Introductory Pasewark & Pasewark 1.
Motion Segmentation at Any Speed Shrinivas J. Pundlik Department of Electrical and Computer Engineering, Clemson University, Clemson, SC.
12-Jun-16 Event loops. 2 Programming in prehistoric times Earliest programs were all “batch” processing There was no interaction with the user Input Output.
Learning and Removing Cast Shadows through a Multidistribution Approach Nicolas Martel-Brisson, Andre Zaccarin IEEE TRANSACTIONS ON PATTERN ANALYSIS AND.
Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons.
Over the recent years, computer vision has started to play a significant role in the Human Computer Interaction (HCI). With efficient object tracking.
Student Gesture Recognition System in Classroom 2.0 Chiung-Yao Fang, Min-Han Kuo, Greg-C Lee, and Sei-Wang Chen Department of Computer Science and Information.
Word Lesson 1 Word Basics
and Attachments Communicating Information and Attachments.
Scratch for Interactivity
CSSE463: Image Recognition Day 29
CS201 Lecture 02 Computer Vision: Image Formation and Basic Techniques
Video Google: Text Retrieval Approach to Object Matching in Videos
Chapter 6: Video.
Video-based human motion recognition using 3D mocap data
Computer Vision Lecture 4: Color
CSSE463: Image Recognition Day 29
From a presentation by Jimmy Huff Modified by Josiah Yoder
Event loops 17-Jan-19.
Knowledge-based event recognition from salient regions of activity
CSSE463: Image Recognition Day 29
Video Google: Text Retrieval Approach to Object Matching in Videos
CSSE463: Image Recognition Day 29
CSSE463: Image Recognition Day 29
Presentation transcript:

Monitoring Human Behavior in an Office Environment Douglas Ayers and Mubarak Shah * Research Conducted at the University of Central Florida *Now at Harris Corporation

Goals of the System Recognize human actions in a room for which prior knowledge is available. Handle multiple people Provide a textual description of each action Extract “key frames” for each action Recognize all actions simultaneously for a given field of view

Possible Actions Enter Leave Sitting or Standing Picking Up Object Put Down Object Remove Object Open or Close Use Terminal

Prior Knowledge Spatial layout of the scene: –Location of entrances and exits –Location of objects and some information about how they are use Context can then be used to improve recognition and save computation

Layout of Scene 1

Layout of Scene 2

Layout of Scene 3

Layout of Scene 4

Major Components Skin Detection Tracking Scene Change Detection Action Recognition

Flow of the System Skin Detection Track people and Objects for this Frame Scene Change Detection Update States, Output Text, Output Key Frames Determine Possible Interactions Between People and Objects

Skin Detection Uses a modified version of Kjeldsen and Kender method Uses Y'C b C r color space instead of HSV 3D Histogram based approach Training phase and detection phase Tracking initialized with a window containing the largest connected region

Tracking Tracking is done on heads and movable objects Code Provided by Fieguth and Terzopoulos Uses Y'C b C r colorspace Tracks multiple people and objects Simple non-linear estimator Goodness of Fit Equation:

Scene Change Detection Similar to method used by Sandy Pentland’s group in Pfinder Uses brightness normalized chromiance Takes a.5 sec clip to build model Determines change based on Mahalonobis distance

Possible Actions Enter: significant skin region detected for several frames Leave: tracking box is close to edge of entrance area Sitting or Standing: significant increase or decrease in tracking box’s y-component Picking Up Object: after the person has moved away from the object’s initial position, a scene change is detected

Possible Actions Put Down Object: person moves away from the object Remove Object: object close to edge of entrance area Open or Close: increase or decrease in the amount of scene change Use Terminal: person is sitting and a scene change in mouse, but not behind it

Start EndStandingSitting Near Cabinet Near Terminal Using Terminal Enter Sit Leave Use Terminal Near Phone Talking on Phone Hanging Up Phone Pick Up Phone Put Down Phone Stand Opening/Closing Cabinet Open / Close Cabinet Sit / 0 Stand / 0 State Model For Action Recognition

Key Frames Why get key frames? –Key frames take less space to store –Key frames take less time to transmit –Key frames can be viewed more quickly We use heuristics to determine when key frames are taken –Some are taken before the action occurs –Some are taken after the action occurs

Key Frames “Enter” key frames: as the person leaves the entrance/exit area “Leave” key frames: as the person enters the entrance/exit area “Standing/Sitting” key frames: after the tracking box has stopped moving up or down respectively “Open/Close” key frames: when the % of changed pixels stabilizes

Demo Sequences Two people: opening cabinet, closing cabinet, sitting and standing One person: picking up phone and putting down phone One person: moving file folder One person: sitting and using a terminal

Sequence 1

Key Frames Sequence 1 (350 frames), Part 1

Key Frames Sequence 1 (350 frames), Part 2

Sequence 2

Key Frames Sequence 2 (200 frames)

Sequence 3

Key Frames Sequence 3 (200 frames)

Sequence 4

Key Frames Sequence 4 (399 frames), Part 1

Key Frames Sequence 4 (399 frames), Part 2

Sequence 5

Sequence 6

Summary Successfully recognizes a set of human actions Uses prior knowledge about scene layout Handles multiple people simultaneously Generates key frames and textual descriptions

Limitations Limited by: –Skin detection –Tracking –Scene change detection Some actions difficult to model Limited by prior knowledge Limited by field of view

Future Work Increase action vocabulary Determine people’s identity Improve field of view –Wide angle lens –Multiple cameras Implement in real-time