Stochastic Grammars: Overview Representation: Stochastic grammar Representation: Stochastic grammar Terminals: object interactions Terminals: object interactions.

Slides:



Advertisements
Similar presentations
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Advertisements

Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.
Scene Labeling Using Beam Search Under Mutex Constraints ID: O-2B-6 Anirban Roy and Sinisa Todorovic Oregon State University 1.
Review: Constraint Satisfaction Problems How is a CSP defined? How do we solve CSPs?
Simple Face Detection system Ali Arab Sharif university of tech. Fall 2012.
1 Video Processing Lecture on the image part (8+9) Automatic Perception Volker Krüger Aalborg Media Lab Aalborg University Copenhagen
Vision Based Control Motion Matt Baker Kevin VanDyke.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
Foreground Background detection from video Foreground Background detection from video מאת : אבישג אנגרמן.
A Nonparametric Treatment for Location/Segmentation Based Visual Tracking Le Lu Integrated Data Systems Dept. Siemens Corporate Research, Inc. Greg Hager.
Semantic analysis Parsing only verifies that the program consists of tokens arranged in a syntactically-valid combination, we now move on to semantic analysis,
Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.
Quadtrees, Octrees and their Applications in Digital Image Processing
What is the temporal feature in video sequences?
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
A Bayesian algorithm for tracking multiple moving objects in outdoor surveillance video Department of Electrical Engineering and Computer Science The University.
Quadtrees, Octrees and their Applications in Digital Image Processing
Vision Computing An Introduction. Visual Perception Sight is our most impressive sense. It gives us, without conscious effort, detailed information about.
A Wrapper-Based Approach to Image Segmentation and Classification Michael E. Farmer, Member, IEEE, and Anil K. Jain, Fellow, IEEE.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
1 Video Surveillance systems for Traffic Monitoring Simeon Indupalli.
Real-Time Vision on a Mobile Robot Platform Mohan Sridharan Joint work with Peter Stone The University of Texas at Austin
ANTLR.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Chapter 5 Models and theories 1. Cognitive modeling If we can build a model of how a user works, then we can predict how s/he will interact with the interface.
Soft Computing Lecture 20 Review of HIS Combined Numerical and Linguistic Knowledge Representation and Its Application to Medical Diagnosis.
BraMBLe: The Bayesian Multiple-BLob Tracker By Michael Isard and John MacCormick Presented by Kristin Branson CSE 252C, Fall 2003.
1. Introduction Motion Segmentation The Affine Motion Model Contour Extraction & Shape Estimation Recursive Shape Estimation & Motion Estimation Occlusion.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Probabilistic Context Free Grammars for Representing Action Song Mao November 14, 2000.
NATIONAL TECHNICAL UNIVERSITY OF ATHENS Image, Video And Multimedia Systems Laboratory Background
Digital Image Processing & Analysis Spring Definitions Image Processing Image Analysis (Image Understanding) Computer Vision Low Level Processes:
Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.
Some Probability Theory and Computational models A short overview.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Video Segmentation Prepared By M. Alburbar Supervised By: Mr. Nael Abu Ras University of Palestine Interactive Multimedia Application Development.
ENTERFACE 08 Project 2 “multimodal high-level data integration” Mid-term presentation August 19th, 2008.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Vehicle Segmentation and Tracking From a Low-Angle Off-Axis Camera Neeraj K. Kanhere Committee members Dr. Stanley Birchfield Dr. Robert Schalkoff Dr.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
Expectation-Maximization (EM) Case Studies
Multiple track hypotheses maintained for each possible blob-object mapping Parse driven by blob interaction events Domain-general events are detected (e.g.,
Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.
By Naveen kumar Badam. Contents INTRODUCTION ARCHITECTURE OF THE PROPOSED MODEL MODULES INVOLVED IN THE MODEL FUTURE WORKS CONCLUSION.
Hand Gesture Recognition Using Haar-Like Features and a Stochastic Context-Free Grammar IEEE 高裕凱 陳思安.
Journal of Visual Communication and Image Representation
Lucent Technologies - Proprietary 1 Interactive Pattern Discovery with Mirage Mirage uses exploratory visualization, intuitive graphical operations to.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Wonjun Kim and Changick Kim, Member, IEEE
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
Tracking Groups of People for Video Surveillance Xinzhen(Elaine) Wang Advisor: Dr.Longin Latecki.
Understanding Naturally Conveyed Explanations of Device Behavior Michael Oltmans and Randall Davis MIT Artificial Intelligence Lab.
By Kyle McCardle.  Issues with Natural Language  Basic Components  Syntax  The Earley Parser  Transition Network Parsers  Augmented Transition Networks.
Learning and Removing Cast Shadows through a Multidistribution Approach Nicolas Martel-Brisson, Andre Zaccarin IEEE TRANSACTIONS ON PATTERN ANALYSIS AND.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Motion Estimation of Moving Foreground Objects Pierre Ponce ee392j Winter March 10, 2004.
CIRP Annals - Manufacturing Technology 60 (2011) 1–4 Augmented assembly technologies based on 3D bare-hand interaction S.K. Ong (2)*, Z.B. Wang Mechanical.
Segmentation of Building Facades using Procedural Shape Priors
Computer vision: models, learning and inference
Leveraging High-Level Expectations for Activity Recognition
Machine Learning Ali Ghodsi Department of Statistics
Vehicle Segmentation and Tracking in the Presence of Occlusions
Image Segmentation Techniques
Leveraging High-Level Expectations for Activity Recognition
Knowledge-based event recognition from salient regions of activity
Presentation transcript:

Stochastic Grammars: Overview Representation: Stochastic grammar Representation: Stochastic grammar Terminals: object interactions Terminals: object interactions Context-sensitive due to internal scene models Context-sensitive due to internal scene models Domain: Towers of Hanoi Domain: Towers of Hanoi Requires activities with strong temporal constraints Requires activities with strong temporal constraints Contributions Contributions Showed recognition & decomposition with very weak appearance models Showed recognition & decomposition with very weak appearance models Demonstrated usefulness of feedback from high to low-level reasoning components Demonstrated usefulness of feedback from high to low-level reasoning components Extended SCFG: parameters and abstract scene models Extended SCFG: parameters and abstract scene models

Expectation Grammars (CVPR 2003) Analyze video of a person physically solving the Towers of Hanoi task Analyze video of a person physically solving the Towers of Hanoi task Recognize valid activity Recognize valid activity Identify each move Identify each move Segment objects Segment objects Detect distracters / noise Detect distracters / noise

System Overview

Low-Level Vision Foreground/background segmentation Foreground/background segmentation Automatic shadow removal Automatic shadow removal Classification based on chromaticity and brightness differences Classification based on chromaticity and brightness differences Background Model Background Model Per pixel RGB means Per pixel RGB means Fixed mapping from CD and BD to foreground probability Fixed mapping from CD and BD to foreground probability

ToH: Low-Level Vision Raw Video Background Model Foreground Components Foreground and shadow detection

Low-Level Features Explanation-based symbols Explanation-based symbols Blob interaction events Blob interaction events merge, split, enter, exit, tracked, noise merge, split, enter, exit, tracked, noise Future Work : hidden, revealed, blob-part, coalesce Future Work : hidden, revealed, blob-part, coalesce All possible explanations generated All possible explanations generated Inconsistent explanations heuristically pruned Inconsistent explanations heuristically pruned Enter Merge

Expectation Grammars Representation : Representation : Stochastic grammar Stochastic grammar Parser augmented with parameters and internal scene model Parser augmented with parameters and internal scene model ToH -> Setup, enter(hand), Solve, exit(hand); Setup -> TowerPlaced, exit(hand); TowerPlaced -> enter(hand, red, green, blue), Put_1(red, green, blue); Solve -> state(InitialTower), MakeMoves, state(FinalTower); MakeMoves -> Move(block) [0.1] | Move(block), MakeMoves [0.9]; Move -> Move_1-2 | Move_1-3 | Move_2-1 | Move_2-3 | Move_3-1 | Move_3-2; Move_1-2 -> Grab_1, Put_2; Move_1-3 -> Grab_1, Put_3; Move_2-1 -> Grab_2, Put_1; Move_2-3 -> Grab_2, Put_3; Move_3-1 -> Grab_3, Put_1; Move_3-2 -> Grab_3, Put_2; Grab_1 -> touch_1, remove_1(hand,~) | touch_1(~), remove_last_1(~); Grab_2 -> touch_2, remove_2(hand,~) | touch_2(~), remove_last_2(~); Grab_3 -> touch_3, remove_3(hand,~) | touch_3(~), remove_last_3(~); Put_1 -> release_1(~) | touch_1, release_1; Put_2 -> release_2(~) | touch_2, release_2; Put_3 -> release_3(~) | touch_3, release_3;

Forming the Symbol Stream Domain independent blob interactions converted to terminals of grammar via heuristic domain knowledge Domain independent blob interactions converted to terminals of grammar via heuristic domain knowledge Examples: merge + (x ≈ 0.33) → touch_1 split + (x ≈ 0.50) → remove_2 Examples: merge + (x ≈ 0.33) → touch_1 split + (x ≈ 0.50) → remove_2 Grammar rule can only fire if internal scene model is consistent with terminal Grammar rule can only fire if internal scene model is consistent with terminal Examples: can’t remove_2 if no discs on peg 2 (B) Examples: can’t remove_2 if no discs on peg 2 (B) Can’t move disc to be on top of smaller disc (C) Can’t move disc to be on top of smaller disc (C)

ToH: Example Frames Explicit noise detection Objects recognized by behavior, not appearance

ToH: Example Frames Grammar can fill in for occluded observations Detection of distracter objects

Finding the Most Likely Parse Terminals and rules are probabilistic Terminals and rules are probabilistic Each parse has a total probability Each parse has a total probability Computed by Earley-Stolcke algorithm Computed by Earley-Stolcke algorithm Probabilistic penalty for insertion and deletion errors Probabilistic penalty for insertion and deletion errors Highest probability parse chosen as best interpretation of video Highest probability parse chosen as best interpretation of video

Semantic Reasoning: Stochastic Parser Pre-conceptual Reasoning: Object IDs Expectation Grammars Summary Memory: Parse Tree Sensory Input: Video Pre-processing: Blobs Interaction Events Learning: None (Bg) Given Knowledge: Grammar, Scene Model Rules Action: Report Best Interpretation Feedback

Contributions Showed activity recognition and decomposition without appearance models Showed activity recognition and decomposition without appearance models Demonstrated usefulness of feedback from high-level, long-term interpretations to low- level, short-term decisions Demonstrated usefulness of feedback from high-level, long-term interpretations to low- level, short-term decisions Extended SCFG representational power with parameters and abstract scene models Extended SCFG representational power with parameters and abstract scene models

Lessons Efficient error recover important for realistic domains Efficient error recover important for realistic domains All sources of information should be included (i.e., appearance models) All sources of information should be included (i.e., appearance models) Concurrency and partial-ordering are common, thus should be easily representable Concurrency and partial-ordering are common, thus should be easily representable Temporal constraints are not the only kind of action relationship (e.g., causal, statistical) Temporal constraints are not the only kind of action relationship (e.g., causal, statistical)

Representational Issues Extend temporal relations Extend temporal relations Concurrency Concurrency Partial-ordering Partial-ordering Quantitative relationships Quantitative relationships Causal (not just temporal) relationships Causal (not just temporal) relationships Parameterized activities Parameterized activities