Stochastic Grammars: Overview Representation: Stochastic grammar Representation: Stochastic grammar Terminals: object interactions Terminals: object interactions Context-sensitive due to internal scene models Context-sensitive due to internal scene models Domain: Towers of Hanoi Domain: Towers of Hanoi Requires activities with strong temporal constraints Requires activities with strong temporal constraints Contributions Contributions Showed recognition & decomposition with very weak appearance models Showed recognition & decomposition with very weak appearance models Demonstrated usefulness of feedback from high to low-level reasoning components Demonstrated usefulness of feedback from high to low-level reasoning components Extended SCFG: parameters and abstract scene models Extended SCFG: parameters and abstract scene models
Expectation Grammars (CVPR 2003) Analyze video of a person physically solving the Towers of Hanoi task Analyze video of a person physically solving the Towers of Hanoi task Recognize valid activity Recognize valid activity Identify each move Identify each move Segment objects Segment objects Detect distracters / noise Detect distracters / noise
System Overview
Low-Level Vision Foreground/background segmentation Foreground/background segmentation Automatic shadow removal Automatic shadow removal Classification based on chromaticity and brightness differences Classification based on chromaticity and brightness differences Background Model Background Model Per pixel RGB means Per pixel RGB means Fixed mapping from CD and BD to foreground probability Fixed mapping from CD and BD to foreground probability
ToH: Low-Level Vision Raw Video Background Model Foreground Components Foreground and shadow detection
Low-Level Features Explanation-based symbols Explanation-based symbols Blob interaction events Blob interaction events merge, split, enter, exit, tracked, noise merge, split, enter, exit, tracked, noise Future Work : hidden, revealed, blob-part, coalesce Future Work : hidden, revealed, blob-part, coalesce All possible explanations generated All possible explanations generated Inconsistent explanations heuristically pruned Inconsistent explanations heuristically pruned Enter Merge
Expectation Grammars Representation : Representation : Stochastic grammar Stochastic grammar Parser augmented with parameters and internal scene model Parser augmented with parameters and internal scene model ToH -> Setup, enter(hand), Solve, exit(hand); Setup -> TowerPlaced, exit(hand); TowerPlaced -> enter(hand, red, green, blue), Put_1(red, green, blue); Solve -> state(InitialTower), MakeMoves, state(FinalTower); MakeMoves -> Move(block) [0.1] | Move(block), MakeMoves [0.9]; Move -> Move_1-2 | Move_1-3 | Move_2-1 | Move_2-3 | Move_3-1 | Move_3-2; Move_1-2 -> Grab_1, Put_2; Move_1-3 -> Grab_1, Put_3; Move_2-1 -> Grab_2, Put_1; Move_2-3 -> Grab_2, Put_3; Move_3-1 -> Grab_3, Put_1; Move_3-2 -> Grab_3, Put_2; Grab_1 -> touch_1, remove_1(hand,~) | touch_1(~), remove_last_1(~); Grab_2 -> touch_2, remove_2(hand,~) | touch_2(~), remove_last_2(~); Grab_3 -> touch_3, remove_3(hand,~) | touch_3(~), remove_last_3(~); Put_1 -> release_1(~) | touch_1, release_1; Put_2 -> release_2(~) | touch_2, release_2; Put_3 -> release_3(~) | touch_3, release_3;
Forming the Symbol Stream Domain independent blob interactions converted to terminals of grammar via heuristic domain knowledge Domain independent blob interactions converted to terminals of grammar via heuristic domain knowledge Examples: merge + (x ≈ 0.33) → touch_1 split + (x ≈ 0.50) → remove_2 Examples: merge + (x ≈ 0.33) → touch_1 split + (x ≈ 0.50) → remove_2 Grammar rule can only fire if internal scene model is consistent with terminal Grammar rule can only fire if internal scene model is consistent with terminal Examples: can’t remove_2 if no discs on peg 2 (B) Examples: can’t remove_2 if no discs on peg 2 (B) Can’t move disc to be on top of smaller disc (C) Can’t move disc to be on top of smaller disc (C)
ToH: Example Frames Explicit noise detection Objects recognized by behavior, not appearance
ToH: Example Frames Grammar can fill in for occluded observations Detection of distracter objects
Finding the Most Likely Parse Terminals and rules are probabilistic Terminals and rules are probabilistic Each parse has a total probability Each parse has a total probability Computed by Earley-Stolcke algorithm Computed by Earley-Stolcke algorithm Probabilistic penalty for insertion and deletion errors Probabilistic penalty for insertion and deletion errors Highest probability parse chosen as best interpretation of video Highest probability parse chosen as best interpretation of video
Semantic Reasoning: Stochastic Parser Pre-conceptual Reasoning: Object IDs Expectation Grammars Summary Memory: Parse Tree Sensory Input: Video Pre-processing: Blobs Interaction Events Learning: None (Bg) Given Knowledge: Grammar, Scene Model Rules Action: Report Best Interpretation Feedback
Contributions Showed activity recognition and decomposition without appearance models Showed activity recognition and decomposition without appearance models Demonstrated usefulness of feedback from high-level, long-term interpretations to low- level, short-term decisions Demonstrated usefulness of feedback from high-level, long-term interpretations to low- level, short-term decisions Extended SCFG representational power with parameters and abstract scene models Extended SCFG representational power with parameters and abstract scene models
Lessons Efficient error recover important for realistic domains Efficient error recover important for realistic domains All sources of information should be included (i.e., appearance models) All sources of information should be included (i.e., appearance models) Concurrency and partial-ordering are common, thus should be easily representable Concurrency and partial-ordering are common, thus should be easily representable Temporal constraints are not the only kind of action relationship (e.g., causal, statistical) Temporal constraints are not the only kind of action relationship (e.g., causal, statistical)
Representational Issues Extend temporal relations Extend temporal relations Concurrency Concurrency Partial-ordering Partial-ordering Quantitative relationships Quantitative relationships Causal (not just temporal) relationships Causal (not just temporal) relationships Parameterized activities Parameterized activities