CS 664, Session 19 1 General architecture
CS 664, Session 19 2 Minimal Subscene Working definition: The smallest set of objects, actors and actions in a dynamic visual scene that are relevant to present behavior For now we will assume: Bottom-up: objects/actors/actions must be visible Top-down: relevance to present behavior explicitly specified, e.g., by specifying a question or task Knowledge base: the system may supplement explicit knowledge with long-term acquired knowledge
CS 664, Session 19 3 Motivation: Humans 1) Free examination 2) estimate material circumstances of family 3) give ages of the people 4) surmise what family has been doing before arrival of “unexpected visitor” 5) remember clothes worn by the people 6) remember position of people and objects 7) estimate how long the “unexpected visitor” has been away from family Yarbus, 1967
CS 664, Session 19 4 “Beobot”
CS 664, Session 19 5 Visual Attention see
CS 664, Session 19 6 Object Recognition Riesenhuber & Poggio, Nat Neurosci, 1999 (MIT)
CS 664, Session 19 7 Action Recognition Oztop & Arbib, 2001
CS 664, Session 19 8 Start: -Issue question -Parse question -Extract keywords -Expand to related concepts, using ontology/KB -Fill initial “task list”
CS 664, Session 19 9 Task list Working list of currently relevant objects/actors/actions -Initially empty -Question/task specification provides initial filling-in -As the scene is scanned and objects/actors/actions are recognized, contents of task list are updated
CS 664, Session “Where:” attention, saliency map and task map Input: video stream Low-level vision: massively parallel extraction of simple visual features from video input Saliency map: localizes conspicuous (potentially interesting) objects irrespectively of why they are salient Task map: acts as spatial filter to saliency map; only locations in the current minimal subscene can easily pass through. Other locations need to be exceptionally salient to pass through.
CS 664, Session “What” memory Relates concepts to visual properties Bridge between visual and semantic knowledge
CS 664, Session General architecture
CS 664, Session Examples / experiments Examine video clips For each scene, please write down: Most salient object Most salient action Minimal subscene Who is doing what to whom
CS 664, Session Scene 001
CS 664, Session Scene 001 – Attentional Trajectory
CS 664, Session Scene 002
CS 664, Session Scene 002 – Attentional Trajectory
CS 664, Session Scene 003
CS 664, Session Scene 003 – Attentional Trajectory
CS 664, Session Scene 004
CS 664, Session Scene 004 – Attentional Trajectory
CS 664, Session Scene 005
CS 664, Session Scene 005 – Attentional Trajectory
CS 664, Session Scene 006
CS 664, Session Scene 006 – Attentional Trajectory
CS 664, Session Scene 007
CS 664, Session Scene 007 – Attentional Trajectory
CS 664, Session Scene 008
CS 664, Session Scene 008 – Attentional Trajectory
CS 664, Session Scene 009
CS 664, Session Scene 009 – Attentional Trajectory
CS 664, Session Scene 010
CS 664, Session Scene 010 – Attentional Trajectory
CS 664, Session Scene 011
CS 664, Session Scene 011 – Attentional Trajectory
CS 664, Session Scene 012
CS 664, Session Scene 012 – Attentional Trajectory
CS 664, Session Scene 013
CS 664, Session Scene 013 – Attentional Trajectory
CS 664, Session Scene 014
CS 664, Session Scene 014 – Attentional Trajectory
CS 664, Session Scene 015
CS 664, Session Scene 015 – Attentional Trajectory
CS 664, Session Scene 016
CS 664, Session Scene 016 – Attentional Trajectory
CS 664, Session Scene 017
CS 664, Session Scene 017 – Attentional Trajectory