Selective Perception Policies for Guiding Sensing and Computation in Multimodal Systems Brief Presentation of ICMI ’ 03 N.Oliver & E.Horvitz paper Nikolaos.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

CSCTR Session 11 Dana Retová.  Start bottom-up  Create cognition based on sensori-motor interaction ◦ Cohen et al. (1996) – Building a baby ◦ Cohen.
Chapter 3: Understanding users. What goes on in the mind?
Intelligent Agents Russell and Norvig: 2
Probabilistic inference in human semantic memory Mark Steyvers, Tomas L. Griffiths, and Simon Dennis 소프트컴퓨팅연구실오근현 TRENDS in Cognitive Sciences vol. 10,
Modeling Human Reasoning About Meta-Information Presented By: Scott Langevin Jingsong Wang.
SOMM: Self Organizing Markov Map for Gesture Recognition Pattern Recognition 2010 Spring Seung-Hyun Lee G. Caridakis et al., Pattern Recognition, Vol.
CSE 471/598, CBS 598 Intelligent Agents TIP We’re intelligent agents, aren’t we? Fall 2004.
Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,
Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,
2 Personal Introduction previousnexthome end Academic Experience ( ) Bachelor and Master Degree on Electrical Engineering, Zhejiang University,
Cindy Song Sharena Paripatyadar. Use vision for HCI Determine steps necessary to incorporate vision in HCI applications Examine concerns & implications.
Effective Gaussian mixture learning for video background subtraction Dar-Shyang Lee, Member, IEEE.
© Lethbridge/Laganière 2001 Chapter 7: Focusing on Users and Their Tasks1 7.1 User Centred Design (UCD) Software development should focus on the needs.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Chemical Process Controls: PID control, part II Tuning
RESOLVER: To ask or to sense? A brief presentation of the ongoing project by Nikolaos Mavridis.
Computer vision: models, learning and inference Chapter 6 Learning and Inference in Vision.
SoundSense: Scalable Sound Sensing for People-Centric Application on Mobile Phones Hon Lu, Wei Pan, Nocholas D. lane, Tanzeem Choudhury and Andrew T. Campbell.
INTROSE Introduction to Software Engineering Raymund Sison, PhD College of Computer Studies De La Salle University User Interface Design.
What’s Making That Sound ?
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
Multimedia Specification Design and Production 2013 / Semester 2 / week 8 Lecturer: Dr. Nikos Gazepidis
3D Motion Capture Assisted Video human motion recognition based on the Layered HMM Myunghoon Suk & Ashok Ramadass Advisor : Dr. B. Prabhakaran Multimedia.
/09/dji-phantom-crashes-into- canadian-lake/
ST01 - Introduction 1 Introduction Lecturer: Smilen Dimitrov Sensors Technology – MED4.
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 7: Focusing on Users and Their Tasks.
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
An Instructable Connectionist/Control Architecture: Using Rule-Based Instructions to Accomplish Connectionist Learning in a Human Time Scale Presented.
1 Webcam Mouse Using Face and Eye Tracking in Various Illumination Environments Yuan-Pin Lin et al. Proceedings of the 2005 IEEE Y.S. Lee.
Face Recognition: An Introduction
2/14/00 Computer Vision. 2/14/00 Computer Vision Lecturer: Ir. Resmana Lim, M.Eng. Text: 1) Computer Vision -- A Modern Approach.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009.
ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.
University of Windsor School of Computer Science Topics in Artificial Intelligence Fall 2008 Sept 11, 2008.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Introduction to Neural Networks and Example Applications in HCI Nick Gentile.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Neural Modeling - Fall NEURAL TRANSFORMATION Strategy to discover the Brain Functionality Biomedical engineering Group School of Electrical Engineering.
Introduction of Intelligent Agents
Context-based vision system for place and object recognition Antonio Torralba Kevin Murphy Bill Freeman Mark Rubin Presented by David Lee Some slides borrowed.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Introduction to Pattern Recognition (การรู้จํารูปแบบเบื้องต้น)
Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.
DESIGNING FOR MOBILE EARLY STAGE UX DESIGN PROCESS.
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 7: Focusing on Users and Their Tasks.
CSE 471/598 Intelligent Agents TIP We’re intelligent agents, aren’t we?
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Prototyping Creation of concrete but partial implementations of a system design to explore usability issues.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
Design Evaluation Overview Introduction Model for Interface Design Evaluation Types of Evaluation –Conceptual Design –Usability –Learning Outcome.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
09/24/2008Unavco Track Intro1 TRACK: GAMIT Kinematic GPS processing module R King overview from longer T Herring.
National Taiwan Normal A System to Detect Complex Motion of Nearby Vehicles on Freeways C. Y. Fang Department of Information.
Perceptive Computing Democracy Communism Architecture The Steam Engine WheelFire Zero Domestication Iron Ships Electricity The Vacuum tube E=mc 2 The.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
REAL-TIME DETECTOR FOR UNUSUAL BEHAVIOR
Midterm in-class Tuesday, Nov 6
Human Computer Interaction (HCI)
Module 2… Talking with computers
Artificial Intelligence for Speech Recognition
Tracking Objects with Dynamics
Human Computer Interaction (HCI)
Multimodal Interfaces
An Infant Facial Expression Recognition System Based on Moment Feature Extraction C. Y. Fang, H. W. Lin, S. W. Chen Department of Computer Science and.
Presentation transcript:

Selective Perception Policies for Guiding Sensing and Computation in Multimodal Systems Brief Presentation of ICMI ’ 03 N.Oliver & E.Horvitz paper Nikolaos Mavridis, Feb ‘ 02

Introduction The menu for today: An application that served as testbed & excuse The architecture of recognition engines used Two varieties of selective perception Results Big Ideas An intro to resolver The main big idea: NO NEED TO NOTICE AND PROCESS EVERYTHING ALWAYS!

The Application SEER: A multimodal system for recognizing office activity General setting: A basic requirement for visual surveillance and multimodal HCI, is the provision of of rich, human-centric notions of context in a tractable manner … Prior work: mainly particular scenarios (waiving the hand etc.), HMM, DynBN Output Categories: PC=Phone Conversation FFC=Face2Face Conversation P=Presentation O=Other Activity NP=Nobody Present DC=Distant Conversation (out of field of view) Input: Audio: PCA of LPC coeffs, energy, μ,σ of ω0, zero cr. rate Audio Localisation: Time Delay of Arrival (TDOA) Video: skin color, motion, foreground and face densities Mouse & Keyboard: History of 1,5 and 60sec of activity

Recognition Engine Recognition engine: LHMM (Layered!) First level: Parallel discriminative HMM’s for categories: Audio: human speech, music, silence, noise, ring, keyboard Video: nobody, static person, moving person, multiperson Second level: Input: Outputs of above + derivative of sound loc + keyb histories Output: PC, P, FFC, P, DC, N – longer temporal extent! Selective Perception Strategies usable for both levels! Selecting which features to use at the input of the HMM’s! Example: motion & skin density for one active person Skin density & face detection for multiple people Also for second stage: selecting which first stage HMM’s to run… HMM’s vs LHMM’s Compared to CP HMM’s (cart. Product, one long feature vector) Prior knowledge about problem encoded in structure for LHMM’s I.e. decomposition into smaller subproblems -> less training required, more filtered output for second stage, only first level needs retraining!

Selective Perception Strategies Why sense everything and compute everything always?!? Two approaches: EVI: Expected Value of Information (ala RESOLVER) Decision theory and uncertainty reduction EVI computed for different overlapping subsets, real time, every frame Greedy, one-step lookahead approach for computing the next best set of observation to evaluate Rate-based perception (somewhat similar to RIP BEHAVIOR) Policies defined heuristically for specifying observational frequencies and duty cycles for each computed feature Two baselines for comparison: Compute everything! Randomly select feature subsets

Expected Value of Information Endowing the perceptual system with knowledge of the value of action in the world …

Expected Value of Information But what we are really interested in is what we have to gain! Thus: Where we also account for: What we would given no sensing at all Cost of sensing – but have to map cost and utility to the same currency! HMM-ised implementation used! Richer cost models: Non-identity U matrix Constant vs. activity-dependent costs (what else is running?) – successful results! (no significant decrease in accuracy;-))

Rate-based perception Simple idea: In this case, no online-tuning of rates … Doesn ’ t capture sequential prerequisites etc.

Results EVI: No significant performance decrease with much less computational cost! Also effective in activity-dependent mode. And even more to be gained!

Take home message: Big Ideas No need to sense & compute everything always! In essence we have a Planner: a planner for goal-based sensing and cognition! Not only useful for AI: Approach might be useful for computational modeling of human performance, too … Simple satisficing works: No need for fully-optimised planning; with some precautions, one-step ahead with many approximations is sufficient – ALSO more plausible for Humans! (ref:Ullman) Easy co-existence with other goal-based modules: We just need a method for distributing time-varying costs of sensing and cognitising actions (centralised stockmarket?) As a future direction: time-decreasing confidence mentioned