Using Acoustic Signal for Enhancing Situational Awareness in SAFIRE

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Yasuhiro Fujiwara (NTT Cyber Space Labs)
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects by Paul A. Viola Presented By: Emrah Ceyhan Divin Proothi Sherwin Shaidee.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
1 SAFIRE Project DHS Update – July 15, 2009 Introductions  Update since last teleconference Demo Video - Fire Incident Command Board (FICB) SAFIRE Streams.
Chapter 11 Integration Information Instructor: Prof. G. Bebis Represented by Reza Fall 2005.
Presented by Zeehasham Rasheed
SAFIRE: Situational Awareness for Firefighters Using Acoustic Signal for Enhancing Situational Awareness in SAFIRE Dmitri V. Kalashnikov.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, and Jiawei Han SIGMOD 2002 Presented by: Eddie Date: 2002/12/23.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
A Low-Power Low-Memory Real-Time ASR System. Outline Overview of Automatic Speech Recognition (ASR) systems Sub-vector clustering and parameter quantization.
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Online Learning Algorithms
Information Retrieval in Practice
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
SoundSense: Scalable Sound Sensing for People-Centric Application on Mobile Phones Hon Lu, Wei Pan, Nocholas D. lane, Tanzeem Choudhury and Andrew T. Campbell.
Decentralised Coordination of Mobile Sensors School of Electronics and Computer Science University of Southampton Ruben Stranders,
SAFIRE: Situational Awareness for Firefighters SITUATIONAL AWARENESS FOR FIRE FIGHTERS (SAFIRE) Goal: Improve the safety of firefighters by providing decision.
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
1 SATWARE: A Semantic Middleware for Multi Sensor Applications Sharad Mehrotra.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Performance Comparison of Speaker and Emotion Recognition
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Cross-modal Hashing Through Ranking Subspace Learning
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
IOT – Firefighting Example
INSTRUCTIONAL DESIGN Many definitions exist for instructional design 1. Instructional Design as a Process: 2. Instructional Design as a Discipline: 3.
Reza Yazdani Albert Segura José-María Arnau Antonio González
Cognition and Language
Fundamentals of Information Systems, Sixth Edition
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Context Sensing.
Learning Software Behavior for Automated Diagnosis
Thrust IC: Action Selection in Joint-Human-Robot Teams
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
“Bayesian Identity Clustering”
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
SAMI: Situational Awareness from Multi-modal Input
Disambiguation Algorithm for People Search on the Web
Integration of sensory modalities
CSE 635 Multimedia Information Retrieval
EE513 Audio Signals and Systems
CSSE463: Image Recognition Day 30
Probabilistic Databases
LECTURE 15: REESTIMATION, EM AND MIXTURES
Using Natural Language Processing to Aid Computer Vision
Minwise Hashing and Efficient Search
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Mathematical Foundations of BME Reza Shadmehr
CSSE463: Image Recognition Day 30
Natural Language Processing (NLP) Systems Joseph E. Gonzalez
Retrieval Performance Evaluation - Measures
Human-object interaction
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Cengizhan Can Phoebe de Nooijer
Presentation transcript:

Using Acoustic Signal for Enhancing Situational Awareness in SAFIRE Dmitri V. Kalashnikov My name is … I am a research scientist here at UCI working on the speech component of the SAFIRE project.

Localization via Speech Alerts SA Apps Conversation Monitoring & Playback Image & Video Tagging Purpose: alerts IC when certain events happen Capture firefighter conversations E.g., if a conversation mentions “victim” - an alert is raised Purpose: allows firefighters to capture images of a crisis site and annotate them with important tags using speech interface. The images are then triaged to the IC for analysis. Purpose: allows IC to quickly locate & playback speech blocks that might contain critical info, by visualizing multiple firefighter conversations. Spatial Messaging Localization via Speech Purpose: allows firefighters to leave spatial messages via speech interface “This room is clear” Anyone walking in this room will get the msg. Purpose: creates an additional firefighter localization capability GPS does not work well indoor E.g., “I’m near room 101 on the 4th floor” Sharad has briefly described the five speech related SA apps we have thought of. Let me provide a few more details on them. The purpose of the alert application is to…

Core Challenge (for ongoing projects) Recognition quality bottleneck Poor recognition quality in noisy & realistic environments Speech Speech Recognizer Output This is a bed sun tan “This is a bad sentence” 4

Different Goals of ASR & SA Applications Recognition Acoustic Tagging & Retrieval DB Query This is a bed sun tan Retrieve correctly Retrieval Algo This is a bad sentence Quality Metric : Precision, recall, F-measure of returned images activated triggers It can be possible to build a good retrieval system on uncertain data. Low WER does not imply low retrieval & SA quality. Observe: Errors in words that are not in triggers do not matter Quality Metric : Word Error Rate (WER) 5

Approach to Building SA Applications Recognizers offer Alternatives - “N-best list” Utterances Fire Emergency Victims Dispatch … Hire 0.6 A merchant sea 0.6 Evict him 0.5 This patch 0.8 Fryer 0.5 Emerging sea 0.55 With him 0.45 Dispatch 0.7 Fire 0.4 Emergency 0.5 Victim 0.4 His batch 0.6 … N –Best lists coming from the speech recognizer Probabilistic DB Most recognizers give a ranked list of alternatives to every utterance. Shown above is one example of alternatives at the word level High precision Low recall High recall Low precision Choose a representation that maximizes the performance of application (e.g., maximizes precision and recall) Key Issue: accurately estimate P(W in utterance), for all W in Q

Estimating P(W in Utterance): Learning Convert confidence levels output by recognizer into probability Fire Emergency Victims Dispatch Hire 0.6 A merchant sea 0.6 Evict him 0.5 This patch 0.8 Fryer 0.5 Emerging sea 0.55 With him 0.45 Dispatch 0.7 Fire 0.4 Emergency 0.5 Victim 0.4 His batch 0.6 … Most recognizers give a ranked list of alternatives to every utterance. Shown above is one example of alternatives at the word level Word Probability Hire 0.4 Fryer 0.3 Fire 0.2 …

Estimating P(W): Combining Recognizers Exploit multiple recognizers to estimate probability Fire Emergency Victims Dispatch Hire 0.6 A merchant sea 0.6 Evict him 0.5 This patch 0.8 Fryer 0.5 Emerging sea 0.55 With him 0.45 Dispatch 0.7 Fire 0.4 Emergency 0.5 Victim 0.4 His batch 0.6 … …. Fire Emergency Victims Dispatch Hire 0.5 A merchant sea 0.6 victory 0.5 This patch 0.3 Flyer 0.1 Emerging sea 0.45 Victim 0.4 Dispatch 0.7 Fire 0.8 Emergency 0.6 With him 0.45 His batch 0.6 … …. … … Merging Most recognizers give a ranked list of alternatives to every utterance. Shown above is one example of alternatives at the word level Word Probability Hire 0.3 Fryer 0.2 Fire …

Estimating P(W): Using Semantics Exploit Semantics Fire Emergency Victims Dispatch … Hire 0.6 A merchant sea 0.6 Evict him 0.5 This patch 0.8 Fryer 0.1↓ Emerging sea 0.2 ↓ With him 0.45 Dispatch 0.7 Fire 0.8↑ Emergency 0.8↑ Victim 0.4 His batch 0.6 … …. Fire Emergency Victims Dispatch Hire 0.6 A merchant sea 0.6 Evict him 0.5 This patch 0.8 Fryer 0.5 Emerging sea 0.55 With him 0.45 Dispatch 0.7 Fire 0.4 Emergency 0.5 Victim 0.4 His batch 0.6 … …. Most recognizers give a ranked list of alternatives to every utterance. Shown above is one example of alternatives at the word level Word Probability Hire 0.6 Fryer 0.5 Fire … …. Word Probability Hire 0.6 Fryer 0.1↓ Fire 0.8↑ … ….

One SA Application in More Detail Acoustic Capture Acoustic Analysis SA Applications Alerts Processing Conversation Monitoring & Playback Speech Voice Image & Video Tagging Amb. Noise Spatial Messaging Localization via Speech Type of Acoustic Analysis Human Speech: Who spoke to whom about what from where and when Ambient Sounds: explosions, loud sounds, screaming, etc Physiological Events: cough, gag, excited state of speaker, slurring, … Other features: too loud, too quiet for too long, … 11

Purpose of Image Tagging Chemical spill nitric acid Take a picture of an incident Speak tags chemical spill nitric acid physical still citric lexical cyclic placid mill AC chemical spill nitric acid physical still citric lexical cyclic placid mill AC Apply speech recognizer, which will suggest alternatives for each utterance (N-best list) Disambiguate among choices, by using a semantic model of how these words have been used in the past

Tagging Images Using Speech Challenge Challenge: The correctness of tags depends on quality of speech recognizer! Speech & Image Semantic Knowledge Speech Recognizer N-best lists Image Database USER Interface for image retrieval Image & Tags Disambiguator

Enumerating Possible Sequences Computing Score for Each Sequence Overview of Solution N-best lists Results (A sequence of tags) Enumerating Possible Sequences Smart (greedy) enumerator of possible tag sequences Detecting NULLs (I.e., ground truth tag not present in N-best list) Computing Score for Each Sequence Co-occurrence based score Probabilistic score Using Max Entropy & Lidstone’s Estimation Choosing Sequence (with the highest score)

Probabilistic Score (Max Entropy) Lidstone’s Estimation “Good” estimates of P for short w1,w2,…,wK sequences P (wi) ← Marginals P (wi, wj) ← Pairwise joints for many/most P (wi, wj, wk) ← Triples for very few Maximum Entropy (ME) Estimates joint P() From known smaller joint P() “No assumptions”/uniformity For unknown P() Optimization problem Computationally expensive

Correlation Score Indirect Correlation Direct Correlation Base Correlation Matrix B, where Bij = c (wi, wj) Indirect Correlation Matrices B2 = B2 Bk = Bk General Correlations Matrix Considers correlations of various sizes Image 1 Hazard, victim Image 2 Hazard, acid Image 3 Victim, ambulance Image 4 Ambulance, acid … Correlation Graph Jaccard Similarity

Branch and Bound Method Motivation Computing ME is expensive Enumerating NK sequences Exponential How to scale? Branch and Bound Method! Two logical parts Searching part How to go to the most promising “direction” to search Bounding part How to bound the search space, prune away unnecessary searches Complete Search Tree Only necessary part of it will be build/considered

Experiments Dataset: 60,000 annotated images from Flickr. Split 80% train + 20% test Experiment 1: Use Dragon recognizer to generate N-best lists for 120 images from test data Different noise levels were created by introducing white Gaussian noise through a speaker Figure shows a significant quality improvement by using the semantics-based approach. Precision – number of correct tags/total tags ; recall number of current tags/number of tags 3 correct 2 incorrect 2 nulls – precision is 3 /5 recall is 3/7

Experiment 6: Speedup of BB Algorithm

Progress SA Application Status Alerts A prototype is implemented and integrated into SAFIRE/FICB. Research: Several novel retrieval algorithms have been designed and being evaluated. Algorithm of combining classifiers are being investigated. Conversation Monitoring & Playback A prototype is implemented. Integration into SAFIRE is ongoing. Image & Video Tagging Prototype system is implemented. Research: two new image tagging methods have been designed, optimization techniques have been investigated as well. Spatial Messaging Future work. Localization via Speech Future work. We have extensive experience on very related topics, possibly some of these ideas can be leveraged.