SAFIRE: Situational Awareness for Firefighters Using Acoustic Signal for Enhancing Situational Awareness in SAFIRE Dmitri V. Kalashnikov
SAFIRE: Situational Awareness for Firefighters High-level Overview & Vision Type of Acoustic Analysis − Human Speech: Who spoke to whom about what from where and when − Ambient Sounds: explosions, loud sounds, screaming, etc − Physiological Events: cough, gag, excited state of speaker, slurring, … − Other features: too loud, too quiet for too long, … 2 Speech Voice Amb. Noise Processing Conversation Monitoring & Playback Image & Video Tagging Acoustic Capture Acoustic Analysis SA Applications Spatial Messaging Localization via Speech Alerts
SAFIRE: Situational Awareness for Firefighters SA Apps Purpose: alerts IC when certain events happen –Capture firefighter conversations –E.g., if a conversation mentions “victim” - an alert is raised 3 Alerts Conversation Monitoring & Playback Conversation Monitoring & Playback Image & Video Tagging Purpose: allows IC to quickly locate & playback speech blocks that might contain critical info, by visualizing multiple firefighter conversations. Purpose: allows firefighters to capture images of a crisis site and annotate them with important tags using speech interface. The images are then triaged to the IC for analysis. Purpose: allows firefighters to leave spatial messages via speech interface –“This room is clear” –Anyone walking in this room will get the msg. Spatial Messaging Localization via Speech Purpose: creates an additional firefighter localization capability –GPS does not work well indoor –E.g., “I’m near room 101 on the 4th floor”
SAFIRE: Situational Awareness for Firefighters 4 Core Challenge (for ongoing projects) Recognition quality bottleneck –Poor recognition quality in noisy & realistic environments “This is a bad sentence” SpeechSpeech Recognizer This is a bed sun tan Output
SAFIRE: Situational Awareness for Firefighters 5 Different Goals of ASR & SA Applications RecognitionAcoustic Tagging & Retrieval This is a bed sun tan This is a bad sentence Quality Metric : Word Error Rate (WER) Query Retrieve correctly Quality Metric : Precision, recall, F-measure of returned images activated triggers It can be possible to build a good retrieval system on uncertain data. Low WER does not imply low retrieval & SA quality. Observe: Errors in words that are not in triggers do not matter Retrieval Algo DB
SAFIRE: Situational Awareness for Firefighters Research Techniques for Enhancing Quality Idea: use past data to derive models of how content has been annotated in the past. Use N-best lists Correlation analysis Probabilistic model based on Max Entropy Speed optimization techniques 6 Semantics Combining Recognizers Retrieval Idea: combining results of multiple recognizers can improve the recognition quality. –Analyze recognizers mutual behavior on past data –Build a probabilistic model for combining them Idea: (1) Use the fact that quality metrics are application dependent. (2) Develop algorithms for retrieval given uncertainty. –Use the given probabilistic representation –Derive methodology for optimal retrieval
SAFIRE: Situational Awareness for Firefighters Approach to Building SA Applications Utterances N –Best lists coming from the speech recognizer Recognizers offer Alternatives - “N-best list” High precision Low recall High recall Low precision Probabilistic DB Choose a representation that maximizes the performance of application (e.g., maximizes precision and recall) Key Issue: accurately estimate P(W in utterance), for all W in Q 7
SAFIRE: Situational Awareness for Firefighters Estimating P(W in Utterance): Learning Convert confidence levels output by recognizer into probability 8
SAFIRE: Situational Awareness for Firefighters Estimating P(W): Combining Recognizers Exploit multiple recognizers to estimate probability … Merging … 9
SAFIRE: Situational Awareness for Firefighters Estimating P(W): Using Semantics Exploit Semantics 10
SAFIRE: Situational Awareness for Firefighters One SA Application in More Detail Type of Acoustic Analysis − Human Speech: Who spoke to whom about what from where and when − Ambient Sounds: explosions, loud sounds, screaming, etc − Physiological Events: cough, gag, excited state of speaker, slurring, … − Other features: too loud, too quiet for too long, … 11 Speech Voice Amb. Noise Processing Conversation Monitoring & Playback Acoustic Capture Acoustic Analysis SA Applications Spatial Messaging Localization via Speech Alerts Image & Video Tagging
SAFIRE: Situational Awareness for Firefighters Purpose of Image Tagging 12 Take a picture of an incident Speak tags Chemical spill nitric acid Apply speech recognizer, which will suggest alternatives for each utterance (N-best list) Disambiguate among choices, by using a semantic model of how these words have been used in the past
SAFIRE: Situational Awareness for Firefighters Challenge Challenge: The correctness of tags depends on quality of speech recognizer! Tagging Images Using Speech Speech & Image Speech Recognizer Disambiguator Semantic Knowledge N-best lists Image Database Image & Tags USER Interface for image retrieval 13
SAFIRE: Situational Awareness for Firefighters Overview of Solution 14 N-best lists Enumerating Possible Sequences Smart (greedy) enumerator of possible tag sequences Computing Score for Each Sequence 1.Co-occurrence based score 2.Probabilistic score − Using Max Entropy & Lidstone’s Estimation Choosing Sequence (with the highest score) Detecting NULLs (I.e., ground truth tag not present in N-best list) Results (A sequence of tags)
SAFIRE: Situational Awareness for Firefighters Probabilistic Score (Max Entropy) Lidstone’s Estimation “Good” estimates of P for short w 1,w 2,…,w K sequences P (w i ) ← Marginals P (w i, w j ) ← Pairwise joints for many/most P (w i, w j, w k ) ← Triples for very few 15 Maximum Entropy (ME) –Estimates joint P() –From known smaller joint P() –“No assumptions”/uniformity –For unknown P() –Optimization problem –Computationally expensive
SAFIRE: Situational Awareness for Firefighters Correlation Score 16 Jaccard Similarity Correlation Graph Direct Correlation Indirect Correlation Base Correlation Matrix B, where B ij = c (w i, w j ) Indirect Correlation Matrices B 2 = B 2 B k = B k General Correlations Matrix Considers correlations of various sizes
SAFIRE: Situational Awareness for Firefighters Branch and Bound Method Motivation Computing ME is expensive Enumerating N K sequences Exponential How to scale? Branch and Bound Method! Two logical parts 1. Searching part How to go to the most promising “direction” to search 2. Bounding part How to bound the search space, prune away unnecessary searches 17 Complete Search Tree − Only necessary part of it will be build/considered
SAFIRE: Situational Awareness for Firefighters Experiments Dataset: 60,000 annotated images from Flickr. Split: 80% training + 20% test Experiment 1: – Use Dragon recognizer to generate N-best lists for 120 images from test data – Noise levels by introducing white Gaussian noise through a speaker Figure shows a significant quality improvement by using the semantics- based approach.
SAFIRE: Situational Awareness for Firefighters Experiment 2 Quality of annotation vs. size of N-best lists Tradeoff – With increasing N (size of list): greater chance that ground truth is present in the list. – However, more options to disambiguate among (more uncertainty)
SAFIRE: Situational Awareness for Firefighters Experiment 3: Correlation of ME & CM scores (Strong correlation) Figure shows the frequency of how often the top-1 sequence according to ME score is contained in among top M sequences according to CM score 20
SAFIRE: Situational Awareness for Firefighters Experiment 4: Quality of BB Algorithm 21
SAFIRE: Situational Awareness for Firefighters Experiment 5: Quality on a Larger Dataset 22
SAFIRE: Situational Awareness for Firefighters Experiment 6: Speedup of BB Algorithm 23
SAFIRE: Situational Awareness for Firefighters Experiment 7: Multi-model Case 24
SAFIRE: Situational Awareness for Firefighters Progress 25