AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

A Text-Independent Speaker Recognition System
IntroductionIntroduction AbstractAbstract AUTOMATIC LICENSE PLATE LOCATION AND RECOGNITION ALGORITHM FOR COLOR IMAGES Kerem Ozkan, Mustafa C. Demir, Buket.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Speaker Adaptation for Vowel Classification
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
Fig. 2 – Test results Personal Memory Assistant Facial Recognition System The facial identification system is divided into the following two components:
Optimal Adaptation for Statistical Classifiers Xiao Li.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
FYP0202 Advanced Audio Information Retrieval System By Alex Fok, Shirley Ng.
Why is ASR Hard? Natural speech is continuous
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Representing Acoustic Information
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.
Craig Holmes Brad Klippstein Andrew Pottkotter Dustin Osborn.
Introduction Mel- Frequency Cepstral Coefficients (MFCCs) are quantitative representations of speech and are commonly used to label sound files. They are.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.
Jacob Zurasky ECE5526 – Spring 2011
1 Detection and Discrimination of Sniffing and Panting Sounds of Dogs Ophir Azulai(1), Gil Bloch(1), Yizhar Lavner (1,2), Irit Gazit (3) and Joseph Terkel.
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
Multimodal Information Analysis for Emotion Recognition
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
Sound Detection Derek Hoiem Rahul Sukthankar (mentor) August 24, 2004.
Sound-Event Partitioning and Feature Normalization for Robust Sound-Event Detection 2 Department of Electronic and Information Engineering The Hong Kong.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Speaker Verification Using Adapted GMM Presented by CWJ 2000/8/16.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
CS 445/656 Computer & New Media
ARTIFICIAL NEURAL NETWORKS
Spoken Digit Recognition
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Urban Sound Classification with a Convolution Neural Network
Leigh Anne Clevenger Pace University, DPS ’16
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
EE513 Audio Signals and Systems
Audio and Speech Computers & New Media.
feature extraction methods for EEG EVENT DETECTION
John H.L. Hansen & Taufiq Al Babba Hasan
A maximum likelihood estimation and training on the fly approach
Dealing with Acoustic Noise Part 1: Spectral Estimation
Measuring the Similarity of Rhythmic Patterns
Presentation transcript:

AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION Caokun Yang, Yusuf Ozkan, Buket D. Barkana Department of Electrical Engineering, School of Engineering, University of Bridgeport, Bridgeport, CT Abstract Database The database contains the nine most common suspicious acoustic events: glass breaking (GB), dog barking (DB), scream (S), gunshot (GS), explosions (E), police sirens (PS), door slams (DS), footsteps (FS), and house alarm (HA) sounds. All audios are manually investigated to make sure there is only one event presents and the present event is detectable by human listeners. The audios are collected from different sources are resampled with sampling frequency of 44.1 kHz and digitized at 16-bit. For the audios gathered by recording, we have used an Apple iPhone 7, equipped with iOS by means of built in sound recorder application. Monophonic versions of the audios are used, i.e. the two channels are averaged to one channel. After the feature extraction, the next step is classification. In this project, we used GMM (Gaussian Mixture Model) classifier to discriminate the nine suspicious sounds. First, a universal background model (UBM) model is formed by using training set, 80 wave files from each nine classes. This UBM model is unique for each class. Later the likelihoods of 180 sound files (test set) to were calculated. The likelihoods of each sound file is compared among different classes. The decision is taken based on the highest likelihood ratio. Figure 2 and 3 show the suspicious sound recognition system. Acoustic scene classification systems are gaining importance because of the recent advances in context-aware applications and surveillance systems. Commonly studied acoustic events are bus, beach, street, quite street, rail station, office, lecture, launderette, football match, car, busy street, office, open air market, park, restaurant, supermarket, tube, tube station, gunshot, rain, and dog barking. Suspicious sounds have not been studied and analyzed comprehensively yet. One of the reasons is the lack of open-access database, which contains commonly encountered suspicious sound events. Recently a research group, Signal Processing Research Group at the University of Bridgeport, composed a database called “a database of auditory suspicious events (DASE)”. This database contains the nine most common suspicious sound events: gunshot, explosion, glass breaking, dog barking, scream, house alarm, police sirens, door slams, and footsteps. In this poster, we present a suspicious sound recognition system by using the DASE database, mel-frequency cepstral coefficients (MFCCs) and Gaussian mixture model (GMM) based classifier. Gaussian Mixture Model Feature Extraction As a part of this study, we calculated Mel-frequency cepstral coefficients (MFCCs) as a feature set. MFCCs can mimic some parts of the human speech production by the logarithmic perception of loudness and pitch of human auditory system. The steps of the MFCCs calculation is shown in Figure 1. The speech signal is transformed to spectrum by fast Fourier transform. Mel-scale filter transform this spectrum to Mel Frequency spectrum. The cepstral coefficients will be output of the logarithm and Discrete Cosine Transform. Figure 2: The proposed suspicious sound recognition system The system consist of three major stages: Development, Enrollment, and Test. Extracted MFCCs features are used to develop UBM models. A sound class specific GMM from UBM is adapted by using maximum a posteriori (MAP) estimation. Expectation Maximization (EM) algorithm is used during training. The log-probability of the sound vector is recalculated and compared to the previously stored values. The log-probability that is equal to the stored values provides access to the entire sounds. UBM Enrollment Development Adaptation Test Scoring Figure 3: Flow chart of the system Figure 1: Mel-frequency cepstral coefficients (MFCCs) Experimental Results Table 3: Confusion matrix (# of mixtures is 8) Table 1: Confusion matrix (# of mixtures is 2) Table 2: Confusion matrix (# of mixtures is 4) Conclusions Automatic suspicious sound recognition has great importance in surveillance systems. Analysis of these signals provides useful information about the surrounding environment. Detection and classification of environmental sounds, acoustic scenes and events have been receiving increasing attention over the past decade because of their applications in surveillance systems, context-aware applications, and adaptive information systems. In this poster, we designed a baseline system to recognize the most common nine sounds: Glass breaking (GB), dog barking (DB), scream (S), gunshot (GS), explosions (E), police sirens (PS), door slams (DS), footsteps (FS), and house alarm (HA) sounds. We employed Mel-frequency cepstral coefficients and energy as a feature set. A GMM based classifier is designed. Among the nine sound classes, police siren sounds are the most commonly misclassified class by the system. Police siren sounds are 100% misclassified as house alarms by our baseline system.