Presentation is loading. Please wait.

Presentation is loading. Please wait.

AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION

Similar presentations


Presentation on theme: "AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION"— Presentation transcript:

1 AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Caokun Yang, Yusuf Ozkan, Buket D. Barkana Department of Electrical Engineering, School of Engineering, University of Bridgeport, Bridgeport, CT Abstract Database The database contains the nine most common suspicious acoustic events: glass breaking (GB), dog barking (DB), scream (S), gunshot (GS), explosions (E), police sirens (PS), door slams (DS), footsteps (FS), and house alarm (HA) sounds. All audios are manually investigated to make sure there is only one event presents and the present event is detectable by human listeners. The audios are collected from different sources are resampled with sampling frequency of 44.1 kHz and digitized at 16-bit. For the audios gathered by recording, we have used an Apple iPhone 7, equipped with iOS by means of built in sound recorder application. Monophonic versions of the audios are used, i.e. the two channels are averaged to one channel. After the feature extraction, the next step is classification. In this project, we used GMM (Gaussian Mixture Model) classifier to discriminate the nine suspicious sounds. First, a universal background model (UBM) model is formed by using training set, 80 wave files from each nine classes. This UBM model is unique for each class. Later the likelihoods of 180 sound files (test set) to were calculated. The likelihoods of each sound file is compared among different classes. The decision is taken based on the highest likelihood ratio. Figure 2 and 3 show the suspicious sound recognition system. Acoustic scene classification systems are gaining importance because of the recent advances in context-aware applications and surveillance systems. Commonly studied acoustic events are bus, beach, street, quite street, rail station, office, lecture, launderette, football match, car, busy street, office, open air market, park, restaurant, supermarket, tube, tube station, gunshot, rain, and dog barking. Suspicious sounds have not been studied and analyzed comprehensively yet. One of the reasons is the lack of open-access database, which contains commonly encountered suspicious sound events. Recently a research group, Signal Processing Research Group at the University of Bridgeport, composed a database called “a database of auditory suspicious events (DASE)”. This database contains the nine most common suspicious sound events: gunshot, explosion, glass breaking, dog barking, scream, house alarm, police sirens, door slams, and footsteps. In this poster, we present a suspicious sound recognition system by using the DASE database, mel-frequency cepstral coefficients (MFCCs) and Gaussian mixture model (GMM) based classifier. Gaussian Mixture Model Feature Extraction As a part of this study, we calculated Mel-frequency cepstral coefficients (MFCCs) as a feature set. MFCCs can mimic some parts of the human speech production by the logarithmic perception of loudness and pitch of human auditory system. The steps of the MFCCs calculation is shown in Figure 1. The speech signal is transformed to spectrum by fast Fourier transform. Mel-scale filter transform this spectrum to Mel Frequency spectrum. The cepstral coefficients will be output of the logarithm and Discrete Cosine Transform. Figure 2: The proposed suspicious sound recognition system The system consist of three major stages: Development, Enrollment, and Test. Extracted MFCCs features are used to develop UBM models. A sound class specific GMM from UBM is adapted by using maximum a posteriori (MAP) estimation. Expectation Maximization (EM) algorithm is used during training. The log-probability of the sound vector is recalculated and compared to the previously stored values. The log-probability that is equal to the stored values provides access to the entire sounds. UBM Enrollment Development Adaptation Test Scoring Figure 3: Flow chart of the system Figure 1: Mel-frequency cepstral coefficients (MFCCs) Experimental Results Table 3: Confusion matrix (# of mixtures is 8) Table 1: Confusion matrix (# of mixtures is 2) Table 2: Confusion matrix (# of mixtures is 4) Conclusions Automatic suspicious sound recognition has great importance in surveillance systems. Analysis of these signals provides useful information about the surrounding environment. Detection and classification of environmental sounds, acoustic scenes and events have been receiving increasing attention over the past decade because of their applications in surveillance systems, context-aware applications, and adaptive information systems. In this poster, we designed a baseline system to recognize the most common nine sounds: Glass breaking (GB), dog barking (DB), scream (S), gunshot (GS), explosions (E), police sirens (PS), door slams (DS), footsteps (FS), and house alarm (HA) sounds. We employed Mel-frequency cepstral coefficients and energy as a feature set. A GMM based classifier is designed. Among the nine sound classes, police siren sounds are the most commonly misclassified class by the system. Police siren sounds are 100% misclassified as house alarms by our baseline system.


Download ppt "AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION"

Similar presentations


Ads by Google