Download presentation
Presentation is loading. Please wait.
Published byGregory Clarke Modified over 9 years ago
1
G. Valenzise *, L. Gerosa, M. Tagliasacchi *, F. Antonacci *, A. Sarti * IEEE Int. Conf. On Advanced Video and Signal-based Surveillance, 2007 * Dipartimento di Elettronica e Informazione, Politecnico di Milano
2
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 Description of the problem System Overview Classification ◦ GMM ◦ Feature extraction ◦ Feature selection ◦ Experimental results Localization ◦ Time Delay Estimation ◦ Source Localization ◦ Experimental results 2
3
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 Increasing need for safety in public places (e.g. squares): ◦ High degree of criminality ◦ Large number of video- cameras installed Aid to the human control of the video-surveillance systems using audio signal to detect and localize anomalous events (e.g. gunshots, screams) and to steer a video-camera 3
4
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 4
5
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 Large set of descriptors, ◦ innovative ones such as autocorrelation roll-off, decrease, slope Exhaustive analysis of the feature selection process, ◦ formulation of a hybrid approach integrating different techniques proposed in literature Improved algorithm for GMM training ◦ Figueiredo-Jain instead of classical EM algorithm Proposal of a method to zoom the camera ◦ basing on the localization confidence 5
6
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 6
7
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 7
8
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 8 Autocorrelation filtered in the frequency range 1000-2500 Hz
9
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 From the full set of features, we want a vector of l features: ◦ Similar discrimination power ◦ Less computationally intensive ◦ Resistant to overfitting 9 Filter-based feature vector construction Wrapper-based selection
10
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 10
11
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 From the full set of L features, we want a vector of l features ( l <L): ◦ Similar discrimination power ◦ Less computationally intensive ◦ Resistant to overfitting Hybrid two-step method: ◦ Heuristic algorithm to construct the feature vectors of different size (2≥ l ≤L) using a separability measure (filter approach) ◦ Choose vector dimension evaluating validation performance using a GMM classifier (wrapper approach) Higher performance w.r.t. filter methods, but less computational complexity than wrapper approaches 11
12
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 A class is represented by a weighted sum of multivariate normal distributions in a l- dimensional space Training: estimate the most probable mixture given a dataset ◦ find the mixture that maximizes the likelihood of the training data Classification: label a new sample ◦ assigning the example to the class maximizing the likelihood of that datum 12
13
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 Classically carried out by means of the Expectation-Maximization algorithm (EM) Drawbacks of EM: ◦ Initialization (initial parameters and number of components) ◦ Risk of singular solutions (number of components chosen too high) Figueiredo Jain (FJ) algorithm (2002) ◦ starts from a high number of components ◦ “annihilates” components if they are not supported by data (MML information-theoretical criterion) 13
14
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 To evaluate performance of classification three metrics have been used: 14
15
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 Test: 0dB Test: 5dB Test: 15dB Test: 10dB Test: 20dB 15
16
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 Consider a T-shaped mic array Center mic is taken as reference Localization problem can be split in two tasks: ◦ Estimate Time Differences of Arrivals (TDOA) between each mic and reference mic ◦ Estimate source location from TDOAs 16
17
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 Use ML-GCC estimator to estimate time delays Where is the Generalized Cross Correlation function, ◦ is the cross spectrum, ◦ is the Discrete Fourier Transform (DFT) of the signal ◦ is the Magnitude Squared Coherence function between x i and x 0 17
18
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 Acoustic model of the audio signal received at a couple of microphones: The TDE problem consists in the estimation of τ GCC signal waveform Generalized Cross Correlation (GCC) 18
19
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 We used Linear Correction Least Square algorithm: ◦ Given the spherical error function where we want to solve the linear problem: subject to the range constraint: 19
20
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 Linear-Correction Least Squares Localization (Huang & Benesty, 2004) Linear-Correction Least Squares Localization (Huang & Benesty, 2004) 20
21
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 SNR > threshold small TDOA estimation errors around the true time delay SNR < threshold large errors on TDOA estimation 21
22
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 22
23
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 23
24
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 24
25
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 Combined system yields a precision of 93% and a false rejection rate of 5% at 10dB SNR Hybrid feature selection allows to effectively select the most representative features with a reasonable computational effort Future Extensions: Fusion of multiple mic arrays into a sensor network increase range and precision 25
26
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 M. Figueiredo and A. Jain, “Unsupervised learning of finite mixture models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 381–396, 2002. C. Knapp and G. Carter, “The generalized correlation method for estimation of time delay,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, no. 4, pp. 320–327, 1976. J. Chen, Y. Huang, and J. Benesty, Audio Signal Processing for Next- Generation Multimedia Communication Systems. Kluwer, 2004, ch. 4-5 J. Ianniello, “Time delay estimation via cross-correlation in the presence of large estimation errors,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 30, no. 6, pp. 998–1003, 1982 26
27
Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 27
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.