G. Valenzise , L. Gerosa, M. Tagliasacchi , F. Antonacci , A. Sarti IEEE Int. Conf. On Advanced Video and Signal-based Surveillance, 2007 * Dipartimento.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

Caroline Rougier, Jean Meunier, Alain St-Arnaud, and Jacqueline Rousseau IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 5,

Supervised Learning Recap

COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.

Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques

Introduction to machine learning

Anomaly detection Problem motivation Machine Learning.

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.

This week: overview on pattern recognition (related to machine learning)

Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Data Processing Functions CSC508 Techniques in Signal/Data Processing.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.

REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

EDGE DETECTION IN COMPUTER VISION SYSTEMS PRESENTATION BY : ATUL CHOPRA JUNE EE-6358 COMPUTER VISION UNIVERSITY OF TEXAS AT ARLINGTON.

Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.

Detection, Classification and Tracking in a Distributed Wireless Sensor Network Presenter: Hui Cao.

Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.

ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.

Support Vector Machines

1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Spectrum Sensing In Cognitive Radio Networks

Data Mining and Decision Support

NTU & MSRA Ming-Feng Tsai

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

Advanced Wireless Networks

CH 5: Multivariate Methods

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Image Segmentation Techniques

Course Outline MODEL INFORMATION COMPLETE INCOMPLETE

Exposing Digital Forgeries by Detecting Traces of Resampling Alin C

Probabilistic Models with Latent Variables

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION

Concave Minimization for Support Vector Machine Classifiers

Machine learning overview

EM Algorithm and its Applications

Jia-Bin Huang Virginia Tech

Lecture 16. Classification (II): Practical Considerations

Introduction to Machine learning

Presentation transcript:

G. Valenzise *, L. Gerosa, M. Tagliasacchi *, F. Antonacci *, A. Sarti * IEEE Int. Conf. On Advanced Video and Signal-based Surveillance, 2007 * Dipartimento di Elettronica e Informazione, Politecnico di Milano

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007  Description of the problem  System Overview  Classification ◦ GMM ◦ Feature extraction ◦ Feature selection ◦ Experimental results  Localization ◦ Time Delay Estimation ◦ Source Localization ◦ Experimental results 2

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007  Increasing need for safety in public places (e.g. squares): ◦ High degree of criminality ◦ Large number of video- cameras installed  Aid to the human control of the video-surveillance systems using audio signal to detect and localize anomalous events (e.g. gunshots, screams) and to steer a video-camera 3

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5,

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007  Large set of descriptors, ◦ innovative ones such as autocorrelation roll-off, decrease, slope  Exhaustive analysis of the feature selection process, ◦ formulation of a hybrid approach integrating different techniques proposed in literature  Improved algorithm for GMM training ◦ Figueiredo-Jain instead of classical EM algorithm  Proposal of a method to zoom the camera ◦ basing on the localization confidence 5

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5,

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5,

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, Autocorrelation filtered in the frequency range Hz

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007  From the full set of features, we want a vector of l features: ◦ Similar discrimination power ◦ Less computationally intensive ◦ Resistant to overfitting 9 Filter-based feature vector construction Wrapper-based selection

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5,

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007  From the full set of L features, we want a vector of l features ( l <L): ◦ Similar discrimination power ◦ Less computationally intensive ◦ Resistant to overfitting  Hybrid two-step method: ◦ Heuristic algorithm to construct the feature vectors of different size (2≥ l ≤L) using a separability measure (filter approach) ◦ Choose vector dimension evaluating validation performance using a GMM classifier (wrapper approach)  Higher performance w.r.t. filter methods, but less computational complexity than wrapper approaches 11

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007  A class is represented by a weighted sum of multivariate normal distributions in a l- dimensional space  Training: estimate the most probable mixture given a dataset ◦ find the mixture that maximizes the likelihood of the training data  Classification: label a new sample ◦ assigning the example to the class maximizing the likelihood of that datum 12

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007  Classically carried out by means of the Expectation-Maximization algorithm (EM)  Drawbacks of EM: ◦ Initialization (initial parameters and number of components) ◦ Risk of singular solutions (number of components chosen too high)  Figueiredo Jain (FJ) algorithm (2002) ◦ starts from a high number of components ◦ “annihilates” components if they are not supported by data (MML information-theoretical criterion) 13

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007  To evaluate performance of classification three metrics have been used: 14

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 Test: 0dB Test: 5dB Test: 15dB Test: 10dB Test: 20dB 15

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007  Consider a T-shaped mic array  Center mic is taken as reference  Localization problem can be split in two tasks: ◦ Estimate Time Differences of Arrivals (TDOA) between each mic and reference mic ◦ Estimate source location from TDOAs 16

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007  Use ML-GCC estimator to estimate time delays Where is the Generalized Cross Correlation function, ◦ is the cross spectrum, ◦ is the Discrete Fourier Transform (DFT) of the signal ◦ is the Magnitude Squared Coherence function between x i and x 0 17

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007  Acoustic model of the audio signal received at a couple of microphones:  The TDE problem consists in the estimation of τ  GCC signal waveform Generalized Cross Correlation (GCC) 18

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007  We used Linear Correction Least Square algorithm: ◦ Given the spherical error function where we want to solve the linear problem: subject to the range constraint: 19

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007 Linear-Correction Least Squares Localization (Huang & Benesty, 2004) Linear-Correction Least Squares Localization (Huang & Benesty, 2004) 20

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007  SNR > threshold  small TDOA estimation errors around the true time delay  SNR < threshold  large errors on TDOA estimation 21

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5,

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5,

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5,

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007  Combined system yields a precision of 93% and a false rejection rate of 5% at 10dB SNR  Hybrid feature selection allows to effectively select the most representative features with a reasonable computational effort Future Extensions:  Fusion of multiple mic arrays into a sensor network  increase range and precision 25

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5, 2007  M. Figueiredo and A. Jain, “Unsupervised learning of finite mixture models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 381–396,  C. Knapp and G. Carter, “The generalized correlation method for estimation of time delay,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, no. 4, pp. 320–327,  J. Chen, Y. Huang, and J. Benesty, Audio Signal Processing for Next- Generation Multimedia Communication Systems. Kluwer, 2004, ch. 4-5  J. Ianniello, “Time delay estimation via cross-correlation in the presence of large estimation errors,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 30, no. 6, pp. 998–1003,

Scream and Gunshot Detection and Localization for Audio-Surveillance Systems AVSS 2007, September 5,