Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.

Slides:



Advertisements
Similar presentations
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Advertisements

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
An Approach to ECG Delineation using Wavelet Analysis and Hidden Markov Models Maarten Vaessen (FdAW/Master Operations Research) Iwan de Jong (IDEE/MI)
Advanced Speech Enhancement in Noisy Environments
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
PERFORMANCE ANALYSIS OF AURORA LARGE VOCABULARY BASELINE SYSTEM Naveen Parihar, and Joseph Picone Center for Advanced Vehicular Systems Mississippi State.
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Segmentation and Event Detection in Soccer Audio Lexing Xie, Prof. Dan Ellis EE6820, Spring 2001 April 24 th, 2001.
CS 188: Artificial Intelligence Fall 2009 Lecture 21: Speech Recognition 11/10/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
Automatic detection and classification of Microchiropteran echolocation calls: Why the current technology is wrong and what can be done about it Mark D.
Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
Jacob Zurasky ECE5526 – Spring 2011
The GEO Line detection Monitor Thomas Cokelaer Cardiff University 22 nd August 2003 LIGO-G Z LSC meeting – Hannover – 18,21 August 2003.
Multimodal Information Analysis for Emotion Recognition
Informing Multisource Decoding for Robust Speech Recognition Ning Ma and Phil Green Speech and Hearing Research Group The University of Sheffield 22/04/2005.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Experimental Results Abstract Fingerspelling is widely used for education and communication among signers. We propose a new static fingerspelling recognition.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga Maurício O. Tsugawa ©2002,
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Predicting Voice Elicited Emotions
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Statistical techniques for video analysis and searching chapter Anton Korotygin.
Automatic speech recognition using an echo state network Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
Present by: Fang-Hui Chu Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition Fei Sha*, Lawrence K. Saul University of Pennsylvania.
Automatic Speech Processing Methods For Bioacoustic Signal Analysis: A Case Study Of Cross-Disciplinary Acoustic Research Mark D. Skowronski and John G.
Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Computational NeuroEngineering Lab
Statistical Models for Automatic Speech Recognition
AN ANALYSIS OF TWO COMMON REFERENCE POINTS FOR EEGS
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
feature extraction methods for EEG EVENT DETECTION
John H.L. Hansen & Taufiq Al Babba Hasan
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida, Gainesville, FL, USA May 19, 2005

Overview Motivations for acoustic bat detection Machine learning paradigm Detection experiments Conclusions

Bat detection motivations Bats are among the most diverse yet least studied mammals (~25% of all mammal species are bats). Bats affect agriculture and carry diseases (directly or through parasites). Acoustical domain is significant for echolocating bats and is non-invasive. Recorded data can be volumous  automated algorithms for objective and repeatable detection & classification desired.

Conventional methods Conventional bat detection/classification parallels acoustic-phonetic paradigm of automatic speech recognition from 1970s. Characteristics of acoustic phonetics: –Originally mimicked human expert methods –First, boundaries between regions determined –Second, features for each region were extracted –Third, features compared with decision trees, DFA Limitations: –Boundaries ill-defined, sensitive to noise –Many feature extraction algorithms with varying degrees of noise robustness

Machine learning Acoustic phonetics gave way to machine learning for ASR in 1980s: Advantages: –Decisions based on more information –Mature statistical foundation for algorithms –Frame-based features, from expert knowledge –Improved noise robustness For bats: increased detection range

Detection experiments Database of bat calls –7 different recording sites, 8 species –1265 hand-labeled calls (from spectrogram readings) Detection experiment design –Discrete events: 20-ms bins –Discrete outcomes: Yes or No: does a bin contain any part of a bat call?

Detectors Baseline –Threshold for frame energy Gaussian mixture model (GMM) –Model of probability distribution of call features –Threshold for model output probability Hidden Markov model (HMM) –Similar to GMM, but includes temporal constraints through piecewise-stationary states –Threshold for model output probability along Viterbi path

Feature extraction Baseline –Normalization: session noise floor at 0 dB –Feature: frame power Machine learning –Blackman window, zero-padded FFT –Normalization: log amplitude mean subtraction From ASR: ~cepstral mean subtraction Removes transfer function of recording environment Mean across time for each FFT bin –Features: Maximum FFT amplitude, dB Frequency at maximum amplitude, Hz First and second temporal derivatives (slope, concavity)

Feature extraction examples

Six features: Power, Frequency,  P,  F  P,  F

Detection example

Experiment results

Conclusions Machine learning algorithms improve detection when specificity is high (>.6). HMM slightly superior to GMM, uses more temporal information, but slower to train/test. Hand labels determined using spectrogram, biased towards high-power calls. Machine learning models applicable to other species.

Bioacoustic applications To apply machine learning to other species: –Determine ground truth training data through expert hand labels –Extract relevant frame-based features, considering domain-specific noise sources (echos, propellor noise, other biological sources) –Train models of features from hand-labeled data –Consider training “silence” models for discriminant detection/classification

Further information Acknowledgements Bat data kindly provided by: Brock Fenton, U. of Western Ontario, Canada