Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

An Approach to ECG Delineation using Wavelet Analysis and Hidden Markov Models Maarten Vaessen (FdAW/Master Operations Research) Iwan de Jong (IDEE/MI)
Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
Student: Hsu-Yung Cheng Advisor: Jenq-Neng Hwang, Professor
Optimal Adaptation for Statistical Classifiers Xiao Li.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Dynamic Time Warping Applications and Derivation
A PRESENTATION BY SHAMALEE DESHPANDE
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.
1 CS 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Soft Margin Estimation for Speech Recognition Main Reference: Jinyu Li, " SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION," PhD thesis, Georgia.
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
Automatic detection and classification of Microchiropteran echolocation calls: Why the current technology is wrong and what can be done about it Mark D.
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
7-Speech Recognition Speech Recognition Concepts
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and Department.
LINEAR DYNAMIC MODEL FOR CONTINUOUS SPEECH RECOGNITION Ph.D. Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing.
Image Classification 영상분류
Informing Multisource Decoding for Robust Speech Recognition Ning Ma and Phil Green Speech and Hearing Research Group The University of Sheffield 22/04/2005.
Structure Discovery of Pop Music Using HHMM E6820 Project Jessie Hsu 03/09/05.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
To examine the feasibility of using confusion matrices from speech recognition tests to identify impaired channels, impairments in this study were simulated.
Speech Signal Processing I
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
Automatic speech recognition using an echo state network Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Automatic Speech Processing Methods For Bioacoustic Signal Analysis: A Case Study Of Cross-Disciplinary Acoustic Research Mark D. Skowronski and John G.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
A NONPARAMETRIC BAYESIAN APPROACH FOR
핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Statistical Models for Automatic Speech Recognition
Statistical Models for Automatic Speech Recognition
EE513 Audio Signals and Systems
feature extraction methods for EEG EVENT DETECTION
John H.L. Hansen & Taufiq Al Babba Hasan
Presentation transcript:

Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA November 19, 2004

Overview Motivations for bat acoustic research Review bat call classification methods Contrast with 1970s human ASR Experiments Conclusions

Bat research motivations Bats are among: –the most diverse, –the most endangered, –and the least studied mammals species, ~25% of all mammal species Close relationship with insects, agricultural impact, disease vectors Acoustical research non-invasive, significant domain (echolocation) Simplified biological acoustic communication system (compared to human speech)

Bat echolocation Ultrasonic, brief chirps Determine range, velocity of nearby objects (clutter, prey, conspecifics) Tailored for task, environment Tadarida brasiliensis (Mexican free-tailed bat) Listen to 10x time-expanded search calls:

Echolocation calls Two characteristics –Frequency modulated -- range –Constant frequency -- velocity Features (holistic) –Freq. extrema –Duration –Shape –# harmonics –Call interval Mexican free-tailed calls, concatenated

Current classification methods Expert sonogram readers –Manual or automatic feature extraction –Comparison with exemplar sonograms Automatic classification –Decision trees –Discriminant function analysis –Artificial neural networks –Spectrogram correlation Parallels the knowledge-based approach to human ASR from the 1970s (acoustic phonetics, expert systems, cognitive approach).

Acoustic phonetics Bottom up paradigm –Frames, boundaries, groups, phonemes, words Manual or automatic feature extraction –Formants, voicing, duration, intensity, transitions Classification –Decision tree, discriminant functions, neural network, Gaussian mixture model, Viterbi path DH AH F UH T B AO L G EY EM IH Z OW V ER

Acoustic phonetics limitations Variability of conversational speech –Complex rules, difficult to train Boundaries difficult to define –Coarticulation Feature estimates brittle –Variable noise robustness Hard decisions, errors accumulate Shifted to information theoretic paradigm of human ASR, better able to account for variability of speech, noise.

Information theoretic ASR Data-driven models from computer science –Non-parametric: dynamic time warp (DTW) –Parametric: hidden Markov model (HMM) Frame-based –Expert information in feature extraction –Models account for feature, temporal variability Information theoretic ASR dominates state-of-the-art speech understanding systems.

Data collection UF Bat House, home to 60,000 bats –Mexican free-tailed bat (vast majority) –Evening bat –Southeastern myotis Continuous recording –90 minutes around sunset –~20,000 calls Equipment: –B&K mic (4939), 100 kHz –B&K preamp (2670) –Custom amp/AA filter –NI 6036E 200kS/s A/D card –Laptop, Matlab

Experiment design Designs and assumptions –All recorded bats are Mexican free-tailed –Calls divided into different intraspecies calls –All calls are search phase –Hand-labeled call detection is complete (no discarded calls) Hand labels –Narrowband spectrogram –Endpoints, class label –436 calls in sec sequences (2% of data) –Four classes, a priori: 34, 40, 20, 6% –All experiments on hand-labeled data only

Experiments Baseline –Features: Fmin, Fmax, Fmax_energy, and duration, from zero crossings and MUSIC –Classifier: Discriminant function analysis, quadratic boundaries DTW and HMM –Frame-based features: fundamental frequency (MUSIC super-resolution estimate), log energy, temporal derivatives (HMM only) –DTW: MUSIC frequencies, 10% endpoint range –HMM: 5 states/model, 4 Gaussian mixtures/state, diagonal covariances Tests –Leave one out –75% train, 25% test, 1000 trials –Test on train (HMM only)

Results Baseline, zero crossing –Leave one out: 72.5% correct –Repeated trials: 72.5 ± 4% (mean ± std) Baseline, MUSIC –Leave one out: 79.1% –Repeated trials: 77.5 ± 4% DTW, MUSIC –Leave one out: 74.5 % –Repeated trials: 74.1 ± 4% HMM, MUSIC –Test on train: 85.3 %

Confusion matrices % % % % 72.5% Baseline, zero crossingBaseline, MUSIC DTW, MUSICHMM, MUSIC % % % % 79.1% % % % % 74.5% % % % % 85.3%

Conclusions Human ASR algorithms applicable to bat echolocation calls Experiments –Weakness: accuracy of class labels –No labeled calls excluded –HMM most accurate, undertrained –MUSIC frequency estimate robust, slow Machine learning –DTW: fast training, slow classification –HMM: slow training, fast classification

Future work Find robust features of bat echolocation calls that match assumptions of machine learning algorithms –Noise robust –Distribution modeled by Gaussian mixtures Use hand-labeled subset of data to create call detection algorithm Explore unsupervised learning –Self-organized maps –Clustering Real-time portable detection/classification system on laptop PC

Further information DTW reference: –L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993 HMM reference: –L. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” in Readings in Speech Recognition, A. Waibel and K.- F. Lee, Eds., pp. 267–296. Kaufmann, San Mateo, CA, 1990.