Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio

Slides:



Advertisements
Similar presentations
Acoustic/Prosodic Features
Advertisements

Digital Signal Processing
Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 181 Lecture 18 DSP-Based Analog Circuit Testing  Definitions  Unit Test Period (UTP)  Correlation.
2 1 Discrete Markov Processes (Markov Chains) 3 1 First-Order Markov Models.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
SIMS-201 Characteristics of Audio Signals Sampling of Audio Signals Introduction to Audio Information.
IT-101 Section 001 Lecture #8 Introduction to Information Technology.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Natural Language Processing - Speech Processing -
Overview What is in a speech signal?
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
Exploring Marine Mammal Bioacoustics Overview of ongoing research and approaches Xanadu
CS 188: Artificial Intelligence Fall 2009 Lecture 21: Speech Recognition 11/10/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
COMP 4060 Natural Language Processing Speech Processing.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Department of Computer Engineering University of California at Santa Cruz Data Compression (2) Hai Tao.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
A PRESENTATION BY SHAMALEE DESHPANDE
Basics of Signal Processing. frequency = 1/T  speed of sound × T, where T is a period sine wave period (frequency) amplitude phase.
Representing Acoustic Information
ME 322: Instrumentation Lecture 22 March 11, 2015 Professor Miles Greiner.
Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.
LE 460 L Acoustics and Experimental Phonetics L-13
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Lecture 1 Signals in the Time and Frequency Domains
Basics of Signal Processing. SIGNALSOURCE RECEIVER describe waves in terms of their significant features understand the way the waves originate effect.
Introduction Mohammad Beigi Department of Biomedical Engineering Isfahan University
Detection and Segmentation of Bird Song in Noisy Environments
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
Lecture 18 DSP-Based Analog Circuit Testing
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
1 ELEN 6820 Speech and Audio Processing Prof. D. Ellis Columbia University Midterm Presentation High Quality Music Metacompression Using Repeated- Segment.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Acoustic Analysis of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Module 2 SPECTRAL ANALYSIS OF COMMUNICATION SIGNAL.
Wireless and Mobile Computing Transmission Fundamentals Lecture 2.
Harmonics November 1, 2010 What’s next? We’re halfway through grading the mid-terms. For the next two weeks: more acoustics It’s going to get worse before.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
1 Introduction to Information Technology LECTURE 6 AUDIO AS INFORMATION IT 101 – Section 3 Spring, 2005.
Week 7 Lecture 1+2 Digital Communications System Architecture + Signals basics.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
Performance Comparison of Speaker and Emotion Recognition
Predicting Voice Elicited Emotions
SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Institut für Nachrichtengeräte und Datenverarbeitung Prof. Dr.-Ing. P. Vary On the Use of Artificial Bandwidth Extension Techniques in Wideband Speech.
Acoustic Phonetics 3/14/00.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
ECE3340 Review of Numerical Methods for Fourier and Laplace Transform Applications – Part 1 Fourier Spring 2016 Prof. Han Q. Le Note: PPT file is the.
The Physics of Sound.
ARTIFICIAL NEURAL NETWORKS
Vocoders.
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
A presentation on Basics of Speech Recognition Systems
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Acoustics of Speech Julia Hirschberg CS /7/2018.
CS 188: Artificial Intelligence Fall 2008
Signals and Systems Networks and Communication Department Chapter (1)
ELEN E4810: Digital Signal Processing Topic 11: Continuous Signals
Presentation transcript:

Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio

Xanadu Halkias- Sound to Signal sound is pressure variation of the medium (e.g. speech air pressure, marine mammals water pressure) Pressure waves in water Converting waves to voltage through a microphone Time varying voltage

Xanadu Halkias- Analog to digital sampling quantizing + = digital signal

Xanadu Halkias- Time to frequency and back… Fourier transform=decompose a signal as a sum of sinusoids and cosines Digital signalFourier spectrum Spectrum = the frequency content of the signal (energy/frequency band)

Xanadu Halkias- Back to sampling… Sampling needs to obey the Nyquist limit: Ω Τ  2Ω Μ Signal has to be bandlimited eg. energy up to some frequency Ω Μ Audio is sampled at Ω Τ =2π44100Hz so spectrum has up to 22050Hz

Xanadu Halkias- Looking at sounds-The Spectrogram Looking at energy in time and frequency

Xanadu Halkias- More on spectrograms

Xanadu Halkias- Overview of marine mammal research

Xanadu Halkias- Call detection Detect different calls within the recording automatically Differentiate between species or identify the number of marine mammals in the region through overlapping of calls Tracking marine mammals through their calls Use calls to analyze and construct a possible language structure What is it good for… Problems Data, data, data…

Xanadu Halkias- Call detection approaches Noise is the biggest problem D. K. Mellinger et all use the cross-correlation approach Cross-correlation is a way of measuring how similar two signals are

Xanadu Halkias- Call detection-kernel cross- correlation This method requires manual interference and is performed on the signal waveform Image obtained by D. K. Mellinger and C. W. Clark. "Methods for automatic detection of mysticete sounds", Mar. Fresh. Behav. Physiol. Vol. 29, pp , 1997

Xanadu Halkias- Call detection-spectrogram correlation Image obtained by D. K. Mellinger and C. W. Clark. "Methods for automatic detection of mysticete sounds", Mar. Fresh. Behav. Physiol. Vol. 29, pp , 1997

Xanadu Halkias- “Voiced” calls Energy appears in multiples of some frequency (=pitch)

Xanadu Halkias- Comments Both methods require manual measurements for the construction of the template The quality of the results depends highly on the noise present in the data Quality recordings at high sampling rates decide the course of action Correlation methods can’t capture all types of calls without constructing different kernels

Xanadu Halkias- Linear Predictive Coding Idea: the signal, x[n], is formed by adding white noise, e[n], to previous samples weighted by the linear predictive coefficients, a X[z]E[z] 1/A[z] The number of coefficients defines the detail that we capture of the original signal

Xanadu Halkias- Linear Predictive Coding Used in speech for transmission purposes Intuition: LPCs model the spectral peaks of your signal

Xanadu Halkias- LPCs in marine mammal recordings Model the peaks in the recordings that likely belong to calls that way we alleviate the problem of noise Unveils harmonic structure not visible in original spectrogram

Xanadu Halkias- Hidden Markov Models Machine learning involves training a general model based on your data in order to extract and predict desired features HMMs, M j are defined by:

Xanadu Halkias- HMMs some more… Training: getting the parameters of the model, a, b, π Evaluating: we are given a sequence of states we want to know if the model produced them Decoding: we have some observations and we want to find out the hidden states

Xanadu Halkias- HMMs in marine mammal vocalizations HMMs could provide a call detection tool The data has to be workable Use frequencies of the spectrogram as hidden states Observe the spectrogram and use it for learning Tracking the call in the spectrogram

Xanadu Halkias- References D. P. Ellis D. K. Mellinger and C. W. Clark. "Methods for automatic detection of mysticete sounds", Mar. Fresh. Behav. Physiol. Vol. 29, pp , 1997 R. O. Duda, P. E. Hart, D. G. Stork. Pattern Classification, John Wiley & sons, inc. 2001