By Jiazhi Ou Tal Blum Wild Dolphin Project 11-751 Speech Final Project.

by Jiazhi Ou jzou@cs.cmu.edu jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu blum@cs.cmu.edu Wild Dolphin Project 11-751 Speech Final Project

Outline  Wild Dolphin Project, Dolphin Speech  Data, Labeling, Labeling problems  Previous work  Models training  Experiments & Results  Conclusions

The Wild Dolphin Project (WDP)  The Wild Dolphin Project (WDP), founded by Dr. Denise Herzing in 1985, is engaged in an ambitious, long-term scientific study of a specific pod of Atlantic spotted dolphins that live 40 miles off the coast of the Bahamas, in the Atlantic Ocean. For about 100 days each year, Phase I research has involved the photographing, videotaping, and audio taping of a group of resident dolphins, aiming to learn about their lives.  http://www.wilddolphinproject.o rg/index.cfm http://www.wilddolphinproject.o rg/index.cfm http://www.wilddolphinproject.o rg/index.cfm

Dolphin’s Speech Range of frequencies is wider Range of frequencies is wider Two mechanisms for producing sound simultaneously Two mechanisms for producing sound simultaneously Directionality of some of the frequencies Directionality of some of the frequencies Carried in water Carried in water Can travel large distances Can travel large distances  Dolphin’s Speech is very different than man’s speech

Dolphin’s Speech(2)  Is used for: Identification Identification Communicating Communicating FightingFighting DefendingDefending CourtingCourting WarningWarning CallingCalling Hunting Hunting

Dolphin’s Speech(3)  3 main types Whistles Whistles SignatureSignature Non-signatureNon-signature Clicks Clicks Spike trains Spike trains

What do we know  Not much  We know that each dolphin has a unique whistle called signature whistle.  The signature whistle is similar to those that are in close contact with the baby dolphin

Data  164 files containing sounds of one dolphin whose name is known.  Average file length is 7 sec  Total data length less than 20 minutes out of which about half is silence  The data does not contain all of the relevant frequencies

Labeling  Dolphin Names Dolphin ID project Dolphin ID project  Pause, Noise, Dolphin Signature Whistles, Dolphin Non-Signature whistles.

Labeling Problems  How do we distinguish between those 2 whistles?  How to distinguish between whistles and non- whistles? They co-occur They co-occur  How to determine the duration of the label? Should close labels be labeled as one label? Should close labels be labeled as one label? This has an effect on the model This has an effect on the model  Some signals are weak, probably due to a change in the dolphins direction

Mapping from Labels to Models LabelModel d Signature Whistles dp, md Non-Signature Whistles click, electnoise, electricnoise, h#, H#, MachineSpike, s GARBAGE pau PAUSE (Water)

Label Statistics PAUSESIGWHISTLEGARBAGEDOLPHIN #occurrence s 7566331324 Accumulated time (in secs) 4663207.111.3 Average time per occurrenc e 0.60.50.550.47

Previous Work  Dolphin-ID Project by Tanja, Alan and Yue Task: To identify dolphin ID using their signature whistles Task: To identify dolphin ID using their signature whistles 51 labeled files by Alan 51 labeled files by Alan 13 HMMs: 10 for each dolphin + DOLPHIN, PAUSE, and GARBAGE 13 HMMs: 10 for each dolphin + DOLPHIN, PAUSE, and GARBAGE Use Janus to do training and testing Use Janus to do training and testing Try different kinds of features Try different kinds of features

Our Work  Model Generalized Signature Whistles Label More Files Label More Files Create HMMs for signature whistles, non- signature whistles, garbage, and pause Create HMMs for signature whistles, non- signature whistles, garbage, and pause Train and test the HMMs using Janus Train and test the HMMs using Janus Evaluate the test results with our own method Evaluate the test results with our own method Compare different model selections Compare different model selections

Signal Processing  Tanja scripts Down sampling Down sampling High Pass Filter High Pass Filter FFT FFT LDA LDA

HMM Topologies bmebmebmemmm Signature WhistlesNon-Signature Whistles GarbagePause (Water)

Model Selection  Scheme 1 Signature Whistles, Non-Signature Whistles, GARBAGE, PAUSE Signature Whistles, Non-Signature Whistles, GARBAGE, PAUSE  Scheme 2 Signature Whistles, GARBAGE, PAUSE Signature Whistles, GARBAGE, PAUSE  Scheme 3 10 HMMs (one for each dolphin), GARBAGE, PAUSE 10 HMMs (one for each dolphin), GARBAGE, PAUSE

Evaluation  We can not use WER here since there are no words, just segments.  The method we used was to compute a confusion matrix over hidden states.  Janus treat silence differently and doesn’t show silence classification which complicates the evaluation.

Experiments  Data 162 labeled files were used 162 labeled files were used Half of the data for training, half for testing Half of the data for training, half for testing Swap the training set and test set Swap the training set and test set 162 test results all together 162 test results all together  Features The same as those in dolphin-ID project The same as those in dolphin-ID project  Model Selection 3 different schemes 3 different schemes

Results – Scheme 1 SigNon-SigGarbagePause Sig58%6%18%34% Non-Sig33%8%37%22% Garbage77%0%5%18% Pause31%6%27%34%

Results – Scheme 2 SigGarbagePause Sig79%9%21% Garbage52%21%27% Pause48%14%38%

Results – Scheme 3 SigGarbagePause Sig91%0.6%8% Garbage80%10%10% Pause69%1%30%

Analysis of Results  You can only get as good as your labels  Scheme 3 is the best to align signature whistles -- speaker dependent  Scheme 1 is the worst – Not enough data to model non-signature whistles and garbage  Scheme 2 is in the middle – speaker independent  Pause is the most difficult to model – It contains all different things. We modeled it with only 1 state

Conclusion  Analyzing dolphin sounds is quite different than analyzing human speech. The methods used have to be adjusted to the characteristics of the dolphin sounds. There is a lot of work to be done in the signal processing stage There is a lot of work to be done in the signal processing stage Partly supervised training Partly supervised training It might be better just to construct a model for the labels we are sure and let the model learn what are signature whistles or units that discriminate between different labels. It might be better just to construct a model for the labels we are sure and let the model learn what are signature whistles or units that discriminate between different labels.

We also tried …  One-state model for non-signature whistles, garbage, and pause -- Segmentation fault in training  “Loop back” model for signature whistles -- The loop back transition makes no difference

Acknowledgement Tanja Schultz Yue Pan Alan W Black Szu-Chen Stan Jou Hua Yu

Thank You! Jiazhi Ou Tal Blue {jzou, tblum}@cs.cmu.edu

By Jiazhi Ou Tal Blum Wild Dolphin Project 11-751 Speech Final Project.

Similar presentations

Presentation on theme: "By Jiazhi Ou Tal Blum Wild Dolphin Project 11-751 Speech Final Project."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

By Jiazhi Ou Tal Blum Wild Dolphin Project 11-751 Speech Final Project.

Similar presentations

Presentation on theme: "By Jiazhi Ou Tal Blum Wild Dolphin Project 11-751 Speech Final Project."— Presentation transcript:

Similar presentations

About project

Feedback