User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh.

User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

CREATE SPECTROGRAM Run the code createSpectro.m to 1.create spectrogram from a.wav file 2.idealize the spectrogram 3.extract candidate syllables from idealized spectrogram Try the following example Set, rec = ‘..\031611KOKO02MATED.wav'; % put the address and name of the wav file D = ‘...\031611KOKO02MATEDspectro\'; % location of the folder % that will contain syllables Depending on the size of main memory and recording set range of the for loop In each iteration we created spectrogram of two minutes of the recording, this value can be changed to create spectrogram of longer section of the recording. RUNNING TIME: Since the running time is faster than real time, we did not include running time analysis in our paper. For example, It took on average, (12.95 + 12.81 + 12.67)/3 = 12.81 second, to create spectrogram of a two minute long recording It took, 85.7 second to extract connected components from the idealized spectrogram of a six minute long recording

CREATE SPECTROGRAM rec = 'C:\Users\Jesin\Desktop\temp\031611KOKO02MATED.wav'; t1 = 124000*250; t2 = 125000*250; [Y, FS] = wavread(rec,[t1,t2]); [y,F,T,P]=spectrogram(Y,512,256,512,FS,'yaxis'); C = -10*log10(P); C(C<35)=0; C(C>80)=0; C(C~=0)=1; imshow(~C); 124 Time (second) 125 40 kHz 100 laboratory mice Figure 1: Use the following code to create the idealized spectrogram.

EXTRACT CANDIDATE SYLLABLES In createSpectro.m we marked the part of code to extract candidate syllables Results of all filtering steps are included in the extractcandidatesyllable.zip folder The folder …\031611KOKO02MATEDspectro contains all connected components with duration >10 and <300 and within frequency range 30 to 110kHz The folder …\031611KOKO02MATED contains all candidate syllables after filtering out some noise and excluding all the syllables but one that appear in the same time stamp The folder …\sametime contains syllables that were excluded for appearing in same timestamp

CLASSIFY CANDIDATE SYLLABLES Run the code classifySyllables.m Require: 1.labelGrndTruth.txt contains labels of the ground truth 2.theta.txt contains thresholds for each class. mean, sigma, mean+sigma and mean+2*sigma for each class of syllables in the ground truth are included in column 1, 2, 4 and 5 of theta.txt 3.Nomalized Ground truth 4.Candidate syllables bitmaps 5.List of candidate syllables in sorted order Result: For our sample example, ‘dis031611KOKO02MATED.txt’, contains distance of the candidate syllables to GroundTruth ‘label 031611KOKO02MATED.txt’, contains labels of all the candidate syllables If you want to see class distribution unblock the code for class distribution in classifySyllables.m

CLASSIFY CANDIDATE SYLLABLES Normalization method In our paper we said that all the candidate syllables and ground truth are normalized before computing the GHT distance between them. But for brevity we did not include details about our normalization method and also did not validate our normalization method. In the next slide we will present detail about our normalization method.

CLASSIFY CANDIDATE SYLLABLES Normalization method Set: 16 syllables of class 1, 3, 4 and 11 (non confusing classes) Syllables that are not clustered correctly are marked with red circle GHT is calculated without normalizing the syllables

CLASSIFY CANDIDATE SYLLABLES Normalization method Set: 16 syllables of class 1, 3, 4 and 11 (non confusing classes) Still there are some syllables that are not clustered correctly as evident from the following figure GHT is calculated after normalizing the syllables by dividing x and y by the larger dimension(row or column) Same set of syllables after normalization

CLASSIFY CANDIDATE SYLLABLES Normalization method (we used in our paper) Set: 16 syllables of class 1, 3, 4 and 11 (non confusing classes) All the syllables except one (marked with arrow), are clustered correctly as evident from the following figure GHT is calculated after normalizing the syllables by dividing x and y by the size of row and column respectively Same set of syllables after normalization

CLASSIFY CANDIDATE SYLLABLES Same set of syllables after normalization Set: 16 syllables of class 1 and 27 syllables of class 9 (Confusing classes) Normalization method (we used in our paper) GHT is calculated after normalizing the syllables by dividing x and y by the size of row and column respectively

EDITING GROUND TRUTH 0100200300400500600700 0 0.2 0.4 0.6 0.8 1 Adding more instances Classification Accuracy for edited ground truth for all the labeled syllables Run accuracyGrndTrth.m to generate the plot It requires, editMatrix.txt dis692.txt label692.txt DESCRIPTION OF THE FILES In our paper we have mentioned about the 692 annotated syllables by the domain expert. Instead of using that 692 syllables as ground truth we used data editing technique, that resulted in a set of 108 syllables which we used as GROUNDTRUTH for our experiments 1. editMatrix.txt contains result of editing 692 annotated syllables Column 2, 3, 4 and 5 represent the number of syllable added to the ground truth, class label of the syllable, total number of classified syllable using the edited ground truth and accuracy rate. 2. dis692.txt contains GHT distances of the 692 annotated syllables 3. label692.txt contains class labels of the 692 syllables groundtruth.zip contains the set of 692 syllable and 108 syllables that we mentioned in our paper.

MOTIF DISCOVERY Run findMotif.m to find motifs from a vocalization 944.7 – 945.2 sec 194.8 – 195.2 sec Instruction: In findMotif.m need to change location of the folders that will contain motifs,.wav file, list of syllables, label of the syllables And also create folder e.g. …/motif/6 …/motif/7 before running the code. These folders will contain motifs of length 6, 7 etc. motif.zip contains motifs from the attached.wav file.

Clustering mice vocalizations Run clusterMtf.m to cluster motifs from mice vocalizations The folder ‘dendo_mice’ contains all the required files used to generate the dendrograms of figure 12 and figure 13.

d dqd ddqd (‘q’ means, unknown class) QUERY Similarity search / Query by content Some additional results are attached here 10 NN from four vocalizations are presented.

qaiaiacia (‘q’ means, unknown class) QUERY Similarity search / Query by content Some additional results are attached here 10 NN from four vocalizations are presented. a q i a i a c i a

Motif Significance Run mtfSgnfnc.m to assess significance of motifs based on their z-score. The folder ‘../mtfSgnfcn’ contains all the required files used to generate the plot of figure 17.

Contrast sets createContrastset.m is used to create the contrast sets. contratset.m is used to extract the patterns in contrast sets, from a vocalization. The folder ‘../contrastSet’ contains some examples of contrast set that we mentioned in our paper. It also contains necessary files needed in createContrastset.m ‘contrastset.txt’ contains the list of substrings sorted in descending order of their information gain.

Question/ comment? Email at, jzaka001@cs.ucr.edu

User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh.

Similar presentations

Presentation on theme: "User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh.

Similar presentations

Presentation on theme: "User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh."— Presentation transcript:

Similar presentations

About project

Feedback