Download presentation
Presentation is loading. Please wait.
Published byMaurice Casey Modified over 9 years ago
1
Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu
2
Image vs. Audio ? ? ? ? ? ? Classical Country Rock
3
Image techniques to audio Idea: Apply image retrieval (and classification) techniques to audio Image is 2-D Audio is 1-D
4
Benefits Don’t have to reinvent the wheel Image techniques have had fairly good success More literature in image processing Audio retrieval is a relatively new field
5
Key Concepts and Goals Image techniques to audio processing Apply a number of different image techniques (and show they work ) Relate various parts of audio to counterparts in image Novel data set with known ground truth Multiple input for audio Raw audio
6
A first step… Audio retrieval Input: A number of songs Output: “Similar” songs from an audio database Histogramming methods (Puzicha et. al.) Wavelets instead of gabor filters
7
Basic Technique DWT Database Most “similar” songs histogram
8
Normal vs. Proportional Histogramming Remember DWT: Different number of samples per level Normal: Histogram each level with same number of bins Proportional: Histogram each level keeping samples/bin equal
9
Compare Histograms Chi-square on each level Sum chi-square value and use for dissimilarity measure (lower the better) Sum dissimilarity over all input songs
10
Ground Truth Data Set Songs by 4 different bands (10 songs each) Dave Mathews band U2 Blink 182 Green Day Mono, sampled at 22 KHz from a number of sources
11
Experiment Input = 5 songs by a single band Goal = Pull out 5 other songs by that band 10 random experiments per band (40 total) Normal bins: 8, 16, 32, 64, 128, 192, 256, 320, 384, 448, 512 Proportional bins: 4, 8, 16, 32, 64
12
Scoring By points: 5 pts. Correct answer in first place 4 pts. Correct answer in second place, etc. Perfect = 5+4+3+2+1 = 15 Percentage correct at each place Percentage that have correct answer less than or equal to place
13
Results: Points
14
Results: Points Proportional
15
Best Score Results: 16 bins 1 st 2 nd 3 rd 4 th 5thScore Dave Mathews.6.8.4.3.28.2 Blink 182.3.1 0 2.3 U2000.10.2 Green Day.2.3.20.53.3 Average.275.3.175.1.23.5
16
Different Bands NormalProportional Dave Mathews 6.95.8 Blink 1821.32 U2.91.5 Green Day2.12 Average2.8
17
Percentage correct 1 st 2 nd 3 rd 4 th 5 th Normal.23.17.18 Proportio nal.16.21.24.15
18
One last result
19
Summary of Results Overall, results are not amazing Band choice has large influence Normal and Proportional perform somewhat similar Proportional is more even over all bands Bin size doesn’t appear to be crucial 75% of a chance a song by the same band will end up in top 5
20
Next Step… Adaptive Binning Vary Parameters Levels Song length Histogram comparison methods Another image retrieval algorithm Boosting for feature selection using large feature set? Other? Larger and more diverse database
21
Conclusion Even though results are not fabulous, image processing techniques CAN be used for audio processing Using bands for testing allows for ground truth Audio files are BIG!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.