Content-based Music Retrieval from Acoustic Input (CBMR)
Outline zWhat is CBMR? zMethods ySignal processing ySimilarity comparison zExperiment results zDemo zFuture work
What is CBMR? zCBMR : yContent-based Music Retrieval zTraditional database query : yText-based or SQL-based zOur goal : yMusic retrieval by singing/humming
Related Work zQuery by humming by Ghias,Loga and Chamberlin in 1995 yAutocorrelation pitch detection y183 songs in database zMELDEX system by New Zealand Digital Library Project in 1996 yGold/Rabiner Algorithm (800 songs) ySing ‘la’ or ‘ta’ when transposition zKaraoke song recognizer by J.F. Wang in 1997 yNovel pitch detection y50 songs in database
Flowchart Post Signal Processing Pitch Tracking Microphone Signal Input Filtering Query Results (Ranked Song List) Similarity Comparison Off-line processing Midi message Extraction Songs Database Sampling 11KHz Mid-level Representation On-line processing
Original Wave Input 小雨中的回憶 Hz 8 Bits Mono
Single Frame 512 points/frame 340 points overlap Zoom in Overlap Frame
Pitch Tracking zRange yE2 - C6 y82 Hz Hz ( - ) zMethod yAuto-correlation y
Auto-correlation without Clipping
-10- Center Clipping (a)(b)(c) 000 Clipping limits are set to % of the absolute maximum of the auto-correlation data
-11- Auto-correlation with Clipping
-12- Pitch Contour
-13- Signal Process zRemove violent point & short notes zDown sampling & smoothing zFrequency to semitone ySemitone : A music scale based on A440 y
-14- Pitch Contour (After Smoothing)
-15- Mid-level Representation
-16- Mid-level Representation without Rest
-17- Similarity Comparison zGoal yFind the most similar Midi file zChallenge yTempo variance xDynamic time warping (DTW) yTune variance xKey transposition
-18- Compare by DTW Wave File Mid File DTW
-19- Dynamic Time Warping (DTW) i j t(i-1) t(i) r(j) r(j-1) window
-20- DTW (cont.) i j dist(i,j) = |t(i)-r(j)| if ( t(i) = Rest && r(j) = Rest ) dist(i,j) = 0; elseif ( t(i) = Rest || r(j) = Rest) dist(i,j) = restWeight;
-21- Example of DTW
-22- Key Transposition zMean sift zBinary search in the searching area yO( N) --> O (log N) Mean Searching Area
-23- Example of Key Transposition
-24- Score Function z ym : length of match string yn : length of input string ye : DTW distance yA = 0.8 yB = 0.6
-25- Experiment Environment z290 wave files yWave length : sec yWave format : PCM, 11025Hz, 8bits, Mono zEnvironment yCeleron 450 with 128Mb RAM under Matlab 5.3 zDatabase y493 midi files
-26- Experiment Result (Histogram)
-27- Experiment Result (Pie) Total time : 4589 sec (15.8 sec/per-wave)
-28- Experiment Result (Pie) - With Rest Total time : 7893 sec (27.2 sec/per-wave)
-29- How to Accelerate? zBranch and bound yO(N) -> O(lnN) yTriangle inequality xd(a,b) + d(b,c) ≧ d(a,c) zHierarchical y2 phase x3/32 sec x2/32 sec
-30- Experiment Result (Pie) - 3/32 sec Total time : 2358 sec (8.9 sec/per-wave)
-31- Experiment Result (Pie) - 2 Phase Total time : 3006 sec (11.2 sec/per-wave)
-32- Error Analysis zMidi error zSinging error zLow pitch zBroken vocalism zNoise
-33- Future Work zTime consuming yBetter similarity comparison yDifferent comparison unit yHardware acceleration yBetter searching algorithm zSteadier pitch tracking algorithm zNoise handle