2015/10/221 Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan
-2- Recent Publications zJournals yJiang-Chun Chen, J.-S. Roger Jang, "TRUES: Tone Recognition Using Extended Segments", ACM Transactions on Asian Language Information Processing, yJ.-S. Roger Jang and Hong-Ru Lee, "A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming", IEEE Transactions on Audio, Speech, and Language Processing, No. 2, Vol. 16, PP , Feb zConferences yLiang-Yu Chen, Chun-Jen Lee, Jyh-Shing Roger Jang, "Minimum Phone Error Discriminative Training For Mandarin Chinese Speaker Adaptation", Proceedings of INTERSPEECH 2008, Brisbane, Australia, Sept yChao-Ling Hsu, Jyh-Shing Roger Jang, and Te-Lu Tsai, "Separation of Singing Voice from Music Accompaniment with Unvoiced Sounds Reconstruction for Monaural Recordings", Proceedings of 125th AES Convention, San Francisco, USA, Oct yZhi-Sheng Chen, Jia-Min Zen, Jyh-Shing Roger Jang, "Music Annotation and Retrieval System Using Anti-Models", Proceedings of 125th AES Convention, San Francisco, USA, Oct
-3- Outline zProblem definition of QBSH zMethods for QBSH zProgressive Filtering zConclusions
-4- Introduction to QBSH zQBSH: Query by Singing/Humming yInput: Singing or humming from microphone yOutput: A ranking list retrieved from the song database zOverview yFirst paper: Around1994 yExtensive studies since 2001 yState of the art: QBSH tasks at ISMIR/MIREXQBSH tasks at ISMIR/MIREX
-5- Challenges in QBSH Systems zReliable pitch tracking for acoustic input yInput from mobile devices yInput at noisy karaoke box zSong database preparation yAudio music vs. MIDIs zEfficient/effective retrieval yKaraoke machine: ~10,000 songs yInternet music search engine: ~500,000,000 songs
-6-
-7- Goal and Approach zGoal: To retrieve songs effectively within a given response time, say 5 seconds or so zOur strategy yMulti-stage progressive filtering yData-driven design methodology based on DP
-8- Approaches to QBSH zPitch TrackingPitch Tracking zMethods for QBSHMethods for QBSH
-9- A Quick Demo of QBSH zDemo page of MIR lab: yhttp://mirlab.org/mir_main/demo.htmhttp://mirlab.org/mir_main/demo.htm zDemo of QBSH yhttp://mirlab.org/Demo/MusicSearch/index.htmhttp://mirlab.org/Demo/MusicSearch/index.htm
-10- Progressive Filtering zMulti-stage representation yEach stage is a method for QBSH stage 1 stage 1 stage 2 stage 2 stage i stage i … … s i : survival rate for stage i d i : delay for stage i n i-1 : no. of input songs to stage i
-11- Stage Characteristics for Effectiveness z RS curve for stage i: recog. rate = r i (s) Survival rates s (%) Recog. rates (%) More effective method Less effective method Random guess Top-10% recog. rate is 65% (0, 0) (100, 100) Survival rate Recog. rate
-12- z TS curve for stage i: average time = t i (s) Stage Characteristics for Efficiency Survival rates (%) Average time (ms) Less efficient method More efficient method When s=10%, the average one-to-one comparison time is 5ms Survival rate Time (0, 0) (100, 0)
-13- Formulation as an Optim. Problem zMax: subject to the constraints n (= n 0 ): Size of the song database T max : maximum allowable response time, say, 5 sec. 10 : the size of the retrieved ranking list.
-14- DP-based Approach zThe orig. optim. task can be cast into DP: yOptimum-value function R i (s, t) is the optimum recog. rate at stage i, with a cumulated survival rate s and a cumulated computation time t. yRecurrent formula for R i (s, t) can be derived based on changing the survival rate of stage i, as follows.
-15- Recurrent formula for R i (s, t) stage 1 stage 1 stage i-1 stage i-1 stage i stage i … … d i : delay of stage i
-16- DP-based Approach yBoundary conditions for R i (s, t) : yOptim. recog. rate: We can then back track to find the optimum s 1, s 2, …, s m.
-17- Five Stages for Our Study zWe chose 5 stages for DP-based design method: yRange comparison yModified edit distance yLS yDTW with down-sampled inputs yDTW
-18- Corpora zQBSH corpusQBSH corpus y second recordings (8 KHz, 8 bits) of 48 kids songs, by118 subjects y500 for design set, the others for test zSong database y13320 songs zComparison mode yAnchored beginning
-19- RS curves
-20- TS Curves
-21- Optimum RR wrt Response Time
-22- Survival Rates wrt Response Time
-23- Conclusions & Future Work zConclusions yAdvantages: xA scalable meta-method xFeasible for optimizing QBSH systems xApplicable (?) to other multimedia retrieval systems yDisadvantages xDerivation of RS and TS curves is time-consuming zFuture work yMore effective/efficient method for each stage