Presentation is loading. Please wait.

Presentation is loading. Please wait.

2015/10/221 Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept.,

Similar presentations


Presentation on theme: "2015/10/221 Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept.,"— Presentation transcript:

1 2015/10/221 Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan http://www.cs.nthu.edu.tw/~jang

2 -2- Recent Publications zJournals yJiang-Chun Chen, J.-S. Roger Jang, "TRUES: Tone Recognition Using Extended Segments", ACM Transactions on Asian Language Information Processing, 2008. yJ.-S. Roger Jang and Hong-Ru Lee, "A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming", IEEE Transactions on Audio, Speech, and Language Processing, No. 2, Vol. 16, PP. 350-358, Feb 2008. zConferences yLiang-Yu Chen, Chun-Jen Lee, Jyh-Shing Roger Jang, "Minimum Phone Error Discriminative Training For Mandarin Chinese Speaker Adaptation", Proceedings of INTERSPEECH 2008, Brisbane, Australia, Sept. 2008. yChao-Ling Hsu, Jyh-Shing Roger Jang, and Te-Lu Tsai, "Separation of Singing Voice from Music Accompaniment with Unvoiced Sounds Reconstruction for Monaural Recordings", Proceedings of 125th AES Convention, San Francisco, USA, Oct. 2008. yZhi-Sheng Chen, Jia-Min Zen, Jyh-Shing Roger Jang, "Music Annotation and Retrieval System Using Anti-Models", Proceedings of 125th AES Convention, San Francisco, USA, Oct. 2008.

3 -3- Outline zProblem definition of QBSH zMethods for QBSH zProgressive Filtering zConclusions

4 -4- Introduction to QBSH zQBSH: Query by Singing/Humming yInput: Singing or humming from microphone yOutput: A ranking list retrieved from the song database zOverview yFirst paper: Around1994 yExtensive studies since 2001 yState of the art: QBSH tasks at ISMIR/MIREXQBSH tasks at ISMIR/MIREX

5 -5- Challenges in QBSH Systems zReliable pitch tracking for acoustic input yInput from mobile devices yInput at noisy karaoke box zSong database preparation yAudio music vs. MIDIs zEfficient/effective retrieval yKaraoke machine: ~10,000 songs yInternet music search engine: ~500,000,000 songs

6 -6-

7 -7- Goal and Approach zGoal: To retrieve songs effectively within a given response time, say 5 seconds or so zOur strategy yMulti-stage progressive filtering yData-driven design methodology based on DP

8 -8- Approaches to QBSH zPitch TrackingPitch Tracking zMethods for QBSHMethods for QBSH

9 -9- A Quick Demo of QBSH zDemo page of MIR lab: yhttp://mirlab.org/mir_main/demo.htmhttp://mirlab.org/mir_main/demo.htm zDemo of QBSH yhttp://mirlab.org/Demo/MusicSearch/index.htmhttp://mirlab.org/Demo/MusicSearch/index.htm

10 -10- Progressive Filtering zMulti-stage representation yEach stage is a method for QBSH stage 1 stage 1 stage 2 stage 2 stage i stage i … … s i : survival rate for stage i d i : delay for stage i n i-1 : no. of input songs to stage i

11 -11- Stage Characteristics for Effectiveness z RS curve for stage i: recog. rate = r i (s) Survival rates s (%) Recog. rates (%) More effective method Less effective method Random guess 10010 100 65 Top-10% recog. rate is 65% (0, 0) (100, 100) Survival rate Recog. rate

12 -12- z TS curve for stage i: average time = t i (s) Stage Characteristics for Efficiency Survival rates (%) Average time (ms) Less efficient method More efficient method 10010 5 When s=10%, the average one-to-one comparison time is 5ms Survival rate Time (0, 0) (100, 0)

13 -13- Formulation as an Optim. Problem zMax: subject to the constraints n (= n 0 ): Size of the song database T max : maximum allowable response time, say, 5 sec. 10 : the size of the retrieved ranking list.

14 -14- DP-based Approach zThe orig. optim. task can be cast into DP: yOptimum-value function R i (s, t) is the optimum recog. rate at stage i, with a cumulated survival rate s and a cumulated computation time t. yRecurrent formula for R i (s, t) can be derived based on changing the survival rate of stage i, as follows.

15 -15- Recurrent formula for R i (s, t) stage 1 stage 1 stage i-1 stage i-1 stage i stage i … … d i : delay of stage i

16 -16- DP-based Approach yBoundary conditions for R i (s, t) : yOptim. recog. rate: We can then back track to find the optimum s 1, s 2, …, s m.

17 -17- Five Stages for Our Study zWe chose 5 stages for DP-based design method: yRange comparison yModified edit distance yLS yDTW with down-sampled inputs yDTW

18 -18- Corpora zQBSH corpusQBSH corpus y2797 8-second recordings (8 KHz, 8 bits) of 48 kids songs, by118 subjects y500 for design set, the others for test zSong database y13320 songs zComparison mode yAnchored beginning

19 -19- RS curves

20 -20- TS Curves

21 -21- Optimum RR wrt Response Time

22 -22- Survival Rates wrt Response Time

23 -23- Conclusions & Future Work zConclusions yAdvantages: xA scalable meta-method xFeasible for optimizing QBSH systems xApplicable (?) to other multimedia retrieval systems yDisadvantages xDerivation of RS and TS curves is time-consuming zFuture work yMore effective/efficient method for each stage


Download ppt "2015/10/221 Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept.,"

Similar presentations


Ads by Google