2015/10/221 Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept.,

Slides:

Advertisements

Similar presentations

Feature Selection for Pattern Recognition J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan University ( 台灣大學資訊工程系 )

Advertisements

Dynamic Time Warping (DTW)

CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.

Speaker Associate Professor Ning-Han Liu. What’s MIR  Music information retrieval (MIR) is the interdisciplinary science of retrieving information from.

Source separation and analysis of piano music signals using instrument-specific sinusoidal model Wai Man SZETO and Kin Hong WONG

Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai

1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.

VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.

Onset Detection in Audio Music J.-S Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept. National Taiwan University.

Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval.

Basic Features of Audio Signals ( 音訊的基本特徵 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan.

The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.

Performance Evaluation: Estimation of Recognition rates J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ.

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh

PCA & LDA for Face Recognition

NM7613: Music Signal Analysis and Retrieval 音樂訊號分析與檢索 Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.

2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ.

Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan.

Speech Assessment 語音評測 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept, Tsing.

CSIE Dept., National Taiwan Univ., Taiwan

World Languages Mandarin English Challenges in Mandarin Speech Recognition  Highly developed language model is required due to highly contextual nature.

Prediction-based Object Tracking and Coverage in Visual Sensor Networks Tzung-Shi Chen Jiun-Jie Peng,De-Wei Lee Hua-Wen Tsai Dept. of Com. Sci. and Info.

National Taiwan University

2015/10/241 Query by Tapping 敲擊選歌 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

Demos for QBSH J.-S. Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.

2015/10/251 Two Paradigms for Music IR: Query by Singing/Humming and Audio Fingerprinting J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab.

Content-based Music Retrieval from Acoustic Input (CBMR)

Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.

2016/6/41 Recent Improvement Over QBSH and AFP J.-S. Roger Jang （張智星） Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan Univ.

Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.

RuSSIR 2013 QBSH and AFP as Two Successful Paradigms of Music Information Retrieval Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept.

Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Quadratic Classifiers (QC) J.-S. Roger Jang ( 張智星 ) CS Dept., National Taiwan Univ Scientific Computing.

Music Information Retrieval: Overview and Challenges

Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.

QBSH Corpus The QBSH corpus provided by Roger Jang [1] consists of recordings of children’s songs from students taking the course “Audio Signal Processing.

Audio Fingerprinting as a New Task for MIREX-2014 Chung-Che Wang Jyh-Shing Roger Jang.

Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.

Query by Singing and Humming System

Some Research Activities in MIR Lab J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS.

DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學.

1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.

Distance/Similarity Functions for Pattern Recognition J.-S. Roger Jang ( 張智星 ) CS Dept., Tsing Hua Univ., Taiwan

Discussions on Audio Melody Extraction (AME) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.

R ESEARCH P ROGRESS R EPORT – C OVER S ONGS I DENTIFICATION Ken.

Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者：郝柏翰 2013/05/23.

Simulation of Stock Trading J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.

Linear Classifiers (LC) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.

HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH GEORGE P. KAFENTZIS, YANNIS STYLIANOU MULTIMEDIA INFORMATICS LABORATORY DEPARTMENT OF COMPUTER SCIENCE.

Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.

Introduction to Music Information Retrieval (MIR)

A NONPARAMETRIC BAYESIAN APPROACH FOR

Introduction to ISMIR/MIREX

Onset Detection, Tempo Estimation, and Beat Tracking

Search in Google's N-grams

MIR Lab: R&D Foci and Demos （ MIR實驗室：研發重點及展示）

DP for Optimum Strategies in Games

Query by Singing/Humming via Dynamic Programming

Introduction to Pattern Recognition

Singing Voice Separation via Active Noise Cancellation 使用主動式雜訊消除於歌聲分離

A review of audio fingerprinting (Cano et al. 2005)

自我介紹學歷：研究方向：經歷： 1984：學士，台大電機系 1992：博士，加州大學柏克萊分校、電機電腦系

Intro. to Audio Signals Jyh-Shing Roger Jang (張智星)

Introduction to Music Information Retrieval (MIR)

Feature Selection for Pattern Recognition

Introduction to Music Information Retrieval (MIR)

Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)

Query by Singing/Humming via Dynamic Programming

Game Trees and Minimax Algorithm

Music Signal Processing

Presentation transcript:

2015/10/221 Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-2- Recent Publications zJournals yJiang-Chun Chen, J.-S. Roger Jang, "TRUES: Tone Recognition Using Extended Segments", ACM Transactions on Asian Language Information Processing, yJ.-S. Roger Jang and Hong-Ru Lee, "A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming", IEEE Transactions on Audio, Speech, and Language Processing, No. 2, Vol. 16, PP , Feb zConferences yLiang-Yu Chen, Chun-Jen Lee, Jyh-Shing Roger Jang, "Minimum Phone Error Discriminative Training For Mandarin Chinese Speaker Adaptation", Proceedings of INTERSPEECH 2008, Brisbane, Australia, Sept yChao-Ling Hsu, Jyh-Shing Roger Jang, and Te-Lu Tsai, "Separation of Singing Voice from Music Accompaniment with Unvoiced Sounds Reconstruction for Monaural Recordings", Proceedings of 125th AES Convention, San Francisco, USA, Oct yZhi-Sheng Chen, Jia-Min Zen, Jyh-Shing Roger Jang, "Music Annotation and Retrieval System Using Anti-Models", Proceedings of 125th AES Convention, San Francisco, USA, Oct

-3- Outline zProblem definition of QBSH zMethods for QBSH zProgressive Filtering zConclusions

-4- Introduction to QBSH zQBSH: Query by Singing/Humming yInput: Singing or humming from microphone yOutput: A ranking list retrieved from the song database zOverview yFirst paper: Around1994 yExtensive studies since 2001 yState of the art: QBSH tasks at ISMIR/MIREXQBSH tasks at ISMIR/MIREX

-5- Challenges in QBSH Systems zReliable pitch tracking for acoustic input yInput from mobile devices yInput at noisy karaoke box zSong database preparation yAudio music vs. MIDIs zEfficient/effective retrieval yKaraoke machine: ~10,000 songs yInternet music search engine: ~500,000,000 songs

-6-

-7- Goal and Approach zGoal: To retrieve songs effectively within a given response time, say 5 seconds or so zOur strategy yMulti-stage progressive filtering yData-driven design methodology based on DP

-8- Approaches to QBSH zPitch TrackingPitch Tracking zMethods for QBSHMethods for QBSH

-9- A Quick Demo of QBSH zDemo page of MIR lab: yhttp://mirlab.org/mir_main/demo.htmhttp://mirlab.org/mir_main/demo.htm zDemo of QBSH yhttp://mirlab.org/Demo/MusicSearch/index.htmhttp://mirlab.org/Demo/MusicSearch/index.htm

-10- Progressive Filtering zMulti-stage representation yEach stage is a method for QBSH stage 1 stage 1 stage 2 stage 2 stage i stage i … … s i : survival rate for stage i d i : delay for stage i n i-1 : no. of input songs to stage i

-11- Stage Characteristics for Effectiveness z RS curve for stage i: recog. rate = r i (s) Survival rates s (%) Recog. rates (%) More effective method Less effective method Random guess Top-10% recog. rate is 65% (0, 0) (100, 100) Survival rate Recog. rate

-12- z TS curve for stage i: average time = t i (s) Stage Characteristics for Efficiency Survival rates (%) Average time (ms) Less efficient method More efficient method When s=10%, the average one-to-one comparison time is 5ms Survival rate Time (0, 0) (100, 0)

-13- Formulation as an Optim. Problem zMax: subject to the constraints n (= n 0 ): Size of the song database T max : maximum allowable response time, say, 5 sec. 10 : the size of the retrieved ranking list.

-14- DP-based Approach zThe orig. optim. task can be cast into DP: yOptimum-value function R i (s, t) is the optimum recog. rate at stage i, with a cumulated survival rate s and a cumulated computation time t. yRecurrent formula for R i (s, t) can be derived based on changing the survival rate of stage i, as follows.

-15- Recurrent formula for R i (s, t) stage 1 stage 1 stage i-1 stage i-1 stage i stage i … … d i : delay of stage i

-16- DP-based Approach yBoundary conditions for R i (s, t) : yOptim. recog. rate: We can then back track to find the optimum s 1, s 2, …, s m.

-17- Five Stages for Our Study zWe chose 5 stages for DP-based design method: yRange comparison yModified edit distance yLS yDTW with down-sampled inputs yDTW

-18- Corpora zQBSH corpusQBSH corpus y second recordings (8 KHz, 8 bits) of 48 kids songs, by118 subjects y500 for design set, the others for test zSong database y13320 songs zComparison mode yAnchored beginning

-19- RS curves

-20- TS Curves

-21- Optimum RR wrt Response Time

-22- Survival Rates wrt Response Time

-23- Conclusions & Future Work zConclusions yAdvantages: xA scalable meta-method xFeasible for optimizing QBSH systems xApplicable (?) to other multimedia retrieval systems yDisadvantages xDerivation of RS and TS curves is time-consuming zFuture work yMore effective/efficient method for each stage