2015/6/281 MIR: Status and Trends 音樂資訊檢索的現況與未來 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan
-2- Outline zIntro. to music information retrieval (MIR) zOur work on MIR (with demos) yQuery by singing/humming (QBSH) ySinging voice separation zConclusions
-3- Types of MIR Systems zText-based MIR yText input x 歌名、歌手、歌詞、作 詞者、作曲者 xMetadata: 類別、情緒、 口水歌 zContent-based MIR ySymbolic input xMusic score info: 音符、 節拍、和弦等 yAcoustic input xBy example: 原曲輸入 xBy humans: 哼唱、口哨 、敲擊、鼓聲
-4- Span of MIR Research zContent analysis yAudio music xLow-level feature extraction xHigh-level feature representation ySymbolic music xHigh-level feature representation zRetrieval methods yText-based information retrieval yData clustering yPattern recognition yDistance measures
-5- MIR Methods for Audio Music zAudio features yLow-level features xMFCC, spectral flux, rolloff freq, … yHigh-level features xPitch, onset, beat, tempo, chord, key, … xVocal extraction yOthers xCollaborative filtering zRetrieval methods yClustering xK-means, VQ, hierarchical clustering yClassification xSVM, GMM, LSA, HMM, ANN… yDistance measure xDTW, KL, cosine similarity, edit distance yOthers: Learning to rank
-6- MIR Major Events zISMIR/MIREX yInt. Sym. on music information retrieval, since 2000 yMusic Information Retrieval Evaluation eXchange, since 2005 zICMC yInt. Computer Music Conference, since 1974 zICASSP yInt. Conf. on Acoustics, Speech, and Signal Processing, since 1976
-7- ISMIR Growth: YEARLOCATIONITEMSPAGES UNIQUE AUTHORS 2000Plymouth, MA Bloomington, IN Paris, FR Baltimore, MD Barcelona, ES London, UK Victoria, BC Vienna, AT Philadelphia, PA Kobe, JP TOTALS
-8- ISMIR Locations 2000, Plymouth 2001, Bloomington 2002, Paris 2003, Baltimore 2004, Barcelona 2005, London2006, Victoria2007, Vienna2008, Philadelphia2009, Kobe
-9- State-of-the-Art MIR: Tasks at MIREX zAudio music yHigh-level feature identification xAudio onset detection xAudio beat tracking xAudio tempo extraction xAudio key detection xAudio chord estimation xMultiple fundamental frequency estimation & tracking xAudio structural segmentation yClassification xArtist xGenre xMood yRetrieval xAudio cover song identification xAudio tag classification xAudio music similarity and retrieval yAlignment xReal-time audio to score Alignment (a.k.a score following) zSymbolic music ySymbolic melodic similarity ySymbolic music similarity and retrieval zHybrid yQuery by singing/humming yQuery by tapping
-10- MIREX: Number of Task (and Subtask) “Sets” Number of Individuals Number of Countries Number of Runs
-11- Our Work on MIR zQBSH: Query by Singing/Humming ( 哼唱檢 索 ) zSinging voice separation ( 人聲抽取 ) zAudio melody extraction ( 主旋律抽取 )
-12- Introduction to QBSH zQBSH: Query by Singing/Humming yInput: Singing or humming from microphone yOutput: A ranking list retrieved from the song database zOverview yFirst paper: Around1994 yExtensive studies since 2001 yState of the art: QBSH tasks at ISMIR/MIREXQBSH tasks at ISMIR/MIREX
-13- Challenges in QBSH Systems zReliable pitch tracking for acoustic input yInput from mobile devices or noisy karaoke bar zSong database preparation yMIDIs, singing clips, or audio music zEfficient/effective retrieval yKaraoke machine: ~10,000 songs yInternet music search engine: ~500,000,000 songs
-14-
-15- QBSH: Goal and Approach zGoal: To retrieve songs effectively within a given response time, say 5 seconds or so zOur strategy yMulti-stage progressive filtering yIndexing for different comparison methods yRepeating pattern identification
-16- Flowchart of QBSH zTwo steps yPitch trackingPitch tracking yComparison methodsComparison methods
-17- Frame Blocking for Pitch Tracking 256 points/frame 84 points overlap 11025/(256-84)=64 pitch/sec Zoom in Overlap Frame
-18- ACF: Auto-correlation Function Frame s(n): Shifted frame s(n- ): =30 30 acf(30) = inner product of overlap part = dot(abs(s(30:256), s(1:227)) acf( ): Pitch period
-19- Frequency to Semitone Conversion zSemitone : A music scale based on A440 zReasonable pitch range: yE2 - C6 y82 Hz Hz ( - )
-20- Example of Pitch Tracking
-21- Typical Result of Pitch Tracking Pitch tracking via autocorrelation for 茉莉花 (jasmine)
-22- Comparison of Pitch Vectors Yellow line : Target pitch vector
-23- Linear Scaling (LS) zScale the query linearly to match the candidate zA typical example of linear scaling
-24- Linear Scaling (LS) zCharacteristics yOne-shot for dealing with key transposition yEfficient and effective ySome indexing methods yCannot deal with large tempo variations y#1 method for task 2 in QBSH/MIREX 2006 zTypical mapping path
-25- DTW Path of “Match Beginning”
-26- DTW Path of “Match Anywhere”
-27- DTW Path of “Match Anywhere”
-28- QBSH at MIREX 2006 z 比賽方式:由主辦單位來測試每一個參賽團隊之程式碼的 辨識效能。參加隊伍來自全球各地,包含澳洲、德國、法 國、芬蘭、台灣、烏拉圭、荷蘭、中國等。 z 語料: y 人聲哼唱的測試資料包含 2797 首 wav 檔案(長度 8 秒, 8KHz/8Bit ), 118 人所錄製,含 48 首兒歌,可自由下載。 y 歌曲資料庫包含 2048 首單音的 midi 檔案,除前述 48 首兒歌外, 其餘歌曲由主辦單位提供,不公開。 z 評比項目: y 以 2797 wav 檔案為輸入來檢索 2048 midi 檔案:評比標準為 mean reciprocal rank ,我們達到 (第三名,全球共有 13 隊參賽) y 以 2797 wav 檔案為輸入來檢索其他 2797 wav 檔案:評比標準為 mean precision ,我們達到 (第一名,全球共有 10 隊參賽)
-29- QBSH at MIREX 2006 zCorpus: y sec recordings ySong database: 2048 midi files zEvaluations yTask 1: To retrieve the correct song, ranked by mean reciprocal rank yTask 2: To retrieve similar queries, ranked by mean precision
-30- Demos of QBSH zReal-time pitch tracking demo ySAP toolbox ( xgoPtbyAcf.mdl zDemo of QBSH yhttp://mirlab.org/new/mir_products.asp#miraclehttp://mirlab.org/new/mir_products.asp#miracle zMost successful QBSH application yhttp://
-31- Singing Voice Separation zCharacteristics yEasier on karaoke stereo songs yHarder for monaural polyphonic songs yImportant step for a number of MIR applications zDemo clips yhttp://sites.google.com/site/unvoicedsoundseparat ion/ ion/
-32- On-going Research at AIST, Japan zSystems for listening to singing voices yLyricSynchronizer: Automatic sync. of lyrics with polyphonic music recordings ySinger ID: Singer identification yMiruSinger: Singing skill visualization/training yHyperlinking Lyrics: Creating hyperlinks between phrases in song lyrics yBreath Detection: Automatic detection of breath sounds in unaccompanied singing voice
-33- On-going Research at AIST, Japan (II) zSystems for music information retrieval based on singing voices yVocalFinder: Music information retrieval based on singing voice timbre yVoice Drummer: Music notation of drums using vocal percussion input zSystems for singing synthesis ySingBySpeaking: Speech-to-singing synthesis yVocaListener: Singing-to-singing synthesis
-34- The Grand Challenges of MIR zPolyphonic audio music transcription yAnalogy to the problem of image understanding over semitranslucent overlayed images y 困難度如同觀察水波而得知烏龜或青蛙游過
-35- Conclusions zMIR research is on the rise! yMIR research over audio music (which account for 86% of MIREX tasks from 2005~2008) xHigh-level feature identification xApplications to genre/mood/tag classification/retrieval zPreexisting approaches shed lights on MIR. ySpeech recognition/synthesis yText information retrieval yMusic theory
-36- References zJ. S. Downie, D. Bryd, T. Crawford, “Ten Years of ISMIR: Reflections on Challenges and Opportunities”, Keynote talk, Kobe, ISMIR zM. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “Content-Based Music Information Retrieval: Current Directions and Future Challenges”, Proceedings of IEEE, Vol. 96, No. 4, April zJ.-S. R. Jang and H.-R. Lee, "A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming", IEEE Transactions on Audio, Speech, and Language Processing, No. 2, Vol. 16, PP , Feb zZ.-S. Chen, and J.-S. R. Jang, "On the Use of Anti-word Models for Audio Music Annotation and Retrieval", IEEE Transactions on Audio, Speech, and Language Processing, zC.-L. Hsu and J.-S. R. Jang, "On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset", IEEE Transactions on Audio, Speech, and Language Processing, zMasataka Goto, Takeshi Saitou, Tomoyasu Nakano, and Hiromasa Fujihara, “Singing Information Processing Based on Singing Voice Modeling”, PP , ICASSP 2010.