2015/6/281 MIR: Status and Trends 音樂資訊檢索的現況與未來 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

Slides:



Advertisements
Similar presentations
Pitch Tracking (音高追蹤) Jyh-Shing Roger Jang (張智星) MIR Lab (多媒體資訊檢索實驗室)
Advertisements

Onset Detection in Audio Music J.-S Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept. National Taiwan University.
Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval.
HSR 課程介紹. 指定用書 Health Services Research Method Leiyu Shi 2008.
Student Library Workshop Higher Diploma (Early Childhood Education)
Event Sampling 事件取樣法. 關心重點為「事件」本身明確的焦點 行為 清楚掌握主題 - 當「事件」出現時才開 始記錄 記錄程序 等待目標事件的發生 開始記錄 事件結束,停止記錄.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 參 實驗法.
亞歷山大文學資料庫 Alexander Street Literature User Guide 中文檢索 登入網址
: OPENING DOORS ? 題組: Problem Set Archive with Online Judge 題號: 10606: OPENING DOORS 解題者:侯沛彣 解題日期: 2006 年 6 月 11 日 題意: - 某間學校有 N 個學生,每個學生都有自己的衣物櫃.
: ShellSort ★★☆☆☆ 題組: Problem D 題號: 10152: ShellSort 解題者:林一帆 解題日期: 2006 年 4 月 10 日 題意:烏龜王國的烏龜總是一隻一隻疊在一起。唯一改變烏龜位置 的方法為:一隻烏龜爬出他原來的位置,然後往上爬到最上方。給 你一堆烏龜原來排列的順序,以及我們想要的烏龜的排列順序,你.
資料庫名稱 中國期刊全文資料庫 (China Journal Full-text Database)
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 肆 資料分析與表達.
JAVA 程式設計與資料結構 第十四章 Linked List. Introduction Linked List 的結構就是將物件排成一列, 有點像是 Array ,但是我們卻無法直接經 由 index 得到其中的物件 在 Linked List 中,每一個點我們稱之為 node ,第一個 node.
高效率太陽能車 指導教授 : 蔡志成, 王國禎 組員 : 張友倫 ( ) 溫承豫 ( ) 溫承豫 ( ) 李志健 ( ) 李志健 ( ) 第十三週 (2005/5/18)
1 單元三 查詢結果的引用分析 Web of Science 利用指引 查看出版及被引用情況 在查詢結果的清單中,可以瀏覽近 20 年來查詢主題出版和被引用的情況。
From: BOOKS ONLINE 1 Safari Tech Books Online Safari Business Books Online 電子書資料庫.
JAVA 程式設計與資料結構 第十章 GUI Introdution III. File Chooser  File Chooser 是一個選擇檔案的圖形介面, 無論我們是要存檔還是要開啟檔案,使 用這個物件都會讓我們覺得容易且舒適。
MC 音樂分享系統 Music Come on 簡介:我們所要做的是一個音樂分享系統MC music come on , MC 系統就能提供所有的 音樂愛好者,分享她們喜歡的歌曲會員也能對 其進行回應及評比。也設有排行榜,可以瀏覽 最近有哪一些上傳的新歌以及最熱門的音樂。 除此之外,每一首好歌的背後總有一篇動人的.
Modern Information Retrieval 第三組 陳國富 王俊傑 夏希璿.
1 第十四章 職業道德 職業道德是一個人在行業工作內表現的道德 情操. 2 職業道德貴在實踐 3 學習目標  了解職業道德的意義  了解職業道得的重要性  遵守職業道德規範.
: The largest Clique ★★★★☆ 題組: Contest Archive with Online Judge 題號: 11324: The largest Clique 解題者:李重儀 解題日期: 2008 年 11 月 24 日 題意: 簡單來說,給你一個 directed.
第三部分:研究設計 ( 二): 研究工具的信效度 與研究效度 (第九章之第 306 頁 -308 頁;第四章)
Matlab Assignment Due Assignment 兩個 matlab 程式 : Eigenface : Eigenvector 和 eigenvalue 的應用. Fractal : Affine transform( rotation, translation,
1 Netlibrary 電子書 Netlibrary 創始於 1998 年,是世界知名的電子書資 料庫,提供 450 多家出版社所出版近 100,962 ( 止)本的電子書,且以每月 2,000 本的 速度增加中。其中 80% 屬於學術性圖書,其餘 20% 一般圖書, 90% 以上為.
從此處輸入帳號密碼登入到管理頁面. 點選進到檔案管理 點選「上傳檔案」上傳資料 點選瀏覽選擇電腦裡的檔案 可選擇公開或不公開 為平台上的資料夾 此處為檔案分類,可顯示在展示頁面上,若要參加 MY EG 競賽,做品一律上傳到 “ 98 MY EG Contest ” 點選此處確定上傳檔案.
6-2 認識元件庫與內建元件庫 Flash 的元件庫分兩種, 一種是每個動畫專 屬的元件庫 (Library) ;另一種則是內建元 件庫 (Common Libraries), 兩者皆可透過 『視窗』功能表來開啟, 以下即為您說明。
線上MP3音樂非法下載偵測之 方法與系統 劉志俊 中華大學資訊工程系 April 2006.
Management Abstracts Retrieval System; MARS 檢索操作.
國科會 「九十四年度數位典藏國家型科技計畫」 應用服務分項 創意加值計畫 期中報告 利用台灣現有視障用數位典藏資料製作盲人電子書報告人:唐傳義 清華大學 資訊工程學系.
Learning Method in Multilingual Speech Recognition Author : Hui Lin, Li Deng, Jasha Droppo Professor: 陳嘉平 Reporter: 許峰閤.
Chapter 10 m-way 搜尋樹與B-Tree
概念性產品企劃書 呂學儒 李政翰.
1/17 A Study on Separation between Acoustic Models and Its Application Author : Yu Tsao, Jinyu Li, Chin-Hui Lee Professor : 陳嘉平 Reporter : 許峰閤.
INFORMATION RETRIEVAL AND EXTRACTION 作業: Program 1 第十四組 組員:林永峰、洪承雄、謝宗憲.
電子書 ( Netlibrary ) 檢索說明 龍華科技大學圖書館. 檢索類型 檢索欄位與限制 在檢索中使用布林邏輯運算元 檢索結果 特殊檢索.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 參 資料蒐集的方法.
著作權所有 © 旗標出版股份有限公司 第 14 章 製作信封、標籤. 本章提要 製作單一信封 製作單一郵寄標籤.
Chapter 12 Estimation 統計估計. Inferential statistics Parametric statistics 母數統計 ( 母體為常態或 大樣本 ) 假設檢定 hypothesis testing  對有關母體參數的假設,利用樣本資料,決定接受或 不接受該假設的方法.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Introduction to MIR Course Overview 1.
2015/9/111 Introduction to ISMIR/MIREX J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan Univ.
Speech Assessment 語音評測 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept, Tsing.
2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan.
National Taiwan University
2015/10/221 Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept.,
2015/10/241 Query by Tapping 敲擊選歌 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan
Demos for QBSH J.-S. Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
2015/10/251 Two Paradigms for Music IR: Query by Singing/Humming and Audio Fingerprinting J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab.
Content-based Music Retrieval from Acoustic Input (CBMR)
2016/6/41 Recent Improvement Over QBSH and AFP J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan Univ.
RuSSIR 2013 QBSH and AFP as Two Successful Paradigms of Music Information Retrieval Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept.
Music Information Retrieval: Overview and Challenges
QBSH Corpus The QBSH corpus provided by Roger Jang [1] consists of recordings of children’s songs from students taking the course “Audio Signal Processing.
Query by Singing and Humming System
Some Research Activities in MIR Lab J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS.
Beat Tracking (節拍追蹤) 張智星 (Roger Jang)
Distance/Similarity Functions for Pattern Recognition J.-S. Roger Jang ( 張智星 ) CS Dept., Tsing Hua Univ., Taiwan
Discussions on Audio Melody Extraction (AME) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Pitch Tracking in Time Domain Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University
Introduction to Music Information Retrieval (MIR)
Introduction to ISMIR/MIREX
Onset Detection, Tempo Estimation, and Beat Tracking
MIR Lab: R&D Foci and Demos ( MIR實驗室:研發重點及展示)
Query by Singing/Humming via Dynamic Programming
Singing Voice Separation via Active Noise Cancellation 使用主動式雜訊消除於歌聲分離
自我介紹 學歷: 研究方向: 經歷: 1984:學士,台大電機系 1992:博士,加州大學柏克萊分校、電機電腦系
ML for FinTech: Some Examples
Introduction to Music Information Retrieval (MIR)
Introduction to Music Information Retrieval (MIR)
Query by Singing/Humming via Dynamic Programming
Music Signal Processing
Presentation transcript:

2015/6/281 MIR: Status and Trends 音樂資訊檢索的現況與未來 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-2- Outline zIntro. to music information retrieval (MIR) zOur work on MIR (with demos) yQuery by singing/humming (QBSH) ySinging voice separation zConclusions

-3- Types of MIR Systems zText-based MIR yText input x 歌名、歌手、歌詞、作 詞者、作曲者 xMetadata: 類別、情緒、 口水歌 zContent-based MIR ySymbolic input xMusic score info: 音符、 節拍、和弦等 yAcoustic input xBy example: 原曲輸入 xBy humans: 哼唱、口哨 、敲擊、鼓聲

-4- Span of MIR Research zContent analysis yAudio music xLow-level feature extraction xHigh-level feature representation ySymbolic music xHigh-level feature representation zRetrieval methods yText-based information retrieval yData clustering yPattern recognition yDistance measures

-5- MIR Methods for Audio Music zAudio features yLow-level features xMFCC, spectral flux, rolloff freq, … yHigh-level features xPitch, onset, beat, tempo, chord, key, … xVocal extraction yOthers xCollaborative filtering zRetrieval methods yClustering xK-means, VQ, hierarchical clustering yClassification xSVM, GMM, LSA, HMM, ANN… yDistance measure xDTW, KL, cosine similarity, edit distance yOthers: Learning to rank

-6- MIR Major Events zISMIR/MIREX yInt. Sym. on music information retrieval, since 2000 yMusic Information Retrieval Evaluation eXchange, since 2005 zICMC yInt. Computer Music Conference, since 1974 zICASSP yInt. Conf. on Acoustics, Speech, and Signal Processing, since 1976

-7- ISMIR Growth: YEARLOCATIONITEMSPAGES UNIQUE AUTHORS 2000Plymouth, MA Bloomington, IN Paris, FR Baltimore, MD Barcelona, ES London, UK Victoria, BC Vienna, AT Philadelphia, PA Kobe, JP TOTALS

-8- ISMIR Locations 2000, Plymouth 2001, Bloomington 2002, Paris 2003, Baltimore 2004, Barcelona 2005, London2006, Victoria2007, Vienna2008, Philadelphia2009, Kobe

-9- State-of-the-Art MIR: Tasks at MIREX zAudio music yHigh-level feature identification xAudio onset detection xAudio beat tracking xAudio tempo extraction xAudio key detection xAudio chord estimation xMultiple fundamental frequency estimation & tracking xAudio structural segmentation yClassification xArtist xGenre xMood yRetrieval xAudio cover song identification xAudio tag classification xAudio music similarity and retrieval yAlignment xReal-time audio to score Alignment (a.k.a score following) zSymbolic music ySymbolic melodic similarity ySymbolic music similarity and retrieval zHybrid yQuery by singing/humming yQuery by tapping

-10- MIREX: Number of Task (and Subtask) “Sets” Number of Individuals Number of Countries Number of Runs

-11- Our Work on MIR zQBSH: Query by Singing/Humming ( 哼唱檢 索 ) zSinging voice separation ( 人聲抽取 ) zAudio melody extraction ( 主旋律抽取 )

-12- Introduction to QBSH zQBSH: Query by Singing/Humming yInput: Singing or humming from microphone yOutput: A ranking list retrieved from the song database zOverview yFirst paper: Around1994 yExtensive studies since 2001 yState of the art: QBSH tasks at ISMIR/MIREXQBSH tasks at ISMIR/MIREX

-13- Challenges in QBSH Systems zReliable pitch tracking for acoustic input yInput from mobile devices or noisy karaoke bar zSong database preparation yMIDIs, singing clips, or audio music zEfficient/effective retrieval yKaraoke machine: ~10,000 songs yInternet music search engine: ~500,000,000 songs

-14-

-15- QBSH: Goal and Approach zGoal: To retrieve songs effectively within a given response time, say 5 seconds or so zOur strategy yMulti-stage progressive filtering yIndexing for different comparison methods yRepeating pattern identification

-16- Flowchart of QBSH zTwo steps yPitch trackingPitch tracking yComparison methodsComparison methods

-17- Frame Blocking for Pitch Tracking 256 points/frame 84 points overlap 11025/(256-84)=64 pitch/sec Zoom in Overlap Frame

-18- ACF: Auto-correlation Function Frame s(n): Shifted frame s(n-  ):  =30 30 acf(30) = inner product of overlap part = dot(abs(s(30:256), s(1:227)) acf(  ):  Pitch period

-19- Frequency to Semitone Conversion zSemitone : A music scale based on A440 zReasonable pitch range: yE2 - C6 y82 Hz Hz ( - )

-20- Example of Pitch Tracking

-21- Typical Result of Pitch Tracking Pitch tracking via autocorrelation for 茉莉花 (jasmine)

-22- Comparison of Pitch Vectors Yellow line : Target pitch vector

-23- Linear Scaling (LS) zScale the query linearly to match the candidate zA typical example of linear scaling

-24- Linear Scaling (LS) zCharacteristics yOne-shot for dealing with key transposition yEfficient and effective ySome indexing methods yCannot deal with large tempo variations y#1 method for task 2 in QBSH/MIREX 2006 zTypical mapping path

-25- DTW Path of “Match Beginning”

-26- DTW Path of “Match Anywhere”

-27- DTW Path of “Match Anywhere”

-28- QBSH at MIREX 2006 z 比賽方式:由主辦單位來測試每一個參賽團隊之程式碼的 辨識效能。參加隊伍來自全球各地,包含澳洲、德國、法 國、芬蘭、台灣、烏拉圭、荷蘭、中國等。 z 語料: y 人聲哼唱的測試資料包含 2797 首 wav 檔案(長度 8 秒, 8KHz/8Bit ), 118 人所錄製,含 48 首兒歌,可自由下載。 y 歌曲資料庫包含 2048 首單音的 midi 檔案,除前述 48 首兒歌外, 其餘歌曲由主辦單位提供,不公開。 z 評比項目: y 以 2797 wav 檔案為輸入來檢索 2048 midi 檔案:評比標準為 mean reciprocal rank ,我們達到 (第三名,全球共有 13 隊參賽) y 以 2797 wav 檔案為輸入來檢索其他 2797 wav 檔案:評比標準為 mean precision ,我們達到 (第一名,全球共有 10 隊參賽)

-29- QBSH at MIREX 2006 zCorpus: y sec recordings ySong database: 2048 midi files zEvaluations yTask 1: To retrieve the correct song, ranked by mean reciprocal rank yTask 2: To retrieve similar queries, ranked by mean precision

-30- Demos of QBSH zReal-time pitch tracking demo ySAP toolbox ( xgoPtbyAcf.mdl zDemo of QBSH yhttp://mirlab.org/new/mir_products.asp#miraclehttp://mirlab.org/new/mir_products.asp#miracle zMost successful QBSH application yhttp://

-31- Singing Voice Separation zCharacteristics yEasier on karaoke stereo songs yHarder for monaural polyphonic songs yImportant step for a number of MIR applications zDemo clips yhttp://sites.google.com/site/unvoicedsoundseparat ion/ ion/

-32- On-going Research at AIST, Japan zSystems for listening to singing voices yLyricSynchronizer: Automatic sync. of lyrics with polyphonic music recordings ySinger ID: Singer identification yMiruSinger: Singing skill visualization/training yHyperlinking Lyrics: Creating hyperlinks between phrases in song lyrics yBreath Detection: Automatic detection of breath sounds in unaccompanied singing voice

-33- On-going Research at AIST, Japan (II) zSystems for music information retrieval based on singing voices yVocalFinder: Music information retrieval based on singing voice timbre yVoice Drummer: Music notation of drums using vocal percussion input zSystems for singing synthesis ySingBySpeaking: Speech-to-singing synthesis yVocaListener: Singing-to-singing synthesis

-34- The Grand Challenges of MIR zPolyphonic audio music transcription yAnalogy to the problem of image understanding over semitranslucent overlayed images y 困難度如同觀察水波而得知烏龜或青蛙游過

-35- Conclusions zMIR research is on the rise! yMIR research over audio music (which account for 86% of MIREX tasks from 2005~2008) xHigh-level feature identification xApplications to genre/mood/tag classification/retrieval zPreexisting approaches shed lights on MIR. ySpeech recognition/synthesis yText information retrieval yMusic theory

-36- References zJ. S. Downie, D. Bryd, T. Crawford, “Ten Years of ISMIR: Reflections on Challenges and Opportunities”, Keynote talk, Kobe, ISMIR zM. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “Content-Based Music Information Retrieval: Current Directions and Future Challenges”, Proceedings of IEEE, Vol. 96, No. 4, April zJ.-S. R. Jang and H.-R. Lee, "A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming", IEEE Transactions on Audio, Speech, and Language Processing, No. 2, Vol. 16, PP , Feb zZ.-S. Chen, and J.-S. R. Jang, "On the Use of Anti-word Models for Audio Music Annotation and Retrieval", IEEE Transactions on Audio, Speech, and Language Processing, zC.-L. Hsu and J.-S. R. Jang, "On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset", IEEE Transactions on Audio, Speech, and Language Processing, zMasataka Goto, Takeshi Saitou, Tomoyasu Nakano, and Hiromasa Fujihara, “Singing Information Processing Based on Singing Voice Modeling”, PP , ICASSP 2010.