2015/10/251 Two Paradigms for Music IR: Query by Singing/Humming and Audio Fingerprinting J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab.

Slides:



Advertisements
Similar presentations
Dynamic Time Warping (DTW)
Advertisements

A Musical Data Mining Primer CS235 – Spring ’03 Dan Berger
National Taiwan University
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Pitch Tracking (音高追蹤) Jyh-Shing Roger Jang (張智星) MIR Lab (多媒體資訊檢索實驗室)
Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification.
Onset Detection in Audio Music J.-S Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept. National Taiwan University.
Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval.
FINGER PRINTING BASED AUDIO RETRIEVAL Query by example Content retrieval Srinija Vallabhaneni.
1 Oct 30, 2006 LogicSQL-based Enterprise Archive and Search System How to organize the information and make it accessible and useful ? Li-Yan Yuan.
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
T.Sharon 1 Internet Resources Discovery (IRD) Music IR.
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Introduction to MIR Course Overview 1.
Information Retrieval in Practice
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval.
2015/9/151 Two Paradigms for Music IR: Query by Singing/Humming and Audio Fingerprinting J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab.
2015/10/101 Query-by-Singing/Humming: An Overview 「哼唱選歌」綜述 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan.
Music Information Retrieval -or- how to search for (and maybe find) music and do away with incipits Michael Fingerhut Multimedia Library and Engineering.
National Taiwan University
Fundamentals of Music Processing
Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,
2015/10/221 Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept.,
Demos for QBSH J.-S. Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Content-based Music Retrieval from Acoustic Input (CBMR)
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
Fundamentals of Music Processing Chapter 7: Content-Based Audio Retrieval Meinard Müller International Audio Laboratories Erlangen
2016/6/41 Recent Improvement Over QBSH and AFP J.-S. Roger Jang (張智星) Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan Univ.
Shazam -Abdulshafil Ahmed -Steven Lewis -Rick Huang.
RuSSIR 2013 QBSH and AFP as Two Successful Paradigms of Music Information Retrieval Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
1 Applications of video-content analysis and retrieval IEEE Multimedia Magazine 2002 JUL-SEP Reporter: 林浩棟.
Music Information Retrieval: Overview and Challenges
MMDB-8 J. Teuhola Audio databases About digital audio: Advent of digital audio CD in Order of magnitude improvement in overall sound quality.
QBSH Corpus The QBSH corpus provided by Roger Jang [1] consists of recordings of children’s songs from students taking the course “Audio Signal Processing.
Audio Fingerprinting as a New Task for MIREX-2014 Chung-Che Wang Jyh-Shing Roger Jang.
SAPIR Search in Audio-Visual Content using P2P Information Retrival For more information visit: Support.
Content-Based MP3 Information Retrieval Chueh-Chih Liu Department of Accounting Information Systems Chihlee Institute of Technology 2005/06/16.
Query by Singing and Humming System
DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學.
Query by Image and Video Content: The QBIC System M. Flickner et al. IEEE Computer Special Issue on Content-Based Retrieval Vol. 28, No. 9, September 1995.
1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.
Distance/Similarity Functions for Pattern Recognition J.-S. Roger Jang ( 張智星 ) CS Dept., Tsing Hua Univ., Taiwan
Discussions on Audio Melody Extraction (AME) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Pitch Tracking in Time Domain Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University
Audio Fingerprinting Wes Hatch MUMT-614 Mar.13, 2003.
Information Retrieval in Practice
Introduction to Music Information Retrieval (MIR)
Introduction to ISMIR/MIREX
Onset Detection, Tempo Estimation, and Beat Tracking
Search in Google's N-grams
MIR Lab: R&D Foci and Demos ( MIR實驗室:研發重點及展示)
Query by Singing/Humming via Dynamic Programming
Singing Voice Separation via Active Noise Cancellation 使用主動式雜訊消除於歌聲分離
A review of audio fingerprinting (Cano et al. 2005)
自我介紹 學歷: 研究方向: 經歷: 1984:學士,台大電機系 1992:博士,加州大學柏克萊分校、電機電腦系
Introduction to Music Information Retrieval (MIR)
Introduction to Music Information Retrieval (MIR)
Multimedia Information Retrieval
Query by Singing/Humming via Dynamic Programming
ADBOT Advertisement Recognition FROM television and radio broadcast
Pre and Post-Processing for Pitch Tracking
Presentation transcript:

2015/10/251 Two Paradigms for Music IR: Query by Singing/Humming and Audio Fingerprinting J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan

-2- Outline zIntroduction to MIR zQBSH (query by singing/humming) yIntro, demos, conclusions zAFP (audio fingerprinting) yIntro, demos, conclusions

-3- Content-based Music Information Retrieval (MIR) via Acoustic Inputs zMelody yQuery by humming (usually “ta” or “da”) yQuery by singing yQuery by whistling zNote onsets yQuery by tapping (at the onsets of notes) zMetadata yQuery by speech (for meta-data, such as title, artist, lyrics) zAudio contents yQuery by examples (noisy versions of original clips) zDrums yQuery by beatboxing

-4- Introduction to QBSH zQBSH: Query by Singing/Humming yInput: Singing or humming from microphone yOutput: A ranking list retrieved from the song database zProgression yFirst paper: Around 1994 yExtensive studies since 2001 yState of the art: QBSH tasks at ISMIR/MIREXQBSH tasks at ISMIR/MIREX

-5- Two Stages in QBSH zOffline stage yDatabase preparation xFrom MIDI files xFrom audio music (eg., MP3) xFrom human vocals yIndexing (if necessary) zOnline stage yPerform pitch tracking on the user’s query yCompare the query pitch with songs in database yReturn the ranked list according to similarity

-6- Frame Blocking for Pitch Tracking Frame size=256 points Overlap=84 points Frame rate=11025/(256-84)=64 pitch/sec Zoom in Overlap Frame

-7- ACF: Auto-correlation Function Frame s(i): Shifted frame s(i+  ):  =30 30 acf(30) = inner product of overlap part  Pitch period

-8- Frequency to Semitone Conversion zSemitone : A music scale based on A440 zReasonable pitch range: yE2 - C6 y82 Hz Hz ( - )

-9- Typical Result of Pitch Tracking Pitch tracking via autocorrelation for 茉莉花 (jasmine)

-10- Comparison of Pitch Vectors Yellow line : Target pitch vector

-11- Comparison Methods of QBSH zCategories of approaches to QBSH yHistogram/statistics-based yNote vs. note xEdit distance yFrame vs. note xHMM yFrame vs. frame xLinear scaling, DTW, recursive alignment

-12- Linear Scaling zScale the query pitch linearly to match the candidates Original input pitch Stretched by 1.25 Stretched by 1.5 Compressed by 0.75 Compressed by 0.5 Target pitch in database Best match Original pitch

-13- Challenges in QBSH Systems zSong database preparation yMIDIs, singing clips, or audio music zReliable pitch tracking for acoustic input yInput from mobile devices or noisy karaoke bar zEfficient/effective retrieval yKaraoke machine: ~10,000 songs yInternet music search engine: ~500,000,000 songs

-14- Goal and Approach zGoal: To retrieve songs effectively within a given response time, say 5 seconds or so zOur strategy yMulti-stage progressive filtering yIndexing for different comparison methods yRepeating pattern identification

-15- MIRACLE zMIRACLE yMusic Information Retrieval Acoustically via Clustered and paralleL Engines zDatabase (~13000) yMIDI files ySolo vocals (<100) yMelody extracted from polyphonic music (<100) zComparison methods yLinear scaling yDynamic time warping zTop-10 Accuracy y70~75% zPlatform ySingle CPU+GPU

-16- Current MIRACLE zSingle server with GPU yNVIDIA 560 Ti, 384 cores (speedup factor = 10) Master server Clients Single server PC PDA/Smartphone Cellular Master server Request: pitch vector Response: search result Database size: ~13,000

-17- QBSH for Various Platforms zPC yWeb version zEmbedded systems yKaraoke machines zSmartphones yiPhone/iPad yAndroid phone zToysToys

-18- QBSH Demo zDemo page of MIR Lab: yhttp://mirlab.org/mir_products.asphttp://mirlab.org/mir_products.asp zMIRACLE demo: yhttp://mirlab.org/demo/miraclehttp://mirlab.org/demo/miracle zExisting commercial QBSH systems ywww.midomi.comwww.midomi.com ywww.soundhound.comwww.soundhound.com

-19- Conclusions for QBSH zQBSH yFun and interesting way to retrieve music yCan be extend to singing scoring yCommercial applications getting mature zChallenges yHow to deal with massive music databases? yHow to extract melody from audio music?

-20- Audio Fingerprinting (AFP) zGoal yIdentify a noisy version of a given audio clips (query by example, not by “cover versions”) zTechnical barrier yRobustness yEfficiency (6M tags/day for Shazam) yEffectiveness (15M tracks for Shazam) zApplications ySong purchase yRoyalty assignment (over radio) yConfirmation of commercials (over TV) yCopyright violation (over web) yTV program ID

-21- Two Stages in AFP zOffline yRobust feature extraction (audio fingerprinting) yHash table construction yInverted indexing zOnline yRobust feature extraction yHash table search yRanked list of the retrieved songs/music

-22- Representative Approaches to AFP zPhilips yJ. Haitsma and T. Kalker, “A highly robust audio fingerprinting system”, ISMIR zShazam yA.Wang, “An industrial- strength audio search algorithm”, ISMIR 2003 zGoogle yS. Baluja and M. Covell, “Content fingerprinting using wavelets”, Euro. Conf. on Visual Media Production, yV. Chandrasekhar, M. Sharifi, and D. A. Ross, “Survey and evaluation of audio fingerprinting schemes for mobile query-by-example applications”, ISMIR 2011

-23- Shazam: Landmarks as Features (Source: Dan Ellis) Spectrogram Local peaks of spectrogram Pair peaks to form landmarks Landmark: [t1, f1, t2, f2] 20-bit hash key: f1: 8 bits Δf = f2-f1: 6 bits Δt = t2-t1: 6 bits Hash value: Song ID & offset time

-24- Shazam: Landmark as Features (II) zPeak picking after smoothing zMatched landmarks (green) (Source: Dan Ellis)

-25- Shazam: Time-justified Landmarks zValid landmarks based on offset time (which maintains robustness even with hash collision)

-26- Our AFP Engine zDatabase (~2500) y2500 tracks currently y50k tracks soon y1M tracks in the future zDriving forces yFundamental issues in computer science (hashing, indexing…) yRequests from local companies zMethods yLandmarks as feature (Shazam) ySpeedup by hash tables and inverted files zPlatform yCurrently: Single CPU yIn the future: Multiple CPU & GPU

-27- Experiments zCorpora yDatabase: 2550 tracks yTest files: 5 mobile- recorded songs chopped into segments of 5, 10, 15, and 20 seconds zAccuracy test y5-sec clips: 161/275=58.6% y10-sec clips: 121/136=89.0% y15-sec clips: 88/90=97.8% y20-sec clips: 65/66=98.5% Accuracy vs. durationComputing time. vs. durationAccuracy vs. computing time

-28- Demos of Audio Fingerprinting zCommercial apps yShazamShazam ySoundhoundSoundhound zOur demo yhttp://mirlab.org/demo/afpFarmer2550http://mirlab.org/demo/afpFarmer2550

-29- Conclusions For AFP zConclusions yLandmark-based methods are effective yMachine learning is indispensable for further improvement. zFuture work: Scale up yShazam: 15M tracks in database, 6M tags/day yOur goal: x50K tracks with a single PC and GPU x1M tracks with cloud computing of 10 PC