Music Information Retrieval: Overview and Challenges

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Dynamic Time Warping (DTW)

A Musical Data Mining Primer CS235 – Spring ’03 Dan Berger

National Taiwan University

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.

Presented by Xinyu Chang

Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,

Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.

Pitch Tracking (音高追蹤) Jyh-Shing Roger Jang (張智星) MIR Lab (多媒體資訊檢索實驗室)

Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification.

Onset Detection in Audio Music J.-S Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept. National Taiwan University.

Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval.

Extracting Noise-Robust Features from Audio Data Chris Burges, John Platt, Erin Renshaw, Soumya Jana* Microsoft Research *U. Illinois, Urbana/Champaign.

FINGER PRINTING BASED AUDIO RETRIEVAL Query by example Content retrieval Srinija Vallabhaneni.

Thursday, November 13, 2008 ASA 156: Statistical Approaches for Analysis of Music and Speech Audio Signals AudioDB: Scalable approximate nearest-neighbor.

T.Sharon 1 Internet Resources Discovery (IRD) Music IR.

Basic Features of Audio Signals ( 音訊的基本特徵 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan.

The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.

Oral Defense by Sunny Tang 15 Aug 2003

Sound Applications Advanced Multimedia Tamara Berg.

NM7613: Music Signal Analysis and Retrieval 音樂訊號分析與檢索 Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.

2015/9/151 Two Paradigms for Music IR: Query by Singing/Humming and Audio Fingerprinting J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab.

Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan.

CSIE Dept., National Taiwan Univ., Taiwan

National Taiwan University

Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

2015/10/221 Progressive Filtering and Its Application for Query-by-Singing/Humming J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept.,

Demos for QBSH J.-S. Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.

2015/10/251 Two Paradigms for Music IR: Query by Singing/Humming and Audio Fingerprinting J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab.

Content-based Music Retrieval from Acoustic Input (CBMR)

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.

Fundamentals of Music Processing Chapter 7: Content-Based Audio Retrieval Meinard Müller International Audio Laboratories Erlangen

2016/6/41 Recent Improvement Over QBSH and AFP J.-S. Roger Jang （張智星） Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan Univ.

Shazam -Abdulshafil Ahmed -Steven Lewis -Rick Huang.

RuSSIR 2013 QBSH and AFP as Two Successful Paradigms of Music Information Retrieval Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept.

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.

QBSH Corpus The QBSH corpus provided by Roger Jang [1] consists of recordings of children’s songs from students taking the course “Audio Signal Processing.

Content-Based MP3 Information Retrieval Chueh-Chih Liu Department of Accounting Information Systems Chihlee Institute of Technology 2005/06/16.

Query by Singing and Humming System

1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.

Audio Fingerprinting MUMT 611 Philippe Zaborowski March 2005.

Discussions on Audio Melody Extraction (AME) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.

Audio Fingerprinting Overview: RARE Algorithms, Resources Chris Burges, John Platt, Jon Goldstein, Erin Renshaw

LOGO Song Identification System Team members: Nguyen Ngoc Tan Ho Vinh Thinh Nguyen Huu Duy Nguyen Hoang Diep Nguyen Trong Dai Le Thanh Tung Supervisor:

Speech and Music Retrieval INST 734 Doug Oard Module 12.

Pitch Tracking in Time Domain Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University

Introduction to Music Information Retrieval (MIR)

Introduction to ISMIR/MIREX

Onset Detection, Tempo Estimation, and Beat Tracking

Search in Google's N-grams

CSIE Dept., National Taiwan Univ., Taiwan

MIR Lab: R&D Foci and Demos （ MIR實驗室：研發重點及展示）

Query by Singing/Humming via Dynamic Programming

Introduction to Pattern Recognition

MART: Music Assisted Running Trainer

A review of audio fingerprinting (Cano et al. 2005)

Closing Remarks on MSAR-2017

Introduction to Music Information Retrieval (MIR)

Feature Selection for Pattern Recognition

Introduction to Music Information Retrieval (MIR)

Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611

Endpoint Detection ( 端點偵測)

Query by Singing/Humming via Dynamic Programming

ADBOT Advertisement Recognition FROM television and radio broadcast

Duration & Pitch Modification via WSOLA

Measuring the Similarity of Rhythmic Patterns

Harmonically Informed Multi-pitch Tracking

Pre and Post-Processing for Pitch Tracking

Presentation transcript:

Music Information Retrieval: Overview and Challenges J.-S. Roger Jang （張智星） Multimedia Information Retrieval (MIR) Lab CSIE Dept, National Taiwan Univ. http://mirlab.org/jang 2017/4/25

Outline Music information Retrieval (MIR) Intro to MIR Intro to ISMIR & MIREX Two classical paradigms of MIR QBSH (query by singing/humming) AFP (audio fingerprinting) Conclusions

Introduction to QBSH QBSH: Query by Singing/Humming Progression Input: Singing or humming from microphone Output: A ranked list retrieved from the song database according to similarity to the query Progression First paper: Around 1994 Extensive studies since 2001 State of the art: QBSH tasks at ISMIR/MIREX, since 2006

Two Steps in QBSH Pitch Tracking Database comparison To detect the period of a waveform Time domain (時域) ACF (Autocorrelation function) NSDF (Normalized squared difference function) AMDF (Average magnitude difference function) Frequency domain (頻域) Harmonic product spectrum Cepstrum To find similarity between query and database songs Linear scaling Dynamic time warping Recursive alignment Hybrid methods

Frame Blocking for Pitch Tracking Overlap Sample rate = 16 kHz Frame size = 512 samples Frame duration = 512/16000 = 0.032 s = 32 ms Overlap = 192 samples Hop size = frame size – overlap = 512-192 = 320 samples Frame rate = 16000/320 = 50 frames/sec = Pitch rate Zoom in Frame

ACF: Auto-correlation Function 1 128 Original frame s(t): Shifted frame s(t-t): t=30 acf(30) = inner product of the overlap part Pitch period To play safe, the frame size needs to cover at least two fundamental periods!

Frequency to Semitone Conversion Semitone : A music scale based on A440 Reasonable pitch range: E2 - C6 82 Hz - 1047 Hz ( - )

Demos Pitch related demos Pitch tracking Pitch shift

Basic Comparison Method: Linear Scaling Scale the query pitch linearly to match the candidates Target pitch in database Compressed by 0.5 Compressed by 0.75 Original pitch Original input pitch Best match Stretched by 1.25 Stretched by 1.5

Typical Result of Pitch Tracking Pitch tracking via autocorrelation for茉莉花 (jasmine)

Comparison of Pitch Vectors Yellow line : Target pitch vector

QBSH Demos QBSH demos by our lab Existing commercial QBSH systems Description QBSH on the web: MIRACLE QBSH on toys Existing commercial QBSH systems www.midomi.com www.soundhound.com

Our QBSH System: Miracle Single server with GPU NVIDIA 560 Ti, 384 cores (speedup factor = 10) Clients Single server PC Master server Request: pitch vector Master server Response: search result PDA/Smartphone Database size: ~20,000 Cellular

Improving QBSH Many ways to improve QBSH Sorted error vector Various weight for rests Re-ranking for better accuracy Better memory arrangement in GPU …

Intro to Audio Fingerprinting (AFP) Goal Identify a noisy version of a given audio clips Also known as… “Query by exact example”  no “cover versions” are allowed

AFP Applications Commercial applications of AFP Music identification & purchase Royalty assignment (over radio) TV shows or commercials ID (over TV) Copyright violation (over web) Major commercial players Shazam, Soundhound, Intonow, Viggle…

Two Stages in AFP Offline Online Feature extraction Hash table construction for songs in database Inverted indexing Online Feature extraction Hash table search Ranked list of the retrieved songs/music

Robust Feature Extraction Various kinds of features for AFP Invariance along time and frequency Landmark of a pair of local maxima Wavelets … Extensive test required for choosing the best features

Representative Approaches to AFP Philips J. Haitsma and T. Kalker, “A highly robust audio fingerprinting system”, ISMIR 2002. Shazam A.Wang, “An industrial-strength audio search algorithm”, ISMIR 2003 Google S. Baluja and M. Covell, “Content fingerprinting using wavelets”, Euro. Conf. on Visual Media Production, 2006. V. Chandrasekhar, M. Sharifi, and D. A. Ross, “Survey and evaluation of audio fingerprinting schemes for mobile query-by-example applications”, ISMIR 2011

Improvement on AFP Re-ranking of AFP by learning to rank Demo: http://mirlab.org/demo/audioFingerprinting

Shazam’s Method Ideas Take advantage of music local structures Find salient peaks on spectrogram Pair peaks to form landmarks for comparison Efficient search by hash tables Use positions of landmarks as hash keys Use song ID and offset time as hash values Use time constraints to find matched landmarks

How to Find Salient Peaks We need to find peaks that are salient along both frequency and time axes Frequency axis: Gaussian local smoothing Time axis: Decaying threshold over time

How to Find Initial Threshold? Goal To suppress neighboring peaks Ideas Find the local max. of mag. spectra of initial 10 frames Superimpose a Gaussian on each local max. Find the max. of all Gaussians

How to Update the Threshold along Time? Decay the threshold Find local maxima larger than the threshold  salient peaks Define the new threshold as the max of the old threshold and the Gaussians passing through the active local maxima

Time-decaying Thresholds Forward: Backward:

How to Pair Salient Peaks? Target zone

Salient Peaks and Landmarks Peak picking after forward smoothing Matched landmarks (green) (Source: Dan Ellis)

Landmarks for Hash Table Access

Optimization Strategies for AFP Several ways to optimize AFP Strategy for query landmark extraction Confidence measure Incremental retrieval Better use of the hash table Re-ranking for better performance

Demos of Audio Fingerprinting Commercial apps Shazam Soundhound Our demo http://mirlab.org/demo/audioFingerprinting

QBSH vs. AFP QBSH AFP Goal: MIR Feature: Pitch Method: LS Database Perceptible Small data size Method: LS Database Harder to collect Small storage Bottleneck CPU/GPU-bound AFP Goal: MIR Features: Landmarks Not perceptible Big data size Method: Matched LM Database Easier to collect Large storage Bottleneck I/O-bound

Conclusions Successful applications in MIR Due to Challenges in MIR QBSH AFP Due to Faster bigger memory Advances in GPU/CPU (Moore’s law) New machine learning methods Challenges in MIR Audio melody extraction from polyphonic music Database collection for QBSH Cover song ID (which cannot handled by AFP) Polyphonic music transcription

Thank you for your attention! Questions & comments?