IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
Introduction Music Information Retrieval (MIR) Singer identification & Vocal-Timbre- Similarity Feature extraction Influence from other instruments
Related Studies Using a statistically based speaker- identification method for speech signals in noisy environments First estimated an accompaniment-only model from interlude sections a vocal-only model by subtract the accompaniment-only model from the vocal- plus-accompaniment model Assume singing voice and accompaniment sounds statistically independent Not always satisfied Estimation have problem
Related Studies Using vocal separation method Similar to their accompaniment sound reduction method Did not dealt with interlude sections Conducted experiments, using only vocal sections
Method Overview
Accompaniment Sound Reduction F0 estimation PreFEst (Predominant-F0 estimation method) Observed power spectrum in units of cents A band pass filter designed for most melody Observed pdf of frequency components Each observed pdf is generated from weighted- mixture model of possible tone model Estimate the weighting by EM algorithm (MAP), regard as F0’s pdf Track dominant F0
Accompaniment Sound Reduction Harmonic Structure Extraction Extract the frequency and amplitude of the l-th overtone Allow r cent error Search local maximum amplitude in an area
Accompaniment Sound Reduction Re-synthesis Model by sinusoidal Quadratic function approximate changes in phase Linear function approximate changes in amplitude
Accompaniment Sound Reduction Evaluation
To be continued…
Feature Extraction LPC-Derived Mel Cepstral Coefficients (LPMCCs) ∆F0s
Reliable Frame Selection The feature vectors obtained from accompaniment sounds regions are unreliable Set up a vocal GMM and a nonvocal GMM Determine whether the feature vector x is reliable or not by threshold: Difficult to determine a global η Use α% of all the frames in each song are selected as reliable frames
Reliable Frame Selection Evaluation