Download presentation
Presentation is loading. Please wait.
Published byByron Summers Modified over 9 years ago
1
Singer similarity / identification Francois Thibault MUMT 614B McGill University
2
Introduction Relatively easy for humans to identify singing voice in various contexts Difficult to find time/environment invariant features for robust automatic identification Growing demand for such systems as Network databases keep expanding
3
Background (1) Significant research in speaker identification, systems perform poorly with singing voice (inadequate training) Singer identification research can draw much of automatic instrument recognition systems Artist / singer identification much harder than song identification (due to necessity of context invariant features)
4
Background (2) Often builds on speech / music discrimination systems Acoustical features heavily used to create N- dimensional Euclidean space: loudness, pitch, brightness, bandwidth, harmonicity Often uses the same tools as style identification because each singer correspond to a ‘micro’ style
5
Kim and Whitman overview Segmentation of vocal regions prior to singer identification algorithm Assumes singing regions display strong harmonic energy in voice frequency range Band-pass filter (200-2000 Hz) Inverse comb filter bank to detect harmonicity Identification classifier uses features based on LPC
6
K & W features extraction Determine formant location and amplitude by a 12-poles linear predictor using the autocorrelation method Augments low frequency resolution without increasing model order by warping the frequency representation with a function approximating the Bark scale
7
K & W classification Uses Gaussian mixture model (GMM) to capture behavior of a class Parameters of Gaussians determined by Expectation Maximization (EM) Run PCA prior to EM (normalizes the data variance, good for EM) SVMs computes optimal hyperplane that can linearly separate classes
8
K & W results Testbed contained more than 200 songs by 17 solo singers Half for training, half for testing Vocal segmentation inaccurate (~55%) Experimenting GMM and SVM for complete song and vocal parts only Overall results well short of human performance
9
K & W Experimental results
10
Liu and Huang overview Singer classification of MP3 files First segment audio into phonemes Calculate feature vector and store phoneme feature vector with associated singer for training set Above feature vectors are used as discriminators for classification of unknown MP3 music objects
11
L & H System Architecture
12
L & H segmentation features Phoneme segmentation is derived from polyphase filter coefficients by obtaining a frame energy measurement
13
K & W phoneme database Phonemes are separated by a minimum in FE
14
L & H Phoneme features The phoneme features are obtained directly from the MDCT coefficients
15
L & H classification (1) Compares phonemes features with those in the phoneme database Discriminating radius (Euclidean distance) is determines uniqueness of a phoneme Number of neighbors by same singer within the discriminating radius is called frequency (w)
16
L & H classification (2) kNN classifier used to guess artist in unknown MP3 songs For efficiency, only uses the first N phonemes in unknown MP3 Find the k closest neighbors in database and allow to vote if distance is within a threshold For each neighbor, give a weighted vote dependent on frequency, and distance where w is frequency and
17
K & W results 3 influencing factors Number of neighbors (N) Threshold for vote decision Number of singers in database
18
Other works… Minnowmatch: MIR engine including artist classification using NN and SVM (Whitman, Flake, Lawrence (NEC)) Quest for ground truth in musical artist similarity: determine accurate measure of similarity given subjective nature of artist classification (Ellis, Whitman, Berenzweig, Lawrence)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.