Presentation is loading. Please wait.

Presentation is loading. Please wait.

Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.

Similar presentations


Presentation on theme: "Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer."— Presentation transcript:

1 Singer Similarity Doug Van Nort MUMT 611

2 Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer  Storage and retrieval

3 Introduction Identification of singer fairly easy task for humans regardless of musical context Not so easy to find parameters for automatic identification More file sharing and databases leads to increased demand

4 Introduction Much work done in speech recognition, performs poorly for singer ID  Systems trained on speech data, with no background noise The vocal problem has some fundamental differences  Vocals exist in variety of background noise  Voiced/unvoiced content Singer recognition similar problem to solo instrument identification

5 The Players Kim and Whitman 2002 Liu and Huang 2002

6 Kim and Whitman From MIT Media Lab Singer identification which  Assumes strong harmonicity from vocals  Assumes pop music  Instrumentation/levels within critical frequency range

7 Two step process Untrained algorithm for automatic segmentation Classification with training based on vocal segments

8 Detection of Vocal Regions Filter frequencies outside of vocal range of 200-2,000 Hz  Chebychev IIR digital filter Detect harmonicity

9 Filter Frequency Response

10 Filtering alone not enough  Bass and cymbals gone, but  Other instruments fall within range Need to extract features within vocal range to find voice

11 Harmonic detection Band limited output sent through bank of inverse comb filters  Delay varied

12 Most attenuated signal represents strongest harmonic content Harmonicity measure calculated by ratio of signal energy to maximally attenuated signal  Allows for establishment of threshold

13 Singer Identification Linear Predictive Coding (LPC) used to extract location and magnitude of formants One of two classifiers used to identify singer based on formant information

14 Feature Extraction A 12-pole linear predictor used to find formants using autocorrelation method Standard LPC treats frequencies linearly, but human sensitivity is more logarithmic  Warp function maps frequencies to approximation of Bark scale  Further beneficial in finding fundamental

15

16 Classification Techniques 2 established pattern recognition algorithms used:  Gaussian Mixture Model (GMM)  Support Vector Machine (SVM)

17 GMM Uses multiple weighted Gaussians to capture behavior of each class  Each vector assumed to arise from mixture of gaussian dists. Parameters for Gaussians found via Expectation Maximization (EM)  Mean and variance Prior to EM, Principal Component Analysis (PCA) taken of data  Normalizes variances, avoids highly irregular scalings which EM can produce

18 SVM Computes optimal hyperplane to linearly separate two classes of data Does not depend on probability estimation Determined by a small number of data points (support vectors)

19 Experiments & Results Testbed of 200 songs by 17 different artists/vocalists Tracks downsampled to 11.025 Khz  Vocal range still well below Nyquist

20 Half of database used for training, half for testing Two experiments:  LPC features taken from entire song  LPC features taken from vocal segments

21 1024 frame analysis with hop size of 2 LP analysis used both linear and warped freq scales

22 Results

23 Results better than chance (~6%) but fall short of expected human performance Linear freq alone outperforms warped freq Oddly, using only vocal segments decreases performance for SVM

24 Liu and Huang Based on MP3 database Particularly high demand for such an approach, given widespread use of Mpeg 1, layer 3 Algorithm works directly on MP3 decoder algorithm

25 Process Coefficients of polyphase filter taken from MP3 decoding process File segmented into phonemes based on said coefficients Feature vector constructed for each phoneme, and stored along with artist name in database Classifier trained on database, used to identify unknown MP3 files

26 Flowchart for singer similarity System of Liu/Huang

27 Phoneme Segmentation MP3 decoding provides polyphase coefficients

28 Energy intensity of each subband is sum of squares of subband coefficients Frame energy calculated from polyphase coefficients

29 Energy gap exists between two phonemes Segmentation looks to automatically identify this gap

30 Waveform of two phonemes Frame energy of two phonemes

31 Phoneme Feature Extraction Phoneme features computed directly from MDCT coefficients 576 dimensional feature vector for each frame Phoneme feature vector of n frames

32 Classification : setup Create database of phoneme feature vectors  Becomes training set Discriminating Radius: measure of uniqueness by min Euclidean distance between dissimilar vectors

33 Good vs. Bad discriminators

34 Number of similar phonemes within discriminating radius also cosidered Number of phonemes within radius = w f = frequency of phoneme f

35 Discriminating ability of each phoneme depends on frequency and distance

36 Classification: in action Unknown MP3 segmented into phonemes  Only first N used for efficiency kNN used as classifier  K neighbors compared to N phonemes and weighted by discriminating function  K*N weighted “votes” clustered by singer, and the winner is one with largest score

37 Experiments/Results 10 Male, 10 Female singers 30 songs apiece  10 phoneme database  10 training (discriminator weights)  10 test set

38 Free parameters User defined parameters:  k value  Discrimination threshold  Number of singers in a class

39 Varying threshold

40 Varying k

41 Varying number of singers

42 Results for all Singers

43 Conclusion Not much work yet strictly on singer Tough because of time and background variances Quite useful as many people identify artists with singer Initial results promising, short of human performance See also: Minnowmatch [Whitman, Flake, Lawrence]


Download ppt "Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer."

Similar presentations


Ads by Google