Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Slides:



Advertisements
Similar presentations
KARAOKE FORMATION Pratik Bhanawat (10bec113) Gunjan Gupta Gunjan Gupta (10bec112)
Advertisements

Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Salvatore giorgi Ece 8110 machine learning 5/12/2014
An Introduction of Support Vector Machine
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Overview of Real-Time Pitch Tracking Approaches Music information retrieval seminar McGill University Francois Thibault.
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
/25 Singer Similarity A Brief Literature Review Catherine Lai MUMT-611 MIR March 24,
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Berenzweig - Music Recommendation1 Music Recommendation Systems: A Progress Report Adam Berenzweig April 19, 2002.
Speaker Adaptation for Vowel Classification
Fig. 2 – Test results Personal Memory Assistant Facial Recognition System The facial identification system is divided into the following two components:
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Optimal Adaptation for Statistical Classifiers Xiao Li.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
Oral Defense by Sunny Tang 15 Aug 2003
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Representing Acoustic Information
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
LE 460 L Acoustics and Experimental Phonetics L-13
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
This week: overview on pattern recognition (related to machine learning)
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Combining Audio Content and Social Context for Semantic Music Discovery José Carlos Delgado Ramos Universidad Católica San Pablo.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
CSE 185 Introduction to Computer Vision Face Recognition.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
PATTERN COMPARISON TECHNIQUES
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
In summary C1={skin} C2={~skin} Given x=[R,G,B], is it skin or ~skin?
Musical Style Classification
Statistical Models for Automatic Speech Recognition
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
EE513 Audio Signals and Systems
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
Multimodal Caricatural Mirror
CS4670: Intro to Computer Vision
John H.L. Hansen & Taufiq Al Babba Hasan
Presentation on Timbre Similarity
Realtime Recognition of Orchestral Instruments
Realtime Recognition of Orchestral Instruments
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Singer similarity / identification Francois Thibault MUMT 614B McGill University

Introduction Relatively easy for humans to identify singing voice in various contexts Difficult to find time/environment invariant features for robust automatic identification Growing demand for such systems as Network databases keep expanding

Background (1) Significant research in speaker identification, systems perform poorly with singing voice (inadequate training) Singer identification research can draw much of automatic instrument recognition systems Artist / singer identification much harder than song identification (due to necessity of context invariant features)

Background (2) Often builds on speech / music discrimination systems Acoustical features heavily used to create N- dimensional Euclidean space: loudness, pitch, brightness, bandwidth, harmonicity Often uses the same tools as style identification because each singer correspond to a ‘micro’ style

Kim and Whitman overview Segmentation of vocal regions prior to singer identification algorithm Assumes singing regions display strong harmonic energy in voice frequency range Band-pass filter ( Hz) Inverse comb filter bank to detect harmonicity Identification classifier uses features based on LPC

K & W features extraction Determine formant location and amplitude by a 12-poles linear predictor using the autocorrelation method Augments low frequency resolution without increasing model order by warping the frequency representation with a function approximating the Bark scale

K & W classification Uses Gaussian mixture model (GMM) to capture behavior of a class Parameters of Gaussians determined by Expectation Maximization (EM) Run PCA prior to EM (normalizes the data variance, good for EM) SVMs computes optimal hyperplane that can linearly separate classes

K & W results Testbed contained more than 200 songs by 17 solo singers Half for training, half for testing Vocal segmentation inaccurate (~55%) Experimenting GMM and SVM for complete song and vocal parts only Overall results well short of human performance

K & W Experimental results

Liu and Huang overview Singer classification of MP3 files First segment audio into phonemes Calculate feature vector and store phoneme feature vector with associated singer for training set Above feature vectors are used as discriminators for classification of unknown MP3 music objects

L & H System Architecture

L & H segmentation features Phoneme segmentation is derived from polyphase filter coefficients by obtaining a frame energy measurement

K & W phoneme database Phonemes are separated by a minimum in FE

L & H Phoneme features The phoneme features are obtained directly from the MDCT coefficients

L & H classification (1) Compares phonemes features with those in the phoneme database Discriminating radius (Euclidean distance) is determines uniqueness of a phoneme Number of neighbors by same singer within the discriminating radius is called frequency (w)

L & H classification (2) kNN classifier used to guess artist in unknown MP3 songs For efficiency, only uses the first N phonemes in unknown MP3 Find the k closest neighbors in database and allow to vote if distance is within a threshold For each neighbor, give a weighted vote dependent on frequency, and distance where w is frequency and

K & W results 3 influencing factors Number of neighbors (N) Threshold for vote decision Number of singers in database

Other works… Minnowmatch: MIR engine including artist classification using NN and SVM (Whitman, Flake, Lawrence (NEC)) Quest for ground truth in musical artist similarity: determine accurate measure of similarity given subjective nature of artist classification (Ellis, Whitman, Berenzweig, Lawrence)