/25 Singer Similarity A Brief Literature Review Catherine Lai MUMT-611 MIR March 24, 2005 1.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Time-Frequency Analysis Analyzing sounds as a sequence of frames

KARAOKE FORMATION Pratik Bhanawat (10bec113) Gunjan Gupta Gunjan Gupta (10bec112)

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai

Data Mining Classification: Alternative Techniques

Computational Rhythm and Beat Analysis Nick Berkner.

Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.

VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.

AN IMPROVED AUDIO Jenn Tam Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

Speaker Adaptation for Vowel Classification

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.

On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.

A Supervised Approach for Detecting Boundaries in Music using Difference Features and Boosting Douglas Turnbull Computer Audition Lab UC San Diego, USA.

The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.

ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.

FYP0202 Advanced Audio Information Retrieval System By Alex Fok, Shirley Ng.

2001/03/29Chin-Kai Wu, CS, NTHU1 Speech and Language Technologies for Audio Indexing and Retrieval JOHN MAKHOUL, FELLOW, IEEE, FRANCIS KUBALA, TIMOTHY.

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST

„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.

9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.

Presented by Tienwei Tsai July, 2005

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.

Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

Detection, Classification and Tracking in a Distributed Wireless Sensor Network Presenter: Hui Cao.

Polyphonic Transcription Bruno Angeles McGill University - Schulich School of Music MUMT-621 Fall /14.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

Audio Tempo Extraction Presenter: Simon de Leon Date: February 9, 2006 Course: MUMT611.

Singer similarity / identification Francois Thibault MUMT 614B McGill University.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.

AUTOMATIC TARGET RECOGNITION AND DATA FUSION March 9 th, 2004 Bala Lakshminarayanan.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

MMDB-8 J. Teuhola Audio databases About digital audio: Advent of digital audio CD in Order of magnitude improvement in overall sound quality.

Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.

Performance Comparison of Speaker and Emotion Recognition

Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Query by Singing and Humming System

Musical Genre Categorization Using Support Vector Machines Shu Wang.

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

BASS TRACK SELECTION IN MIDI FILES AND MULTIMODAL IMPLICATIONS TO MELODY gPRAI Pattern Recognition and Artificial Intelligence Group Computer Music Laboratory.

RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.

Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.

1 Tempo Induction and Beat Tracking for Audio Signals MUMT 611, February 2005 Assignment 3 Paul Kolesnik.

Speech and Singing Voice Enhancement via DNN

David Sears MUMT November 2009

Artist Identification Based on Song Analysis

3. Applications to Speaker Verification

Musical Style Classification

AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION

Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611

John H.L. Hansen & Taufiq Al Babba Hasan

Presentation on Timbre Similarity

Speaker Identification:

Presenter: Shih-Hsiang(士翔)

Measuring the Similarity of Rhythmic Patterns

Music Signal Processing

Presentation transcript:

/25 Singer Similarity A Brief Literature Review Catherine Lai MUMT-611 MIR March 24,

/25 Outline of Presentation Introduction –Motivation –Related research Recent publications –Kim & Whitman, 2002 –Liu & Huang, 2002 –Tsai, Wang, Rodgers, Cheng & Yu, 2003 –Bartsch & Wakefield, 2004 Discussion Conclusion 2

/25 Introduction Motivation –Multitude of audio files circulation on the Internet –Replace human documentation efforts and organize collection of music recordings automatically –Singer identification relatively easy for human but not machines Related Research –Speaker identification –Musical instrument identification 3

/25 Kim & Whitman, “Singer Identification in Popular Music Recordings Using Voice Coding Features” (MIT Media Lab) Automatically establish the I.D. of singer using acoustic features extracted from songs in a DB of pop music –Perform segmentation of vocal region prior to singer I.d. –Classifier uses features drawn from voice coding based on Linear Predictive Coding (LPC)  Good at highlight formant locations  Regions of resonance significant perceptually 4

/25 Kim & Whitman, Detection of Vocal Region Detect region of singing detect energy within frequencies bounded by the range of vocal energy –Filter audio signal with band-pass filter –Used Chebychev IIR digital filter of order 12 Attenuate other instruments fall outside of the vocal range regions e.g. bass and cymbals –Voice not only remaining instrument in the region Discriminate the other sounds e.g. drums use a measure of harmonicity –Vocal segment is > 90% voiced is highly harmonic –Measure harmonicity of filtered signal within analysis frame and thresholding the harmonicity against a fixed value 5

/25 Kim & Whitman, Feature Extraction 12-pole LP analysis based on the general principle behind LPC for speech used for feature extraction LP analysis performed on linear and warped scales Linear scale treats all frequencies equally on linear scale –Human ears not equally sensitive to all frequencies linearly –Warping function adjusts closely to the Bark scale approx. frequency sensitivity of human hearing –Warp function better at capture formant location at lower frequencies 6

/25 Kim & Whitman, Experiments Data sets include 17 different singer > 200 songs 2 classifier Gaussian Mixture Model (GMM) and SVM used on 3 different feature sets –Linear scaled, warped scaled, both linear and warped data Run on entire song data and on segments classified as vocal only 7

/25 Kim & Whitman, Results Linear frequency feature tend to outperform the warped frequency feature when each used alone; combined best Song and frame accuracy increases when using only vocal segments in GMM Song and frame accuracy decreases when using only vocal segments in SVM 8 Kim & Whitman, 2002

/25 Kim & Whitman, Discussion and Future Work Better performance of linear frequency scale features vs. warped frequency scale features indicate –Machine find increased accuracy of the linear scale at higher frequencies useful –Contrary to human auditory system The performance of the SVM decreased is puzzling –Finding aspects of the features not specifically related to voice Add high-level musical knowledge to the system –Attempt to I.D. song structure such as locate verses or choruses –Higher probability of vocals in these sections 9

/25 Liu & Huang, “A Singer Identification Technique for Content-Based Classification of MP3 Music Objects” Automatically classify MP3 music objects according to singers Major steps: –Coefficients extracted from compressed raw data used to compute the MP3 features for segmentation –Use these features to segment MP3 objects into a sequence of notes or phonemes  Waveform of 2 phonemes –Each MP3 phoneme in the training set, its MP3 features extracted and stored with its associated singer in phoneme DB –Phoneme in the MP3 DB used as discriminators in an MP3 classifier to I.D. the singers of unknown MP3 objects 10 Liu & Huang, 2002

/25 Liu & Huang, Classification Number of different phonemes a singer can sing is limited and singer with different timbre possess unique phoneme set Phonemes of an unknown MP3 song can be associated with the similar phoneme of the same singer in the phoneme DB kNN classifier used for classification –Each unknown MP3 song first segmented into phonemes –First N phonemes used and compared with every discriminators in the phoneme DB –K closest neighbors found For each of the k closest neighbor, –If its distance within a threshold, a weighted vote given –K*N weighted votes accumulated according to singer –Unknown MP3 song is assigned to the singer with largest score 11

/25 Liu & Huang, Experiments Data set consists of 10 male and 10 female Chinese singers each with 30 songs 3 factors dominate the results of the MP3 music classification method –Setting of k in the kNN classifier (best k = 80 result 90% precision rate) –Threshold for vote decision used by the discriminator (best threshold = 0.2) –Number of singer allowed in a music class (larger no. higher precision)  Allow > 1 singer in a musical class  Grouping several singers with similar voice provide ability to find songs with singers of similar voices 12

/25 Liu & Huang, Results and Future Work Results within expectation –Songs sung by a singer with very unique style resulted in the highest precision rate (> 90%) –Songs sung by a singer with a common voice resulted in only 50% of the precision rate Future work to use more music features –Pitch, melody, rhythm, and harmonicity for music classification –Try to represent MP3 features according to syntax and semantics of the MPEG7 standards 13 Liu & Huang, 2002

/25 Tsai et al., “Blind Clustering of Popular Music Recordings Based on Singer Voice Characteristics” (ISMIR) Technique for automatically clustering undocumented music recording based on associated singers given no singer information or population of singers Clustering method based on the singer’s voice rather than background music, genre, or others 3-stage process proposed: –Segmentation of each recording into vocal/non-vocal segments –Suppressing the characteristics of background from vocal segment –Clustering the recording based on singer characteristic similarity 14

/25 Tsai et al., Classification Classifier for vocal/non-vocal segmentation –Front-end signal processor to convert digital waveform into spectrum-based feature vectors –Back-end statistical processor to perform modeling, matching, and decision making 15

/25 Tsai et al., Classification Classifier operates in 2 phases: training and testing –During training phase, a music DB with manual vocal/non- vocal transcriptions used to form two separate GMMS: a vocal GMM and non-vocal GMM –In testing phase, recognizer takes as input feature vectors extracted from an unknown recording, produces as output the frame log-likelihoods for the vocal GMM and the non- vocal GMM 16

/25 Tsai et al., Classification Block diagram 17 Tsai, 2003

/25 Tsai et al., Decision Rules Decision for each frame made according to one of three decision rules: 1. frame-based, 2. fixed-length-segment-based, and 3. homogeneous-segment based decision rules. 18 Tsai, 2003 Assign a single classification per segment

/25 Tsai et al., Singer Characteristic Modeling Characteristics of voice be modeled to cluster recordings –V = {v1, v2, v3, …} be features vectors from a vocal region, is a mixture of  solo feature vectors S = {s1, s2, s3, …}  background accompaniment feature vectors B = {b1, b2, b3, …} S and B unobservable –B can be approximated from the non-vocal segments –S is subsequent estimated given V and B A solo and a background music model is generate for each recording to be clustered 19

/25 Tsai et al., Clustering Each recording evaluated against each singer’s solo model –Log-likelihood of the vocal portion of one recording tested against one solo model computed (for all solo models) K-mean algorithm used for clustering –Starts with a single cluster and recursively split clusters –Bayesian Information Criterion employed to decide the best value of k 20

/25 Tsai et al., Experiments Data set consists of 416 tracks from Mandarin pop music CD Experiments run to validate the vocal/non-vocal segmentation method –Best accuracy achieved was 78% using the homogeneous segment-based method 21

/25 Tsai et al., Results System evaluation on the basis of average cluster purity When k = singer population, the highest purity = Tsai, 2003

/25 Tsai et al., Future Work Test method on a wider variety of data –Larger singer population –Richer songs with different genre 23

/25 Discussion and Conclusion Singer similarity technique can be used to –Automatically organize a collection of music recordings based on lead singer –Labeling of guest performers information usually omitted in music in music database –Replace human documentation efforts Extend to handle duets, chorus, background vocals, other musical data with multiple simultaneous or non-simultaneous singers –Rock band songs with parts sung by the guitarist, drummer band members can be identified 24

/25 Bibliography Bartsch, M. and G. Wakefield (2004). Singing voice identification using spectral envelope estimation. IEEE Transactions on Speech and Audio Processing, vol. 12, no. 2, Kim, Y. and B. Whitman (2002). Singer identification in popular music recordings using voice coding features. In Proceedings of the 2002 International Symposium on Music Information Retrieval. Liu, C. and C. Huang (2002). A singer identification technique for content-based classification of mp3 music objects. In Proceedings of the 2002 Conference on Information and Knowledge Management (CIKM), Tsai, W., H. Wang, D. Rodgers, S. Cheng, and H. Yu (2003). Blind clustering of popular music recording based on singer voice characteristics. In Proceedings of the 2003 International Symposium on Music Information Retrieval. 25