Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Speaker Associate Professor Ning-Han Liu. What’s MIR  Music information retrieval (MIR) is the interdisciplinary science of retrieving information from.
KARAOKE FORMATION Pratik Bhanawat (10bec113) Gunjan Gupta Gunjan Gupta (10bec112)
Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
Results obtained in speaker recognition using Gaussian Mixture Models Marieta Gâta*, Gavril Toderean** *North University of Baia Mare **Technical University.
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
/25 Singer Similarity A Brief Literature Review Catherine Lai MUMT-611 MIR March 24,
Automatic Identification of Bacterial Types using Statistical Image Modeling Sigal Trattner, Dr. Hayit Greenspan, Prof. Shimon Abboud Department of Biomedical.
Large Lump Detection by SVM Sharmin Nilufar Nilanjan Ray.
Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.
Berenzweig - Music Recommendation1 Music Recommendation Systems: A Progress Report Adam Berenzweig April 19, 2002.
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Multiple Human Objects Tracking in Crowded Scenes Yao-Te Tsai, Huang-Chia Shih, and Chung-Lin Huang Dept. of EE, NTHU International Conference on Pattern.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
AdvAIR Supervised by Prof. Michael R. Lyu Prepared by Alex Fok, Shirley Ng 2002 Fall An Advanced Audio Information Retrieval System.
Computer Science Department A Speech / Music Discriminator using RMS and Zero-crossings Costas Panagiotakis and George Tziritas Department of Computer.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Large Lump Detection by SVM Sharmin Nilufar Nilanjan Ray.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
2001/03/29Chin-Kai Wu, CS, NTHU1 Speech and Language Technologies for Audio Indexing and Retrieval JOHN MAKHOUL, FELLOW, IEEE, FRANCIS KUBALA, TIMOTHY.
9.0 Speaker Variabilities: Adaption and Recognition References: of Huang 2. “ Maximum A Posteriori Estimation for Multivariate Gaussian Mixture.
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
/14 Automated Transcription of Polyphonic Piano Music A Brief Literature Review Catherine Lai MUMT-611 MIR February 17,
Kinect Player Gender Recognition from Speech Analysis
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Isolated-Word Speech Recognition Using Hidden Markov Models
Gaussian Mixture Model and the EM algorithm in Speech Recognition
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
G AUSSIAN M IXTURE M ODELS David Sears Music Information Retrieval October 8, 2009.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Structure Discovery of Pop Music Using HHMM E6820 Project Jessie Hsu 03/09/05.
Online Kinect Handwritten Digit Recognition Based on Dynamic Time Warping and Support Vector Machine Journal of Information & Computational Science, 2015.
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
Summary  Extractive speech summarization aims to automatically select an indicative set of sentences from a spoken document to concisely represent the.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
Gaussian Mixture Models and Expectation-Maximization Algorithm.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
QBSH Corpus The QBSH corpus provided by Roger Jang [1] consists of recordings of children’s songs from students taking the course “Audio Signal Processing.
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
Content-Based MP3 Information Retrieval Chueh-Chih Liu Department of Accounting Information Systems Chihlee Institute of Technology 2005/06/16.
2D-LDA: A statistical linear discriminant analysis for image matrix
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
A Tutorial on Speaker Verification First A. Author, Second B. Author, and Third C. Author.
Automatic Transcription of Polyphonic Music
LECTURE 11: Advanced Discriminant Analysis
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Artist Identification Based on Song Analysis
Statistical Models for Automatic Speech Recognition
Unsupervised-learning Methods for Image Clustering
Image Segmentation Techniques
ECE539 final project Instructor: Yu Hen Hu Fall 2005
SMEM Algorithm for Mixture Models
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
A Tutorial on Bayesian Speech Feature Enhancement
Popular Music Vocal Analysis
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
John H.L. Hansen & Taufiq Al Babba Hasan
Presentation on Timbre Similarity
Presentation transcript:

Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai Institute of Information Science, Academia Sinica

Institute of Information Science Academia Sinica 2 Extracting Information From Music Music Information Retrieval (MIR) –To develop ways of managing collections of musical material for preservation, access, research, and other uses. MIR communities & research areas [after Futrelle & Downie, 2002]

Institute of Information Science Academia Sinica 3 Extracting Voice Information From Music Viewing MIR from a speech-processing perspective

Institute of Information Science Academia Sinica 4 Singer Recognition Tasks (I) Singer Identification –Determining who is singing

Institute of Information Science Academia Sinica 5 Singer Recognition Tasks (II) Singer detection –Determining whether or not a specified singer is present in a music recording

Institute of Information Science Academia Sinica 6 Singer Recognition Tasks (III) Singer Tracking –Locating where a specified singer is present in a music recording

Institute of Information Science Academia Sinica 7 Singer Recognition Tasks (IV) Singer Clustering –Grouping the same-singer music recordings into a cluster

Institute of Information Science Academia Sinica 8 Potential Applications Indexing –Finding cameo’s or guest appearances in live concert recordings. –Identifying the singers in a movie’s musical interludes. Music recommendation systems –Suggesting music by singers with similar voices. Karaoke services –Efficiently organizing the customer’s recordings. –Personalization Copyright protection –Distinguishing between an original song and a cover-band. –Rapidly scanning suspect websites for piracy

Institute of Information Science Academia Sinica 9 Singer’s Vocal Characteristics Humans use several levels of perceptual cues for distinguishing among singers

Institute of Information Science Academia Sinica 10 Major Challenges In Singer Recognition The vast majority of popular music contains background accompaniment during most or all vocal passages –Infeasible to acquire isolated solo voice data for extracting the singer’s vocal characteristics  The proposed solution: Vocal segment detection followed by solo vocal signal modeling

Institute of Information Science Academia Sinica 11 Vocal/Non-vocal Segmentation

Institute of Information Science Academia Sinica 12 Gaussian Mixture Model (I) Model description –The distribution of the feature vector x is represented by a mixture of M component Gaussian densities, i.e., is the i-th Gaussian density with mean and covariance matrix –A Gausian mixture model (GMM) is characterized by

Institute of Information Science Academia Sinica 13 Gaussian Mixture Model (II) Parameter estimation –Using the EM algorithm, an initial model is created, and the new model is then estimated by maximizing the auxiliary function where and –Letting for each parameter to be re-estimated, we have

Institute of Information Science Academia Sinica 14 Distilling Singers’ Voices From Music Substantial similarities exist between the instrumental regions and the accompaniment of the vocal signal Solo voice can be modeled via suppressing the background music estimated from the instrumental regions.

Institute of Information Science Academia Sinica 15 Solo Vocal Signal Modeling (I) Model Description – b can be approximately estimated using the instrumental regions of music –Our aim is to find an optimal s such that (in maximum likelihood sense)

Institute of Information Science Academia Sinica 16 Solo Vocal Signal Modeling (II) Parameter estimation –Defining an auxiliary function where –Letting for each parameter to be re-estimated, we have

Institute of Information Science Academia Sinica 17 Solo Vocal Signal Modeling (III) Re-estimation formulas for linear spectral features –Suppose V is a linear spectral feature, and S and B are additive in the time domain, then v t = s t + b t – is the convolution of the solo and background music densities, i.e., – and can be shown in the following form:

Institute of Information Science Academia Sinica 18 Solo Vocal Signal Modeling (IV) Re-estimation formulas for cepstral features –Suppose V is a cepstral feature, and S and B are additive in the time domain, then v t = log[exp(s t )+exp(b t )]. We approximate v t  max (s t, b t ). –It can be shown that

Institute of Information Science Academia Sinica 19 Singer Identification (SID) Block diagram

Institute of Information Science Academia Sinica 20 SID Experiments Music data –200 tracks from Mandarin pop music CDs –10 female & 10 male singers –5 tracks/singer for training; 5 tracks/singer for testing –20-min instrumental-only data for training the non- vocal GMM –22.05 kHz sampling rate (down-sampled from 44.1 kHz) Vocal/Non-vocal segmentation –82.3% frame accuracy

Institute of Information Science Academia Sinica 21 Singer Clustering (I) Block diagram

Institute of Information Science Academia Sinica 22 Singer Clustering (II) An example of the characteristic vectors

Institute of Information Science Academia Sinica 23 Singer Clustering (III) Determining the number of clusters –Bayesian Information Criterion (BIC) Measuring how well the model fits a data set, and how simple the model is, specifically –The BIC for a K-clustering is computed by: –A reasonable number of clusters can be determined by

Institute of Information Science Academia Sinica 24 Singer Clustering Experiments (I) Music data –200 tracks (20 singers; 10 tracks/singer) Assessment method –Cluster purity  k is the purity of the cluster k, n k the total no. of recordings in the cluster k, and n kp the no. of recordings in the cluster k that were performed by singer p –Average purity M is the total no. of recordings, and K the no. of clusters

Institute of Information Science Academia Sinica 25 Singer Clustering Experiments (II) Results

Institute of Information Science Academia Sinica 26 Summary We have –Separated vocal from non-vocal segments of music; –Isolated singers’ vocal characteristics form the background music; –Distinguished singers from one another. We will –Handle wider variety of music data including duets, trios, chorus, background vocals, or music with multiple simultaneous or non- simultaneous singers; –Deal with the other problems of voice information retrieval from music, such as lyric transcription and singing language recognition.

Institute of Information Science Academia Sinica 27 To Probe Further (I) Selected references –Music information retrieval A. L. Uitdenbogerd, “Music IR: past, present, and future,” Proceedings of International Symposium on Music Information Retrieval, J. Futrelle and J. S. Downie, “Interdisciplinary communities and research issues in music information retrieval,” Proceedings of International Conference on Music Information Retrieval, pp. 215–221, –Artist recognition B. Whitman, G. Flake, and S. Lawrence, “Artist detection in music with Minnowmatch,” Proceedings of IEEE Workshop on Neural Networks for Signal Processing, A. Berenzweig, D. P. W. Ellis, and S. Lawrence, “Using voice segments to improve artist classification of music,” Proceedings of International Conference on Virtual, Synthetic and Entertainment Audio, –Singer identification Y. E. Kim and B. Whitman, “Singer identification in popular music recordings using voice coding features,” Proceedings of International Conference on Music Information Retrieval, pp. 164–169, C. C. Liu, and C. S. Huang, “A singer identification technique for content-based classification of MP3 music objects,” Proceedings of International Conference on Information and Knowledge Management, pp. 438–445, T. Zhang, “Automatic Singer Identification,” Proceedings of International Conference on Multimedia and Expo, W. H. Tsai, H. M. Wang, and D. Rodgers, “Automatic singer identification of popular music recordings via estimation and modeling of solo vocal signal,” Proceedings of European Conference on Speech Communication and Technology, –Singer clustering W. H. Tsai, H. M. Wang, D. Rodgers, S. S. Cheng, and H. M. Yu, “Blind clustering of popular music recordings based on singer voice characteristics,” to appear in Proceedings of International Conference on Music Information Retrieval, 2003.

Institute of Information Science Academia Sinica 28 To Probe Further (II) General resources –Important conferences International Conference on Music Information Retrieval International Computer Music Conference IEEE International Conference on Multimedia and Expo ACM International Multimedia Conference International Conference on New Interfaces for Musical Expression –Organizations International Computer Music Association ( The Australasian Computer Music Association ( ACM Multimedia ( Acoustical Society of America ( –Journals Computer Music Journal ( Journal of New Music Research ( Computing in Musicology ( –Useful links