Tetsuro Kitahara* Masataka Goto** Hiroshi G. Okuno*

Slides:

Advertisements

Similar presentations

Component Analysis (Review)

Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Easily extensible unix software for spectral analysis, display modification, and synthesis of musical sounds James W. Beauchamp School of Music Dept.

Pattern Recognition and Machine Learning

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.

Musical Instruments Contents: What is the difference between high and low notes? Why do different instruments sound different? What does it mean to play.

G. Valenzise *, L. Gerosa, M. Tagliasacchi *, F. Antonacci *, A. Sarti * IEEE Int. Conf. On Advanced Video and Signal-based Surveillance, 2007 * Dipartimento.

Pitch Recognition with Wavelets Final Presentation by Stephen Geiger.

Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.

Speaker Adaptation for Vowel Classification

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Recent Research in Musical Timbre Perception James W. Beauchamp University of Illinois at Urbana-Champaign Andrew B. Horner Hong University of Science.

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh

Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Instrument Recognition in Polyphonic Music Jana Eggink Supervisor: Guy J. Brown University of Sheffield

Isolated-Word Speech Recognition Using Hidden Markov Models

0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,

This week: overview on pattern recognition (related to machine learning)

SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,

Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.

Presented by Tienwei Tsai July, 2005

Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.

Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.

Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio Signal Processing Wednesday, March 9th, 2005.

Multimodal Information Analysis for Emotion Recognition

MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.

Basics of Neural Networks Neural Network Topologies.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Instrument Classification in a Polyphonic Music Environment Yingkit Chow Spring 2005.

Nick Wang, 25 Oct Speaker identification and verification using EigenVoices O. Thyes, R. Kuhn, P. Nguyen, and J.-C. Junqua in ICSLP2000 Presented.

Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

Spectral centroid PianoFlute Piano Flute decayed not decayed F0-dependent mean function which captures the pitch dependency (i.e. the position of distributions.

A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER

GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.

Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.

MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.

Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.

Automatic Transcription System of Kashino et al. MUMT 611 Doug Van Nort.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.

2D-LDA: A statistical linear discriminant analysis for image matrix

Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.

1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.

LDA (Linear Discriminant Analysis) ShaLi. Limitation of PCA The direction of maximum variance is not always good for classification.

On the relevance of facial expressions for biometric recognition Marcos Faundez-Zanuy, Joan Fabregas Escola Universitària Politècnica de Mataró (Barcelona.

Intro to Fourier Series BY JORDAN KEARNS (W&L ‘14) & JON ERICKSON (STILL HERE )

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Chapter 3: Maximum-Likelihood Parameter Estimation

LECTURE 11: Advanced Discriminant Analysis

School of Computer Science & Engineering

Review – Standing Waves

Special Topics In Scientific Computing

Object Modeling with Layers

Course Outline MODEL INFORMATION COMPLETE INCOMPLETE

ECE539 final project Instructor: Yu Hen Hu Fall 2005

EE513 Audio Signals and Systems

Pattern Recognition and Machine Learning

AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION

Feature Selection Methods

CAMCOS Report Day December 9th, 2015 San Jose State University

Presenter: Shih-Hsiang(士翔)

Combination of Feature and Channel Compensation (1/2)

Hairong Qi, Gonzalez Family Professor

Presentation transcript:

Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution Tetsuro Kitahara* Masataka Goto** Hiroshi G. Okuno* *Grad. Sch’l of Informatics, Kyoto Univ. **PRESTO JST / Nat’l Inst. Adv. Ind. Sci. & Tech. ICASSP’03 (6-10th Apr. 2003 in Hong Kong)

Today’s talk What is musical instrument identification? What is difficult in musical instrument identification? The pitch dependency of timbre How is the pitch dependency coped with? Approximate it as a function of F0 Musical instrument identification using F0-dependent multivariate normal distribution Experimental results Conclusions

1. What is musical instrument identification? It is to obtain the name of musical instruments from sounds (acoustical signals). It is useful for music automatic transcription, music information retrieval, etc. Its research began recently (since 1990s). Feature Extraction (e.g. Decay speed, Spectral centroid) w = argmax p(w|X) = argmax p(X|w) p(w) p(X|wpiano) p(X|wflute) <inst>piano</inst>

2. What is difficult in musical instrument identification? The pitch dependency of timbre e.g. Low-pitch piano sound = Slow decay High-pitch piano sound = Fast decay 1 2 3 -0.5 0.5 (a) Pitch = C2 (65.5Hz) time [s] 1 2 3 -0.5 0.5 (b) Pitch = C6 (1048Hz) time [s]

3. How is the pitch dependency coped with? Most previous studies have not dealt with the pitch dependency. Example: [Martin99] used hierarchical classification. [Brown99] used cepstral coefficients. [Eronen00] used both techniques. [Kashino98] developed a system for computational music scene analysis. [Kashino00] introduced template adaptation and musical contexts

3. How is the pitch dependency coped with? Proposal: Approximate the pitch dependency of each feature as a function of fundamental frequency (F0)

3. How is the pitch dependency coped with? An F0-dependent multivariate normal distribution has following two parameters: F0-dependent mean function which captures the pitch dependency (i.e. the position of distributions of each F0) F0-normalized covariance which captures the non-pitch dependency

4. Musical instrument identification using F0-dependent multivariate normal distribution 1st step: Feature extraction 129 features defined based on consulting literatures are extracted. e.g. Spectral centroid (which captures brightness of tones) Piano Flute Spectral centroid Spectral centroid

4. Musical instrument identification using F0-dependent multivariate normal distribution 1st step: Feature extraction 129 features defined based on consulting literatures are extracted. e.g. Decay speed of power Flute Piano not decayed decayed

4. Musical instrument identification using F0-dependent multivariate normal distribution 2nd step: Dimensionality reduction First, the 129-dimensional feature space is transformed to a 79-dimensional space by PCA (principal component analysis) (with the proportion value of 99%) Second, the 79-dimensional feature space is transformed to an 18-dimensional space by LDA (linear discriminant analysis)

4. Musical instrument identification using F0-dependent multivariate normal distribution 3rd step: Parameter estimation First, the F0-dependent mean function is approximated as a cubic polynomial.

4. Musical instrument identification using F0-dependent multivariate normal distribution 3rd step: Parameter estimation Second, the F0-normalized covariance is obtained by subtracting the F0-dependent mean from each feature. eliminating the pitch dependency

4. Musical instrument identification using F0-dependent multivariate normal distribution Final step: Using the Bayes decision rule The instrument w satisfying w = argmax [log p(X|w; f) + log p(w; f)] is determined as the result. p(X|w; f) … - A probability density function of the F0-dependent multivariate normal distribution. - Defined using the F0-dependent mean function and the F0-normalized covariance.

5. Experiments (Conditions) Database: A subset of RWC-MDB-I-2001 Consists of solo tones of 19 real instruments with all pitch range. Contains 3 individuals and 3 intensities for each instrument. Contains normal articulation only. The number of all sounds is 6,247. Using the 10-fold cross validation. Evaluate the performance both at individual-instrument level and at category level.

Piano Guitars Classical Guitar Ukulele Acoustic Guitar Strings Violin Viola Cello Brass Trumpet Trombone Saxophones Soprano Sax Alto Sax Tenor Sax Baritone Sax Double Reeds Oboe Faggoto Clarinet Air Reeds Piccolo Flute Recorder

5. Experiments (Results) Recognition rates: 79.73% (at individual level) 90.65% (at category level) Improvement: 4.00% (at individual level) 2.45% (at category level) Error reduction (relative): 16.48% (at individual level) 20.67% (at category level) Individual level (19 classes) Category level (8 classes)

5. Experiments (Results) The recognition rates of following 6 instruments were improved by more than 7%. Piano Trumpet Trom-bone Soprano Sax Baritone Sax Faggoto Piano: The best improved (74.21% a 83.27%) Because the piano has the wide pitch range.

6. Conclusions To cope with the pitch dependency of timbre in musical instrument identification, F0-dependent multivariate normal distribution is proposed. Experimental results: Recognition rate: 75.73% a 79.73% (Using 6,247 solo tones of 19 instruments) Future works: Evaluation against mixture of sounds Development of application systems using the proposed method.

Recognition rates at category level Piano Guitar Strings Brass Sax Dbl Rd. Clarinet Air Rd. Err Rdct 35% 8% 23% 33% 20% 13% 15% 8% Recognition rates for all categories were improved. Recognition rates for Piano, Guitar, Strings: 96.7%

Bayes vs k-NN PCA+LDA+Bayes achieved the best performance. We adopt Bayes (18 dim; PCA+LDA) Bayes (79 dim; PCA only) Bayes (18 dim; PCA only) 3-NN (18 dim; PCA+LDA) 3-NN (79 dim; PCA only) 3-NN (18 dim; PCA only) PCA+LDA+Bayes achieved the best performance. 18-dimension is better than 79-dimension. # of training data is not enough for 79-dim. The use of LDA improved the performance. LDA considers separation between classes.

Bayes vs k-NN PCA+LDA+Bayes achieved the best performance. We adopt Bayes (18 dim; PCA+LDA) Bayes (79 dim; PCA only) Bayes (18 dim; PCA only) 3-NN (18 dim; PCA+LDA) 3-NN (79 dim; PCA only) 3-NN (18 dim; PCA only) Jain’s guideline (1982): Having 5 to 10 times as many training data as # of dimensions seems to be a good practice. PCA+LDA+Bayes achieved the best performance. 18-dimension is better than 79-dimension. # of training data is not enough for 79-dim. The use of LDA improved the performance. LDA considers separation between classes.

Relationship between training data and dimension 14 dim. (85%) 18 dim. (88%) 20 dim. (89%) 23 dim. (90%) 32 dim. (93%) 41 dim. (95%) 52 dim. (97%) 79 dim. (99%) Hughes’s peaking phenomenon At 23-dimension, the performance peaked. Any results without LDA are worse than that with LDA.