MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.

Slides:



Advertisements
Similar presentations
Shapelets Correlated with Surface Normals Produce Surfaces Peter Kovesi School of Computer Science & Software Engineering The University of Western Australia.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
ACHIZITIA IN TIMP REAL A SEMNALELOR. Three frames of a sampled time domain signal. The Fast Fourier Transform (FFT) is the heart of the real-time spectrum.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Tetsuro Kitahara* Masataka Goto** Hiroshi G. Okuno*
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
Classifying Motion Picture Audio Eirik Gustavsen
Presented by Zeehasham Rasheed
Wavelet-based Coding And its application in JPEG2000 Monia Ghobadi CSC561 project
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Introduction to machine learning
Representing Acoustic Information
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
ECE 8443 – Pattern Recognition ECE 3163 – Signals and Systems Objectives: Pattern Recognition Feature Generation Linear Prediction Gaussian Mixture Models.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Sound Applications Advanced Multimedia Tamara Berg.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Lecture 1 Signals in the Time and Frequency Domains
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Instrument Recognition in Polyphonic Music Jana Eggink Supervisor: Guy J. Brown University of Sheffield
Isolated-Word Speech Recognition Using Hidden Markov Models
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Multiresolution STFT for Analysis and Processing of Audio
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Wireless and Mobile Computing Transmission Fundamentals Lecture 2.
Implementing a Speech Recognition System on a GPU using CUDA
1 Detection and Discrimination of Sniffing and Panting Sounds of Dogs Ophir Azulai(1), Gil Bloch(1), Yizhar Lavner (1,2), Irit Gazit (3) and Joseph Terkel.
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
BARCODE IDENTIFICATION BY USING WAVELET BASED ENERGY Soundararajan Ezekiel, Gary Greenwood, David Pazzaglia Computer Science Department Indiana University.
Eyes detection in compressed domain using classification Eng. Alexandru POPA Technical University of Cluj-Napoca Faculty.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
CCN COMPLEX COMPUTING NETWORKS1 This research has been supported in part by European Commission FP6 IYTE-Wireless Project (Contract No: )
2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Spectral centroid PianoFlute Piano Flute decayed not decayed F0-dependent mean function which captures the pitch dependency (i.e. the position of distributions.
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Performance Comparison of Speaker and Emotion Recognition
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
1 Automatic Music Style Recognition Arturo Camacho.
Analysis of Traction System Time-Varying Signals using ESPRIT Subspace Spectrum Estimation Method Z. Leonowicz, T. Lobos
Musical Genre Categorization Using Support Vector Machines Shu Wang.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.
Adaboost (Adaptive boosting) Jo Yeong-Jun Schapire, Robert E., and Yoram Singer. "Improved boosting algorithms using confidence- rated predictions."
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
University of Rochester
ARTIFICIAL NEURAL NETWORKS
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Classification and numbering of teeth in dental bitewing images
Collaborative Filtering Nearest Neighbor Approach
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
EE513 Audio Signals and Systems
7.1 Introduction to Fourier Transforms
Analysis of Audio Using PCA
Audio and Speech Computers & New Media.
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
Lecture 16. Classification (II): Practical Considerations
Measuring the Similarity of Rhythmic Patterns
Presentation transcript:

MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley

Motivation of MIIS Musical instrument identification plays an important role in musical signal indexing and database retrieval. People can search music by the musical instruments instead of the type or the author For instance, user is able to query ‘find piano solo parts of a musical database’.

Introduction Bass Drum Piano Saxophone Identification results Musical Mixtures Musical instruments

Structure of MIIS Functional Components DUET algorithm: Separate the input musical mixture into sources Feature Extraction: Extract features of each source Classification: Implement classifier on testing source and find out the class it belongs to Input Mixture X(n) DUET algorithm Separation Estimated Sources Feature Extraction Classification Results

DUET algorithm Time-Frequency representation: and are representations in time-frequency domain, i.e. Short-time Fourier Transform, Modified Cosine Discrete Transform. Mixing parameters computation: Time-frequency points are labeled with Mask construction: Mask equals deciding set,which could be achieved by grouping the time-frequency point with the same label Source estimation is the time-frequency representation of one source. Time-domain conversion Convert each to in time domain

Feature Extraction Mel-Frequency Cepstral Coefficient (MFCC) Relationship between Mel and Hertz Spectral Rolloff It is calculated by summing up the power spectrum samples until the desired percentage (threshold) of the total energy is reached. Bandwidth Defined as the width of the range of frequencies that the signal occupies. Root Mean Square RMS features are used to detect boundaries among musical instruments Spectral Centroid Correlates strongly with the subjective qualities of “brightness” or “sharpness”. Zero Crossing Rate A simple measure of the frequency content of a signal

Classification K-Nearest Neighbor Nonparametric classifier Large storage required X Class a Class b Class c y x

Experiments Musical Instruments Database Database : Downloaded from University of Iowa website. Mixtures are composed by isolated notes. Training set: Includes 18 classes musical instruments Testing set: Choose 3 to 5 instruments to generate mixtures The instruments to be tested: Alto Saxophone Bassoon Double Bass Flute Viola

Experiments of three groups Group 1Group 2Group 3 No. of Sources 345 Percentage correct 80%60%48% For each group, five mixtures are tested and the result of each group is listed as follows:

Example SourceSDROriginal Source Estimated Source Result AltoSaxophone.C4B correct Bassoon.C3B correct Double Bass.D2B correct Estimated Sources Original Sources

Results discussion Without MISS, the recognisation percentage of each source in 18 classes is 1/18 which is about 5.5%. The worst case in our experiments is group 3 where each mixture consists five sources. The percentage is 48%. The less sources mixtures have, the higher percentage system performs. More sources introduce more interferences among each other.

Conclusion MISS is a system able to identify each musical instrument in a musical mixture. Three functional components are introduced:  DUET algorithm  Feature Extraction  Classification Experiments of three groups, which is fifteen mixtures in total have been tested. Correct percentages are 80%,60%and 48% respectively. More features could be extracted such as features of MPEG7 A more adaptive mask could help overcoming interferences among sources.