Measuring the Similarity of Rhythmic Patterns

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Building an ASR using HTK CS4706
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
Computational Rhythm and Beat Analysis Nick Berkner.
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Evaluation of the Audio Beat Tracking System BeatRoot By Simon Dixon (JNMR 2007) Presentation by Yading Song Centre for Digital Music
Rhythmic Similarity Carmine Casciato MUMT 611 Thursday, March 13, 2005.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Speaker Adaptation for Vowel Classification
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Dynamic Time Warping Applications and Derivation
Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.
A PRESENTATION BY SHAMALEE DESHPANDE
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
So far: Historical introduction Mathematical background (e.g., pattern classification, acoustics) Feature extraction for speech recognition (and some neural.
All features considered separately are relevant in a speech / music classification task. The fusion allows to raise the accuracy rate up to 94% for speech.
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
Speech Processing Laboratory
1 TEMPLATE MATCHING  The Goal: Given a set of reference patterns known as TEMPLATES, find to which one an unknown pattern matches best. That is, each.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
Implementing a Speech Recognition System on a GPU using CUDA
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
Recognition, Analysis and Synthesis of Gesture Expressivity George Caridakis IVML-ICCS.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Similarity Matrix Processing for Music Structure Analysis Yu Shiu, Hong Jeng C.-C. Jay Kuo ACM Multimedia 2006.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Chapter 5: Speech Recognition An example of a speech recognition system Speech recognition techniques Ch5., v.5b1.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
1/16 Dynamic Programming Carmine Casciato MUMT 611 Thursday March 31 st 2005.
Audio Processing Mitch Parry. Resource! Sound Waves and Harmonic Motion.
PATTERN COMPARISON TECHNIQUES
Carmine Casciato MUMT 611 Thursday, March 13, 2005
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
ARTIFICIAL NEURAL NETWORKS
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Carmine Casciato MUMT 611 Thursday, March 13, 2005
The Functional Space of an Activity Ashok Veeraraghavan , Rama Chellappa, Amit Roy-Chowdhury Avinash Ravichandran.
Shape matching and object recognition using shape contexts
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Sam Norman-Haignere, Nancy G. Kanwisher, Josh H. McDermott  Neuron 
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
Presenter: Shih-Hsiang(士翔)
Auditory Morphing Weyni Clacken
Presentation transcript:

Measuring the Similarity of Rhythmic Patterns Jouni Paulus, Anssi Klapuri Tampere University of Technology ISMIR 2002 8/3/2019 ISE 599 - by Frances Kao

Outline Background Proposal System Modules Simulation Results Pre-processing Pattern Segmenting Acoustic Features for Similarity Judgments Dynamic Time Warping (DTW) Simulation Results 8/3/2019 ISE 599 - by Frances Kao

Background Music is composed of patterns Measuring of rhythm similarity can be applied in musical database searching, and music context analysis There lacks pragmatic methods, which can quantify the dissimilarity into some computer model 8/3/2019 ISE 599 - by Frances Kao

Proposal A method for measuring the similarity of two rhythmic patterns Patterns are performed using arbitrary percussion sounds, and presented as two acoustic signals Four modules in the proposed system, including an optional pre-processing part. 8/3/2019 ISE 599 - by Frances Kao

System Overview Find recurring patterns Sin+noise model preprocessing Feature extraction Similarity measure DTW 8/3/2019 ISE 599 - by Frances Kao

System Modules (1) - Preprocessing Use sinusoids plus noise spectrum model to extract the stochastic parts of acoustic musical signal Sin+noise model preprocessing In order to suppress the sound of other instruments. Acoustic signal Noise residual 8/3/2019 ISE 599 - by Frances Kao

System Modules (2) – Pattern Segmenting Signal modeling Periodicity detection Selecting tatum, tactus and measure lengths To retain the metric percept of most signal, while reduce the amount of data needed to describe the signal. (DSP approach) Use a fundamental frequency estimation algorithm. The figure is from a soft rock genre example. Output is used to for musical meter estimation. A dip means a period. The vertical line is the actual tactus and measure periods. Tatum = time quantum. The shortest durational value. Tactus = beat, the tapping rate. Different distribution functions and conditional probability are applied. Some parameters are assigned. Authors mention about pattern phase, and also proposed a method to decide the temporal pattern starting point. Finding recurring patterns Acoustic signal without preprocessing Length of the Tactus (beat) Music measure 8/3/2019 ISE 599 - by Frances Kao

System Modules (3) – Feature Extraction Three features are calculated in each of 23ms time frames Loudness – mean square energy Brightness – spectral centroid (SC) Mel-frequency cepstral coefficients (MFCC) Normalization of those feature vectors SC: balance point of spectral power distribution, generally the expect value of magnitude spectrum. MFCC: discrete cosine transform to the log-energy output of mel-scaling filterbank. (detailed algorithm in the paper) Normalization: absolute feature values to relative. We model only the deviations from the average. Output is 2-D matrix. Matrix with normalized feature vectors Feature extraction Pattern boundaries Noise residual 8/3/2019 ISE 599 - by Frances Kao

System Modules (4) – Dynamic Time Warping (DTW) A dynamic programming algorithm, which has been applied in template matching in speech and image pattern recognition Allows flexibility in time alignment Has been used to handle musical variations in pattern matching successfully DTW can compare two patterns with different length. 8/3/2019 ISE 599 - by Frances Kao

System Modules (4) – DTW(cont’d) DTW looks for optimal path in the matrix of points representing time alignment of two dataset Starting from (0,0). Find the global cost (local cost + min global cost at adjacent cell) of each cell. The overall global distance is at top-right. 8/3/2019 ISE 599 - by Frances Kao

System Modules (4) – DTW(cont’d) Three different local path constraints are tried, and type 3 is used in the algorithm. With certain path constraints, the similarity measure is (global cost) C(n.m)= D(n,m) + min (cost of three path) (local cost) D(n,m) = sigma [ Weight * (F1(i, n)-F2(i, m))^2] DTW Feature vector sets of two acoustic signals Similarity measure 8/3/2019 ISE 599 - by Frances Kao

Simulation Results (1) – Meter estimation and pattern segmenting Estimated results vs. manual annotation “Correct” criteria: deviation within ±10% range Correct rate Data size (piece) Typical error Tactus 67% 365 Tactus period doubling Pattern (Measure) 77% 141 17% - doubling or halfing 6% - unclassified Pattern phase Around 50% Database contains 365 tactus-annotated pieces; 141 pattern-annotated pieces. 7 different genres. 8/3/2019 ISE 599 - by Frances Kao

Simulation Results (2) – Similarity Measurements Similarity of drum patterns Successfully identifies that same rhythms played with different instruments are similar Performance of different features Normalized spectral centroid is the best Experiments with complex music signals In-song similarity of patterns is higher Similarity of drum patterns: Another database: 9 different patterns, totally 14 deviations, and each of the 14 was played with 3 different drum sets. Performance of different features: Assign weight to different features. With SC alone, the result is more consistent. MFCC would prefer identical sound sets. Complex music - Only SC; with preprocessing. 8/3/2019 ISE 599 - by Frances Kao

Simulation Results (2) – Similarity Measurements (cont’d) Similarity of drum patterns: Swing is played with only two drum sets. This experiment is with only one feature (SC), and no preprocessing. The input pattern boundary is manual annotation. White area with no data is that the pattern length of two patterns differ by a factor >2. Complex music - Only SC; with preprocessing. 8/3/2019 ISE 599 - by Frances Kao