Measuring the Similarity of Rhythmic Patterns Jouni Paulus, Anssi Klapuri Tampere University of Technology ISMIR 2002 8/3/2019 ISE 599 - by Frances Kao
Outline Background Proposal System Modules Simulation Results Pre-processing Pattern Segmenting Acoustic Features for Similarity Judgments Dynamic Time Warping (DTW) Simulation Results 8/3/2019 ISE 599 - by Frances Kao
Background Music is composed of patterns Measuring of rhythm similarity can be applied in musical database searching, and music context analysis There lacks pragmatic methods, which can quantify the dissimilarity into some computer model 8/3/2019 ISE 599 - by Frances Kao
Proposal A method for measuring the similarity of two rhythmic patterns Patterns are performed using arbitrary percussion sounds, and presented as two acoustic signals Four modules in the proposed system, including an optional pre-processing part. 8/3/2019 ISE 599 - by Frances Kao
System Overview Find recurring patterns Sin+noise model preprocessing Feature extraction Similarity measure DTW 8/3/2019 ISE 599 - by Frances Kao
System Modules (1) - Preprocessing Use sinusoids plus noise spectrum model to extract the stochastic parts of acoustic musical signal Sin+noise model preprocessing In order to suppress the sound of other instruments. Acoustic signal Noise residual 8/3/2019 ISE 599 - by Frances Kao
System Modules (2) – Pattern Segmenting Signal modeling Periodicity detection Selecting tatum, tactus and measure lengths To retain the metric percept of most signal, while reduce the amount of data needed to describe the signal. (DSP approach) Use a fundamental frequency estimation algorithm. The figure is from a soft rock genre example. Output is used to for musical meter estimation. A dip means a period. The vertical line is the actual tactus and measure periods. Tatum = time quantum. The shortest durational value. Tactus = beat, the tapping rate. Different distribution functions and conditional probability are applied. Some parameters are assigned. Authors mention about pattern phase, and also proposed a method to decide the temporal pattern starting point. Finding recurring patterns Acoustic signal without preprocessing Length of the Tactus (beat) Music measure 8/3/2019 ISE 599 - by Frances Kao
System Modules (3) – Feature Extraction Three features are calculated in each of 23ms time frames Loudness – mean square energy Brightness – spectral centroid (SC) Mel-frequency cepstral coefficients (MFCC) Normalization of those feature vectors SC: balance point of spectral power distribution, generally the expect value of magnitude spectrum. MFCC: discrete cosine transform to the log-energy output of mel-scaling filterbank. (detailed algorithm in the paper) Normalization: absolute feature values to relative. We model only the deviations from the average. Output is 2-D matrix. Matrix with normalized feature vectors Feature extraction Pattern boundaries Noise residual 8/3/2019 ISE 599 - by Frances Kao
System Modules (4) – Dynamic Time Warping (DTW) A dynamic programming algorithm, which has been applied in template matching in speech and image pattern recognition Allows flexibility in time alignment Has been used to handle musical variations in pattern matching successfully DTW can compare two patterns with different length. 8/3/2019 ISE 599 - by Frances Kao
System Modules (4) – DTW(cont’d) DTW looks for optimal path in the matrix of points representing time alignment of two dataset Starting from (0,0). Find the global cost (local cost + min global cost at adjacent cell) of each cell. The overall global distance is at top-right. 8/3/2019 ISE 599 - by Frances Kao
System Modules (4) – DTW(cont’d) Three different local path constraints are tried, and type 3 is used in the algorithm. With certain path constraints, the similarity measure is (global cost) C(n.m)= D(n,m) + min (cost of three path) (local cost) D(n,m) = sigma [ Weight * (F1(i, n)-F2(i, m))^2] DTW Feature vector sets of two acoustic signals Similarity measure 8/3/2019 ISE 599 - by Frances Kao
Simulation Results (1) – Meter estimation and pattern segmenting Estimated results vs. manual annotation “Correct” criteria: deviation within ±10% range Correct rate Data size (piece) Typical error Tactus 67% 365 Tactus period doubling Pattern (Measure) 77% 141 17% - doubling or halfing 6% - unclassified Pattern phase Around 50% Database contains 365 tactus-annotated pieces; 141 pattern-annotated pieces. 7 different genres. 8/3/2019 ISE 599 - by Frances Kao
Simulation Results (2) – Similarity Measurements Similarity of drum patterns Successfully identifies that same rhythms played with different instruments are similar Performance of different features Normalized spectral centroid is the best Experiments with complex music signals In-song similarity of patterns is higher Similarity of drum patterns: Another database: 9 different patterns, totally 14 deviations, and each of the 14 was played with 3 different drum sets. Performance of different features: Assign weight to different features. With SC alone, the result is more consistent. MFCC would prefer identical sound sets. Complex music - Only SC; with preprocessing. 8/3/2019 ISE 599 - by Frances Kao
Simulation Results (2) – Similarity Measurements (cont’d) Similarity of drum patterns: Swing is played with only two drum sets. This experiment is with only one feature (SC), and no preprocessing. The input pattern boundary is manual annotation. White area with no data is that the pattern length of two patterns differ by a factor >2. Complex music - Only SC; with preprocessing. 8/3/2019 ISE 599 - by Frances Kao