Computational analysis on folk music of Cyprus Internal report Andreas Neocleous University of Groningen University of Cyprus April 2013
Objective of the study: Folk music classification - predict the class label of an unseen folk tune [1] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2009 Objective of the study: Global features models and Event features models for the task of folk song classification. Conclusions on the robustness of each feature model A global feature set summarizes a piece as a feature vector, which can be viewed as a data point in a feature space.
Folk music classification - predict the class label of an unseen folk tune [1] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2009 Global features: Alicante set of 28 global features – selected 12 features. 92 features computed by the program called Feature ANalysis Technology Accessing STatistics [2] – selected 37 features. The Jesser set - 40 pitch and duration statistics [3]. The McKay set of 101 global features,developed for the classification of orchestrated MIDI files [4].
Folk music classification - predict the class label of an unseen folk tune [1] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2009 Event features: Excerpt of the Scottish jig “With a hundred pipers”, illustrating the difference between global features and event features.
Event features: Classification with n-gram models Folk music classification - predict the class label of an unseen folk tune [1] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2009 Event features: Classification with n-gram models Used in probability, communication theory, computational linguistics 1) The probability of a piece is obtained by computing the joint probability of the individual events in the piece: 2) For each class a separate model is built. 3) The predicted class of a piece is the class whose model generates the piece with the highest probability.
Folk music classification - predict the class label of an unseen folk tune [1] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2009 Europa-6 collection
Folk music classification - predict the class label of an unseen folk tune [1] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2009 Classification accuracies of the global feature sets on the Europa-6 collection, obtained by 10-fold cross validation. With a pentagram model of a linked viewpoint of melodic interval and duration, the obtained classification accuracy is 72.7%
Objective of the study: Folk music classification - predict the class label of an unseen folk tune [5] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2012 Objective of the study: This investigation of the performance of three string methods Compare the performance of the string methods with the global feature models and event feature models Conclusions on the robustness of each feature model String methods rely on a sequential music representation which views a piece as a string of symbols. A pairwise similarity measure between the strings is computed and used to classify unlabeled pieces.
Folk music classification - predict the class label of an unseen folk tune [5] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2012 Excerpt of the Scottish jig “With a hundred pipers”, illustrating the difference between global features, event features and the string representation.
String methods: (1) Sequence alignment Folk music classification - predict the class label of an unseen folk tune [5] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2012 String methods: (1) Sequence alignment Estimation of the minimal cost of a transformation of one sequence into the other by means of edit operations, such as substition, insertion and deletion. Often referred to as “edit distance”, which is in fact the Levenshtein distance.
String methods: (2) Compression based distance Folk music classification - predict the class label of an unseen folk tune [5] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2012 String methods: (2) Compression based distance K(x) is the Kolmogorov complexity of string x K(x|y) is the conditional complexity of string x given string y. How much information is not shared between the two strings relatively to the information that they could maximally share.
String methods: (3) String subsequence kernel (SSK) Folk music classification - predict the class label of an unseen folk tune [5] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2012 String methods: (3) String subsequence kernel (SSK) Computes a similarity measure between strings based on the number and form of their common subsequences. Given any pair of two strings, SSK will find all common subsequences of a specified length k, also allowing non-contiguous matches, although these are penalized with a decay factor .
String methods: (3) String subsequence kernel (SSK) Folk music classification - predict the class label of an unseen folk tune [5] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2012 String methods: (3) String subsequence kernel (SSK) SSK(k = 2, ‘ismir’,‘music’) = λ^5 + λ^6,
Folk music classification - predict the class label of an unseen folk tune [5] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2012 Dance-9 collection:
Folk music classification - predict the class label of an unseen folk tune [5] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2012 Results:
Folk songs - - + + Repeating parts - stanzas Segmentation Popular music Folk music - - + + Complex structure intro Verse Bridge chorus Professional productions Similar stanzas Repetitions Inaccurate singing of performers Variable tempo throughout the song Presence of noise Forget parts of lyrics or melody Switch to speaking
Finding repeating stanzas in folk songs [6] Bohak C. and Marolt M., 2012
Preprocessing Detecting vocal pauses According to signal energy Finding repeating stanzas in folk songs [6] Bohak C. and Marolt M., 2012 Preprocessing Detecting vocal pauses According to signal energy According to signal envelope According to relative difference of pitch
Finding repeating stanzas in folk songs [6] Bohak C. and Marolt M., 2012 Preprocessing The input audio signal is mixed from stereo to a single channel The sample rate reduced to 11025 Hz The amplitude is normalized
Preprocessing Detecting vocal pauses According to signal energy Finding repeating stanzas in folk songs [6] Bohak C. and Marolt M., 2012 Preprocessing Detecting vocal pauses According to signal energy Energy is below an experimentally determined threshold. Energy is computed on 200 ms long frames and the threshold is set to , where is the average energy of the signal. Consequent frames with energy values below the specified threshold are merged into one vocal pause. Vocal pauses shorter than times the average detected vocal pause length are ignored Parameters ξ1 and ξ2 were determined experimentally
Preprocessing Detecting vocal pauses According to signal envelope Finding repeating stanzas in folk songs [6] Bohak C. and Marolt M., 2012 Preprocessing Detecting vocal pauses According to signal envelope The amplitude envelope of a signal is obtained by filtering the full-wave rectified signal using 4th order Butterworth filter with a normalized cutoff frequency of 0:001 Vocal pauses are parts of the signal where the envelope falls below the threshold ξ3 = -60dB
Preprocessing Detecting vocal pauses Finding repeating stanzas in folk songs [6] Bohak C. and Marolt M., 2012 Preprocessing Detecting vocal pauses According to relative difference of pitch Detection of fundamental frequency (YIN algorithm [7]). Smooth fundamental frequencies with a low-pass filter. Parts of the signal that differ more than 20 semitones from the average signal frequency are selected as vocal pauses. endings of vocal pauses are used as candidates for stanza Beginnings.
Finding candidates for stanza boundaries Finding repeating stanzas in folk songs [6] Bohak C. and Marolt M., 2012 Finding candidates for stanza boundaries Calculate 12 dimensional chromagrams A distance function between each pair of 12 dimensional chroma vectors (RMS) distance Where: c is the distance function between two chroma vectors a and b, ai and bi are i-th elements of chroma vectors
Finding candidates for stanza boundaries Finding repeating stanzas in folk songs [6] Bohak C. and Marolt M., 2012 Finding candidates for stanza boundaries The defined distance function is used by the Dynamic Time Warping (DTW) algorithm for calculation of the total distance between the selected stanzas as: Where: p1 and p2 are candidate stanza beginnings. p1(l) and p2(l) are the corresponding chroma vectors The index l takes values from the first (1) to the last (L) chroma vector in the selected audio part.
Finding candidates for stanza boundaries Finding repeating stanzas in folk songs [6] Bohak C. and Marolt M., 2012 Finding candidates for stanza boundaries The DTW is used for calculating the total distance between two stanza candidates: where cmin is the minimal cost between parts d0 and dj .
Finding candidates for stanza boundaries Finding repeating stanzas in folk songs [6] Bohak C. and Marolt M., 2012 Finding candidates for stanza boundaries The chroma vectors are circularly shifted up to two semitones up and down to compensate for the outof-tune singing. We then select the lowest DTW distance as: where represents a rotation of chroma vectors for the selected stanza candidate from two semitones downwards to two semitones upwards in steps of one semitone.
Finding candidates for stanza boundaries Finding repeating stanzas in folk songs [6] Bohak C. and Marolt M., 2012 Finding candidates for stanza boundaries Define a fitness function for scoring the candidate stanza beginnings ki as:
Finding candidates for stanza boundaries Finding repeating stanzas in folk songs [6] Bohak C. and Marolt M., 2012 Finding candidates for stanza boundaries In the defined fitness function, peaks represent the most likely stanza beginnings, so all peaks above a global threshold, corresponding to the average value of the fitness function, are picked as the actual boundaries between stanzas.
Finding repeating stanzas in folk songs [6] Bohak C. and Marolt M., 2012
Music of Cyprus Fones Dances Religious Weak category with no sub-categories Κarpasitissa Αvgoritissa Paphididji Lyshiotissa Μariniotou Τyllirkotissa Ishia Κomitissa Αkathkiotissa Nekalisti Pegiotoua Zeimpekikos Kartsilamas Kalamatianos Syrtos Arapies
Music of Cyprus Preprocessing Fundamental frequency detection (YIN). Eliminate noise with an aperiodicity threshold. Eliminate silence with a loudness threshold. Octave/fifth errors: A common problem of frequency detection algorithms is the wrong octave detection, also referred to as octave errors, which implies that the fundamental frequency is confused with its multiples and/or other harmonics. To correct these errors a moving window was applied to detect and correct unexpected melodic jumps in the estimated pitch trajectory. Smoothing
Music of Cyprus Preprocessing Pitch track before pre-processing
Music of Cyprus Preprocessing Pitch track after pre-processing
Music of Cyprus Segmentation Detection of vocal pauses
Music of Cyprus Segmentation Detection of all peaks
Music of Cyprus Segmentation Detection of notes based on the difference of the peaks
Music of Cyprus Repetition
[1] Hillewaere R. , Manderick B. , Conklin D [1] Hillewaere R., Manderick B., Conklin D., Global feature versus event models for folk song classification. 10th International Society for Music Information Retrieval Conference (ISMIR), 2009. [2] D. Mullensiefen: FANTASTIC: Feature ANalysis Technology Accessing STatistics (In a Corpus): Technical Report v0.9, 2009. [3] B. Jesser: Interaktive Melodieanalyse, Peter Lang, Bern, 1991. [4] C. McKay and I., Fujinaga., Automatic genre classification using large high-level musical feature sets. Proceedings of the International Conference on Music Information Retrieval, pp. 525–530, 2004. [5] Hillewaere R., Manderick B., Conklin D., String methods for folk tune genre Classification. 13th International Society for Music Information Retrieval Conference (ISMIR), 2012. [6] Bohak C. and Marolt M., Finding repeating stanzas in folk songs. 13th International Society for Music Information Retrieval Conference [7] Cheveigne A. and a Kawahara H. YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4):1917–1930, 2002.