Ionian University Department of Informatics Introducing the Greek Music Dataset Dimos Makris, Ioanis Karydis, and Spyros Sioutas
Music Information Retrieval (MIR) MIR refers to the interdisciplinary research of retrieving information from music. Involves musicology, psychology, academic music study, signal processing and machine learning. Applications: Recommender systems, Track separation and instrument recognition, Automatic music transcription (MIDI), Automatic categorization and Music generation.
Why we need musical data? What is a dataset? Collection of sound recordings, sheet music, lyrics as well as associated information to the musical content (i.e. metadata, social tags, etc) Why we need them? The requirement to experiment with the methods on real musical data is central. Allow researchers to compare and contrast their methods by testing them on commonly available collection of musical data.
Greek Music on MIR MIR requires data for all kinds of music. Although a number of widely used datasets do exist most of these are collections of mainstream English language music. Local music has numerous differences (different instruments and rhythms). Unique Genres like “Ρεμπέτικο”, “Λαϊκό” and “Έντεχνο”. Does not start from scratch. It is a continuation and extension of the Greek Audio Dataset [1]. [1] D. Makris, K. Kermanidis, and I. Karydis. The greek audio dataset. In Articial Intelligence Applications and Innovations, volume 437 of IFIP Advances in Information and Communication Technology, pages 165-173. Springer Berlin Heidelberg, 2014
Related Work regarding Datasets The construction of a music dataset is a tedious and demanding effort. Avoid containing music data but only metadata and information (large data, copyrights).
Contribution and Motivation The Greek Music Dataset 1400 songs Audio, lyrics & symbolic features for immediate use in MIR tasks Manually annotated labels pertaining to mood & genre styles of music. Metadata Manually selected MIDI files (currently available for 500 of the tracks). Manually selected link to a performance / audio content in YouTube is provided for further research
Greek Music Dataset vs Greek Audio Dataset +400 songs focused on traditional unique Greek genres 500 MIDI files with symbolic features sets Manually Multi Label Annotation on Genre tags Updated Audio Feature sets Lyric Feature sets Last FM ID tags for further extraction
Gathering the Content Audio: Broad range of Greek music, from traditional to modern. Removed 100 songs and added 500 new songs. Sources from best YouTube Links(Number of views, number of responses, best audio quality). Lyrics: Retrieved among various sources mainly from stixoi.info [2] Matches with the audio performance. Symbolic: MIDI files were collected from Greek Midi Database [3]. Preprocessed and checked manually for the music & performance's precise correspondence. [2] stixoi info: Greek lyrics for songs and poetry, http://www.stixoi.info/ [3] Greek Midi Database: George's Greek MIDI Site, http://http://www.greekmidi.com/
Genre Annotation Greek genre tags were taken from MyGreek.fm [4]. Greek musical culture oriented tags Rembetiko, Laiko, Entexno, Modern Laiko, Rock, Hip-Hop/R & B, Pop, Alternative Multi Label Assignment. Listening tests per song 2421 annotations 521 single label annotations from the 8 genre classes 748 double label annotations from 17 different combinations 119 triple label annotations from 15 different combinations 12 quad label annotations from 8 different combinations [4] Mygreek.fm: The biggest collection of Greek music on the Internet, with different styles and genres, http://www.mygreek.fm/
Mood Annotation Single Label Annotation. Measuring Valence (A-D) & Arousal (1-4) Mood information: The model of Thayer is adopted. 2 dimensional emotive plane with Valence (tension) and Arousal (energy). “Arousal" is the level/amount of physical response and “Valence" is the emotional "direction" of that emotion.
Audio Features Extraction from CD quality wave files (44,1KHz, 16 bit) using Marsyas software 454 Features divided in 4 sets. Timbral Texture Feature Sets Standard Timbral Set (68 features): Most commonly used feature set (MFCCs, Zero Crossing, Spectral features). Other Timbral Features (264 features): Combination which focus in magnitude spectrum. Rhythm Features Beat Histogram (18 features): A vector containing the most commonly rhythmic features (detecting and measuring peaks, bpm etc.) Pitch (Chroma) Content Features Chroma Set (104 features): Combination of Chroma and Linear Prediction Cepstral Coeficients (LPC) features. Mel-frequency cepstral coefficients είναι ένα σύνολο από αντιληπτά χαρακτηριστικά που έχουν χρησιμοποιηθεί ευρέως στην αναγνώριση ομιλίας Method of Moments consists of the first five statistical moments of the spectrograph
Lyric Features Selection of 5 feature sets based on the bag-of-words (BOW) model from Greek song lyrics. The most popular BOW features are various unigram, bigram, and trigram representations Metrics: GMD includes TF-IDF term weighting and TF (Term Frequency). 1. A unigram set of the top 250 words with the most occurrences. Includes “Function Words”. 2. A unigram set of the top 60 words with the most occurrences without counting the Function Words. 3. A bigram set of the top 100 bigram words with the most occurrences. 4. A trigram set of the top 60 trigram words with the most occurrences. 5. A unigram set of the top 60 function words with the most occurrences.
Symbolic Features High Level Features. Emphasize on the musical characteristics. Examples: Instruments present, melodic contour, chord frequencies and rhythmic density. More powerful than Audio Features. Rare use due to the lack of existing symbolic datasets. Feature extraction was done by Music21. 2 different feature sets. jSymbolic Set (78 features): It includes features regarding the instrumentation, rhythm, dynamics (loudness), chords and detecting melody variations or patterns. Native Music21 Set (17 features): Specialized and very high-level feature set. It requires a high level of musical harmony knowledge
Available Data + Metadata The GMD additionally includes for 621 of its tracks their equivalent Last.fm id aiming to facilitate information collection using the Last.fm's. Retrieve more information (social tags). The collection of the ids was made by manual processing GMD offers YouTube Links, lyrics and MIDI files for further feature extraction.
Dataset format The data is available in two formats, HDF5 and CSV. HDF5: Efficient for handling the heterogeneous types of information such as audio features in variable array lengths, names as strings, and easy for adding new types of features. Following Million Song Dataset (MSD) structure. CSV: Compatible for processing with Weka, RapidMiner and other similar data mining platforms. GMD provides the commonly used, on the discipline of MIR, audio feature sets in separate CSV files. Available download from the webpage of the Informatics in Humanistic and Social Sciences Lab http://di.ionio.gr/hilab/gmd
Future Directions The addition of the remaining tracks' symbolic information. MIDI and Audio Alignment. Incorporation of contextual information for each track from social networks. Addition of Last-FM ID tags (or similar) for further social tags extraction. Experimentation on data mining tasks using the dataset.
The Greek Music Dataset Thank you for your attention!