The Greek Audio Dataset Dimos Makris, Katia Lida Kermanidis, and Ioannis Karydis Dept. Of Informatics, Ionian University
Music Information Retrieval Musical data acoustic, i.e. sound recordings symbolic, i.e. sheet music associated information to the musical content metadata, social tags Required for testing efficiency & effectiveness of the methods comparison of existing methods show improvement
Nature of music Highly artistic Local music ornamentation personal expression during performance adaptation Local music numerous differences from pop mainstream different instruments & rhythms Methods’ application results not always intuitive MIR methods require all kinds of music
Intellectual property Existing datasets Intellectual property issues
The Greek Audio Dataset Freely-available collection of Greek musical data for the purposes of MIR Each song contains audio features immediate use in MIR tasks lyrics of song manually annotated mood & genre labels YouTube link further feature extraction
Greek music Greek musical tradition Greek contemporary music diverse and celebrated Greek contemporary music Greek traditional music Byzantine music Greek traditional (folk) music Combination of songs, tempos and rhythms from a litany of Greek regions basis for the Modern Greek traditional music scene
Dataset creation process Selection of the music tracks aiming to make the set balanced Sources from personal CD collections Audio feature extraction jAudio Lyrics various sources YouTube link selection criteria number of views, number of responses, best audio quality, audio similarity to CD
Genre classification Greek musical culture oriented tags Rembetiko, Laiko, Entexno, Modern Laiko, Rock, Hip-Hop/R & B, Pop, Alternative Genre assignment Listening tests per song Class # of tracks Rempetiko 65 Laiko 186 Entexno 195 Modern laiko 175 Rock Hip/hop 60 Pop 63 Alternative 61
Mood classification 5 annotators per song Thayer model - 16 Mood taxonomies valence & arousal 2-dimensional emotive plane into 4 parts by having positive/high and negative/ low values respectively Arousal -> linked to energy moods range from “angry” & “exciting” to “tired” & “serene” Valence -> linked to tension moods range from “sad” & “upset” to “happy” & “content”
GAD content 1000 songs For each song A total of 277 unique artists its lyrics a YouTube link A total of 277 unique artists The accumulated lyrics contain: 32024 lines 143003 words 1397244 characters
GAD availability http://di.ionio.gr/hilab/gad two formats HDF5 CSV efficient for handling the heterogeneous types of information audio features in variable array lengths names as strings easy for adding new type of features CSV compatible for processing with Weka, RapidMiner & similar data mining platforms
Acoustic features Based on timbre, rhythm & pitch. Includes derived features application of meta-features to primary features Timbral Texture Features used to differentiate mixture of sounds based on their instrumental compositions FFT, MFCC, Spectrum, Method of Moments (MoM) Rhythm Features: used to characterize regularity of rhythm, beat, tempo Beat, Freq, Beat Histogram Pitch Content Features: describe the distribution of pitches Linear Predictive Coding (LPC) MoM: Method Of Moments. LPC: Linear Predictive Coding
Centroid & Rolloff for genres Rock and Entexno
Future Direction of the Dataset Inclusion of user generated tags from tagging games or web-services Increase of labels for mood and genre more users Expansion of the number of songs include more & latest top-chart songs Genres’ refinement addition of detailed labels with descriptions Content balancing in terms of moods and/or genres, Inclusion of scores Development of programming language wrappers
The Greek Audio Dataset Thank you for your attention