A Game-Based Approach for Collecting Semantic Music Annotations Douglas Turnbull, Rouran Liu, Luke Barrington, Gert Lanckriet Computer Audition Lab UC San Diego ISMIR September 27, 2007
2 Introduction Automatic audio content analysis helps to organize, search, recommend, retrieve and describe huge - and growing - music collections Computer audition systems require significant amounts of high-quality semantic labels for audio content Collecting this data can be difficult, expensive, slow, boring and inaccurate If only we could get someone else to do it...
3 Sources of Semantic Information 10 2 Quality Quantity web mine id3 tags
4
5 Sources of Semantic Information 10 2 Quality Quantity CAL500 human tags web mine id3 tags
6
7 Sources of Semantic Information 10 2 Quality Quantity CAL500 Pandora Last.fm Web mine id3 tags Human Computation
8 Human Computation Many problems that are hard for computers can be easily solved by humans Many humans spend lots of time solving problems that are of little use How can we put these “gray cycles” to use?
9 Music Games Multi-player Music is social Music can be subjective Use group consensus... but allow personal variations Collaborative There are no “right answers” But agreed-on answers earn more points... Fun Need to excite players in order to collect lots of data Sacrifice data collection in favor of a compelling game
players have played at least 1 game 30,000 song-word associations collected ISMIR deadline
11 Evaluation Evaluate the quality of collected semantic annotations by using them to train an automatic music retrieval system [SIGIR07] 0.705CAL Words 0.609AllMusic 317 Words Retrieval ROC AreaDataset 0.609AllMusic 317 Words Retrieval ROC AreaDataset 0.661Listen Words 0.705CAL Words 0.609AllMusic 317 Words Retrieval ROC AreaDataset
A Game-Based Approach for Collecting Semantic Music Annotations Douglas Turnbull, Rouran Liu, Luke Barrington, Gert Lanckriet Computer Audition Lab UC San Diego ISMIR September 27, 2007
15 Annotating Music Web mining Cheap, collect lots of data Noisy data, not necessarily related to music content Surveys / Hand-labelling e.g. Music Genome Project, LastFM tags, CAL500 Reliable, can be tailored to applications Expensive, slow, boring, unfocused, free vocabulary Games Engage users, free, offer new, social music interaction Need lots of players!