Data Mining and Text Analytics in Music Audi Sugianto and Nicholas Tawonezvi.

Slides:



Advertisements
Similar presentations
Sentiment Analysis on Twitter Data
Advertisements

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Distant Supervision for Emotion Classification in Twitter posts 1/17.
LYRIC-BASED ARTIST NETWORK Derek Gossi CS 765 Fall 2014.
Multimedia Answer Generation for Community Question Answering.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
© author(s) of these slides including research results from the KOM research network and TU Darmstadt; otherwise it is specified at the respective slide.
Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge Date: 2011/11/21 Source: Claudiu S. Firan (CIKM’10)
TÍTULO GENÉRICO Concept Indexing for Automated Text Categorization Enrique Puertas Sanz Universidad Europea de Madrid.
1 / 22 Issues in Text Similarity and Categorization Jordan Smith – MUMT 611 – 27 March 2008.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Text Classification With Support Vector Machines
Berenzweig - Music Recommendation1 Music Recommendation Systems: A Progress Report Adam Berenzweig April 19, 2002.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Using IR techniques to improve Automated Text Classification
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Scalable Text Mining with Sparse Generative Models
WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
Keyphrase Extraction in Scientific Documents Thuy Dung Nguyen and Min-Yen Kan School of Computing National University of Singapore Slides available at.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
TEXT CLASSIFICATION USING MACHINE LEARNING Student: Hung Vo Course: CP-SC 881 Instructor: Professor Luo Feng Clemson University 04/27/2011.
Structure Discovery of Pop Music Using HHMM E6820 Project Jessie Hsu 03/09/05.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization Shubhanshu Mishra 1, Jana Diesner 1, Jason Byrne 2, Elizabeth.
Emotions from text: machine learning for text-based emotion prediction Cecilia Alm, Dan Roth, Richard Sproat UIUC, Illinois HLT/EMPNLP 2005.
Semi-automatic Product Attribute Extraction from Store Website
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Nuhi BESIMI, Adrian BESIMI, Visar SHEHU
Ping-Tsun Chang Intelligent Systems Laboratory Computer Science and Information Engineering National Taiwan University Combining Unsupervised Feature Selection.
Objectives: Terminology Components The Design Cycle Resources: DHS Slides – Chapter 1 Glossary Java Applet URL:.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.
Subjectivity Recognition on Word Senses via Semi-supervised Mincuts Fangzhong Su and Katja Markert School of Computing, University of Leeds Human Language.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Improving Music Genre Classification Using Collaborative Tagging Data Ling Chen, Phillip Wright *, Wolfgang Nejdl Leibniz University Hannover * Georgia.
Machine Learning in CSC 196K
Spam Detection Kingsley Okeke Nimrat Virk. Everyone hates spams!! Spam s, also known as junk s, are unwanted s sent to numerous recipients.
Music Emotion Classification: A Fuzzy Approach
BASS TRACK SELECTION IN MIDI FILES AND MULTIMODAL IMPLICATIONS TO MELODY gPRAI Pattern Recognition and Artificial Intelligence Group Computer Music Laboratory.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY & ROCCHIO CLASSIFICATION Kezban Demirtas
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
A content-based System for Music Recommendation and Visualization of User Preference Working on Semantic Notions Dmitry Bogdanov, Martin Haro, Ferdinand.
Artist Identification Based on Song Analysis
Data Mining 101 with Scikit-Learn
Introduction to Music Information Retrieval (MIR)
Finding Clusters within a Class to Improve Classification Accuracy
A Unifying View on Instance Selection
Presented by: Prof. Ali Jaoua
iSRD Spam Review Detection with Imbalanced Data Distributions
Review-Level Aspect-Based Sentiment Analysis Using an Ontology
Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Data Mining and Text Analytics in Music Audi Sugianto and Nicholas Tawonezvi

Overview  Introduction  Building a ground truth set  Experiments  Results

Introduction  Purpose: Music mood classification through lyric text mining approaches  MIR (Music Information Retrieval)  Use of Audio Datasets:  AMC (Audio Mood Classification)  USPOP, USCRAP, etc.  Use of Social tags from last.fm Challenges:  Natural subjectivity of music  Human perspectives on mood

Generating Ground Truth Data Collection  Combination of in-house and public audio tracks  Collect songs with at least one social tag from last.fm  Lyrics can be gathered from mainly Lyricwiki.org.  Use of Lingua to ensure data quality  Finalise songs that have both correct lyrics and tags

Generating Ground Truth Algorithms, Resources and Techniques  WordNet-Affect  Used to filter out junk tags  Assignment of labels to concepts (emotions, moods, responses)  Use of human expertise to identify mood-related words in the music domain  Affective Aspect  Judgemental Tags  Ambiguous Meanings  Use of WordNet to categorise into groups based on synonyms.  Use of music experts to merge groups by musical similarity

Generating Ground Truth Selecting Songs Approaches:  Tag identification  Lyric counts  Multi-label Classification

Mood Categories and Song Distributions

Experiments Evaluation Measures and Classifiers  Use of 10-fold Cross Validation  Break data into 10 sets of size n/10.  Train on 9 datasets and test on 1.  Repeat 10 times and take a mean accuracy.  Classification with Support Vector Machines (SVM)  Algorithms to analyse data and recognise patterns

Experiments Lyric Preprocessing Facts:  Repetitions of words and sections: - Lack of verbatim transcripts  Consisting of sections:  Intro, interlude, verse, etc. in the annotations  Notes about song and instrumentation Possible solution:  Identifying and converting repetition and annotation patterns to actual repeated segments

Experiments Lyrics Features  Common text classification tasks:  Bag-of-words (BOW)  Collection of Unordered words  Part-of-Speech (POS)  Use of Stanford Tagger  Function Words (the, a, etc.)  Assigning of values:  Frequency  Tf-idf weight  Normalised-frequency  Boolean Value

Experiments Stemming  Stemming – Merging words with same morphological roots  Snowball Stemmer  Irregular nouns and verbs as inputs

Results  Text categorisation provides dimensionality and good generalisability POS Boolean representation is poorer because of high content of POS types in lyrics  Content words are more useful in mood classification 10th International Society for Music Information Retrieval Conference (ISMIR 2009)

Acknowledgement Hu, X. et al Lyric Text Mining in Music Mood Classification. International Music Information Retrieval Systems Evaluation Laboratory University of Illinois at Urbana- Champaign. [Online]. Pp [Accessed 6 December 2013]. Available from ː Training and Testing Data Sets Training and Testing Data Sets. [Online]. [Accessed 5 December 2013]. Available from: Kohavi, Ron (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence 2 (12): 1137–1143.(Morgan Kaufmann, San Mateo, CA) D. Ellis, A. Berenzweig, and B. Whitman: The USPOP2002 Pop Music Data Set. Available fromː

Software & Additional Resources – Statistical language identifier irregular verb list - irregular noun list POS Tagger - Mood Categories & Song Distributions Tests&pid=1087Tests&pid=1087 – Performance identifier