Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li.

Slides:



Advertisements
Similar presentations
Yansong Feng and Mirella Lapata
Advertisements

Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
Sentiment Analysis on Twitter Data
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Farag Saad i-KNOW 2014 Graz- Austria,
Improved TF-IDF Ranker
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data … Sentiment Analysis January 2012.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Search Engines and Information Retrieval
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Topics in AI: Applied Natural Language Processing Information Extraction and Recommender Systems for Video Games Supervised by Dr. Noriko Tomuro Fall –
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 7 Topic Spotting & Query Expansion Martin Russell.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Overview of Search Engines
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Mining and Summarizing Customer Reviews
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
More than words: Social networks’ text mining for consumer brand sentiments A Case on Text Mining Key words: Sentiment analysis, SNS Mining Opinion Mining,
Opinion mining in social networks Student: Aleksandar Ponjavić 3244/2014 Mentor: Profesor dr Veljko Milutinović.
Mining and Summarizing Customer Reviews Minqing Hu and Bing Liu University of Illinois SIGKDD 2004.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Search Engines and Information Retrieval Chapter 1.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Computational Linguistics WTLAB ( Web Technology Laboratory ) Mohsen Kamyar.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Microblogs: Information and Social Network Huang Yuxin.
Chapter 6: Information Retrieval and Web Search
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Blog Summarization We have built a blog summarization system to assist people in getting opinions from the blogs. After identifying topic-relevant sentences,
What Is Text Mining? Also known as Text Data Mining Process of examining large collections of unstructured textual resources in order to generate new.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Data Mining: Text Mining
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
CSC 594 Topics in AI – Text Mining and Analytics
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Subjectivity Recognition on Word Senses via Semi-supervised Mincuts Fangzhong Su and Katja Markert School of Computing, University of Leeds Human Language.
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.
LECTURE 10: TEXT AS DATA April 13, 2015 SDS 136 Communicating with Data Portions of this slide deck adapted from J.Chuang University of Washington.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Project Deliverable-1 -Prof. Vincent Ng -Girish Ramachandran -Chen Chen -Jitendra Mohanty.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Event Detection and Opinion Mining
Memory Standardization
University of Computer Studies, Mandalay
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Text Categorization Assigning documents to a fixed set of categories
Automatic Detection of Causal Relations for Question Answering
Text Mining & Natural Language Processing
Text Mining & Natural Language Processing
Introduction to Search Engines
Presentation transcript:

Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li

Contents Background of research Significance of research Problems and challenges Main tasks Literature review Methodology Improvement and innovation Experiment Result

Background Microblogs: Twitter  Twitter allows users to post short messages (i.e. maximum 140 characters) called “tweets” to communicate to each other  Information platform allow people to publish, spread and share information, knowledge and personal viewpoint.  Publish easily and conveniently Authors publish tweets, so they often publish blogs which are useless as well as good articles by using laptops and smart phones.

Significance Find useful information  Extract hot topic  Extract opinion Save plenty of time and energy  Do not have to read all the tweets, can quickly know the content.  Quickly find the opinion classification for the hot topic. Seek and track the important events Identify fashion trends Find popular products

Problems and challenges It is very hard for individuals to manually find interesting and popular things due to numerous posts We could not directly utilise the existing web and text mining methods to extract hot topics and opinions from mircoblogs because of unique characteristics of mircoblogs.

Problems and challenges  mass data At the end of 2009, Twitter had 75 million account holders, of which about 20% are active. There are approximately 2.5 million Twitter posts per day. While the majority posts are conversational or not very meaningful, about 3.6% of the posts concern topics of mainstream news.

Problems and challenges  Semi-structured and unstructured data there are no restrictions and rules on content and style to write posts on Microblogs.  A great variety of topics and views Authors may discuss the popular movies in one paragraph, and then express their opinions for the sports events in next paragraph in one article, which makes the topic of one tweet is not clear.

Main tasks Topic extraction Generate a complete and meaningful sentence to summary a popular current event (e.g London Olympics ) from relevant posts of blogs.

Main tasks Sentiment analysis find who support this topic and who oppose it from the comments

Literature review M. Chau, et al., "A blog mining framework," It Professional, vol. 11, pp , 2009.

Literature review M. Hutton, et al., "Summarizing microblogs automatically," presented at the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, California, 2010.

Literature review B. Sharifi, et al., "Experiments in Microblog Summarization," in Social Computing (SocialCom), 2010 IEEE Second International Conference on, 2010, pp

Methodology

Methodology 1 Text pre-processing  Part-of-speech (POS) tagging  Feature filtering  Stop Words list: and, or, of  Word Stemming: wants, wanted -> want  Synonyms and antonyms  Hypernyms and hyponyms: love -> emotion  TF IDF: term frequency * inverse document frequency  Vector Space Model  Similarity analysis

Methodology 2 Detect topics: clustering Method  K Means clustering,  SOM clustering  wordnet-based clustering 3 Detect opinion  Bayesian classification  SVM (support vector machine)

Improvement and innovation Using wordnet to improve clustering, assign the weight to wrods and generate topic sentence. WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. For example: Suppose the weight of “defeat” is 5, the weight of “overcome” is 3. They are in the same synset, so the weight of “defeat” is 8

Improvement and innovation Using clustering method to cluster the tweets before detect hot topics and opinions wordnet-based clustering Other’s work only calculate the word frequency

Improvement and innovation Consider Related factors  Word Frequency  Posts Occurrence time  Author: celebrity or have a lot of followers  Users’ Discrete Degrees: describe the discrete distribution level of users who release or forward posts  Keywords: some words in twitter are signed by using hashtag: #Happy Sweetest Day, #beijing, #Alex Cross

Improvement and innovation Grammar Analysis Noun: not changed. Verb: word stemming. Adjective and adverb: word stemming, analysed and processed by wordnet. Synonyms and antonyms For example: the love of hypernyms and hyponyms, entity——> abstract entity ——>abstraction ——> attribute ——> state ——> feeling ——> emotion ——> love Create subject set, verb set and object set to generate the simple sentence of the topic

Improvement and innovation 3-layer tree structure The first layer is subject set, the second layer is verb set, the last layer is object set Create subject set, verb set and object set to generate the simple sentence of the topic the basic sentence unit: SUBJECT plus VERB, or SUBJECT plus VERB plus OBJECT. Remember that the subject names what the sentence is about, the verb tells what the subject does or is, and the object receives the action of the verb. Although many other structures can be added to this basic unit, the pattern of SUBJECT plus VERB (or SUBJECT plus VERB plus OBJECT) can be found in even the longest and most complicated structures.

Improvement and innovation

Experiment Input : Australian Olympic shooters have had a tough morning. They lost - Dina Aspandiyarova finished 14th and Lalita Yauhleuskaya was 40th Germany defeats Aussies beach volleyball pair Bec Palmer and Louise Bawden in three sets Germany overcomes Aussies beach volleyball pair Bec Palmer and Louise Bawden in August. Aussies Palmer and Bawden take it to a deciding set in the beach volleyball against Germany Australian team lost the men's water polo to Italy 8-5. The Sharks play Kazakhstan next on Tuesday. They lost the men's water polo to Italy. They came back last night.

Experiment Result

Questions