Download presentation
Presentation is loading. Please wait.
Published byLambert Glenn Modified over 9 years ago
1
Deriving Topics and Opinions from Microblogs Feng Jiang Supervisors: Jixue Liu & Jiuyong Li
2
Contents Background of research Significance of research Problems and challenges Main tasks Literature review Methodology Improvement and innovation Experiment Result
3
Background Microblogs: Twitter Twitter allows users to post short messages (i.e. maximum 140 characters) called “tweets” to communicate to each other Information platform allow people to publish, spread and share information, knowledge and personal viewpoint. Publish easily and conveniently Authors publish tweets, so they often publish blogs which are useless as well as good articles by using laptops and smart phones.
4
Significance Find useful information Extract hot topic Extract opinion Save plenty of time and energy Do not have to read all the tweets, can quickly know the content. Quickly find the opinion classification for the hot topic. Seek and track the important events Identify fashion trends Find popular products
5
Problems and challenges It is very hard for individuals to manually find interesting and popular things due to numerous posts We could not directly utilise the existing web and text mining methods to extract hot topics and opinions from mircoblogs because of unique characteristics of mircoblogs.
6
Problems and challenges mass data At the end of 2009, Twitter had 75 million account holders, of which about 20% are active. There are approximately 2.5 million Twitter posts per day. While the majority posts are conversational or not very meaningful, about 3.6% of the posts concern topics of mainstream news.
7
Problems and challenges Semi-structured and unstructured data there are no restrictions and rules on content and style to write posts on Microblogs. A great variety of topics and views Authors may discuss the popular movies in one paragraph, and then express their opinions for the sports events in next paragraph in one article, which makes the topic of one tweet is not clear.
8
Main tasks Topic extraction Generate a complete and meaningful sentence to summary a popular current event (e.g. 2012 London Olympics ) from relevant posts of blogs.
9
Main tasks Sentiment analysis find who support this topic and who oppose it from the comments
10
Literature review M. Chau, et al., "A blog mining framework," It Professional, vol. 11, pp. 36-41, 2009.
11
Literature review M. Hutton, et al., "Summarizing microblogs automatically," presented at the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, California, 2010.
12
Literature review B. Sharifi, et al., "Experiments in Microblog Summarization," in Social Computing (SocialCom), 2010 IEEE Second International Conference on, 2010, pp. 49-56.
13
Methodology
14
Methodology 1 Text pre-processing Part-of-speech (POS) tagging Feature filtering Stop Words list: and, or, of Word Stemming: wants, wanted -> want Synonyms and antonyms Hypernyms and hyponyms: love -> emotion TF IDF: term frequency * inverse document frequency Vector Space Model Similarity analysis
15
Methodology 2 Detect topics: clustering Method K Means clustering, SOM clustering wordnet-based clustering 3 Detect opinion Bayesian classification SVM (support vector machine)
16
Improvement and innovation Using wordnet to improve clustering, assign the weight to wrods and generate topic sentence. WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. For example: Suppose the weight of “defeat” is 5, the weight of “overcome” is 3. They are in the same synset, so the weight of “defeat” is 8
17
Improvement and innovation Using clustering method to cluster the tweets before detect hot topics and opinions wordnet-based clustering Other’s work only calculate the word frequency
18
Improvement and innovation Consider Related factors Word Frequency Posts Occurrence time Author: celebrity or have a lot of followers Users’ Discrete Degrees: describe the discrete distribution level of users who release or forward posts Keywords: some words in twitter are signed by using hashtag: #Happy Sweetest Day, #beijing, #Alex Cross
19
Improvement and innovation Grammar Analysis Noun: not changed. Verb: word stemming. Adjective and adverb: word stemming, analysed and processed by wordnet. Synonyms and antonyms For example: the love of hypernyms and hyponyms, entity——> abstract entity ——>abstraction ——> attribute ——> state ——> feeling ——> emotion ——> love Create subject set, verb set and object set to generate the simple sentence of the topic
20
Improvement and innovation 3-layer tree structure The first layer is subject set, the second layer is verb set, the last layer is object set Create subject set, verb set and object set to generate the simple sentence of the topic the basic sentence unit: SUBJECT plus VERB, or SUBJECT plus VERB plus OBJECT. Remember that the subject names what the sentence is about, the verb tells what the subject does or is, and the object receives the action of the verb. Although many other structures can be added to this basic unit, the pattern of SUBJECT plus VERB (or SUBJECT plus VERB plus OBJECT) can be found in even the longest and most complicated structures.
21
Improvement and innovation
23
Experiment Input : Australian Olympic shooters have had a tough morning. They lost - Dina Aspandiyarova finished 14th and Lalita Yauhleuskaya was 40th Germany defeats Aussies beach volleyball pair Bec Palmer and Louise Bawden in three sets Germany overcomes Aussies beach volleyball pair Bec Palmer and Louise Bawden in August. Aussies Palmer and Bawden take it to a deciding set in the beach volleyball against Germany Australian team lost the men's water polo to Italy 8-5. The Sharks play Kazakhstan next on Tuesday. They lost the men's water polo to Italy. They came back last night.
24
Experiment Result
25
Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.