Download presentation
Presentation is loading. Please wait.
Published bySara West Modified over 9 years ago
1
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University Qiang Yang Hong Kong University of Science and Technology WWW2007
2
Introduction Unique characteristics of blogs Unique characteristics of blogs –Mainly maintained by individual persons and thus the contents are generally personal –The link structures between blogs generally form localized communities Ongoing research on blogs Ongoing research on blogs –Content based analysis –Blog communities’ evolution –Different kinds of tools to help users retrieve, organize and analyze the blogs
3
Introduction – Genres in Blog ’ s Content Affective Affective –The online diary by which people share their daily life publicly, express their feelings or thoughts or emotions through the blogs Informative Informative –Topic-oriented; the topic can be related to a hobby or the author’s profession or business
4
Introduction – the Problem and the Approach The problem The problem –Separating informative articles from affective articles in blogs. The approach The approach –Considering the problem as binary classification –Challenges The definitions of the informative articles and the affective articles The definitions of the informative articles and the affective articles The training corpus for both categories The training corpus for both categories The machine learning algorithm The machine learning algorithm
5
Introduction – Studies in the Weblog Space Emotion and topic classification of blog articles Emotion and topic classification of blog articles –To improve the effectiveness of emotion classification through filtering out informative articles Blog search Blog search –An intent-driven blog-search engine is proposed to resort the search results by considering their score of informative values. Automatic detection of high-quality blogs Automatic detection of high-quality blogs –To measure the quality of a blog by calculating the percentage of informative articles
6
Definition of Informative and Affective Articles A survey is done among the users who usually participate in the activities in blogs A survey is done among the users who usually participate in the activities in blogs Contents of informative articles include: Contents of informative articles include: –News that is similar to the news on traditional news websites –Technical descriptions, e.g. programming techniques –Commonsense knowledge –Objective comments on the events in the world Contents of affective articles include: Contents of affective articles include: –Diaries about personal affairs –Self-feelings or self-emotions descriptions
7
Algorithms Classification algorithms Classification algorithms –Naïve Bayes Classifier (NB) –Support Vector Machine (SVM) –Rocchio Classifier Feature selection algorithms Feature selection algorithms –Information Gain (IG) –χ 2 statistic (CHI)
8
Classification Algorithm – Na ï ve Bayes Classifier Laplace smoothing is applied to overcome the zero- frequency problem Laplace smoothing is applied to overcome the zero- frequency problem
9
Classification Algorithm – Rocchio Classifier Category profile based classifier Category profile based classifier where |c j | is the number of documents in the category c j and denotes document with terms weighted by TF-IDF
10
Feature Selection Algorithms Information Gain (IG) Information Gain (IG) χ 2 statistic (CHI) χ 2 statistic (CHI)
11
Experiment Data 5000 articles crawled from MSN space 5000 articles crawled from MSN space 3,547 of them are labeled as affective and 1,109 are labeled as informative while the others are filtered because of the encoding problem 3,547 of them are labeled as affective and 1,109 are labeled as informative while the others are filtered because of the encoding problem 2,200 articles from Sohu.com Directory as informative articles 2,200 articles from Sohu.com Directory as informative articles –News, commonsense knowledge or objective comments about 22 different topics Table 1. Statistics of Data Set
12
Experiment – Comparing Classification Algorithms Table 2. Performances of three classification algorithms
13
Comparing Feature Selection Algorithms Table 3. Performances on different features set
14
Representative Features Table 4. Top 20 representative features of each category
15
Study on Emotion and Topic Classification Assume that informative articles do not express personal emotions Assume that informative articles do not express personal emotions –Extracting affective articles can help to build a corpus with pure emotional articles Figure 1. Two-step approach for topic and emotion classification
16
Experiment on Emotion Classification Data Data –Training: 2,494 blog articles are manually labeled into two emotion tendencies, positive and negative –Testing: 1,303 articles from 75 blogs in MSN Space Table 5. Data set used for emotion classification
17
Experiment Result on Emotion Classification Before the binary emotion classifier, the information- affectiveness classification is used (I-Approach) or not (II-Approach) Before the binary emotion classifier, the information- affectiveness classification is used (I-Approach) or not (II-Approach) Table 6. Comparison results for two emotion classification approaches
18
Study on Intent-driven Weblog Search Engine Blog search is at the state of Web search currently Blog search is at the state of Web search currently Intent-driven search (re-rank) Intent-driven search (re-rank) Intent-driven search Intent-driven search S mixed = λ . S if + (1 - |λ|) . S origin where S if is a confidence value between -1 (strong affective intent) and 1 (strong informative intent), and S origin is the original relevance score
19
Analysis for the Distribution of Two Genres of Articles Figure 2. Distribution of informative articles and affective articles on 99,059 blog articles
20
Detecting High-quality Blogs Figure 3. Distribution of blogs with different levels of quality on 6,319 blogs
21
Conclusion and Future Work The task of separating informative and affective articles is addressed and considered as a binary classification task. The task of separating informative and affective articles is addressed and considered as a binary classification task. The applications of above information-affectiveness classification are studied, including emotion classification, intent-driven blog search and high- quality blogs detection. The applications of above information-affectiveness classification are studied, including emotion classification, intent-driven blog search and high- quality blogs detection. Future work: 1) building a much large data set by using semi-supervised learning techniques 2) applying the existing approach on the data in other languages Future work: 1) building a much large data set by using semi-supervised learning techniques 2) applying the existing approach on the data in other languages
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.