Download presentation
Presentation is loading. Please wait.
1
Stance Classification of Ideological Debates
Sen Han 1 4. June 2019 1
2
Outline Abstract Introduction Improvements in stance classification
Problem Introduction Previous approach Improved approach (for this paper) Improvements in stance classification Models Features Data Constraints Experiments and Evaluation Results Discussion 2 21 Sen Han 2
3
Problem Determining the stance expressed in a post written for a two- sided debate in an online debate forum is a relatively new and challenging problem in opinion mining. Improve the performance of a learning-based stance classification in different dimensions 3 21 Sen Han 3
4
Previous “Should homosexual marriage be legal ?”
The goal of debate stance classification is to determine which of the two sides (i.e., for and against) its author is taking But colorful and emotional language to express one’s points, which may involve sarcasm, insults, and questioning another debater’s assumptions and evidence. (spam,disturbance term) Limited stance-annotated debate 4 21 Sen Han 4
5
Improvement Data: Increase the number of stance-annotated debate posts from different sources for training Features: Add semantic features on an n-gram-based stance classifier Models: Exploite the linear structure inherent in a post sequence, train a better model by learning only from the stance-related sentences without relying on sentences manually annotated with stance labels Constraints: Extra-linguistic inter-post constraints, such as author constraints by postprocessing output of a stance classifier 5 21 Sen Han 5
6
Models Binary classifier Naive Bayes (NB)
Support Vector Machines (SVMs) Sequence labelers first-order Hidden Markov Models (HMMs) linear-chain Conditional Random Fields (CRFs) Our model unigram fine-grained models. stance label of a debate post and the stance label of each of its sentences 6 21 Sen Han 6
7
Fine-grained model Document di A document stance c with probability
P(c) Sentence em A sentence stance s with probability P(s|c) N-th feature representing em: fn,with probability P(fn|s,c) Sentence stance P(s|em,di ,c) 7 21 Sen Han 7
8
Fine-griand Model Classify each test post di using fine-grained NB
Maximum conditional probability S_max Set of sentences in test post di S(di) E.g p(“for homosexual marriage”|d1)=80% p(“for abortion”| d2)=5% 8 21 Sen Han 8
9
Features N-gram features
unigrams and bigrams collected from the training posts Anand et al.’s (2011) features n-grams document statistics punctuations syntactic dependencies the set of features computed for the immediately preceding post in its thread Adding frame-semantic framesemantic parse for each sentence for each frame that a sentence contains, we create three types of frame-semantic features 9 21 Sen Han 9
10
Features Frame-word interaction feature:(frame-word1-word2)
“Possession-right-woman; Possession-woman-choose”, unordered word pair Frame-pair feature: (frame2:frame1) “Choosing:Possession”, ordered 10 21 Sen Han 10
11
Frame-semantic features
Frame n-gram feature: its frame name (if the word is a frame target) its frame semantic role (if the word is present in a frame element). “woman+has” woman+Possession, People+has,People+Possession , Owner+Possession and Owner+has. 11 21 Sen Han 11
12
Data amount and quality of the training data
collect documents relevant to the debate domain from different sources stancelabel them heuristically combination of noisily labeled documents with the stance- annotated debate posts 12 21 Sen Han 12
13
Data Roughly the same number of phrases were created for the two stances in a domain. 13 21 Sen Han 13
14
Constraints Author constraints (Acs)
two posts written by the same author for the same debate domain should have the same stance post-process the output of a stance classifier. Probabilistic votes cast of posts Majority voting for stance 14 21 Sen Han 14
15
Experiment and evaluation
5-fold cross validation accuracy is the percentage of test instances correctly classified Three folds for model training, one fold for development, and one fold for testing in each fold experiment 15 21 Sen Han 15
16
Results Results for three selected points on each learning curve, which correspond to the three major columns in each sub-table. 16 21 Sen Han 16
17
Results ‘F’ finegraind model ‘W’ only n-gram features . ‘A’
Anand et al.’s (2011) features ‘A+FS’ Anand et al.’s features and frame-semantic features. The last two rows noisily labeled documents and author constraints are added incrementally to A+FS. 17 21 Sen Han 17
18
Results learning curves for HMM and HMMF for the four domains
the best-performing configuration is A+FS+N+AC, which is followed by A+FS+N and then A+FS 18 21 Sen Han 18
19
Discussion 19 21 Sen Han 19
20
Thanks 20 21 Sen Han 20
21
Unigram List of words appearing in training data at least 10 times and is associated with document stance c at least 70% of times A list of words Frequently appearing in training data, which is relevant to the stance of document p(w)=#w/#(w in corpus) 21 21 Sen Han 21
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.