Stance Classification of Ideological Debates

Stance Classification of Ideological Debates
Sen Han 1 4. June 2019 1

Outline Abstract Introduction Improvements in stance classification
Problem Introduction Previous approach Improved approach (for this paper) Improvements in stance classification Models Features Data Constraints Experiments and Evaluation Results Discussion 2 21 Sen Han 2

Problem Determining the stance expressed in a post written for a two- sided debate in an online debate forum is a relatively new and challenging problem in opinion mining. Improve the performance of a learning-based stance classification in different dimensions 3 21 Sen Han 3

Previous “Should homosexual marriage be legal ?”
The goal of debate stance classification is to determine which of the two sides (i.e., for and against) its author is taking But colorful and emotional language to express one’s points, which may involve sarcasm, insults, and questioning another debater’s assumptions and evidence. (spam,disturbance term) Limited stance-annotated debate 4 21 Sen Han 4

Improvement Data: Increase the number of stance-annotated debate posts from different sources for training Features: Add semantic features on an n-gram-based stance classifier Models: Exploite the linear structure inherent in a post sequence, train a better model by learning only from the stance-related sentences without relying on sentences manually annotated with stance labels Constraints: Extra-linguistic inter-post constraints, such as author constraints by postprocessing output of a stance classifier 5 21 Sen Han 5

Models Binary classifier Naive Bayes (NB)
Support Vector Machines (SVMs) Sequence labelers first-order Hidden Markov Models (HMMs) linear-chain Conditional Random Fields (CRFs) Our model unigram fine-grained models. stance label of a debate post and the stance label of each of its sentences 6 21 Sen Han 6

Fine-grained model Document di A document stance c with probability
P(c) Sentence em A sentence stance s with probability P(s|c) N-th feature representing em: fn,with probability P(fn|s,c) Sentence stance P(s|em,di ,c) 7 21 Sen Han 7

Fine-griand Model Classify each test post di using fine-grained NB
Maximum conditional probability S_max Set of sentences in test post di S(di) E.g p(“for homosexual marriage”|d1)=80% p(“for abortion”| d2)=5% 8 21 Sen Han 8

Features N-gram features
unigrams and bigrams collected from the training posts Anand et al.’s (2011) features n-grams document statistics punctuations syntactic dependencies the set of features computed for the immediately preceding post in its thread Adding frame-semantic framesemantic parse for each sentence for each frame that a sentence contains, we create three types of frame-semantic features 9 21 Sen Han 9

Features Frame-word interaction feature:(frame-word1-word2)
“Possession-right-woman; Possession-woman-choose”, unordered word pair Frame-pair feature: (frame2:frame1) “Choosing:Possession”, ordered 10 21 Sen Han 10

Frame-semantic features
Frame n-gram feature: its frame name (if the word is a frame target) its frame semantic role (if the word is present in a frame element). “woman+has” woman+Possession, People+has,People+Possession , Owner+Possession and Owner+has. 11 21 Sen Han 11

Data amount and quality of the training data
collect documents relevant to the debate domain from different sources stancelabel them heuristically combination of noisily labeled documents with the stance- annotated debate posts 12 21 Sen Han 12

Data Roughly the same number of phrases were created for the two stances in a domain. 13 21 Sen Han 13

Constraints Author constraints (Acs)
two posts written by the same author for the same debate domain should have the same stance post-process the output of a stance classifier. Probabilistic votes cast of posts Majority voting for stance 14 21 Sen Han 14

Experiment and evaluation
5-fold cross validation accuracy is the percentage of test instances correctly classified Three folds for model training, one fold for development, and one fold for testing in each fold experiment 15 21 Sen Han 15

Results Results for three selected points on each learning curve, which correspond to the three major columns in each sub-table. 16 21 Sen Han 16

Results ‘F’ finegraind model ‘W’ only n-gram features . ‘A’
Anand et al.’s (2011) features ‘A+FS’ Anand et al.’s features and frame-semantic features. The last two rows noisily labeled documents and author constraints are added incrementally to A+FS. 17 21 Sen Han 17

Results learning curves for HMM and HMMF for the four domains
the best-performing configuration is A+FS+N+AC, which is followed by A+FS+N and then A+FS 18 21 Sen Han 18

Discussion 19 21 Sen Han 19

Thanks 20 21 Sen Han 20

Unigram List of words appearing in training data at least 10 times and is associated with document stance c at least 70% of times A list of words Frequently appearing in training data, which is relevant to the stance of document p(w)=#w/#(w in corpus) 21 21 Sen Han 21

Stance Classification of Ideological Debates

Similar presentations

Presentation on theme: "Stance Classification of Ideological Debates"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Stance Classification of Ideological Debates

Similar presentations

Presentation on theme: "Stance Classification of Ideological Debates"— Presentation transcript:

Similar presentations

About project

Feedback