Stance Classification of Ideological Debates

Slides:



Advertisements
Similar presentations
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Advertisements

Hyeonsoo, Kang. ▫ Structure of the algorithm ▫ Introduction 1.Model learning algorithm 2.[Review HMM] 3.Feature selection algorithm ▫ Results.
Farag Saad i-KNOW 2014 Graz- Austria,
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Political Party, Gender, and Age Classification Based on Political Blogs Michelle Hewlett and Elizabeth Lingg.
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
47 th Annual Meeting of the Association for Computational Linguistics and 4 th International Joint Conference on Natural Language Processing Of the AFNLP.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Scalable Text Mining with Sparse Generative Models
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Recognition of Multi-sentence n-ary Subcellular Localization Mentions in Biomedical Abstracts G. Melli, M. Ester, A. Sarkar Dec. 6, 2007
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
Graphical models for part of speech tagging
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Joint Models of Disagreement and Stance in Online Debate Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor, Marilyn Walker University of California,
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Tokenization & POS-Tagging
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Recognizing Stances in Ideological Online Debates.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.
Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff School of Computing University of Utah Janyce Wiebe, Theresa Wilson Computing.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Question Classification using Support Vector Machine Dell Zhang National University of Singapore Wee Sun Lee National University of Singapore SIGIR2003.
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
Conditional Random Fields & Table Extraction Dongfang Xu School of Information.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Language Identification and Part-of-Speech Tagging
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Sentiment analysis algorithms and applications: A survey
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
Text Mining CSC 600: Data Mining Class 20.
Learning Coordination Classifiers
CRF &SVM in Medication Extraction
Relation Extraction CSCI-GA.2591
Perceptrons Lirong Xia.
CSC 594 Topics in AI – Natural Language Processing
张昊.
Improving a Pipeline Architecture for Shallow Discourse Parsing
Social Knowledge Mining
Machine Learning in Natural Language Processing
Statistical NLP: Lecture 9
CSc4730/6730 Scientific Visualization
Discriminative Frequent Pattern Analysis for Effective Classification
iSRD Spam Review Detection with Imbalanced Data Distributions
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Automatic Extraction of Hierarchical Relations from Text
presented by Thomas L. Packer
Text Mining CSC 576: Data Mining.
CPSC 503 Computational Linguistics
Speech recognition, machine learning
Using Uneven Margins SVM and Perceptron for IE
Introduction to Sentiment Analysis
Perceptrons Lirong Xia.
Statistical NLP : Lecture 9 Word Sense Disambiguation
Speech recognition, machine learning
Presentation transcript:

Stance Classification of Ideological Debates Sen Han leonihsam@gmail.com 1 4. June 2019 1

Outline Abstract Introduction Improvements in stance classification Problem Introduction Previous approach Improved approach (for this paper) Improvements in stance classification Models Features Data Constraints Experiments and Evaluation Results Discussion 2 21 Sen Han 2

Problem Determining the stance expressed in a post written for a two- sided debate in an online debate forum is a relatively new and challenging problem in opinion mining. Improve the performance of a learning-based stance classification in different dimensions 3 21 Sen Han 3

Previous “Should homosexual marriage be legal ?” The goal of debate stance classification is to determine which of the two sides (i.e., for and against) its author is taking But colorful and emotional language to express one’s points, which may involve sarcasm, insults, and questioning another debater’s assumptions and evidence. (spam,disturbance term) Limited stance-annotated debate 4 21 Sen Han 4

Improvement Data: Increase the number of stance-annotated debate posts from different sources for training Features: Add semantic features on an n-gram-based stance classifier Models: Exploite the linear structure inherent in a post sequence, train a better model by learning only from the stance-related sentences without relying on sentences manually annotated with stance labels Constraints: Extra-linguistic inter-post constraints, such as author constraints by postprocessing output of a stance classifier 5 21 Sen Han 5

Models Binary classifier Naive Bayes (NB) Support Vector Machines (SVMs) Sequence labelers first-order Hidden Markov Models (HMMs) linear-chain Conditional Random Fields (CRFs) Our model unigram fine-grained models. stance label of a debate post and the stance label of each of its sentences 6 21 Sen Han 6

Fine-grained model Document di A document stance c with probability P(c) Sentence em A sentence stance s with probability P(s|c) N-th feature representing em: fn,with probability P(fn|s,c) Sentence stance P(s|em,di ,c) 7 21 Sen Han 7

Fine-griand Model Classify each test post di using fine-grained NB Maximum conditional probability S_max Set of sentences in test post di S(di) E.g p(“for homosexual marriage”|d1)=80% p(“for abortion”| d2)=5% 8 21 Sen Han 8

Features N-gram features unigrams and bigrams collected from the training posts Anand et al.’s (2011) features n-grams document statistics punctuations syntactic dependencies the set of features computed for the immediately preceding post in its thread Adding frame-semantic framesemantic parse for each sentence for each frame that a sentence contains, we create three types of frame-semantic features 9 21 Sen Han 9

Features Frame-word interaction feature:(frame-word1-word2) “Possession-right-woman; Possession-woman-choose”, unordered word pair Frame-pair feature: (frame2:frame1) “Choosing:Possession”, ordered 10 21 Sen Han 10

Frame-semantic features Frame n-gram feature: its frame name (if the word is a frame target) its frame semantic role (if the word is present in a frame element). “woman+has” woman+Possession, People+has,People+Possession , Owner+Possession and Owner+has. 11 21 Sen Han 11

Data amount and quality of the training data collect documents relevant to the debate domain from different sources stancelabel them heuristically combination of noisily labeled documents with the stance- annotated debate posts 12 21 Sen Han 12

Data Roughly the same number of phrases were created for the two stances in a domain. 13 21 Sen Han 13

Constraints Author constraints (Acs) two posts written by the same author for the same debate domain should have the same stance post-process the output of a stance classifier. Probabilistic votes cast of posts Majority voting for stance 14 21 Sen Han 14

Experiment and evaluation 5-fold cross validation accuracy is the percentage of test instances correctly classified Three folds for model training, one fold for development, and one fold for testing in each fold experiment 15 21 Sen Han 15

Results Results for three selected points on each learning curve, which correspond to the three major columns in each sub-table. 16 21 Sen Han 16

Results ‘F’ finegraind model ‘W’ only n-gram features . ‘A’ Anand et al.’s (2011) features ‘A+FS’ Anand et al.’s features and frame-semantic features. The last two rows noisily labeled documents and author constraints are added incrementally to A+FS. 17 21 Sen Han 17

Results learning curves for HMM and HMMF for the four domains the best-performing configuration is A+FS+N+AC, which is followed by A+FS+N and then A+FS 18 21 Sen Han 18

Discussion 19 21 Sen Han 19

Thanks 20 21 Sen Han 20

Unigram List of words appearing in training data at least 10 times and is associated with document stance c at least 70% of times A list of words Frequently appearing in training data, which is relevant to the stance of document p(w)=#w/#(w in corpus) 21 21 Sen Han 21