Article Filtering for Conflict Forecasting Benedict Lee and Cuong Than Comp 540 4/25/2006.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Classification Classification Examples
Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)
Biointelligence Laboratory, Seoul National University
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Chapter 4: Linear Models for Classification
Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic.
Classification and risk prediction
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Kernel Technique Based on Mercer’s Condition (1909)
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Classification: Support Vector Machine 10/10/07. What hyperplane (line) can separate the two classes of data?
A Probabilistic Model for Classification of Multiple-Record Web Documents June Tang Yiu-Kai Ng.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain.
An Introduction to Logistic Regression
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Discriminant Analysis Testing latent variables as predictors of groups.
Classification and Prediction: Regression Analysis
Decision Tree Models in Data Mining
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Review of Lecture Two Linear Regression Normal Equation
Crash Course on Machine Learning
Neural Networks Lecture 8: Two simple learning algorithms
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.
1 Scaling multi-class Support Vector Machines using inter- class confusion Author:Shantanu Sunita Sarawagi Sunita Sarawagi Soumen Chakrabarti Soumen Chakrabarti.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
Benk Erika Kelemen Zsolt
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Feature Selection Poonam Buch. 2 The Problem  The success of machine learning algorithms is usually dependent on the quality of data they operate on.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
A Comprehensive Comparative Study on Term Weighting Schemes for Text Categorization with SVM Lan Man 3 Nov, 2004.
Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.
Using Asymmetric Distributions to Improve Text Classifier Probability Estimates Paul N. Bennett Computer Science Dept. Carnegie Mellon University SIGIR.
Computacion Inteligente Least-Square Methods for System Identification.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Lecture 1.31 Criteria for optimal reception of radio signals.
Chapter 7. Classification and Prediction
Logistic Regression Gary Cottrell 6/8/2018
Data Mining Lecture 11.
iSRD Spam Review Detection with Imbalanced Data Distributions
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

Article Filtering for Conflict Forecasting Benedict Lee and Cuong Than Comp 540 4/25/2006

Motivation One goal of the Ares Project is to predict conflict from events data extracted from various news sources Sources: Reuters, BBC, The Associated Press Sources contain many irrelevant articles We ’ d like to distinguish relevant articles from irrelevant articles

Problem Text Classification Problem Relevant – International political interactions Irrelevant – Everything else Additional context/information not necessarily available, so we classify solely on text of article

General Approach Assertion: Relevant/irrelevant articles are similar to other relevant/irrelevant articles For each article, generate a Relevance and Irrelevance Rating based on “ similarity ” with training articles. “ Similarity ” derived from the OKAPI BM25 Ranking formula (with Inflectional and Synonym Generation)

Direct Comparison Classifier The top N OKAPI scores are summed together to form Relevance/Irrelevance Ratings. N is a tweakable parameter Article is classified as Relevant if Relevance Rating >= Irrelevance Rating

Logistic Regression Classifier Input: relevance/irrelevance ratio. Output: 1 or 0, representing the Relevant and Irrelevant categories. Model: Pr(y = 1|x) = e h(x) /(1 + e h(x) ), where h(x) =  T x. Decision rule: x is classified as 1 if Pr(y = 1|x)  0.5, or equivalently h(x)  0.

Fitting the Model Compute the likelihood over the training dataset. Maximize the likelihood to obtain a set of non-linear equations. Solve this set of equations by the IRLS method to find the parameter vector .

Dealing with Costs Motivation: misclassifying a relevant article costs more than misclassifying an irrelevant one. Normally, c 10 > c 00 and c 01 > c 11. actual neg.actual pos. predict neg.c 00 c 01 predict pos.c 10 c 11

Dealing with Costs (cont ’ d) Making decision: classify x as in category i if the risk function  j Pr(j |x) c(i, j) is minimized. x is classified as in class 1 if Pr(y = 1|x)  p*, where p* = (c 10 – c 00 ) / (c 10 – c 00 + c 01 – c 11 ).

Dealing with Costs (cont ’ d) Folk Theorem. By altering the example distribution, an error-minimizing classifier solves cost-sensitive problems. For binary output space, the number of negative examples is multiplied by p*(1 – p 0 ) / (1 – p*)p 0 Intuition: Changing the example distribution will change the posterior probability.

Extend Classifiers with Voting We can increase the classifier ’ s accuracy by learning several models and predict new cases by voting. Build four models for four different pairs of relevance/irrelevance ranks.

Working Dataset ~180,000 categorized Reuters articles from 9/1996 – 7/1997 Relevant Categories: GVIO, G13, GDIP Irrelevant Categories: 1POL, 2ECO, 3SPO, ECAT, G12, G131, GDEF, GPOL

Test Methodology 10-Fold Cross-Validation on Reuters Dataset 5 Trials Approaches: Na ï ve Bayes (NB) Weight-based Complement Na ï ve Bayes (WNB) OKAPI Direct Comparison (ODC) OKAPI Logistic Regression (OLR) OKAPI Cost-Sensitive LR (OCLR)

Different N values do not significantly affect results ODC Tests with Varying N

Classifier Comparison Results ODC and OLR: N = 25 OCLR: c 00 = c 11 = 0, c 01 = 1, c 10 = 0.7

Analysis All OKAPI classifiers performed better than NB and WNB in our tests. OLR has worse recall because it gives equal weights to false positives and false negatives. Adding cost-sensitivity improved performance.

Conclusion The OKAPI classifiers are suitable for text classification. OLR doesn ’ t perform as well as ODC. The cost table in OCLR can be adjusted to the appropriate trade-off necessary between recall and precision.