Breaking News Exploring Israeli News Bias using Simple Textual Analysis Yuval Pinter Shuki Tausig Oren Persico.

Breaking News Exploring Israeli News Bias using Simple Textual Analysis Yuval Pinter Shuki Tausig Oren Persico

Motivation / Hypotheses
Media is biased Israeli media is super-biased Machine Learning detects bias Headlines could be enough "כותרות הן עיתונות בצורתה הצרופה ביותר" Simon Jenkins, 1992 Which is more significant – class bias or agenda bias? The idea: classify the news outlet using basic features Most of the “agenda bias” part will have to wait No prior work AFAIK; closest field – Authorship attribution

github.com/yuvalpinter/MediaAnalysis
Data General news sites only Homepage headlines only Scraped in 15-minute intervals July 2014 – May 2015 Most experiments on February Data and extraction code is available: github.com/yuvalpinter/MediaAnalysis

Data Samples Nov 23, 15:00: Feb 15, 15:30:

Text Processing Consecutive appearance de-duping
Tokenization (inc. lemmatization, affix deletion) using hspell (Har’el and Kenigsberg) Mostly good, sometimes not so much הפרלמנט הירדני עמד דקת דומייה לזכר המחבלים => פרלמנט ירדן עימד דקה דומייה זכר מחבל (NRG, 20/11/2014, 0:15) רעידת אדמה קטלנית כאלף נהרגו בנפאל: "שעות קריטיות"‎ => רעידה דימה קטלוניה אילף נהרג נפאל שעה קריטי (Mako, 25/4/2015, 19:30)

Features Form: character length, word count, word length (average/min/median/max), punctuation token count Lexicon: quantile word/lemma frequencies average/min/median/max Wordlists (Hermit Dave), Israblog (Linzen 2009) Morphology: affix letters Word features Probably the media cycle Features and extraction code are available

Setup & Results 7 classes, 1785 headlines (all of February)
Weka’s Random Forest Accuracy: 10 trees: 45.4% 50 trees: 49.5% Most significant features: Number of words Average word length Average position in word frequency table

Feature Example Character length Character count Add text descriptions

Binary classifier accuracy
Pairwise Setup Binary classifier accuracy 72.3 88 92.1 73.4 78.5 76.5 84.5 91.8 75.8 78.1 77.9 72.9 86.7 79 79.4 88.9 74 78.2 69.4 64.9 58.6 (Higher = easier to classify = less similar) Class over agenda: Mako, Walla, NRG form a cluster – “online ethos” Ha’aretz and Ma’ariv relatively unique (newspaper-derived) Israel Hayom resembles tabloid competitor ynet most, more than agenda-sharing NRG

Future Work Better content (“agenda”) features
Topic Models? Sentiment? Some weird phenomena to be ironed out Alternating headlines: dedup based on recent k Very similar headlines: merge or use edit distance Location-sensitive features Headlines starting with נתניהו: ~ balanced Headlines starting with רה"מ: 50% in Israel Hayom, another 25% in NRG More text: main leads / other headlines

github.com/yuvalpinter/MediaAnalysis
Thanks! github.com/yuvalpinter/MediaAnalysis

Breaking News Exploring Israeli News Bias using Simple Textual Analysis Yuval Pinter Shuki Tausig Oren Persico.

Similar presentations

Presentation on theme: "Breaking News Exploring Israeli News Bias using Simple Textual Analysis Yuval Pinter Shuki Tausig Oren Persico."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Breaking News Exploring Israeli News Bias using Simple Textual Analysis Yuval Pinter Shuki Tausig Oren Persico.

Similar presentations

Presentation on theme: "Breaking News Exploring Israeli News Bias using Simple Textual Analysis Yuval Pinter Shuki Tausig Oren Persico."— Presentation transcript:

Similar presentations

About project

Feedback