Presentation is loading. Please wait.

Presentation is loading. Please wait.

Breaking News Exploring Israeli News Bias using Simple Textual Analysis Yuval Pinter Shuki Tausig Oren Persico.

Similar presentations


Presentation on theme: "Breaking News Exploring Israeli News Bias using Simple Textual Analysis Yuval Pinter Shuki Tausig Oren Persico."— Presentation transcript:

1 Breaking News Exploring Israeli News Bias using Simple Textual Analysis Yuval Pinter Shuki Tausig Oren Persico

2 Motivation / Hypotheses
Media is biased Israeli media is super-biased Machine Learning detects bias Headlines could be enough "כותרות הן עיתונות בצורתה הצרופה ביותר" Simon Jenkins, 1992 Which is more significant – class bias or agenda bias? The idea: classify the news outlet using basic features Most of the “agenda bias” part will have to wait No prior work AFAIK; closest field – Authorship attribution

3 github.com/yuvalpinter/MediaAnalysis
Data General news sites only Homepage headlines only Scraped in 15-minute intervals July 2014 – May 2015 Most experiments on February Data and extraction code is available: github.com/yuvalpinter/MediaAnalysis

4 Data Samples Nov 23, 15:00: Feb 15, 15:30:

5 Text Processing Consecutive appearance de-duping
Tokenization (inc. lemmatization, affix deletion) using hspell (Har’el and Kenigsberg) Mostly good, sometimes not so much הפרלמנט הירדני עמד דקת דומייה לזכר המחבלים => פרלמנט ירדן עימד דקה דומייה זכר מחבל (NRG, 20/11/2014, 0:15) רעידת אדמה קטלנית כאלף נהרגו בנפאל: "שעות קריטיות"‎ => רעידה דימה קטלוניה אילף נהרג נפאל שעה קריטי (Mako, 25/4/2015, 19:30)

6 Features Form: character length, word count, word length (average/min/median/max), punctuation token count Lexicon: quantile word/lemma frequencies average/min/median/max Wordlists (Hermit Dave), Israblog (Linzen 2009) Morphology: affix letters Word features Probably the media cycle Features and extraction code are available

7 Setup & Results 7 classes, 1785 headlines (all of February)
Weka’s Random Forest Accuracy: 10 trees: 45.4% 50 trees: 49.5% Most significant features: Number of words Average word length Average position in word frequency table

8 Feature Example Character length Character count Add text descriptions

9 Binary classifier accuracy
Pairwise Setup Binary classifier accuracy 72.3 88 92.1 73.4 78.5 76.5 84.5 91.8 75.8 78.1 77.9 72.9 86.7 79 79.4 88.9 74 78.2 69.4 64.9 58.6 (Higher = easier to classify = less similar) Class over agenda: Mako, Walla, NRG form a cluster – “online ethos” Ha’aretz and Ma’ariv relatively unique (newspaper-derived) Israel Hayom resembles tabloid competitor ynet most, more than agenda-sharing NRG

10 Future Work Better content (“agenda”) features
Topic Models? Sentiment? Some weird phenomena to be ironed out Alternating headlines: dedup based on recent k Very similar headlines: merge or use edit distance Location-sensitive features Headlines starting with נתניהו: ~ balanced Headlines starting with רה"מ: 50% in Israel Hayom, another 25% in NRG More text: main leads / other headlines

11 github.com/yuvalpinter/MediaAnalysis
Thanks! github.com/yuvalpinter/MediaAnalysis


Download ppt "Breaking News Exploring Israeli News Bias using Simple Textual Analysis Yuval Pinter Shuki Tausig Oren Persico."

Similar presentations


Ads by Google