More experiments on FND & Literature Review on Fact Checking

Slides:



Advertisements
Similar presentations
Conducting Research Investigating Your Topic Copyright 2012, Lisa McNeilley.
Advertisements

1 Working with Social Media in Research Settings Victoria Wade Careers Consultant.
By Lee Betancourt Director of Communications and Public Relations Jane Myers Public Relations, Communications and Social Media Coordinator Social Media.
Become an online sleuth
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Experimental Psychology PSY 433
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
An Introduction to Content Management. By the end of the session you will be able to... Explain what a content management system is Apply the principles.
O VERVIEW OF THE W RITING P ROCESS Language Network – Chapter 12.
Slide 1 of 44 Introduction to Public Relations Chapter 5 Research: Understanding Public Opinion © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Content Strategy.
Gathering News Writing and Reporting. Getting It Right  Research  Polls and Surveys  Observation  Interviews.
The Expository Essay An Overview
Finding Credible Sources
COMM331 Effective Reading: Unpacking the text for better understanding Dr. Celeste Rossetto: Learning Development 2013.
CH 42 DEVELOPING A RESEARCH PLAN CH 43 FINDING SOURCES CH 44 EVALUATING SOURCES CH 45 SYNTHESIZING IDEAS Research!
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Written Assignment NOTES AND TIPS FOR STUDENTS.  MarksLevel descriptor 0The work does not reach a standard described by the descriptors below. 1–2The.
Argumentative Writing Grades College and Career Readiness Standards for Writing Text Types and Purposes arguments 1.Write arguments to support a.
The College Board (best known for the SAT) has these eight tips for writing a solid college essay: t-in/essays/8-tips-for-crafting-your-
Research – using the Internet and other secondary sources and Source analysis Top Tips – get ready to make your own notes!
EmPOWER Your writing!!!!!!!!!!!!. What does EmPOWER stand for? E- Evaluate mP- Make a Plan O- Organize W- Write E- Edit R- Re-write.
Midterm Review Sara Yousef CPIT 221 Be Sure to Study: Lectures slides Exercises Questions at end of each lecture Assignments.
WP4 Models and Contents Quality Assessment
Fake News and “Alternative Facts”
Stages of Editing.
Writing a Critical Summary of an Article or Paper
Academic writing.
The Specialist Study Unit
Evaluating of Information
Review on Fact Checking and Automatic Fact Checking Systems
IB Assessments CRITERION!!!.
Web News Sentence Searching Using Linguistic Graph Similarity
Evaluation of Research Methods
How to Communicate Assurance?
Distinguished Minds: media literacy in illiterate times
Chapter 16 Multimedia Arguments.
Critically Reviewing the Literature
Assessing Credibility
Factual Claim Validation Models Extraction of Evidence
UNIT 15 Webpage Creator.
ELT 213 APPROACHES TO ELT I Communicative Language Teaching Week 11
How often do you get information from the Internet
Fake News and “Alternative Facts”
From Bedford Handbook for College Writers Chapter 12
Research Presentation
Edexcel – GCSE History – Paper 1
Using Social Media Effectively
Effective Research and Integration Techniques
Introduction into Knowledge and information
AS LEVEL Paper One – Section A / B
National Curriculum Requirements of Language at Key Stage 2 only
Making a Change.
Public Relations Writing
The Mid Tudors A2 Evaluation and enquiry questions
An Introduction to the Research Process
A LEVEL Paper Three– Section A
APPROPRIATE POINT OF CARE DIAGNOSTICS
How to structure 01 A Level Stuarts answers
Developing Academic Paragraphs
Questioning and evaluating information
Telling the Story Chapter 3.
Argumentative Writing
Community Builder Activity 3 min-2 min
Factual Claim Validation Models
9th Literature EOC Review
MPATE-GE 2626: Thesis in Music Technology
Introduction Dataset search
Getting Started with Microsoft Azure Machine Learning
Research Presentation
Presentation transcript:

More experiments on FND & Literature Review on Fact Checking Deep Learning Research & Application Center 25 September 2017 Claire Li

New experiments with fake New experiments with fake.csv dataset using content only and eliminating the sentences less than 20 words (average length: 458) New experiments with LIAR dataset (average length: 17.9) Review on fact checking

Fake.csv with content only (diffFN-0920), stacked LSTM For 4,000 validating data FN: 44; FP: 127; TN: 1,852; TP: 1,976 Precision: 0.94; Recall: 0.98; F1: 0.96; Accuracy: 0.96

Fake.csv with content only, eliminating sentences less than 20 words, stacked LSTM For 4,000 validating data FN: 49; FP: 44; TN: 1,996; TP: 1,911 Precision: 0.98; Recall: 0.97; F1: 0.97; Accuracy: 0.98

Fake.csv with content only, eliminating sentences less than 20 words, Bi-directional LSTM For 4,000 validating data

Liar, Liar Pants on Fire: A New Benchmark Dataset for Fake News Detection, arXiv preprint arXiv:1705.00648, 2017

LIAR dataset from fact-checking website PolitiFact 12,836 human-labeled short statements from news releases, TV/radio, interviews, campaign speeches etc labels: pants-fire, false, barely-true, half-true, mostly true, and true

Experiment 1, Bi-directional RNN False dataset (5,113): pants-fire, false, barely-true True dataset (6,440) : half-true, mostly true, and true Precision on Training: 0.9734

Experiment 2, Bi-directional RNN False dataset (3219): pants-fire, false, barely-true True dataset (4069) : half-true, mostly true, and true Precision on Training: 0.9941

Review on Fact Checking Fact checking is a way of knowledge-based news content verification, can be categorized as Expert-oriented relies on human domain experts to investigate relevant data and documents to construct the verdicts of claim veracity, e.g., PolitiFact11, Snopes12 Crowdsourcing-oriented exploits the “wisdom of crowd” to enable normal people based news content annotations which are then aggregated to produce an overall assessment of the news veracity, e.g., Fiskkit, ‘for real’ account of LINE, Twitter Trails Computational-oriented provide an automatic scalable system to classify true and false claims using open web and structured knowledge graph (e.g. Google Knowledge Graph Search API) identifying check-worthy claims discriminating the veracity of fact claims

Relationship rather than replacing the factchecker, the software's role is to make their work easier

Four stage factchecking processes [1] Factchecking is the same four stage processes whether it’s done by humans or machines http://fullfact.org/automated

Expert-oriented

Fact checking methodologies Selecting process For what claims to evaluate Research methods techniques and sources that fact checkers use when conducting research on claims the official rules and editorial policies that govern their approaches Claim evaluation the systems and processes by which fact checkers establish the veracity of a claim Three major fact-checking organizations in the United States PolitiFact FactCheck.org The Washington Post Fact Checker

PolitiFact Began in 2007, a project of the Tampa Bay Times, a publishing company Rates the accuracy of claims by elected officials, candidates, leaders of political parties and political activists based on its Truth-O-Meter, from True to False, down to the lowest rating, Pants on Fire http://www.politifact.com/truth-o-meter/statements/ Monitoring Sources of statements to be checked comb through speeches, news stories, press releases, campaign brochures, TV ads, Facebook postings and transcripts of TV and radio interviews

Truth-O-Meter rulings TRUE – The statement is accurate and there’s nothing significant missing. MOSTLY TRUE – The statement is accurate but needs clarification or additional information. HALF TRUE – The statement is partially accurate but leaves out important details or takes things out of context. MOSTLY FALSE – The statement contains an element of truth but ignores critical facts that would give a different impression. FALSE – The statement is not accurate. PANTS ON FIRE – The statement is not accurate and makes a ridiculous claim.

Principles in Truth-O-Meter rulings Words matter Pay attention to the specific wording of a claim, e.g. mitigating words or phrases Context matters Full context related to a claim, e.g. comments, questions prompt to, points make Statements can be right and wrong Rate the overall accuracy of compound statements containing two or more factual assertions Timing The date a statement was made and the available information during the period

Process for Truth-O-Meter rulings A writer researches the claim and writes the Truth- O-Meter article with a recommended rulings, say, true, mostly true, half true, half false, mostly false, pants on fire After the article is edited, it is reviewed by a panel of at least three editors that determines the Truth-O-Meter ruling

Corrections and review policy For a factual error an editor's note will be added and labeled "CORRECTION" explaining how the article has been changed For clarifications or updates an editor's note will be added and labeled "UPDATE" explaining how the article has been changed For the significant mistakes reconvene the three-editor panel if there is a new ruling, rewrite the item and put the correction at the top indicating how it's been changed

FactCheck.org A project of the Annenberg Public Policy Center (APPC) of the University of Pennsylvania addresses public policy issues at the local, state and federal levels Aims to reduce the level of deception and confusion in U.S. politics Rates the accuracy of claims by focusing on presidential candidates in presidential election years on the top Senate races in midterm elections In off-election years, our primary focus is on the action in Congress Sources of statements to be checked Sunday talk shows; TV ads; C-SPAN; Presidential remarks; CQ Transcripts; Campaign and official websites, press releases and similar materials; Readers

Process: focus on false claims Once a reporter or writer find a statement that we suspect may be inaccurate or misleading engage with the person or organization that is being fact-checked for supporting materials If supporting material does not support statement Check with the sources of information: the Library of Congress for congressional testimony; the House Clerk and Senate Secretary’s office for roll call votes; the Bureau of Labor Statistics for employment data; the Securities and Exchange Commission for corporate records; the IRS for tax data; the Bureau of Economic Analysis for economic data; and the Energy Information Administration for energy data also interview experts on other topics as needed – for instance, in researching issues on foreign countries, we would contact experts on those areas.

Process of story publishing Line editing Is context missing? Is the writing clear? Is the word choice accurate? Copy editing for proper style and grammar Fact-checking goes through the story line by line, word by word Make sure every statement is correct By the time of publishing, a story has been reviewed by a line editor, copy editor, fact-checker and by the director of the APPC, Prof. Kathleen Hall Jamieson, a former dean of the Annenberg School for Communication at the University of Pennsylvania

Corrections and review Policy If any new information comes to light after we publish a story that materially changes that story, we will clarify, correct or update our story and provide a note to readers that explains the change, why it was made and the date it was made

Computational-oriented

The State of Automated Factchecking - 2016 fullfact.org/automated intends to develop products that automate fact-checking tasks wherever possible, using statistical analysis and natural language processing technologies in real-time An open standards, with aims: Standard data formats (schema.org), so that any new automated factchecking tool can work with any known source. Shared monitoring systems, so we do not duplicate work unnecessarily. Open and shared evaluation, so we know what works and what it works for. Published roadmaps, to attract volunteers, researchers, partners and funders to work with us. Think global, so that where possible new automated factchecking tools are designed with the aim of being able to work for many languages and countries.

The state of automated factchecking

Full fact’s roadmap Hawk Stats Trends Robocheck monitoring system, also spots claims have been factchecked technologies: CrowdTangle, Google Trends, Newswhip, Trendolizer, Trendsmap, and Signal monitor content and conversations on: Twitter, Facebook, YouTube, Reddit and online forums Stats automatically checking statistical claims Trends with output from Hawk, monitoring how common a claim is, where it is being made, and who is making it Robocheck a real time product provides subtitles of live TV, and add verdicts to claims using the results from Stats and Hawk

Automated factchecking Full fact.org

1. Popular Sources need to track (1)

1. Metadata on sources monitoring (2) Who, where, and when Whether it is true to say “unemployment is rising”

2. Spot claims (1) Monitoring claims that have been factchecked before in new text open source search engines:Apache Solr and Elastic Search Percolator, Apache Lucence & Luwak Identifying new factual claims that have not been factchecked before in new text Use template like “x is rising” Use machine learning algorithms to detect ‘check- worthy claims’ or factual claims ClaimBuster: learn from labelled check-worthy sentences, identifies features they have in common, and looks for these features in new sentences Argumentative zoning : classify sentences into several different types, which can include identifying factual claims

2. Spot claims (2) Making editorial judgements about the priority of different claims The content approaches Through content of claims The contextual (information is relevant to an understanding of the text) approaches Identifying important content based on reach and engagement (social influence) Combining the content and context approaches Dealing with different phrasing for the same or similar claims A fully automated factchecking system should identify different phrases making the same claim, and distinguishing very similar phrases making different claims, as humans can Paraphrasing in factchecking is harder than in most NLP because precise wording can matter so much to a factchecker’s conclusion

3 . Check claims (1) Make sure the sources human factcheckers rely on are available as structured data computers can use Peter Norvig, Google’s Chief Scientist, explained with the successfulness of driverless cars: “we didn’t have better algorithms, we just had better data” Victoria and David Beckham are married Reference approaches - Look up their names in the register of marriages Statcheck (psychology), jEugene (legal doc) Instant answers from search engines do a similar job Machine learning approaches - Make a mathematical model of things we have known (e.g. from knowledge base [2]) True claims are likely to be closer together in a knowledge graph Contextual approaches - Look at how claims that they are married spread from reasoning about the social & other claims Claims that survive longer with reaction claims in open discussion than contradictory claims, so they probably are

3 . Check claims (2) Automated checking projects vary in what kinds of types of sources they deal with, what kinds of claims they deal with, and what topics they deal with

4. Create and publish (1) Automated journalism takes data and tries to make stories, while automated factchecking takes stories and tries to reverse-engineer the data or sources encoding factchecks as structured data so they can be presented in different places, from shareable widgets to search results (schema.org) Representing automated factchecking results Real time pop ups for audiences Truth Teller provides automated factchecking annotations for video clips based on previous human factchecks Claim tracker for factcheckers and journalists Full Fact provides a graph of how frequently claims have appeared over time, the details of where the claim has appeared

References The state of Automated Factchecking, http://fullfact.org/automated Ciampaglia, Giovanni Luca, Prashant Shiralkar, Luis M. Rocha, Johan Bollen, Filippo Menczer, and m Knowledge Networks". In: PLoS ONE 10.6 (June 2015), e0128193. http://dx.doi.org/10.1371/journal.pone.0128193