More experiments on FND & Literature Review on Fact Checking

More experiments on FND & Literature Review on Fact Checking
Deep Learning Research & Application Center 25 September 2017 Claire Li

New experiments with fake
New experiments with fake.csv dataset using content only and eliminating the sentences less than 20 words (average length: 458) New experiments with LIAR dataset (average length: 17.9) Review on fact checking

Fake.csv with content only (diffFN-0920), stacked LSTM
For 4,000 validating data FN: 44; FP: 127; TN: 1,852; TP: 1,976 Precision: 0.94; Recall: 0.98; F1: 0.96; Accuracy: 0.96

Fake.csv with content only, eliminating sentences less than 20 words, stacked LSTM
For 4,000 validating data FN: 49; FP: 44; TN: 1,996; TP: 1,911 Precision: 0.98; Recall: 0.97; F1: 0.97; Accuracy: 0.98

Fake.csv with content only, eliminating sentences less than 20 words, Bi-directional LSTM
For 4,000 validating data

Liar, Liar Pants on Fire: A New Benchmark Dataset for Fake News Detection, arXiv preprint arXiv: , 2017

LIAR dataset from fact-checking website PolitiFact
12,836 human-labeled short statements from news releases, TV/radio, interviews, campaign speeches etc labels: pants-fire, false, barely-true, half-true, mostly true, and true

Experiment 1, Bi-directional RNN
False dataset (5,113): pants-fire, false, barely-true True dataset (6,440) : half-true, mostly true, and true Precision on Training:

Experiment 2, Bi-directional RNN
False dataset (3219): pants-fire, false, barely-true True dataset (4069) : half-true, mostly true, and true Precision on Training:

Review on Fact Checking
Fact checking is a way of knowledge-based news content verification, can be categorized as Expert-oriented relies on human domain experts to investigate relevant data and documents to construct the verdicts of claim veracity, e.g., PolitiFact11, Snopes12 Crowdsourcing-oriented exploits the “wisdom of crowd” to enable normal people based news content annotations which are then aggregated to produce an overall assessment of the news veracity, e.g., Fiskkit, ‘for real’ account of LINE, Twitter Trails Computational-oriented provide an automatic scalable system to classify true and false claims using open web and structured knowledge graph (e.g. Google Knowledge Graph Search API) identifying check-worthy claims discriminating the veracity of fact claims

Relationship rather than replacing the factchecker, the software's role is to make their work easier

Four stage factchecking processes [1]
Factchecking is the same four stage processes whether it’s done by humans or machines

Expert-oriented

Fact checking methodologies
Selecting process For what claims to evaluate Research methods techniques and sources that fact checkers use when conducting research on claims the official rules and editorial policies that govern their approaches Claim evaluation the systems and processes by which fact checkers establish the veracity of a claim Three major fact-checking organizations in the United States PolitiFact FactCheck.org The Washington Post Fact Checker

PolitiFact Began in 2007, a project of the Tampa Bay Times, a publishing company Rates the accuracy of claims by elected officials, candidates, leaders of political parties and political activists based on its Truth-O-Meter, from True to False, down to the lowest rating, Pants on Fire Monitoring Sources of statements to be checked comb through speeches, news stories, press releases, campaign brochures, TV ads, Facebook postings and transcripts of TV and radio interviews

Truth-O-Meter rulings
TRUE – The statement is accurate and there’s nothing significant missing. MOSTLY TRUE – The statement is accurate but needs clarification or additional information. HALF TRUE – The statement is partially accurate but leaves out important details or takes things out of context. MOSTLY FALSE – The statement contains an element of truth but ignores critical facts that would give a different impression. FALSE – The statement is not accurate. PANTS ON FIRE – The statement is not accurate and makes a ridiculous claim.

Principles in Truth-O-Meter rulings
Words matter Pay attention to the specific wording of a claim, e.g. mitigating words or phrases Context matters Full context related to a claim, e.g. comments, questions prompt to, points make Statements can be right and wrong Rate the overall accuracy of compound statements containing two or more factual assertions Timing The date a statement was made and the available information during the period

Process for Truth-O-Meter rulings
A writer researches the claim and writes the Truth- O-Meter article with a recommended rulings, say, true, mostly true, half true, half false, mostly false, pants on fire After the article is edited, it is reviewed by a panel of at least three editors that determines the Truth-O-Meter ruling

Corrections and review policy
For a factual error an editor's note will be added and labeled "CORRECTION" explaining how the article has been changed For clarifications or updates an editor's note will be added and labeled "UPDATE" explaining how the article has been changed For the significant mistakes reconvene the three-editor panel if there is a new ruling, rewrite the item and put the correction at the top indicating how it's been changed

FactCheck.org A project of the Annenberg Public Policy Center (APPC) of the University of Pennsylvania addresses public policy issues at the local, state and federal levels Aims to reduce the level of deception and confusion in U.S. politics Rates the accuracy of claims by focusing on presidential candidates in presidential election years on the top Senate races in midterm elections In off-election years, our primary focus is on the action in Congress Sources of statements to be checked Sunday talk shows; TV ads; C-SPAN; Presidential remarks; CQ Transcripts; Campaign and official websites, press releases and similar materials; Readers

Process: focus on false claims
Once a reporter or writer find a statement that we suspect may be inaccurate or misleading engage with the person or organization that is being fact-checked for supporting materials If supporting material does not support statement Check with the sources of information: the Library of Congress for congressional testimony; the House Clerk and Senate Secretary’s office for roll call votes; the Bureau of Labor Statistics for employment data; the Securities and Exchange Commission for corporate records; the IRS for tax data; the Bureau of Economic Analysis for economic data; and the Energy Information Administration for energy data also interview experts on other topics as needed – for instance, in researching issues on foreign countries, we would contact experts on those areas.

Process of story publishing
Line editing Is context missing? Is the writing clear? Is the word choice accurate? Copy editing for proper style and grammar Fact-checking goes through the story line by line, word by word Make sure every statement is correct By the time of publishing, a story has been reviewed by a line editor, copy editor, fact-checker and by the director of the APPC, Prof. Kathleen Hall Jamieson, a former dean of the Annenberg School for Communication at the University of Pennsylvania

Corrections and review Policy
If any new information comes to light after we publish a story that materially changes that story, we will clarify, correct or update our story and provide a note to readers that explains the change, why it was made and the date it was made

Computational-oriented

The State of Automated Factchecking - 2016
fullfact.org/automated intends to develop products that automate fact-checking tasks wherever possible, using statistical analysis and natural language processing technologies in real-time An open standards, with aims: Standard data formats (schema.org), so that any new automated factchecking tool can work with any known source. Shared monitoring systems, so we do not duplicate work unnecessarily. Open and shared evaluation, so we know what works and what it works for. Published roadmaps, to attract volunteers, researchers, partners and funders to work with us. Think global, so that where possible new automated factchecking tools are designed with the aim of being able to work for many languages and countries.

The state of automated factchecking

Full fact’s roadmap Hawk Stats Trends Robocheck
monitoring system, also spots claims have been factchecked technologies: CrowdTangle, Google Trends, Newswhip, Trendolizer, Trendsmap, and Signal monitor content and conversations on: Twitter, Facebook, YouTube, Reddit and online forums Stats automatically checking statistical claims Trends with output from Hawk, monitoring how common a claim is, where it is being made, and who is making it Robocheck a real time product provides subtitles of live TV, and add verdicts to claims using the results from Stats and Hawk

Automated factchecking Full fact.org

1. Popular Sources need to track (1)

1. Metadata on sources monitoring (2)
Who, where, and when Whether it is true to say “unemployment is rising”

2. Spot claims (1) Monitoring claims that have been factchecked before in new text open source search engines:Apache Solr and Elastic Search Percolator, Apache Lucence & Luwak Identifying new factual claims that have not been factchecked before in new text Use template like “x is rising” Use machine learning algorithms to detect ‘check- worthy claims’ or factual claims ClaimBuster: learn from labelled check-worthy sentences, identifies features they have in common, and looks for these features in new sentences Argumentative zoning : classify sentences into several different types, which can include identifying factual claims

2. Spot claims (2) Making editorial judgements about the priority of different claims The content approaches Through content of claims The contextual (information is relevant to an understanding of the text) approaches Identifying important content based on reach and engagement (social influence) Combining the content and context approaches Dealing with different phrasing for the same or similar claims A fully automated factchecking system should identify different phrases making the same claim, and distinguishing very similar phrases making different claims, as humans can Paraphrasing in factchecking is harder than in most NLP because precise wording can matter so much to a factchecker’s conclusion

3 . Check claims (1) Make sure the sources human factcheckers rely on are available as structured data computers can use Peter Norvig, Google’s Chief Scientist, explained with the successfulness of driverless cars: “we didn’t have better algorithms, we just had better data” Victoria and David Beckham are married Reference approaches - Look up their names in the register of marriages Statcheck (psychology), jEugene (legal doc) Instant answers from search engines do a similar job Machine learning approaches - Make a mathematical model of things we have known (e.g. from knowledge base [2]) True claims are likely to be closer together in a knowledge graph Contextual approaches - Look at how claims that they are married spread from reasoning about the social & other claims Claims that survive longer with reaction claims in open discussion than contradictory claims, so they probably are

3 . Check claims (2) Automated checking projects vary in what kinds of types of sources they deal with, what kinds of claims they deal with, and what topics they deal with

4. Create and publish (1) Automated journalism takes data and tries to make stories, while automated factchecking takes stories and tries to reverse-engineer the data or sources encoding factchecks as structured data so they can be presented in different places, from shareable widgets to search results (schema.org) Representing automated factchecking results Real time pop ups for audiences Truth Teller provides automated factchecking annotations for video clips based on previous human factchecks Claim tracker for factcheckers and journalists Full Fact provides a graph of how frequently claims have appeared over time, the details of where the claim has appeared

References The state of Automated Factchecking, Ciampaglia, Giovanni Luca, Prashant Shiralkar, Luis M. Rocha, Johan Bollen, Filippo Menczer, and m Knowledge Networks". In: PLoS ONE (June 2015), e

More experiments on FND & Literature Review on Fact Checking

Similar presentations

Presentation on theme: "More experiments on FND & Literature Review on Fact Checking"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

More experiments on FND & Literature Review on Fact Checking

Similar presentations

Presentation on theme: "More experiments on FND & Literature Review on Fact Checking"— Presentation transcript:

Similar presentations

About project

Feedback