Review on Fact Checking and Automatic Fact Checking Systems

Slides:



Advertisements
Similar presentations
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Advertisements

Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Detecting Near Duplicates for Web Crawling Authors : Gurmeet Singh Mank Arvind Jain Anish Das Sarma Presented by Chintan Udeshi 6/28/ Udeshi-CS572.
1 Sarah Cohen Public Policy, Duke U. Chengkai Li CSE, U. Texas Arlington Jun Yang CS, Duke U. Cong Yu Google Inc. CIDR, January 2011.
Overview of Search Engines
Library HITS Library HITS: Helpful Information for Trinity Students/Staff Library eResources for Sciences Michaelmas Term 2013 Trinity College Library.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Copyright © Allyn & Bacon 2008 This multimedia product and its contents are protected under copyright law. The following are prohibited by law: any public.
What's the story with open source? Searching and monitoring news media with open source technology Charlie Hull, Flax BCS IRSG Search Solutions 2010 Photo.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Search Engines and Information Retrieval Chapter 1.
Improving the Catalogue Interface using Endeca Tito Sierra NCSU Libraries.
Survey of Semantic Annotation Platforms
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
How to Validate a Website How Does a Search Engine Work? Click Movie To Start.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
The Internet 8th Edition Tutorial 4 Searching the Web.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Usability Issues in Metasearch Interface Design: persectives of an information provider LITA Human Machine Interface Interest Group June 25, 2004 Oliver.
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
Faceted browsing for ACL Anthology Praveen Bysani.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
+ The Use of Databases in the Instructional Program Increasing Rigor and Inquiry Throughout the Curriculum Donna Dick, Jacob Gerding, and Michelle Phillips.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Using the Internet for academic purposes Your Logo Birkbeck Library.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Report Writing Lecturer: Mrs Shadha Abbas جامعة كربلاء كلية العلوم الطبية التطبيقية قسم الصحة البيئية University of Kerbala College of Applied Medical.
DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.
© Prentice Hall, 2005 Excellence in Business CommunicationChapter Planning Business Reports and Proposals.
Introducing Precictive Analytics
TDM in the Life Sciences Application to Drug Repositioning *
Information Retrieval in Practice
Information Retrieval in Practice
REMOVE THIS SLIDE BEFORE PRESENTATION
Queensland University of Technology
Evaluation Anisio Lacerda.
Evaluating of Information
More experiments on FND & Literature Review on Fact Checking
Extraction of relevant Evidences of Factual Claim Validation
User guide to books at jstor
A Network Science Approach to Fake News Detection on Social Media
Building Search Systems for Digital Library Collections
Assessing Credibility
Factual Claim Validation Models Extraction of Evidence
Library Workshop for ENG1377 Exploring iSearch & Google Scholar
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Search Techniques and Advanced tools for Researchers
Legislative Influence Detector
Applying Key Phrase Extraction to aid Invalidity Search
Thanks to Bill Arms, Marti Hearst
TDM=Text Mining “automated processing of large amounts of structured digital textual content for purposes of information retrieval, extraction, interpretation.
Introduction of KNS55 Platform
Searching and browsing through fragments of TED Talks
Dr. Bhavani Thuraisingham The University of Texas at Dallas
Introduction to Information Retrieval
Statistical n-gram David ling.
Databases and Information Management
Information Retrieval and Web Design
ProQuest Databases.
Factual Claim Validation Models
Different Shades of Green
Journal of Web Semantics 55 (2019)
Connecting the Dots Between News Article
Presentation transcript:

Review on Fact Checking and Automatic Fact Checking Systems Deep Learning Research & Application Center 17 October 2017 Claire Li

Fact Checking Factual claims are those that can be verified The room measures ten feet by twelve feet Fact checking is a way of knowledge-based news content verification, it assigns a truth value (can be in some degree) to a factual claim made in a particular context Important features include Context, time, speaker, multiple sources (URL) , evidences, etc Fact-checkers verdict factual claims by investigating relevant data and documents and publish their verdicts Automatic fact checking systems consists of Pre-restrict the task to claims can be fact-checking objectively (scope of the task), for example Spot check-worthy factual claims, how, related publications: [1], [2] Verdict check-worthy factual claims automatically, how, related publications: [1], [2]; websites: fullfact.org, Hybrid technologies: deep learning approaches + reasoning techniques with world knowledge base Integrating existing tools

Unsuitable claims for the task of automatic fact-checking [2] assessing causal relations, e.g. whether a statistic should be attributed to a particular law concerning the future, e.g. speculations involving oil prices not concerning facts, e.g. whether a politician is supporting certain policies (e.g. opinions, believes) statements whose verdict relied on data that were not available online such as needing personal communications And more…

Automatic Fact-checking System collect information and data; analyze claims&extract evidence; match claims with evidence; validation and explanations Monitor Model Claim Spotting Model Claim Verdict Model Create & publish Extract natural language sentences from textual/audio sources; Separate factual claims from opinions, beliefs, hyperboles, questions

Spot claims worth checking [1][2] Match statements to ones already fact-checked claims (problem of K-nearest neighbor/semantic similarity between statements) Create a google Custom Search Engine for claim corpus [4] Google’s structured data with ClaimReview markup From fact-checking websites construct publicly available fact checked local database Hoax-Slayer, archives for fact-checked claims politFact, more than 6,000 fact-checks Google’s Schema.org: more than 7,000 fact-checking with ClaimReview markup channel 4, http://blogs.channel4.com/factcheck/ Washington post Calculating semantic similarities between sentences based on word2vec

ClaimReview as a subtype of Review. "A fact-checking review of claims made in some creative work." claimReviewed as a property of ClaimReview. "A short summary of the specific claims reviewed in a ClaimReview." author property on Review to indicate the organization behind the review. claimReviewSiteLogo on the (Claim)Review the fact-checking organization's logo. itemReviewed property the document that carries the claims being reviewed (which could include as shown here, offline newspaper articles). rating how the claim was judged by the article

<script type="application/ld+json"> { "@context": "http://schema.org", "@type": "ClaimReview", "datePublished": "2014-07-23", "url": "http://www.politifact.com/texas/statements/2014/jul/23/rick-perry/rick-perry-claim-about-3000-homicides-illegal-immi/", "author": { "@type": "Organization", "url": "http://www.politifact.com/", "sameAs": "https://twitter.com/politifact"}, "claimReviewed": "More than 3,000 homicides were committed by \"illegal aliens\" over the past six years.", "reviewRating": { "@type": "Rating", "ratingValue": 1, "bestRating": 6, "worstRating": "1", "alternateName": "False", "image": "http://static.politifact.com/mediapage/jpgs/politifact-logo-big.jpg"}, "itemReviewed": { "@type": "CreativeWork", "@type": "Person", "name": "Rich Perry", "jobTitle": "Former Governor of Texas", "image": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/15/Gov._Perry_CPAC_February_2015.jpg/440px-Gov._Perry_CPAC_February_2015.jpg"}, "datePublished": "2014-07-17", "name": "The St. Petersburg Times interview [...]" }}</script>

<script type="application/ld+json"> { "@context": "http://schema.org", "@type": "ClaimReview", "datePublished": "2016-06-22", "url": "http://example.com/news/science/worldisflat.html", "itemReviewed":{ "@type": "CreativeWork", "author": { "@type": "Organization", "name": "Square World Society", "sameAs": https://example.flatworlders.com/we-know-that-the-world-is-flat}, "datePublished": "2016-06-20“}, "claimReviewed": "The world is flat", "author":{ "name": "Example.com science watch“}, "reviewRating": { "@type": "Rating", "ratingValue": "1", "bestRating": "5", "worstRating": "1", "alternateName" : "False“} }</script>

Spot claims worth checking Identifying new factual claims that have not been fact checked before in new text Use machine learning algorithms to detect ‘check- worthy claims’, related publications ClaimBuster: a platform that allows you to score political sentences to assess how check-worthy they are uses a human-labeled dataset of check-worthy factual claims from the U.S. general election debate transcripts learns from labelled check-worthy sentences, identifies features they have in common, then looks for these features in new sentences

Claim Verdict Model Information retrieval from free open source search engines Given claims spotted, search for documents contain relevant fact checks or evidences Ranking and classification problem Apache Lucene, in Java, cross-platform fuzzy searches: e.g. roam~0.8, find terms similar in spelling to roam as 0.8 proximity query: e.g.,   "Barack michellea"~10 range query, title:{Aida TO Carmen} phrase query: e.g., “new york " used by infomedia, Bloomberg , and Twitter’s real time searching Apache Solr (better for text search) and Elastic Search (better for complex time series search and aggregations) Solr/elasticsearch are built on top of Lucene Basic Queries, text: obama, all docs with text field containing obama Phrase query, text: “Obama michellea” Proximity query, text: ”big analytics”~1, big analytics, big data analytics Boolean query, solr AND search OR facet NOT highlight Range query, age: [18 To 30] Used by Netflix, eBay, Instagram, and Amazon CloudSearch

Claim Verdict Model [1][2] Monitor Model Claim Spotting Model Claim Verdict Model Create & publish LSTMs True Mostly true Half true Half false Mostly false False LSTMs

Claim Verdict Model: extraction of evidence

Claim Verdict Model: Claim Validation True, mostly true, half true, half false, mostly false, false

Related works 2015-Computational Fact Checking from Knowledge Networks, PLoS One 2017towards automated fact-checking-detecting check-worthy factual claims by claimBuster 2017-Fully Automated Fact Checking Using External Sources 2017-ClaimBuster: The First-ever End-to-end Fact-checking System.pdf Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking, emnlp 2017

Reference Sarah Cohen, Chengkai Li, Jun Yang, and Cong Yu. 2011. Computational journalism: A call to arms to database researchers. In Proceedings of the Conference on Innovative Data Systems Research, volume 2011, pages 148–151. Fact Checking: Task definition and dataset construction.pdf In Proceedings of  the ACL 2014 Workshop on Language Technologies and Computational Social Science, Baltimore, MD N. Hassan, B. Adair, J. T. Hamilton, C. Li, M. Tremayne, J. Yang, and C. Yu. The quest to automate fact-checking. In Computation Journalism Symposium, 2015 Creating Custom Search Engine with configuration files https://developers.google.com/custom-search/docs/basics