Factual Claim Validation Models

Slides:



Advertisements
Similar presentations
Internet Search Methods 1.01 Understand Internet search tools and methods.
Advertisements

© Paradigm Publishing, Inc Word 2010 Level 2 Unit 1Formatting and Customizing Documents Chapter 2Proofing Documents.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Chapter 12: Expert Systems Design Examples
Information Retrieval in Practice
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.
Overview of Search Engines
An innovative platform to allow translation and indexing of internet sites Localization World
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
Apache Lucene in LexGrid. Lucene Overview High-performance, full-featured text search engine library. Written entirely in Java. An open source project.
What's the story with open source? Searching and monitoring news media with open source technology Charlie Hull, Flax BCS IRSG Search Solutions 2010 Photo.
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Text Mining In InQuery Vasant Kumar, Peter Richards August 25th, 1999.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
XP Chapter 3 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Analyzing Data For Effective Decision Making Chapter.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
Retrieval 1/2 BDK12-5 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Access 2010 by Robert Grauer, Keith Mast, and Mary Anne.
An Introduction to NHS Evidence
InK4DEV Week – Information and Knowledge for Development, 4th Edition Entebbe, Uganda (24 th – 28 th Sept, 2012) CTA is an ACP-EU institution working in.
 Packages:  Scrapy, Beautiful Soup  Scrapy  Website  
Yahoo! BOSS Open up Yahoo!’s Search data via web services Developer & Custom Tracks Big Goal – If you’re in a vertical and you perform a search, you should.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Harnessing the Deep Web : Present and Future -Tushar Mhaskar Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy January 7,
High performance, full-featured text search engine written in Java. Technology suitable for nearly any application requiring full-text search, especially.
Introducing Precictive Analytics
Information Retrieval in Practice
4/19/ :02 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Fuzzy Searches Fuzzy searching allows you to search for words with similar spelling to the entered search word. It can be a useful way to ensure that you.
More experiments on FND & Literature Review on Fact Checking
Review on Fact Checking and Automatic Fact Checking Systems
Searching and Indexing
Extraction of relevant Evidences of Factual Claim Validation
Factual Claim Validation Models Extraction of Evidence
1.01- Understand Internet search tools and methods.
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
Personalized, Interactive Question Answering on the Web
Defining Data-intensive computing
BMA-IBT-2 Apply technology as a tool to increase productivity to create, edit, and publish industry appropriate documents.
Deep Learning Research & Application Center
Internet Search Methods
Internet Search Methods
Introduction to Information Retrieval
Internet Search Methods
Text Mining & Natural Language Processing
Spreadsheets, Modelling & Databases
Internet Search Methods
Internet Search Methods
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
Internet Search Methods
Bryan Soltis – Kentico Technical Evangelist
Information Retrieval and Web Design
INF 141: Information Retrieval
Indexing with ElasticSearch
Topic: Semantic Text Mining
Journal of Web Semantics 55 (2019)
Anirban Laha and Vikas C. Raykar, IBM Research – India.
Presentation transcript:

Factual Claim Validation Models Deep Learning Research & Application Center October 2017 Claire Li

Available fact checking tools ClaimBuster Google search API and other free ones Claim Validation Model with RNN

Available fact checking tools Automated fact checking projects vary in what kinds of sources they deal with, what kinds of claims they deal with, and what topics they deal with

Narrow scope is the key for practical tools for fact-checkers claimBuster political sentences currently Based on machine learning models As a ranking and classification task Fake news detection as a stance classification task

Claim Validation with ClaimBuster Scoring sentences: Classification &scoring models, features of tokens and tokens of PoS Similarity calculation: Similarity of token & Sematic similarity from semilar Retrieve evidence: Context from google SE; Ans from wolfram alpha& Google answer box; Verdicts from above Monitors & retrieves sentences

Claim Validation with ClaimBuster Given a factual claim which is scored Search in a repository for similar claims that have already been fact-checked by professionals (claim matcher) Sematic similarity match (3-10) spots the matched fact-checked claims Returns the truth rating if any Otherwise goto 1) ClaimBuster is not able to produce a verdict processes search engine results for evidence based on the similarity to the input claim Use question-answering systems translate the natural language claim into questions queries external knowledge bases (Google Answer Boxer and Wolfram Alpha ) with derived questions

Search in a repository for similar claims that have already been fact-checked by professionals, e.g. claim (string) the matched fact-checked claim host (string) the source of the fact-check search (string) the search measure which yielded the fact-match similarity_rating (number) 3-10 for a good match speaker (string) speaker of fact-checked claim truth_rating (string) true, false, pants on fire, indeterminate url the URL location of the matched fact-check

Processes search engine results for evidence based on the similarity to the input claim, ex sentence (string) an anchor sentence is the one which has a high similarity score to the input claim context (array[string]) a context is composed of, some sentences to the left of the anchor + the anchor sentence + some sentences to the right of the anchor similarity_rating (number) 0-1, measure between input claim and anchor url (string) url of context host (string) the hostname of the URL

Use question-answering systems, e.g. answer_box_html (string, optional) Complete raw html where the justification was extracted from Google Answer Boxes justification (string) Either the text scraped from the Google Answer Box or the Wolframalpha response question (string) question which was derived from your input claim and subsequently input into the question answering system specified in the source parameter source (string) either Google Answer Boxes or Wolfram Alpha API truth_rating (string, optional) If the truth value of true, false, pants on fire, indeterminate is inferable

Use a world knowledge base of fact-checked statements Google Answer Boxer: what is the time in Hong Kong Wolfram Alpha: How many undocumented people in United States?

Google custom search API & Wolfram|Alpah API pricing By default, the Google Custom Search API has a quota of 100 queries per day. If you exceed this quota, you can upgrade to 1000 queries per day for one month for $5

Free Open Source Search Engines Information retrieval from free open source search engines Given claims spotted, search for documents contain relevant fact checks or evidences Ranking and classification problem Apache Lucene, in Java, cross-platform fuzzy searches: e.g. roam~0.8, find terms similar in spelling to roam as 0.8 proximity query: e.g.,   "Barack michellea"~10 range query, title:{Aida TO Carmen} phrase query: e.g., “new york " used by infomedia, Bloomberg , and Twitter’s real time searching Apache Solr (better for text search) and Elastic Search (better for complex time series search and aggregations) Solr/elasticsearch are built on top of Lucene Basic Queries, text: obama, all docs with text field containing obama Phrase query, text: “Obama michellea” Proximity query, text: ”big analytics”~1, big analytics, big data analytics Boolean query, solr AND search OR facet NOT highlight Range query, age: [18 To 30] Used by Netflix, eBay, Instagram, and Amazon CloudSearch

Claim Validation Model with RNN [1][2] Monitor Model Claim Spotting Model Claim Verdict Model Create & publish LSTMs True Mostly true Half true Half false Mostly false False LSTMs

Claim Validation Model: extraction of evidence

Claim Verdict Model: Claim Validation True, mostly true, half true, half false, mostly false, false

Works with RNN CNN- and LSTM-based Claim Classification in Online User Comments, COLING 2016 Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM Fake News Detection using Stacked Ensemble of Classifiers, nlpj2017 Identification and Verification of Simple Claims about Statistical Properties, emnlp 2015