Download presentation
Presentation is loading. Please wait.
1
Factual Claim Validation Models
Deep Learning Research & Application Center October 2017 Claire Li
2
Available fact checking tools
ClaimBuster Google search API and other free ones Claim Validation Model with RNN
3
Available fact checking tools
Automated fact checking projects vary in what kinds of sources they deal with, what kinds of claims they deal with, and what topics they deal with
4
Narrow scope is the key for practical tools for fact-checkers
claimBuster political sentences currently Based on machine learning models As a ranking and classification task Fake news detection as a stance classification task
5
Claim Validation with ClaimBuster
Scoring sentences: Classification &scoring models, features of tokens and tokens of PoS Similarity calculation: Similarity of token & Sematic similarity from semilar Retrieve evidence: Context from google SE; Ans from wolfram alpha& Google answer box; Verdicts from above Monitors & retrieves sentences
6
Claim Validation with ClaimBuster
Given a factual claim which is scored Search in a repository for similar claims that have already been fact-checked by professionals (claim matcher) Sematic similarity match (3-10) spots the matched fact-checked claims Returns the truth rating if any Otherwise goto 1) ClaimBuster is not able to produce a verdict processes search engine results for evidence based on the similarity to the input claim Use question-answering systems translate the natural language claim into questions queries external knowledge bases (Google Answer Boxer and Wolfram Alpha ) with derived questions
7
Search in a repository for similar claims that have already been fact-checked by professionals, e.g.
claim (string) the matched fact-checked claim host (string) the source of the fact-check search (string) the search measure which yielded the fact-match similarity_rating (number) 3-10 for a good match speaker (string) speaker of fact-checked claim truth_rating (string) true, false, pants on fire, indeterminate url the URL location of the matched fact-check
8
Processes search engine results for evidence based on the similarity to the input claim, ex
sentence (string) an anchor sentence is the one which has a high similarity score to the input claim context (array[string]) a context is composed of, some sentences to the left of the anchor + the anchor sentence + some sentences to the right of the anchor similarity_rating (number) 0-1, measure between input claim and anchor url (string) url of context host (string) the hostname of the URL
9
Use question-answering systems, e.g.
answer_box_html (string, optional) Complete raw html where the justification was extracted from Google Answer Boxes justification (string) Either the text scraped from the Google Answer Box or the Wolframalpha response question (string) question which was derived from your input claim and subsequently input into the question answering system specified in the source parameter source (string) either Google Answer Boxes or Wolfram Alpha API truth_rating (string, optional) If the truth value of true, false, pants on fire, indeterminate is inferable
10
Use a world knowledge base of fact-checked statements
Google Answer Boxer: what is the time in Hong Kong Wolfram Alpha: How many undocumented people in United States?
11
Google custom search API & Wolfram|Alpah API pricing
By default, the Google Custom Search API has a quota of 100 queries per day. If you exceed this quota, you can upgrade to queries per day for one month for $5
12
Free Open Source Search Engines
Information retrieval from free open source search engines Given claims spotted, search for documents contain relevant fact checks or evidences Ranking and classification problem Apache Lucene, in Java, cross-platform fuzzy searches: e.g. roam~0.8, find terms similar in spelling to roam as 0.8 proximity query: e.g., "Barack michellea"~10 range query, title:{Aida TO Carmen} phrase query: e.g., “new york " used by infomedia, Bloomberg , and Twitter’s real time searching Apache Solr (better for text search) and Elastic Search (better for complex time series search and aggregations) Solr/elasticsearch are built on top of Lucene Basic Queries, text: obama, all docs with text field containing obama Phrase query, text: “Obama michellea” Proximity query, text: ”big analytics”~1, big analytics, big data analytics Boolean query, solr AND search OR facet NOT highlight Range query, age: [18 To 30] Used by Netflix, eBay, Instagram, and Amazon CloudSearch
13
Claim Validation Model with RNN [1][2]
Monitor Model Claim Spotting Model Claim Verdict Model Create & publish LSTMs True Mostly true Half true Half false Mostly false False LSTMs
15
Claim Validation Model: extraction of evidence
16
Claim Verdict Model: Claim Validation
True, mostly true, half true, half false, mostly false, false
17
Works with RNN CNN- and LSTM-based Claim Classification in Online User Comments, COLING 2016 Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM Fake News Detection using Stacked Ensemble of Classifiers, nlpj2017 Identification and Verification of Simple Claims about Statistical Properties, emnlp 2015
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.