Download presentation
Presentation is loading. Please wait.
Published byAnna Lester Modified over 6 years ago
1
Factual Claim Validation Models Extraction of Evidence
Deep Learning Research & Application Center 23 October 2017 Claire Li
2
Construct an claim-evidence training corpus
Hybrid Claim Validation (Fact Validation [1]) Approach with deep learning Construct an claim-evidence training corpus Crawl claims with truth labels from large archives of fact-checked bolgs PolitFact Channel 4 Extract the related evidence over a given claim Obtain the related evidence with respect to the given claim from claimBuster through their API
3
Claim Validation Approaches
The problem statement Input: give a claim and set of articles of facts Output: the truth label of the claim (true, mostly true, mostly false/barely true, false ) Approaches Semantic Similarity based, for the repetition and paraphrase claims Calculate the semantic similarity between the given claim and the already fact-checked ones, return the label in K-nearest neighbor Deep learning model , for novel Claim Validation For a claim with associating evidences, learn the support/deny features from the related evidences, and use the learned features to verify the new claim-evidence component Extract the related evidences over a claim based on semantic similarity Models Construct the claim-evidence training corpus by extending Liars evidence which include meta-data such as url, speaker etc Search knowledge base or google search engine, for world knowledge claims (e.g. population, GDP rate) Wolfram Alpha search API Wikipedia – calculation needed
4
Extract the relevant evidences given a claim based on semantic similarity Models
Get true claims (true-1683, mostly true-1966) and false claims (mostly false/barely true-1657, false-1998) from Liars dataset Relevant evidence Retrieval: for each claim, use google search engine to get the top-20 HTML pages and extract the textual context using BoilerPipe for each document with the textual context, measure the semantic similarity between the sentences in the relevant documents and the given claim Word2vec based Word embedding for a semantic spaces of either similarities or relatedness [2] Achieved by learning from both a general corpus and a specialized thesaurus Semilar – an open source platform for similarity in the document/paragraph/sentences level Create triplets(subject-verb-object) of sentences in the document using Stanford parser. And then create a triplet of the claim and find similarity between these triplets,
5
Construct claim-evidence training corpus through claimBuster API
Liars dataset Get true claims (true-1683, mostly true-1966) and false claims (mostly false-1657, false-1998) Through claimBuster API to retrieval the one-to- many claim-evidence pairs
6
Examples Claim1-true: I belong to the AFL-CIO., rick-perry Governor
Claim2-true: Warren (Buffett) still does support me. barack-Obama President Claim3-false: We now have driven (health care) costs down to the lowest theyve been in 50 years.hillary- clinton Presidential candidate Claim4-false: The top 1 percent pay over half of the entire revenue for this country.trent-franks U.S. Representative Claim5-true: Our tax code has nearly doubled since 1985, Roy Blunt
7
CNN concatenating LSTM
8
LSTM True, mostly true, half true, half false, mostly false, false
9
Works with RNN DeFacto - Temporal and Multilingual Deep Fact Validation, Web Semantics: Science, Services and Agents on the World Wide Web, 35:85–101, Github validating statements by finding confirming sources for it on the web. It takes a statement (such as "Jamaica Inn was directed by Alfred Hitchcock") as input and then tries to find evidence for the truth of that statement by searching for information in the web. Specializing Word Embeddings for Similarity or Relatedness, 2015, EMNLP Sentence similarity based on semantic kernels for intelligent text retrieval, Journal of Intelligent Information Systems, 2017 CNN- and LSTM-based Claim Classification in Online User Comments, COLING 2016 Fake News Detection using Stacked Ensemble of Classifiers, nlpj2017 From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles, nlpj2017 Detecting of the stance of claim with regard to a piece of evidence Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM
10
Works on Fact Validation
DeFacto - A Multilingual Fact Validation Interface, Journal of Web semantics, 2015 DeFacto - Temporal and Multilingual Deep Fact Validation, Web Semantics: Science, Services and Agents on the World Wide Web, 35:85–101, 2015 Triple Scoring Using a Hybrid Fact Validation Approach: The Catsear Triple Scorer at WSDM Cup 2017
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.