Factual Claim Validation Models Extraction of Evidence

Slides:



Advertisements
Similar presentations
Understanding Tables on the Web Jingjing Wang. Problem to Solve A wealth of information in the World Wide Web Not easy to access or process by machine.
Advertisements

Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Automatic Web Page Categorization by Link and Context Analysis Giuseppe Attardi Antonio Gulli Fabrizio Sebastiani.
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Multi-agent Research Tool (MART) A proposal for MSE project Madhukar Kumar.
Artificial intelligence project
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
Ihr Logo Chapter 7 Web Content Mining DSCI 4520/5240 Dr. Nick Evangelopoulos Xxxxxxxx.
Ontological Classification of Web Pages Zafer Erenel Many users use search engines to locate and buy goods and services (such as choosing a vacation).
Pete Bohman Adam Kunk.  ChronoSearch: A System for Extracting a Chronological Timeline ChronoChrono.
A Language Independent Method for Question Classification COLING 2004.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Search Tools and Search Engines Searching for Information and common found internet file types.
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
Deep Questions without Deep Understanding
Crawling the Hidden Web Authors: Sriram Raghavan, Hector Garcia-Molina VLDB 2001 Speaker: Karthik Shekar 1.
1 Centroid Based multi-document summarization: Efficient sentence extraction method Presenter: Chen Yi-Ting.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Unsupervised Sparse Vector Densification for Short Text Similarity
Topic Modeling for Short Texts with Auxiliary Word Embeddings
CNN-RNN: A Unified Framework for Multi-label Image Classification
PNFS: PERSONALIZED NEWS FILTERING & SUMMARIZATION ON THE WEB
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Review on Fact Checking and Automatic Fact Checking Systems
Relation Extraction CSCI-GA.2591
Information Retrieval and Web Search
Search Engine Architecture
Extraction of relevant Evidences of Factual Claim Validation
A Deep Learning Technical Paper Recommender System
Test Review Be prepared to provide an answer.
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
Personalized, Interactive Question Answering on the Web
Information Retrieval and Web Search
Project Implementation for ITCS4122
Machine Learning Week 1.
Authors: S. Volkova, J. JanG, Presenter: Maria Glenski
Academic Reading Skill: Distinguishing between fact, opinion, and claim LA Times article: “Charter and private schools might not make the grade either”
Factual Claim Validation - Domains, topics, and task specification
Deep Learning Research & Application Center
Seminar Topics and Projects
Word embeddings based mapping
Word embeddings based mapping
Factual Claim Validation - Domains, topics, and task specification
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Identify Different Chinese People with Identical Names on the Web
Statistical n-gram David ling.
Measuring Complexity of Web Pages Using Gate
Text Mining & Natural Language Processing
Search Engine Architecture
Enriching Taxonomies With Functional Domain Knowledge
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
CS565: Intelligent Systems and Interfaces
Automatic Handwriting Generation
Human-object interaction
Factual Claim Validation Models
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presented By: Harshul Gupta
Stance Classification for Fact-Checking
Learning to Detect Human-Object Interactions with Knowledge
Huawei CBG AI Challenges
Presentation transcript:

Factual Claim Validation Models Extraction of Evidence Deep Learning Research & Application Center 23 October 2017 Claire Li

Construct an claim-evidence training corpus Hybrid Claim Validation (Fact Validation [1]) Approach with deep learning Construct an claim-evidence training corpus Crawl claims with truth labels from large archives of fact-checked bolgs PolitFact Channel 4 Extract the related evidence over a given claim Obtain the related evidence with respect to the given claim from claimBuster through their API

Claim Validation Approaches The problem statement Input: give a claim and set of articles of facts Output: the truth label of the claim (true, mostly true, mostly false/barely true, false ) Approaches Semantic Similarity based, for the repetition and paraphrase claims Calculate the semantic similarity between the given claim and the already fact-checked ones, return the label in K-nearest neighbor Deep learning model , for novel Claim Validation For a claim with associating evidences, learn the support/deny features from the related evidences, and use the learned features to verify the new claim-evidence component Extract the related evidences over a claim based on semantic similarity Models Construct the claim-evidence training corpus by extending Liars evidence which include meta-data such as url, speaker etc Search knowledge base or google search engine, for world knowledge claims (e.g. population, GDP rate) Wolfram Alpha search API Wikipedia – calculation needed

Extract the relevant evidences given a claim based on semantic similarity Models Get true claims (true-1683, mostly true-1966) and false claims (mostly false/barely true-1657, false-1998) from Liars dataset Relevant evidence Retrieval: for each claim, use google search engine to get the top-20 HTML pages and extract the textual context using BoilerPipe for each document with the textual context, measure the semantic similarity between the sentences in the relevant documents and the given claim Word2vec based Word embedding for a semantic spaces of either similarities or relatedness [2] Achieved by learning from both a general corpus and a specialized thesaurus Semilar – an open source platform for similarity in the document/paragraph/sentences level Create triplets(subject-verb-object) of sentences in the document using Stanford parser. And then create a triplet of the claim and find similarity between these triplets, https://github.com/SilentFlame/Fact-Checker

Construct claim-evidence training corpus through claimBuster API Liars dataset Get true claims (true-1683, mostly true-1966) and false claims (mostly false-1657, false-1998) Through claimBuster API to retrieval the one-to- many claim-evidence pairs

Examples Claim1-true: I belong to the AFL-CIO., rick-perry Governor Claim2-true: Warren (Buffett) still does support me. barack-Obama President Claim3-false: We now have driven (health care) costs down to the lowest theyve been in 50 years.hillary- clinton Presidential candidate Claim4-false: The top 1 percent pay over half of the entire revenue for this country.trent-franks U.S. Representative Claim5-true: Our tax code has nearly doubled since 1985, Roy Blunt

CNN concatenating LSTM

LSTM True, mostly true, half true, half false, mostly false, false

Works with RNN DeFacto - Temporal and Multilingual Deep Fact Validation, Web Semantics: Science, Services and Agents on the World Wide Web, 35:85–101, 2015. Github https://github.com/SmartDataAnalytics/DeFacto validating statements by finding confirming sources for it on the web. It takes a statement (such as "Jamaica Inn was directed by Alfred Hitchcock") as input and then tries to find evidence for the truth of that statement by searching for information in the web. Specializing Word Embeddings for Similarity or Relatedness, 2015, EMNLP Sentence similarity based on semantic kernels for intelligent text retrieval, Journal of Intelligent Information Systems, 2017 CNN- and LSTM-based Claim Classification in Online User Comments, COLING 2016 Fake News Detection using Stacked Ensemble of Classifiers, nlpj2017 From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles, nlpj2017 Detecting of the stance of claim with regard to a piece of evidence Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM

Works on Fact Validation DeFacto - A Multilingual Fact Validation Interface, Journal of Web semantics, 2015 DeFacto - Temporal and Multilingual Deep Fact Validation, Web Semantics: Science, Services and Agents on the World Wide Web, 35:85–101, 2015 Triple Scoring Using a Hybrid Fact Validation Approach: The Catsear Triple Scorer at WSDM Cup 2017