Factual Claim Validation Models Extraction of Evidence

Slides:

Advertisements

Similar presentations

Understanding Tables on the Web Jingjing Wang. Problem to Solve A wealth of information in the World Wide Web Not easy to access or process by machine.

Advertisements

Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.

Automatic Web Page Categorization by Link and Context Analysis Giuseppe Attardi Antonio Gulli Fabrizio Sebastiani.

Mining the Medical Literature Chirag Bhatt October 14 th, 2004.

MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.

1 The BT Digital Library A case study in intelligent content management Paul Warren

Multi-agent Research Tool (MART) A proposal for MSE project Madhukar Kumar.

Artificial intelligence project

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,

Ihr Logo Chapter 7 Web Content Mining DSCI 4520/5240 Dr. Nick Evangelopoulos Xxxxxxxx.

Ontological Classification of Web Pages Zafer Erenel Many users use search engines to locate and buy goods and services (such as choosing a vacation).

Pete Bohman Adam Kunk.  ChronoSearch: A System for Extracting a Chronological Timeline ChronoChrono.

A Language Independent Method for Question Classification COLING 2004.

1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.

Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.

1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)

Search Tools and Search Engines Searching for Information and common found internet file types.

A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.

Deep Questions without Deep Understanding

Crawling the Hidden Web Authors: Sriram Raghavan, Hector Garcia-Molina VLDB 2001 Speaker: Karthik Shekar 1.

1 Centroid Based multi-document summarization: Efficient sentence extraction method Presenter: Chen Yi-Ting.

LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.

Unsupervised Sparse Vector Densification for Short Text Similarity

Topic Modeling for Short Texts with Auxiliary Word Embeddings

CNN-RNN: A Uniﬁed Framework for Multi-label Image Classiﬁcation

PNFS: PERSONALIZED NEWS FILTERING & SUMMARIZATION ON THE WEB

Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :

Review on Fact Checking and Automatic Fact Checking Systems

Relation Extraction CSCI-GA.2591

Information Retrieval and Web Search

Search Engine Architecture

Extraction of relevant Evidences of Factual Claim Validation

A Deep Learning Technical Paper Recommender System

Test Review Be prepared to provide an answer.

Giuseppe Attardi Dipartimento di Informatica Università di Pisa

Personalized, Interactive Question Answering on the Web

Information Retrieval and Web Search

Project Implementation for ITCS4122

Machine Learning Week 1.

Authors: S. Volkova, J. JanG, Presenter: Maria Glenski

Academic Reading Skill: Distinguishing between fact, opinion, and claim LA Times article: “Charter and private schools might not make the grade either”

Factual Claim Validation - Domains, topics, and task specification

Deep Learning Research & Application Center

Seminar Topics and Projects

Word embeddings based mapping

Word embeddings based mapping

Factual Claim Validation - Domains, topics, and task specification

Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.

Identify Different Chinese People with Identical Names on the Web

Statistical n-gram David ling.

Measuring Complexity of Web Pages Using Gate

Text Mining & Natural Language Processing

Search Engine Architecture

Enriching Taxonomies With Functional Domain Knowledge

Giuseppe Attardi Dipartimento di Informatica Università di Pisa

CS565: Intelligent Systems and Interfaces

Automatic Handwriting Generation

Human-object interaction

Factual Claim Validation Models

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

Presented By: Harshul Gupta

Stance Classification for Fact-Checking

Learning to Detect Human-Object Interactions with Knowledge

Huawei CBG AI Challenges

Presentation transcript:

Factual Claim Validation Models Extraction of Evidence Deep Learning Research & Application Center 23 October 2017 Claire Li

Construct an claim-evidence training corpus Hybrid Claim Validation (Fact Validation [1]) Approach with deep learning Construct an claim-evidence training corpus Crawl claims with truth labels from large archives of fact-checked bolgs PolitFact Channel 4 Extract the related evidence over a given claim Obtain the related evidence with respect to the given claim from claimBuster through their API

Claim Validation Approaches The problem statement Input: give a claim and set of articles of facts Output: the truth label of the claim (true, mostly true, mostly false/barely true, false ) Approaches Semantic Similarity based, for the repetition and paraphrase claims Calculate the semantic similarity between the given claim and the already fact-checked ones, return the label in K-nearest neighbor Deep learning model , for novel Claim Validation For a claim with associating evidences, learn the support/deny features from the related evidences, and use the learned features to verify the new claim-evidence component Extract the related evidences over a claim based on semantic similarity Models Construct the claim-evidence training corpus by extending Liars evidence which include meta-data such as url, speaker etc Search knowledge base or google search engine, for world knowledge claims (e.g. population, GDP rate) Wolfram Alpha search API Wikipedia – calculation needed

Extract the relevant evidences given a claim based on semantic similarity Models Get true claims (true-1683, mostly true-1966) and false claims (mostly false/barely true-1657, false-1998) from Liars dataset Relevant evidence Retrieval: for each claim, use google search engine to get the top-20 HTML pages and extract the textual context using BoilerPipe for each document with the textual context, measure the semantic similarity between the sentences in the relevant documents and the given claim Word2vec based Word embedding for a semantic spaces of either similarities or relatedness [2] Achieved by learning from both a general corpus and a specialized thesaurus Semilar – an open source platform for similarity in the document/paragraph/sentences level Create triplets(subject-verb-object) of sentences in the document using Stanford parser. And then create a triplet of the claim and find similarity between these triplets, https://github.com/SilentFlame/Fact-Checker

Construct claim-evidence training corpus through claimBuster API Liars dataset Get true claims (true-1683, mostly true-1966) and false claims (mostly false-1657, false-1998) Through claimBuster API to retrieval the one-to- many claim-evidence pairs

Examples Claim1-true: I belong to the AFL-CIO., rick-perry Governor Claim2-true: Warren (Buffett) still does support me. barack-Obama President Claim3-false: We now have driven (health care) costs down to the lowest theyve been in 50 years.hillary- clinton Presidential candidate Claim4-false: The top 1 percent pay over half of the entire revenue for this country.trent-franks U.S. Representative Claim5-true: Our tax code has nearly doubled since 1985, Roy Blunt

CNN concatenating LSTM

LSTM True, mostly true, half true, half false, mostly false, false

Works with RNN DeFacto - Temporal and Multilingual Deep Fact Validation, Web Semantics: Science, Services and Agents on the World Wide Web, 35:85–101, 2015. Github https://github.com/SmartDataAnalytics/DeFacto validating statements by finding confirming sources for it on the web. It takes a statement (such as "Jamaica Inn was directed by Alfred Hitchcock") as input and then tries to find evidence for the truth of that statement by searching for information in the web. Specializing Word Embeddings for Similarity or Relatedness, 2015, EMNLP Sentence similarity based on semantic kernels for intelligent text retrieval, Journal of Intelligent Information Systems, 2017 CNN- and LSTM-based Claim Classification in Online User Comments, COLING 2016 Fake News Detection using Stacked Ensemble of Classifiers, nlpj2017 From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles, nlpj2017 Detecting of the stance of claim with regard to a piece of evidence Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM

Works on Fact Validation DeFacto - A Multilingual Fact Validation Interface, Journal of Web semantics, 2015 DeFacto - Temporal and Multilingual Deep Fact Validation, Web Semantics: Science, Services and Agents on the World Wide Web, 35:85–101, 2015 Triple Scoring Using a Hybrid Fact Validation Approach: The Catsear Triple Scorer at WSDM Cup 2017