Download presentation
Presentation is loading. Please wait.
Published byGeraldine Murphy Modified over 6 years ago
1
Factual Claim Validation - Domains, topics, and task specification
Problem Statement of Factual Claim Validation - Domains, topics, and task specification Deep Learning Research & Application Center 13 November 2017 Claire Li
2
Fact checking - claim validation
Most Popular fact checking web sites What are the popular domains What are the most concerned topics Proposed problem scope & task Domains and Topics Task Specification Agreement based Source-claim Iterative Models for Truth Discovery
3
The 6 best sites for manual political fact checking on the Internet
Politifact For finding the truth in American politics FactCheck.org for Getting Political Facts monitor the factual accuracy of what is said by the president and top administration officials, as well as congressional and party leaders focus on claims that are false or misleading, all stories with sources provided Washington Post‘s Fact Checker the site assesses claims made by politicians or political advocacy groups and gives out Pinochios (mostly true, half true, half false, whoppers) based on its level of accuracy OpenSecrets On money’s influence on U.S. elections and public policy by tracking tracks money in U.S. politics and its effect on elections and public policy RESTful APIs for machine-readable access to the data we display on OpenSecrets and OpenData available for academic researches The Sunlight Foundation sunlight uses the power of the Internet to catalyze greater government openness and transparency The open government work now takes place at the local, state, federal and international levels Citizens for Responsibility and Ethics in Washington (CREW) providing a necessary knowledge base for building a better political system
4
Popular Fact Check Websites for Urban Legends, Religion, Education economy
Snopes.com true, false, mostly true, mostly false, mixture, legend, unproven On the domain of Urban Legends and Rumors since 2007 topics include college90, computers198, religion72, food234, travel, business450, political news, horrors- 234, crimes432, history108, media matters324 Political figures: Trump922; Obama-338; hillary324 TruthOrFiction.com truth, fiction, mostly truth, mostly fiction, truth&fiction, commentary, unproven Reality Check Topics include education51, government380, food/drink150, religion360, health/medical80, holidays, immigration40, politics740, business60, crimes280, terrorism210, viruses, war Political figures: Obama210, trump20 ABC News negative ruling, positive ruling, in between determines the accuracy of claims by politicians, public figures, advocacy groups and institutions engaged in the public debate Topics include economy, immigration, environment, education, health Wikipedia Wikipedia founder Jimmy Wales has launched a crowd-funded news platform, WikiTribune, aims to combat fake news with the help of journalists and fact-checkers. initially launched in English and hopes to expand to other languages Media Bias/Fact Check (MBFC) True, mostly true, mostly false, blatant lie News in least biased, left bias news, lift-center bias news(slight to moderate liberal bias), right-center bias news, right bias news () moderately to strongly biased toward conservative causes
5
Snopes.com Example 1 (travel) Example 2 (Horrors) Example 3 (religion)
Individuals fleeing danger can request to be "unlisted" in a hotel so no one can find them. Unproven; further what’s true and what’s false analysis provided Example 2 (Horrors) Poisoned Halloween Candy false Example 3 (religion) Christ Church in Alexandria, Virginia, is "ripping out" a plaque dedicated to George Washington because it might offend people Mostly false; further what’s true and what’s false analysis provided Example 4 (computers) Accepting a friend request from a stranger will provide hackers with access to your computer and online accounts Example 5 (college) A student mistook examples of unsolved math problems for a homework assignment and solved them True Example 6 (food) Drinking cocktails from a copper mug can cause copper poisoning Mixture; further what’s true and what’s false analysis provided
6
TruthOrFiction.com Example 1(government) Example 2(food)
Bill Clinton’s Love Child, Danney Williams, Found Dead- Fiction! Example 2(food) McDonald’s McRib Sandwiches Made From Inverted Pork Rectums-Fiction! Example 3(religion) Sign at Swiss Hotel: Jewish Guests Must Shower Before Using Swimming Pool-Truth! Example 4(terrorism) Seddique Mateen Visits White House; Meets with President Obama-Unproven!
7
Examples, Abc.news Health, Do more people die in Australia than Sweden due to poorly heated homes? – overstated! Education, Do Australian taxpayers subsidise over half the cost of each student's higher education – incorrect!
8
即時核查(real time fact check)在香港&中國
事實核查在中國並未形成一個專門的崗位, 采編者天然承擔了事實核查的部分工作 愛讀網: ONE實驗室成為中國首 個設立事實核查崗位的媒體, 而事實核查員 其實也只有一個,劉洋 北京新浪網: 以李海鵬為首的 ONE 實驗室團隊解散 2017/03/21 港媒新嘗試即時核查特首選舉論壇 香港01 “整個團隊為約20人,來自不同的部門,由4名 較資深的同事,負責細聽候選人的說話,一聽到 有可疑之處,便分派記者立即查證。”例如: 林鄭月娥表示政府司局長落區「完全無警察安 排」,但香港01發現政府官員落區多次要警方布 防 端傳媒 “端傳媒請來十多名不同範疇的專家同時收看直 播,當專家察覺候選人的對話有問題時,便會由 記者翻查資料。” 例如: 就前財政司司長曾俊華稱”香港為世界三大金融 中心之一”查證,指「金融中心指數」去年4月公 布的排名,香港排行第四 林鄭月娥又提到居屋資產上限只是5萬多港元, 但事實上是85萬元
9
News helper
10
Problem Scope & Task Political news Urban legends &rumors
Two-step aims: provide an automatic live fact check/Truth discovery tool with domain and task specifications To retrieve fact-checked/world-knowledge-based claims with their truth labels based on the similarity calculation, to provide evidences with associating lists of web sources for novel claims to Journalist based on the truth discovery models, returns the veracity label and score of each data value as well as the trustworthiness scores of the sources [VERA] large dataset for building up & training in RNN for a domain specification practical system To automatically provide the suggested truth labels with confidence scores for the novel claims Urban legends &rumors Topics: food, computers, education, health, famous figures
11
Task Specification 即時核查 特首選舉論壇 城市論壇
12
Claim Validation Approaches
Approaches: measurement of relatedness and reliability Semantic Similarity based, for the repetition and paraphrase claims Calculate the semantic similarity between the given claim and the already fact- checked ones, return the label in K-nearest neighbor for novel Claim Validation Deep Learning Model Features claims, evidences, web sources with trustworthiness scores, speaker etc The trustworthiness or accuracy of a web source is the probability that it contains the correct value for a fact E.g. for a fact of Barack Obama’s nationality How to calculate the Web source trustworthiness? agreement based Source-claim Iterative Models Input: claims with true/false labels Output: disagreeing and agreeing web sources with trustworthiness socres Truth Discovery Model: VERA the first attempt to demonstrate truth discovery in action from Web data and Twitter data Search knowledge base or google search engine, for world knowledge claims (e.g. population, GDP rate) Wolfram Alpha search API Wikipedia – calculation needed
13
Claim Validation Architecture with Truth Discovery
trustworthiness scores of the web sources Claim Truth value Claim Query API Information Extraction: Tools: semilar Truth Discovery Multi-Source & Evidence Discovery Tools: TextRunner, DeepDive, TwitIE Entity/relationship Extraction Data Fusion Agreement based Source-claim Iterative Models
15
Agreement based Source-claim Iterative Models for Truth Discovery
Problem of Truth Discovery Given a set of assertions claimed by multiple sources, label each claimed value as true or false and compute the reliability of each source E.g., we combine evidences of claims from web sources of different trustworthiness to verify Claims Web sources might be an individual web page or a whole web site General principle of truth discovery If a web source provides trustworthy information frequently, it will be assigned a high reliability; meanwhile, if one piece of information is supported by web sources with high reliabilities, it will have big chance to be selected as truth
16
Iterative computation of source trustworthiness and claim belief
Input: Claims (si, dj, vk) Output label: true/false d1: Russia.CurrentPresident, v1:Medvedev, v2: Putin, v3: Yeltsin d2: USA.currentPresident, v4: Clinton, v5: Obama d3: France.currentPresident, v6: Hollande, v7: Sarkozy
17
Claims-Evidences-Sources (ex from claimBuster)
18
Iterative computation of source trustworthiness and claim belief
Select s of Ts Threshold 13 𝑇 𝑖 (s) = 𝑒∈ 𝐸 𝑠 𝑐 𝑖−1 𝑒 ; 𝐶 𝑖 (e) = 𝑠∈ 𝑆 𝑒 𝑇 𝑖 (𝑠) Iteration 1: Iteration 2: Iteration 3: Source s Trustworthiness Ts s1 s2 s3 s4 s5 s6 s7 e1 e2 e3 e4 e5 e6 e7 Initialization Iteration 1: Iteration 2: Iteration 3: Evidence e Confidence Ce Select e of Ce Threshold 21 Claim 1 T/F Claim 2 T/F Claim 3 T/F
19
Other Information Top 15 most popular celebrity gossip websites.
July 2017 Websites that Post Fake and Satirical Stories satirical-stories/ The list of Questionable sites The list of satire sites The list of Conspiracy-Pseudoscience
20
Related works TextRunner: Open Information Extraction on the Web, University of Washington, Computer Science and Engineering, NAACL HLT Demonstration Program A Review of Data Fusion Techniques, Hindawi Publishing Corporation,The ScientificWorld Journal Volume 2013 VERA: A Platform for Veracity Estimation over Web Data, WWW’16 Companion, April 11–15, 2016, Montréal, Québec, Canada, ACM /16/04 A Survey on Truth Discovery Methods for Big Data, International Journal of Computational Intelligence Research ISSN Volume 13, Number 7 (2017), pp Veracity of Big Data, CIKM2015 tutorial
21
香港政府新聞網 明鏡集團網 BBC中文網 Yahoo news google fact checking algorithm websites authority
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.