Extraction of relevant Evidences of Factual Claim Validation

Slides:



Advertisements
Similar presentations
Web indexing ICE0534 – Web-based Software Development July Seonah Lee.
Advertisements

Natural Language Processing WEB SEARCH ENGINES August, 2002.
1 LE 4000 ENGLISH FOR ACADEMIC PURPOSES STEP 2 Gathering academic information The Internet & Other academic sources.
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
Web Searching and PPT. Searching for Information on the Web  Goals: –Decrease number of search results –Increase number of relevant results  Method:
Topics in Technology and Marketing Week 4 Recap. Assignments and Grading Mid-term assignment (individual) - 20% of final grade: Identify an actual/imaginary.
1 Advanced Searching Use Query Languages. Use more than one search engine. –Or metasearches like at Start with simple searches. Add.
Search Engine Optimization By Andy Smith | Art Institute of Dallas.
Topics in Technology and Marketing Pull Marketing: Search Engine Optimization.
SEO Lunch How to Grow A Business in 3 Bites Akiva Ben-Ezra
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Internet Research, Second Edition- Illustrated 1 Internet Research: Unit A Searching the Internet Effectively.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Reliable Sources Six questions to ask to determine the trustworthiness of an internet source.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Fourth Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
EDT321EDT321 1 Summer Session Web Search and Evaluation How to find a web site How to know it is useful.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Wading Through the Web Conducting Research on the Internet.
21cif.imsa.edu Search Wizard Evaluation Wizard Citation WizardSearch WizardEvaluation WizardCitation Wizard Advanced Search Five Keys to Power Searching*
Search Engine Marketing SEM = Search Engine Marketing SEO = Search Engine Optimization optimizing (altering/changing) your page in order to get a higher.
How is the process of publishing printed material
Hypersearching the Web, Chakrabarti, Soumen Presented By Ray Yamada.
NATIONAL AGENCY FOR EDUCATION Check the Source! - Web Evaluation
How to Research– Finding RELIABLE Information. Getting Started  Where is the first place you go when you start researching a project?  Google, Wikipedia,
Research and the Internet Finding and evaluating the credibility of internet resources.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
How to Research– Finding RELIABLE Information. Getting Started  Where is the first place you go when you start researching a project?  Google, Wikipedia,
Company LOGO In the Name of Allah,The Most Gracious, The Most Merciful King Khalid University College of Computer and Information System Websites Programming.
CIW Lesson 6MBSH Mr. Schmidt1.  Define databases and database components  Explain relational database concepts  Define Web search engines and explain.
SEO for Google in Hello I'm Dave Taylor from Webmedia.
Research Skills for Your Essay Where to begin…. Starting the search task for real Finding and selecting the best resources are the key to any project.
A Pocket Guide to Public Speaking Pages Google and Yahoo may lead to false or biased information.
Searching the Web for academic information Ruth Stubbings.
Search Engine Optimization
CSC 102 Lecture 12 Nicholas R. Howe
Clustering of Web pages
WEB SPAM.
Lesson 6: Databases and Web Search Engines
AUTOMATICALLY CITE YOUR SOURCES FOR FREE AT
Assessing Credibility
Factual Claim Validation Models Extraction of Evidence
Are my Sources Reliable?
Using Credible Internet Sources
Evaluating Web Resources
SEARCH ENGINE OPTIMIZATION SEO. What is SEO? It is the process of optimizing structure, design and content of your website in order to increase traffic.
GUIDE BY- SEOCZAR IT SERVICESSEOCZAR IT SERVICES.
AUTOMATICALLY CITE YOUR SOURCES FOR FREE AT
A Comparative Study of Link Analysis Algorithms
AUTOMATICALLY CITE YOUR SOURCES FOR FREE AT
AUTOMATICALLY CITE YOUR SOURCES FOR FREE AT
Wading Through the Web Conducting Research on the Internet
Searching EIT, Author Gay Robertson, 2017.
Conducting Online Research
AUTOMATICALLY CITE YOUR SOURCES FOR FREE AT
Lesson 6: Databases and Web Search Engines
AUTOMATICALLY CITE YOUR SOURCES FOR FREE AT
A SPEAKER’S GUIDEBOOK 4TH EDITION CHAPTER 10
Wading Through the Web Conducting Research on the Internet
Hubs and Authorities & Learning: Perceptrons
Enriching Taxonomies With Functional Domain Knowledge
AUTOMATICALLY CITE YOUR SOURCES FOR FREE AT
Searching the Internet
AUTOMATICALLY CITE YOUR SOURCES FOR FREE AT
Factual Claim Validation Models
Presentation transcript:

Extraction of relevant Evidences of Factual Claim Validation Deep Learning Research & Application Center 31 October 2017 Claire Li

Claim Validation Verify claims with respect to the evidences measurement of relatedness measurement of reliability News database for reliability assessment Meta Search Engine

Claim Validation Approaches The problem statement Input: give a claim and set of articles of facts Output: the truth label of the claim (true, mostly true, mostly false/barely true, false ) Approaches: measurement of relatedness and reliability Semantic Similarity based, for the repetition and paraphrase claims Calculate the semantic similarity between the given claim and the already fact-checked ones, return the label in K-nearest neighbor Deep learning model , for novel Claim Validation For a claim with associating evidences, learn the support/deny features from the related evidences, and use the learned features to verify the new claim-evidence component Extract the related evidences over a claim based on semantic similarity Models Construct the claim-evidence training corpus by extending Liars evidence which include meta-data such as url, speaker etc Search knowledge base or google search engine, for world knowledge claims (e.g. population, GDP rate) Wolfram Alpha search API Wikipedia – calculation needed

Extract the relevant evidences given a claim based on semantic similarity Models Get true claims (true-1683, mostly true-1966) and false claims (mostly false/barely true-1657, false-1998) from Liars dataset Relevant evidence Retrieval: for each claim, use google search engine/meta search engine to get the top-20 HTML pages and extract the textual context using BoilerPipe for each document with the textual context, measure the semantic similarity between the sentences in the relevant documents and the given claim Semilar – an open source platform for similarities in the document/paragraph/sentences level English only Possible for Chinese by developing LSA/LDA models using Semilar API Word2vec based Word embedding for a semantic spaces of either similarities or relatedness [2] Achieved by learning from both a general corpus and a specialized thesaurus From scratch- Create triplets(subject-verb-object) of sentences in the document using Stanford parser. And then create a triplet of the claim and find similarity between these triplets, https://github.com/SilentFlame/Fact-Checker

Meta Search Engine Consults with several search engines and combines answers Dogpile  Accept customizable list of search engines, directories and specialty search sites Winner of Best Meta Search Engine award from Search Engine Watch for 2003 SurfWax Using the “SiteSnaps” feature, you can preview any page in the results and see where your terms appear in the document. Allows results or documents to be saved for future use Vivisimo not only pull back matching responses from major search engines but also automatically organize the pages into categories Metacrawlers news searching is also offered

Semilar API based Similarity calculation based on Wordnet 3.0 Latent Semantic Analysis (LSA) model Latent Dirichlet Allocation (LDA) model N-gram overlap BLEU Meteor Pointwise Mutual Information (PMI) Syntactic dependency based methods optimized methods based on Quadratic Assignment Semantic Similarity & semantic relatedness granularities Word-to-word Sentence-to-sentence, demo LSA space based or develop from word-to-word model Paragraph-to-paragraph Document-to-document LDA model using whole Wikipedia articles and TASA corpus word-to-sentence paragraph-to-document Combination of the above Corpus TASA corpus and the Wikipedia

Sentence-to-sentence, demo "I belong to the AFL-CIO", said rick perry. "I did not belong to the AFL-CIO", said rick perry. – 0.78 "I am a member of the AFL-CIO", said rick perry. - 0.89 Rick Perry's AFL-CIO Membership. - 0.66

Measurement of evidence reliability Opensources: a curated resource for assessing online information sources in fake, satire, bias, conspiracy, rumor, state, hate, clickbait, unreliable, reliable Title/Domain Analysis Appearing of .wordpress, .com.co in title/domain might be a sign of problem E.g. 70news.wordpress.com,questionable; aceflashman.wordpress, staire; deadlyclear.wordpress.com, bias, fake About Us Analysis Check for Wikipedia page with citations Source Analysis Does the website mention/link to studies, sources, quotes Writing Style Analysis frequently use ALL CAPS If the language is free of emotion/encourage, e.g. use the phrases like “WOW!”, ”Please” etc Social Media Analysis Check for attention attraction and likes/click-throughs/shares encourage

Measurement of evidence reliability The site is professional (domain analysis) Professional sites include: .edu/.gov/.mil/.museum/.aero Site is published and copyrighted Look on the webpage to see if the website has a sponsor or affiliation Check for bias Are there advertisements on the page? Check out the date & last updated date If an author list presented Compile a list of content farm sites

checkSource() – scoring the evidence reliability Compile and rank the most reliable websites and the least reliable websites from news database , opensources etc PolitiFact Channel 4 Opensources: http://www.opensources.co/ mediabiasfactcheck.com: left bias, left-center bias, least biased, right-certer bias, right bias,pro-science, satire, pseudoscience, questionable Right-center bias: these media sources are slightly to moderately conservative in bias Left-center bias: these media sources have a slight to moderate liberal bias and compile the trust levels for news Checklist by Ideological Group

Page Rank algorithm & HITS If a web page i contains a hyperlink to the web page j, then j is relevant for the web page i If there are a lot of pages that link to (in/out) j , j is important If j has one backlinks, but it comes from a credible site k likes .gov/.edu/www.google.com/www.wikipedia.org, k asserts j is reliable Credible HITS (Hyperlink-Induced Topic Search) To rate a web page ni, calculate its hub-authority scores, credible hub- authority, and incredible hub-authority scores Update ni's Authority score to be equal to the sum of the Hub Scores of each node that points to ni Update ni's Hub Score to be equal to the sum of the Authority Scores of each node that ni points to a page’s real/fake authority (incoming hub scores) represented the page that was linked by many different credible/incredible hubs Enhanced by how many are credible sites scored by checkSource() And how many are incredible sites scored by checkSource() a page’s real/fake hub (outgoing authority scores) represented a page that pointed to many other credible/incredible pages

Real/FakeHITS - Score of the real/fake HITS  Given a claim ci as the query, n1, n2,… nj are the top-j web pages returned by the search engine n1, n2,… nj are called the root set Construct the base set n1, n2,… nk, k >= j, iteratively from each ni in the root set by augmenting the root set with all the web pages that are linked from each ni (ni‘s outlink)and some of the pages that link to ni (backlink) Mozscape API to find ni‘s backlinks with the pageRank score Run checkSource() to get a scoreni of each ni , for i in [1,k] Return real/fakeHITS for each node ni in root set Extract evidence ei in the top-j web pages which are tired to the scores of real/fakeHITS

MozTrust Moz tools are free tools for link building and analysis, keyword research, webpage performance, local listing audits etc Measures a form of link equity that is tied to the “trustworthiness” of the linking website Link equity (also known as page rank) is the number of incoming links (backlinks) going to any given page on the target web site Determines MozTrust by calculating link “distance” between a given page and a “seed” site — a specific, known trust source (website) on the Internet The closer you are linked to a trusted website, the more trust you have score MozTrust on a logarithmic scale between 0 and 10 a higher score generally means more — and more trustworthy Domain-Level MozTrust is like regular MozTrust, but instead of being calculated between web pages, it is calculated between entire domains.

Related works Web credibility: Features exploration and credibility prediction, European conference on information retrieval, Springer (2013), pp. 557-568 Predicting webpage credibility using linguistic features, Proceedings of the 23rd international conference on world wide web, ACM (2014), pp. 1135-1140 Building trusted social media communities: A research roadmap for promoting credible content, Roles, trust, and reputation in social media knowledge markets, Springer (2015), pp. 35-43 Understanding and predicting Web content credibility using the Content Credibility Corpus, Information Processing and Management 53 (2017) 10