Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Slides:



Advertisements
Similar presentations
1 Random Sampling from a Search Engines Index Ziv Bar-Yossef Department of Electrical Engineering, Technion Maxim Gurevich Department of Electrical Engineering,
Advertisements

Evaluating the Internet by Ms. Gould. Uses of the Internet The Internet has so much information available to find It can be used to find information or.
Evaluating Web Resources. Author/Institution n Who is the author or Institution? n Biographical info given n Institution? n Information given about institution?
22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop L EHIGH U NIVERSITY.
Using Propagation of Distrust to find Untrustworthy Web Neighborhoods Panagiotis Takis Metaxas Computer Science Department Wellesley College, USA ICIW2009.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
 How many pages does it search?  How does it access all those pages?  How does it give us an answer so quickly?  How does it give us such accurate.
Internet Resources Discovery (IRD) Search Engines Quality.
Searching the Web II. The Web Why is it important: –“Free” ubiquitous information resource –Broad coverage of topics and perspectives –Becoming dominant.
INFO 624 Week 3 Retrieval System Evaluation
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
What is the Internet? The Internet is a computer network connecting millions of computers all over the world It has no central control - works through.
SEARCH ENGINES By, CH.KRISHNA MANOJ(Y5CS021), 3/4 B.TECH, VRSEC. 8/7/20151.
Efficient Search Engine Measurements Maxim Gurevich Technion Ziv Bar-Yossef Technion and Google.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Research and the Internet Adapted from “Research and the Internet”, Online Writing Lab (OWL), Purdue University.
Adversarial Information Retrieval The Manipulation of Web Content.
Using sources in your Advanced Higher Investigation.
Searching Google Ms. Mary Condon Librarian Lowell Catholic High School.
The Savvy Cyber Teacher ® Using the Internet Effectively in the K-12 Classroom Copyright  2001 Stevens Institute of Technology, CIESE, All Rights Reserved.
1 Searching through the Internet Dr. Eslam Al Maghayreh Computer Science Department Yarmouk University.
Analyzing your web site John Powell Director of Web Development.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
University of Minnesota Campus Event Finder Department of Computer Science and Engineering, University of Minnesota Presented by Murat Demiray & Mustafa.
Search Yahoo! With Boolean Operators AND, OR, (), “”, NOT, Domain:
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Finding and Evaluating Sources.  Online Catalog: Search ALL the resources of the library; access through library computers or remotely through Internet.
HOW BIG IS THE INTERNET? As of 2005, Internet size is estimated at 5 million terabytes: 5.
Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.
The Savvy Cyber Teacher ® Using the Internet Effectively in the K-12 Classroom 1Copyright © 2001 Stevens Institute of Technology, CIESE, All Rights Reserved.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.
LIS618 lecture 3 Thomas Krichel Structure of talk Document Preprocessing Basic ingredients of query languages Retrieval performance evaluation.
Scientific Sleuthing of Human Disease for High School Teachers Pathology Resources on the Web and How to Use Them.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Search Engine Marketing SEM = Search Engine Marketing SEO = Search Engine Optimization optimizing (altering/changing) your page in order to get a higher.
Shelly Warwick, MLS, Ph.D – Permission is granted to reproduce and edit this work for non-commercial educational use as long as attribution is provided.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Data Mining for Web Intelligence Presentation by Julia Erdman.
Search & Searchability. Presentation from David Hawking – CSIRO Ineffectual corporate search tools can be the biggest drag on employee productivity. Knowledge.
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Hypersearching the Web, Chakrabarti, Soumen Presented By Ray Yamada.
The Savvy Cyber Teacher ® Using the Internet Effectively in the K-12 Classroom 1 Copyright © 2003 Stevens Institute of Technology, CIESE, All Rights Reserved.
Meet the web: First impressions How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape.
Evaluation of the NSDL and Google for Obtaining Pedagogical Resources Frank McCown, Johan Bollen, and Michael L. Nelson Old Dominion University Computer.
Who Is Providing You Information? By Kristin Willmore RHSM Librarian.
Web Search Essentials. Search Engine  Search engines are specialized websites that can help you find what you're looking for.  popular ones— Google,
Information Retrieval Part 2 Sissi 11/17/2008. Information Retrieval cont..  Web-Based Document Search  Page Rank  Anchor Text  Document Matching.
“In the beginning -- before Google -- a darkness was upon the land.” Joel Achenbach Washington Post.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
What part of the URL tell the computer to find the server?
Linda Cooper – September 2007 Using the Internet Linda Cooper.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
Search Engine Optimization(S.E.O)
Better Living through Better Searching
My Growth in Public Presentation
FATMA ISMED K1-09 Websites in ELT.
Reach People when it matters with Location Extensions
Evaluating Web Resources
From taking notes to creating a bibliography
Holdstein & Aquiline, Chapter 5 Online Sources
Agreeing to Disagree: Search Engines and Their Public Interfaces
ما الذي يريد صاحب العمل أن يعرفه؟
Introduction to Information Retrieval
Searching the Internet
Finding Medical Information on the Internet
Presentation transcript:

Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science Wellesley College, USA

Precision and Recall in Traditional IR

Precision and Recall in Web IR High Precision is easy to achieve but does not convey useful information Recall is uninteresting and cannot be computed accurately because of the enormous size of the web 85% of Web Searchers never look past top-10!

But what is Quality?

Quality when searching controversial issues?

Quality when searching Political Issues? But Google is usually so good in finding info… Why does it do that?

Define Search Quality in a web-meaningful way Comprehensive Coverage = Lack of bias towards some search results For a controversial issue (at a minimum): cover the pro, con and balanced opinions For k opinions, and top-N results: expected # of results / opinion: N/k Coverage Bias = total distance from N/k

Define Search Quality in a web-meaningful way Comprehensive Coverage = Lack of bias towards some search results (bad coverage) 0 ≤ C ≤ 1 (good coverage) Now we can talk about, e.g., 60% coverage

Define Search Quality in a web-meaningful way Independent search results = Results that are not dependent due to spamming u: URL Dependency r: Redirection Dependency c: Content Dependency l: Link Dependency

Example of Dependent Results: Google’s “HGH benefits” Redirection dependencyURL dependency Table 1: Top-10 results of Google when given the query ”HGH benefits” for August, 2007 and September, For each entry we have calculated the size of the backGraph as (|V |, |E |) revealed by the Google API and the change between these two dates.

Example of Dependent Results: Yahoo’s “Is ADHD a real disease” Link dependency Content dependency Table 4: Top-10 results of the Yahoo search engine when given the query ”Is ADHD a real disease” (August and September, 2007).

Define Search Quality in a web-meaningful way Independent search results = Results that are not dependent due to spamming u: URL Dependency r: Redirection Dependency c: Content Dependency l: Link Dependency (total dependence) 0 ≤ ≤ 1 (total independence)

Evaluating Quality of 3 Search Results Query with commercial interest: “Human Growth Hormone (HGH) benefits” Query with medical interest: “Is ADHD a real disease?” Query with political interest: “Morality of abortions”

Evaluating Quality of 3 Search Results Coverage of GoogleCoverage of YahooIndependence Our result show low coverage for controversial questions that are not highly pursued and higher coverage for an issue that is highly pursued (“Abortion”). They also show high independence of results that are not highly pursued and higher independence for an issue that is highly pursued (“Abortion”). There is significant overlap between the top-10 returns of both Yahoo and Google results!

Comparing visible neighborhoods Google Yahoo Both

Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Department of Computer Science Wellesley College, USA Thank you!

Example of Dependent Results: Yahoo’s “HGH benefits”

Example of Dependent Results: Google’s “Is ADHD a real disease”

Example of Dependent Results: Google’s “Morality of Abortion”

Example of Dependent Results: Yahoo’s “Morality of Abortion”