Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.

Slides:



Advertisements
Similar presentations
Database VS. Search Engine
Advertisements

Chapter 5: Introduction to Information Retrieval
Ziv Bar-YossefMaxim Gurevich Google and Technion Technion TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A AA.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
1 Web Search and Web Search Overlap: What the Deal? Amanda Spink Queensland University of Technology.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Evaluating Search Engine
Search Engines and Information Retrieval
Internet Resources Discovery (IRD) Search Engines Quality.
Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Introduction Web Development II 5 th February. Introduction to Web Development Search engines Discussion boards, bulletin boards, other online collaboration.
Web Archive Information Retrieval Miguel Costa, Daniel Gomes (speaker) Portuguese Web Archive.
Information Retrieval
How Does a Search Engine Work? Part 1 Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial-
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
Search Engines and their Public Interfaces: Which APIs are the Most Synchronized? Frank McCown and Michael L. Nelson Department of Computer Science, Old.
1 Web Developer Foundations: Using XHTML Chapter 11 Web Page Promotion Concepts.
Introductions Search Engine Development COMP 475 Spring 2009 Dr. Frank McCown.
Web Characterization: What Does the Web Look Like?
Search Engines and Information Retrieval Chapter 1.
Introduction to SEO August 2011 NowSourcing, Inc..
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
How Does a Search Engine Work? Part 1 Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
CIS 430 November 6, 2008 Emily Pitler. 3  Named Entities  1 or 2 words  Ambiguous meaning  Ambiguous intent 4.
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Query trends CS 349 Presentation December 2 nd, 2008 Catherine Grevet.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Information Retrieval Effectiveness of Folksonomies on the World Wide Web P. Jason Morrison.
Search Engines.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Meet the web: First impressions How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Search Tools and Search Engines Searching for Information and common found internet file types.
Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.
Evaluation of the NSDL and Google for Obtaining Pedagogical Resources Frank McCown, Johan Bollen, and Michael L. Nelson Old Dominion University Computer.
 Who Uses Web Search for What? And How?. Contribution  Combine behavioral observation and demographic features of users  Provide important insight.
Post-Ranking query suggestion by diversifying search Chao Wang.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 10 Evaluation.
Lecture 4 Access Tools/Searching Tools. Learning Objectives To define access tools To identify various access tools To be able to formulate a search strategy.
Sampath Jayarathna Cal Poly Pomona
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Dr. Frank McCown Comp 250 – Web Development Harding University
Evaluation Anisio Lacerda.
Evaluation of IR Systems
Jim Barton Librarian Glenside Public Library District
Instructor Name Instructor Title Library Name
Detecting Online Commercial Intention (OCI)
Data Mining Chapter 6 Search Engines
Anatomy of a Search Search The Index:
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Panagiotis G. Ipeirotis Luis Gravano
Searching for Truth: Locating Information on the WWW
INF 141: Information Retrieval
Presentation transcript:

Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 3.0 Unported LicenseCreative Commons Attribution-NonCommercial- ShareAlike 3.0 Unported License

How do you locate information on the Web? When seeking information online, one must choose the best way to fulfill one’s information need Most popular: – Web directories – Search engines – primary focus of this lecture – Social media

Web Directories Pages ordered in a hierarchy Usually powered by humans Yahoo started as a web directory in 1994 and still maintains one: Open Directory Project (ODP) is largest and is maintained by volunteers

Search Engines Most often used to fill an information need Pages are collected automatically by web crawlers Users enter search terms into text box get back a SERP (search engine result page) Queries are generally modified and resubmitted to the SE if the desired results are not found on the first few pages of results Types of search engines: – Web search engines (Google, Bing, Baidu)Baidu – Metasearch engines – includes Deep Web (Dogpile, WebCrawler)Dogpile WebCrawler – Specialized (or focused) search engines (Google Scholar, MapQuest)Google Scholar MapQuest

Components of a Search Engine Figure from Introduction to Information Retrieval by Manning et al., Ch 19.Introduction to Information Retrieval

Search query Paid results Organic results SERP Page title Text snippet

Social Media Increasingly being used to find info Limits influence of results to trusted group Figure: Nielsen study (August 2009)

Search Queries Search engines store every query, but companies usually don’t share with the public because of privacy issues – 2006 AOL search log incident 2006 AOL search log incident – 2006 govt subpoenas Google incident 2006 govt subpoenas Google incident Often short: 2.4 words on average 1 but getting longer 2 Most users do not use advanced features 1 Distribution of terms is long-tailed 3 1 Spink et al., Searching the web: The public and their queries,2001Searching the web: The public and their queries Lempel & Moran, WWW 2003

Search Queries 10-15% contain misspellings 1 Often repeated: Yahoo study 2 showed 1/3 of all queries are repeat queries, and 87% of users click on same result 1 Cucerzan & Brill, Teevan et al., History Repeats Itself: Repeat Queries in Yahoo's Logs, Proc SIGIR 2006

Query Classifications Informational – Intent is to acquire info about a topic – Examples: safe vehicles, albert einstein Navigational – Intent is to find a particular site – Examples: facebook, google Transactional – Intent is to perform an activity mediated by a website – Examples: children books, cheap flights Broder, Taxonomy of web search, SIGIR Forum, 2002

Determining Query Type It is impossible to know the user’s intent, but we can guess based on the result(s) selected Example: safe vehicles – Informational if selects web page about vehicle safety – Navigational if selects safevehicle.com – Transactional if selects web page that sells safe vehicles Requires access to SE’s transaction logs Broder, Taxonomy of web search, SIGIR Forum, 2002

Query Classifications Study by Jansen et al., Determining the User Intent of Web Search Engine Queries, WWW 2007 (link)link Analyzed transaction logs from 3 search engines containing 5M queries Findings – Informational: 80.6% – Navigational: 10.2% – Transactional: 9.2%

Google Trends

Google Flu Trends

Google Zeitgeist

My 2010 Top Queries, Sites, & Clicks

My 2010 Monthly, Daily, & Hourly Search Activity

Relevance Search engines are useful if they return relevant results Relevance is hard to pin down because it depends on user’s intent & context which is often not known Relevance can be increased by personalizing search results – What is the user’s location? – What queries has this user made before? – How does the user’s searching behavior compare to others? Two popular metrics are used to evaluate whether the results returned by a search engine are relevant: precision and recall

Precision and Recall Corpus Retrieved Relevant Overlap Precision = Overlap / Retrieved Recall = Overlap / Relevant

Example Given a corpus of 100 documents 20 are about football Search for football results in 50 returned documents, 10 are about football Precision = Overlap / Retrieved = 10/50 =.2 Recall = Overlap / Relevant = 10/20 =.5 Note: Usually precision and recall are at odds

High Precision, Low Recall Corpus RetrievedRelevant Precision = Overlap / Retrieved Recall = Overlap / Relevant Missing a lot of relevant docs!

Low Precision, High Recall Corpus Retrieved Relevant Precision = Overlap / Retrieved Recall = Overlap / Relevant Lots of irrelevant docs!

Evaluating Search Engines We don’t usually know how many documents on the entire Web are about a particular topic, so computing recall for a web search engine is not possible Most people view only the first page or two of search results, so the top N results are most important where N is 10 or 20 is the precision of the top N results

Comparing Search Engine with Digital Library McCown et al. 1 compared the of Google and the National Science Digital Library (NSDL) School teachers evaluated relevance of search results in regards to Virginia's Standards of Learning Overall, Google’s precision was found to be 38.2% compared to NSDL’s 17.1% 1 McCown et al., Evaluation of the NSDL and Google search engines for obtaining pedagogical resources, Proc ECDL 2005

F-score F-score combines precision and recall into single metric F-score is harmonic mean of precision and recall Highest = 1, Lowest = 0

Quality of Ranking Issue search query to SE and have humans rank the first N results in order of relevance Compare human ranking with SE ranking (e.g., Spearman rank-order correlation coefficient) Other ranking methods can be used – Discounted cumulative gain (DCG) 2 which gives higher ranked results more weight than lower ranked results – M measure 3 similar function as DCG which gives sliding scale of importance based on ranking 1 Vaughan, New measurements for search engine evaluation proposed and tested, Info Proc & Mang (2004) 2 Järvelin & Kekäläinen, Cumulated gain-based evaluation of IR techniques, TOIS (2004) 3 Bar-Ilan et al., Methods for comparing rankings of search engine results, Computer Networks (2006)

Other Sources Mark Levene, An Introduction to Search Engines and Web Navigation, Ch 2 & 4, 2010 Steven Levy, Exclusive: How Google’s Algorithm Rules the Web, Wired Magazine google_algorithm/ google_algorithm/