Download presentation
Presentation is loading. Please wait.
Published byMilo Sutton Modified over 9 years ago
1
Meet the web: First impressions How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape of the web? How hard is it to go from one page to another? How do people search for information? Can we categorize web searchers? Differences b/w web search & Information Retrieval. Differences between global and local search. Differences between search and navigation.
2
How big is the web? Number of accessible web pages – May 2005 estimate: 11.5 Billion pages Most recent estimate? ________ The deep (or hidden or invisible) web “contains 400-550 times more information” that means __________ pages. Do others agree? Coverage (i.e. the proportion of the web indexed) is crucial for search engines. Today, ____________ pages are indexed
3
How do you measure the size of web? Capture-recapture method SE1 = # of pages indexed search engine 1. QSE2 = # of pages returned by search engine 2 for typical queries. OVR = # of pages returned by both search engines for typical queries. Estimate : SE1 / WWW = OVR / QSE2 => WWW = (SE1 x QSE2) / OVR SE1 OVR QSE2 WWW Lawrence & Giles: Searching the WWW
4
A B = (1/2) * Size A A B = (1/6) * Size B (1/2)*Size A = (1/6)*Size B Size A / Size B = (1/6)/(1/2) = 1/3 Sample URLs randomly from A Check if contained in B and vice versa A BA B Each test involves: (i) Sampling (ii) Checking (Assume for now that we can do them reliably) Relative Size from Overlap
5
How many people use the web? SEs? Over 10% of the world’s population were online as of late 2004. Today? ________ Number of broadband users is growing (over 50% of connected Americans use broadband). Search engine share as of June 2004: Google (41.6%), Yahoo! (31.5%), MSN (27.4%), AOL (13.6%), Ask Jeeves (7%) Today? _______ 200 million hits per day to Google (mid 2004). Today? ___
6
What is the shape of the web? “Map of the Internet” (1998)
7
Example Look at paths and strongly connected components
8
What is the shape of the web? Bow-tie shape of the web Broder et.al: Graph structure of the web (2000)
9
How hard is it to go from one page to another? Over 75% of the time there is no directed path from one random web page to another. When a directed path exists its average length is 16 clicks. When an undirected path exists its average length is 7 clicks. Short average path between pairs of nodes is characteristic of a small-world network. Kleiberg: The small-world phenomenon (we will revisit later)
10
How do people search for information? Direct navigation Enter the URL directly into the browser. Navigation within a directory Use a web portal as an entry point to the web. Information seeking on the web is problematic and more users are turning to search engines. Broder: A taxonomy of web search
11
How do people search for information? Query formulation Result selection Surfing Query modification
12
Can we categorize web searchers? Informational ____ % acquire some information about a topic from web pages. Navigational ____ % find a site to start navigation from. Transactional ____ % perform some activity mediated by a web site. Broder: A taxonomy of web search Think of your own searches. Do you agree? How did Broder found out these categories? How did he measure the percentages?
13
Web search vs. Info Retrieval The scale of web search is way beyond traditional information retrieval. The web is very dynamic. The web contains an enormous amount of duplication. The quality of web pages is not uniform. The range of topics on the web is open. The web is globally distributed. Users typical habits are different (short queries, inspect only top-10 pages). The web is hypertextual.
14
Differences b/w global & local search Local search engines on web sites have a bad reputation. Users often use a web search engine such as Google or Yahoo! to find information on web sites, rather than the local web site search engine. Many companies do not invest in local search. Content management is a problem. Language may be a problem. Information needs on web sites may be different.
15
Differences b/w search & navigation Search – employing a search engine to find information. Navigation (or surfing) – employing a link-following strategy to find information. The web encourages a combination of search, navigation and browsing.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.