Meet the web: First impressions How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape.

Slides:



Advertisements
Similar presentations
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.
Advertisements

The Structure of the Web Mark Levene (Follow the links to learn more!)
Scale Free Networks.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
Asking Questions on the Internet
1 Web Search and Web Search Overlap: What the Deal? Amanda Spink Queensland University of Technology.
Over 60% of the U.S. population is online with over 170 million users in the United States! The Internet is viewed more than the newspaper industry and.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Sigir’99 Inside Internet Search Engines: Fundamentals Jan Pedersen and William Chang.
Searching the Web II. The Web Why is it important: –“Free” ubiquitous information resource –Broad coverage of topics and perspectives –Becoming dominant.
1 ETT 429 Spring 2007 Microsoft Publisher II. 2 World Wide Web Terminology Internet Web pages Browsers Search Engines.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
1 Our Web Part 0: Overview COMP630L Topics in DB Systems: Managing Web Data Fall, 2007 Dr Wilfred Ng.
SEARCH ENGINES By, CH.KRISHNA MANOJ(Y5CS021), 3/4 B.TECH, VRSEC. 8/7/20151.
Internet Research Search Engines & Subject Directories.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
SEARCHING ON THE INTERNET
Effective Internet Searching. Why use the Internet Search for a question Research a topic Current research Variety of sources, a click away What other.
BTREE Indices A little context information What’s the purpose of an index? Example of web search engines Queries do not directly search the WWW for data;
SEARCH ENGINE By Ms. Preeti Patel Lecturer School of Library and Information Science DAVV, Indore E mail:
Internet. Internet is Is a Global network Computers connected together all over that world. Grew out of American military.
Browser Wars and the Politics of Search Engines
Search Engine Marketing Shelly Brown Director of Web Services Southwest Baptist University.
CS246 Web Characteristics. Junghoo "John" Cho (UCLA Computer Science)2 Web Characteristics What is the Web like? Any questions on some of the characteristics.
Yahoo! Acquires Inktomi March 19 th, Yahoo!
The Internet By Amal Wali 10DD. Contents  What is the Internet? What is the Internet?  Who owns the Internet? Who owns the Internet?  How do you connect.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Accessing the Deep Web Bin He IBM Almaden Research Center in San Jose, CA Mitesh Patel Microsoft Corporation Zhen Zhang computer science at the University.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
 Search Engine Search Engine  Steps to Search for webpages pertaining to a specific information Steps to Search for webpages pertaining to a specific.
Search Yahoo! With Boolean Operators AND, OR, (), “”, NOT, Domain:
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Measuring the Size of the Web Dongwon Lee, Ph.D. IST 501, Fall 2014 Penn State.
Internet Research Tips Daniel Fack. Internet Research Tips The internet is a self publishing medium. It must be be analyzed for appropriateness of research.
PEERSPECTIVE.MPI-SWS.ORG ALAN MISLOVE KRISHNA P. GUMMADI PETER DRUSCHEL BY RAGHURAM KRISHNAMACHARI Exploiting Social Networks for Internet Search.
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Measuring the Internet Age™ comScore qSearch™ The Search Landscape Industry Highlights & Trends For SES James Lamberti Senior Vice President, comScore.
The World Wide Web: Information Resource. Hock, Randolph. The Extreme Searcher’s Internet Handbook. 2 nd ed. CyberAge Books: Medford. (2007). Internet.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
Internet and WWW. Internet Network linking computers to other computers Access to numerous resources – Communications systems Instant messaging.
Internet Research – Illustrated, Fourth Edition Unit A.
Advancing Science: OSTI’s Current and Future Search Strategies Jeff Given IT Operations Manager Computer Protection Program Manager Office of Scientific.
The World Wide Web: Information Resource. How a Search Engine works… How Search Works - YouTube
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Sigir’99 Inside Internet Search Engines: Spidering and Indexing Jan Pedersen and William Chang.
8/31: Ch. 1 The Internet & WWW What is the Internet? What is the WWW? –Browser basics What is a search engine? What search engines are used today? Images.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
Week-6 (Lecture-1) Publishing and Browsing the Web: Publishing: 1. upload the following items on the web Google documents Spreadsheets Presentations drawings.
Seminar on seminar on Presented By L.Nageswara Rao 09MA1A0546. Under the guidance of Ms.Y.Sushma(M.Tech) asst.prof.
Searching the Web for academic information Ruth Stubbings.
The Web Web Design. 3.2 The Web Focus on Reading Main Ideas A URL is an address that identifies a specific Web page. Web browsers have varying capabilities.
CS 115: COMPUTING FOR THE SOCIO-TECHNO WEB FINDING INFORMATION WITH SEARCH ENGINES.
SEARCH ENGINE by: by: B.Anudeep B.Anudeep Y5CS016 Y5CS016.
The World Wide Web.
Chapter 10: Web Basics.
Chapter 10: Web Basics.
Internet Searching: Finding Quality Information
CIW Lesson 6 Web Search Engines.
THE INTERNET.
Search Engines & Subject Directories
Search Engines & Subject Directories
Search Engines & Subject Directories
Information Search Week 4.
Internet Vocabulary Beth Felton McKelvey.
Presentation transcript:

Meet the web: First impressions How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape of the web? How hard is it to go from one page to another? How do people search for information? Can we categorize web searchers? Differences b/w web search & Information Retrieval. Differences between global and local search. Differences between search and navigation.

How big is the web? Number of accessible web pages – May 2005 estimate: 11.5 Billion pages Most recent estimate? ________ The deep (or hidden or invisible) web “contains times more information” that means __________ pages. Do others agree? Coverage (i.e. the proportion of the web indexed) is crucial for search engines. Today, ____________ pages are indexed

How do you measure the size of web? Capture-recapture method SE1 = # of pages indexed search engine 1. QSE2 = # of pages returned by search engine 2 for typical queries. OVR = # of pages returned by both search engines for typical queries. Estimate : SE1 / WWW = OVR / QSE2 => WWW = (SE1 x QSE2) / OVR SE1 OVR QSE2 WWW Lawrence & Giles: Searching the WWW

A  B = (1/2) * Size A A  B = (1/6) * Size B (1/2)*Size A = (1/6)*Size B  Size A / Size B = (1/6)/(1/2) = 1/3 Sample URLs randomly from A Check if contained in B and vice versa A BA B Each test involves: (i) Sampling (ii) Checking (Assume for now that we can do them reliably) Relative Size from Overlap

How many people use the web? SEs? Over 10% of the world’s population were online as of late Today? ________ Number of broadband users is growing (over 50% of connected Americans use broadband). Search engine share as of June 2004: Google (41.6%), Yahoo! (31.5%), MSN (27.4%), AOL (13.6%), Ask Jeeves (7%) Today? _______ 200 million hits per day to Google (mid 2004). Today? ___

What is the shape of the web? “Map of the Internet” (1998)

Example Look at paths and strongly connected components

What is the shape of the web? Bow-tie shape of the web Broder et.al: Graph structure of the web (2000)

How hard is it to go from one page to another? Over 75% of the time there is no directed path from one random web page to another. When a directed path exists its average length is 16 clicks. When an undirected path exists its average length is 7 clicks. Short average path between pairs of nodes is characteristic of a small-world network. Kleiberg: The small-world phenomenon (we will revisit later)

How do people search for information? Direct navigation Enter the URL directly into the browser. Navigation within a directory Use a web portal as an entry point to the web. Information seeking on the web is problematic and more users are turning to search engines. Broder: A taxonomy of web search

How do people search for information? Query formulation Result selection Surfing Query modification

Can we categorize web searchers? Informational ____ % acquire some information about a topic from web pages. Navigational ____ % find a site to start navigation from. Transactional ____ % perform some activity mediated by a web site. Broder: A taxonomy of web search Think of your own searches. Do you agree? How did Broder found out these categories? How did he measure the percentages?

Web search vs. Info Retrieval The scale of web search is way beyond traditional information retrieval. The web is very dynamic. The web contains an enormous amount of duplication. The quality of web pages is not uniform. The range of topics on the web is open. The web is globally distributed. Users typical habits are different (short queries, inspect only top-10 pages). The web is hypertextual.

Differences b/w global & local search Local search engines on web sites have a bad reputation. Users often use a web search engine such as Google or Yahoo! to find information on web sites, rather than the local web site search engine. Many companies do not invest in local search. Content management is a problem. Language may be a problem. Information needs on web sites may be different.

Differences b/w search & navigation Search – employing a search engine to find information. Navigation (or surfing) – employing a link-following strategy to find information. The web encourages a combination of search, navigation and browsing.