Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 115: COMPUTING FOR THE SOCIO-TECHNO WEB FINDING INFORMATION WITH SEARCH ENGINES.

Similar presentations


Presentation on theme: "CS 115: COMPUTING FOR THE SOCIO-TECHNO WEB FINDING INFORMATION WITH SEARCH ENGINES."— Presentation transcript:

1 CS 115: COMPUTING FOR THE SOCIO-TECHNO WEB FINDING INFORMATION WITH SEARCH ENGINES

2 SEARCHING VS SURFING Search = employing a search engine to find information. Surfing (or navigating) = employing a link-following strategy to find information. The web encourages a combination of search and navigation.

3 SURFING THE WEB

4 THE WEB IS A DIRECTED GRAPH Like a map of a country with cities and one-way roads Directed Graph of Nodes and Arcs (one-way connections) Nodes = web pages Arcs = hyperlinks from a page to another Why is this cool? Because… it can be explored it can be indexed

5 GOOGLE

6 HOW SEARCH ENGINES WORK A E C D B The Web Web spider Indexer Indexes

7 HOW SEARCH ENGINES WORK A E C D B The Web Ad indexes Web spider Indexer Indexes Search User

8 BINARY SEARCH Halve things each time

9 MECHANICS OF A TYPICAL SEARCH

10 WEB DIRECTORIES ORGANIZE INFORMATION IN CATEGORIES WITH HUMAN HELP

11 WHAT ARE YOU TRYING TO FIND? Types of queries: Informational – want to learn about something Navigational – want to go to that page Transactional – want to do something (web-mediated) Access a service Downloads Shop Gray areas Find a good hub (resource collection) Exploratory search “see what’s there” Peripheral neuropathy Wellesley College Wellesley weather Mars surface images Nikon SLR camera car rental Boston morality of abortion

12 HOW FAR DO YOU LOOK FOR RESULTS?

13

14 DIVERSITY IN CONTENT Languages Hundreds of languages (2001) Home pages (1997): English 82%, Next 15 languages: 13% Google’s index (mid 2001): English: 53%, JGCFSKRIP: 30% This trend is expected to continue today Popular Query Topics (from 1 million Google queries, Apr 2000) 1.8%Regional: Europe7.2%Business ………… 2.3%Business: Industries7.3%Recreation 3.2%Computers: Internet8%Adult 3.4%Computers: Software8.7%Society 4.4%Adult: Image Galleries10.3%Regional 5.3%Regional: North America13.8%Computers 6.1%Arts: Music14.6%Arts

15

16

17 QUESTIONS ABOUT THE WEB How big is the Web? How many people use the Web? How many people use search engines? How hard is it to go from one page to another through clicks? What is the shape of the Web?

18 HOW BIG IS THE WEB? Number of accessible web pages (the visible web) Google claims to have encountered 1 trillion unique URLs (though in the past claimed to have indexed 26.6 billion pages Yahoo claims to have indexed 55 billion pages Cuil claims to have indexed 120 billion pages The deep web (or hidden or invisible web) “contains 400-550 times more information” Coverage (i.e. the proportion of the web indexed) is crucial for search engines. Today, less than 15% pages are indexed!

19 CAN YOU MEASURE THE SIZE OF WEB? How do you count fish on a lake? Lincoln-Petersen Method aka: Capture-Mark-Recapture method M = # of fish captured and marked; released. C= # of fish returned in second visit. R = # of marked fish in second visit. Estimate : M/N = R/C => N= (M x C) / R M R C N

20 CAN YOU MEASURE THE SIZE OF WEB? Capture-Mark-Recapture method SE1 = # of pages indexed search engine 1. QSE2 = # of pages returned by search engine 2 for typical queries. OVR = # of pages returned by both search engines for typical queries. Estimate : SE1 / WWW = OVR / QSE2 => WWW = (SE1 x QSE2) / OVR SE1 OVR QSE2 WWW

21 HOW MANY PEOPLE USE THE WEB? 87% of the American online 70% of Americans use broadband at home 68.0% of Americans a ccess the internet on a cell phone, tablet, or other mobile device 39% of the world had internet access in 2013 What does this tell you about the importance of the Web?

22 HOW MANY PEOPLE USE SEARCH ENGINES? 49% of all internet users use a search engine on a daily basis 622 million queries per day ( 18.6 billion searches in April, 2014) Search engine usage as of June 2004: Google (41.6%), Yahoo! (31.5%), MSN (27.4%), AOL (13.6%), Ask (7%) Search engine usage as of April 2014: Google (67.6%), Yahoo! (10%), MSN (18.7%), AOL (1.3%), Ask (2.4%) What does this tell you about the importance of the Search Engines?

23 HOW HARD IS IT TO SURF FROM ONE PAGE TO ANOTHER? Over 75% of the time there is no directed path from one random web page to another. When a directed path exists its average length is 16 clicks. Short average path between pairs of nodes is characteristic of a small-world network.

24 WHAT IS THE SHAPE OF THE WEB? “Map of the Internet” (1998)

25 WHAT DOES THE WEB LOOK LIKE? BOW-TIE SHAPE OF THE WEB

26 A CONSTRUCTIVE ALGORITHM TO PROVE THAT THE WEB IS A BOWTIE Start with disconnected Web pages Examine the shape after 1 link/page is considered Bowtie appears after the 2 nd link per page is considered After that, the Bowtie shape gets stronger

27 AFTER ONE LINK IS CONSIDERED A collection of pseudo-trees

28 AFTER A SECOND LINK IS CONSIDERED A collection of bowties

29 WHEN MORE LINKS ARE INCLUDED… Consider the combinations of links within the same bowtie between bowties

30 CORRECT THE SHAPE OF THE WEB Bowties are everywhere!

31 EXERCISES 1-Draw a web graph of the course class website. 2-Handout

32 Crawling starting point HOW ABOUT THE CLASS WEB?

33 Crawling starting point CAN SE’S COVER ALL THE WEB? Put a starting Web page in a queue Q & repeat: Pick up a page P from the queue, Crawl P, and Put on the queue each page reachable from P


Download ppt "CS 115: COMPUTING FOR THE SOCIO-TECHNO WEB FINDING INFORMATION WITH SEARCH ENGINES."

Similar presentations


Ads by Google