Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ch. 13 Structure of the Web Padmini Srinivasan Computer Science Department Department of Management Sciences

Similar presentations


Presentation on theme: "Ch. 13 Structure of the Web Padmini Srinivasan Computer Science Department Department of Management Sciences"— Presentation transcript:

1 Ch. 13 Structure of the Web Padmini Srinivasan Computer Science Department Department of Management Sciences http://cs.uiowa.edu/~psriniva padmini-srinivasan@uiowa.edu

2 Origins Origins of WWW (1989/1990: http) – Sir Tim Berners-Lee & Robert Cailliau First prototype of browser: WorldWideWeb 1 st popular graphical browser: Mosaic (NCSA), Marc Andreessen and others – Mozilla -> Netscape -> Firefox Lynx 2000 Windows explorer WAIS, Gopher, Veronica, 1994: W3C 1993: 1 st World wide web conference 1995: Yahoo! 1998: Google 2006: Live Search -> Bing

3 Network Metaphor Information network: – Different from social network Notion of a logical document: different – Decentralized, over many computers – annotation Network metaphor: “inspired and non-obvious” Origins in hypertext – origins in citation nets Citation nets: distinctly temporal, web? – Citation maps (popular) co-citation; bibliographic coupling; H-index (Hirsch); g-index; f-index – Patents; legal cases (precedents); medical literature Indexes: cross-linkages; see also; wikipedia

4 Links/Associations Directed edges, – Friendship nets, name-recognition, business colleagues, collaboration [Erdos number, Bacon number], IM nets, email graphs etc. – paths, shortest paths… Associative memory Semantic nets aka Conceptual networks (free-association studies) Vannevar Bush “As We May Think” (1945) Atlantic Monthly. WW2. MEMEX (on web) – Associative connections between all of knowledge – Acknowledged by most – A way to rechannel human resources

5 Paths and Connectivity Connected graphs Path: sequence of nodes beginning at node X and ending at node Y. A directed graph is strongly connected if there is a path (directed of course) between every pair of its nodes. If it is not strongly connected, need to examine its ‘reachability’ properties. – Easier in an undirected graph: disconnected components – Directed? Find strongly connected components

6 Strongly Connected Component SCC in a directed graph is a subset of nodes such that – (1) every node in it has a path to every other node in it – (2) the subset is not a part of a larger set of nodes that has the same property. [So it is the largest such component] Why is it interesting to know about such components in the Web?

7 Bow-Tie Structure of the Web 1999 Andrei Broder (now Yahoo!), then Alta Vista SCC; IN; OUT; Tendrils; Tubes, Disconnected Macro-model – Properties of a reasonable model: Should have a succinct and fairly natural description Rooted in plausible macro-level process for creation of Web content Not require some prior static set of topics Should reflect many of the structural phenomenon observed in the Web

8 Similar Studies Donato et al. ACM TOIT, 2007. The Web as a Graph: How Far We Are Webbase, 200 Million Stanford crawl – 39% OUT; 11% IN; 13% Tendrils; 33% SCC (48 million) next SCC: 10 thousand!

9 Similar Studies Buriol et al. (includes Donato): Temporal analysis of Wikigraph.

10 Bow-Tie Why a single SCC? Why not two large ones? Any other explanations? – Interlinked world? – Hard to be disconnected? – What about a new page? Is the SCC static/fixed? How does it change? – Are links permanent? (2004: 25% remain after 1 year and 50% of pages stay the same; Ntoulas et al., 2004) Many naturally occurring graphs have a giant SCC – IM (nodes people, link message) almost all are in the SCC; median path length is 7,mean 6.6.

11 Bow-Tie: points to note Incomplete picture – Doesn’t tell you how this is generated, just that it is. – Macro model: Thematic collections; differences? Organization specific collections Regional: economic incentives/disincentives? Community based: education levels? Bipartite cliques (small sized – many in number) – Fans pointing to centers – Will it always be observed? How about now?

12 Web 2.0 “an attitude not a technology” – Collaboration/collective maintenance Annotation, tags, links, editing, revisions – Data generated by individuals for individual and group sharing; Flickr, Gmail. – Connections between entities beyond “documents”. Social feedback key; ‘wisdom of crowds’; long tail;

13 Web Links Navigational – static pages – passive services Transactional – dynamic / computational services. Deep web Search engines – heuristics – What kinds of rules would you use? – Implications for crawlers

14 Summary Web: origins, network metaphor – Citations, MEMEX Paths Structures (macro) – SCC – Bow-Tie model Next – Ch 14: Hubs and Authorities; PageRank


Download ppt "Ch. 13 Structure of the Web Padmini Srinivasan Computer Science Department Department of Management Sciences"

Similar presentations


Ads by Google