Download presentation
Presentation is loading. Please wait.
Published byEzra McCarthy Modified over 9 years ago
1
Ch. 13 Structure of the Web Padmini Srinivasan Computer Science Department Department of Management Sciences http://cs.uiowa.edu/~psriniva padmini-srinivasan@uiowa.edu
2
Origins Origins of WWW (1989/1990: http) – Sir Tim Berners-Lee & Robert Cailliau First prototype of browser: WorldWideWeb 1 st popular graphical browser: Mosaic (NCSA), Marc Andreessen and others – Mozilla -> Netscape -> Firefox Lynx 2000 Windows explorer WAIS, Gopher, Veronica, 1994: W3C 1993: 1 st World wide web conference 1995: Yahoo! 1998: Google 2006: Live Search -> Bing
3
Network Metaphor Information network: – Different from social network Notion of a logical document: different – Decentralized, over many computers – annotation Network metaphor: “inspired and non-obvious” Origins in hypertext – origins in citation nets Citation nets: distinctly temporal, web? – Citation maps (popular) co-citation; bibliographic coupling; H-index (Hirsch); g-index; f-index – Patents; legal cases (precedents); medical literature Indexes: cross-linkages; see also; wikipedia
4
Links/Associations Directed edges, – Friendship nets, name-recognition, business colleagues, collaboration [Erdos number, Bacon number], IM nets, email graphs etc. – paths, shortest paths… Associative memory Semantic nets aka Conceptual networks (free-association studies) Vannevar Bush “As We May Think” (1945) Atlantic Monthly. WW2. MEMEX (on web) – Associative connections between all of knowledge – Acknowledged by most – A way to rechannel human resources
5
Paths and Connectivity Connected graphs Path: sequence of nodes beginning at node X and ending at node Y. A directed graph is strongly connected if there is a path (directed of course) between every pair of its nodes. If it is not strongly connected, need to examine its ‘reachability’ properties. – Easier in an undirected graph: disconnected components – Directed? Find strongly connected components
6
Strongly Connected Component SCC in a directed graph is a subset of nodes such that – (1) every node in it has a path to every other node in it – (2) the subset is not a part of a larger set of nodes that has the same property. [So it is the largest such component] Why is it interesting to know about such components in the Web?
7
Bow-Tie Structure of the Web 1999 Andrei Broder (now Yahoo!), then Alta Vista SCC; IN; OUT; Tendrils; Tubes, Disconnected Macro-model – Properties of a reasonable model: Should have a succinct and fairly natural description Rooted in plausible macro-level process for creation of Web content Not require some prior static set of topics Should reflect many of the structural phenomenon observed in the Web
8
Similar Studies Donato et al. ACM TOIT, 2007. The Web as a Graph: How Far We Are Webbase, 200 Million Stanford crawl – 39% OUT; 11% IN; 13% Tendrils; 33% SCC (48 million) next SCC: 10 thousand!
9
Similar Studies Buriol et al. (includes Donato): Temporal analysis of Wikigraph.
10
Bow-Tie Why a single SCC? Why not two large ones? Any other explanations? – Interlinked world? – Hard to be disconnected? – What about a new page? Is the SCC static/fixed? How does it change? – Are links permanent? (2004: 25% remain after 1 year and 50% of pages stay the same; Ntoulas et al., 2004) Many naturally occurring graphs have a giant SCC – IM (nodes people, link message) almost all are in the SCC; median path length is 7,mean 6.6.
11
Bow-Tie: points to note Incomplete picture – Doesn’t tell you how this is generated, just that it is. – Macro model: Thematic collections; differences? Organization specific collections Regional: economic incentives/disincentives? Community based: education levels? Bipartite cliques (small sized – many in number) – Fans pointing to centers – Will it always be observed? How about now?
12
Web 2.0 “an attitude not a technology” – Collaboration/collective maintenance Annotation, tags, links, editing, revisions – Data generated by individuals for individual and group sharing; Flickr, Gmail. – Connections between entities beyond “documents”. Social feedback key; ‘wisdom of crowds’; long tail;
13
Web Links Navigational – static pages – passive services Transactional – dynamic / computational services. Deep web Search engines – heuristics – What kinds of rules would you use? – Implications for crawlers
14
Summary Web: origins, network metaphor – Citations, MEMEX Paths Structures (macro) – SCC – Bow-Tie model Next – Ch 14: Hubs and Authorities; PageRank
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.