Download presentation
Presentation is loading. Please wait.
1
February 17, 20111
2
2
3
There is no practical obstacle whatever now to the creation of an efficient index to all human knowledge, ideas and achievements, to the creation, that is, of a complete planetary memory for all mankind. And not simply an index; the direct reproduction of the thing itself can be summoned to any properly prepared spot. … This in itself is a fact of tremendous significance. It foreshadows a real intellectual unification of our race. The whole human memory can be, and probably in a short time will be, made accessible to every individual. H. G. Wells (1937) February 17, 20113
4
One of the facilities or services provided by certain of the computers on the Internet A logical network of web pages that need not be on physically connected computers February 17, 20114
5
5 http://www.ksg.harvard.edu/ http://www.president.harvard.edu/ http://www.news.harvard.edu/gazette/… http://www.harvard.edu http://www.brighamandwomens.org/PressReleases/… http://www.harvard.edu
6
February 17, 20116 Request “www.president.harvard.edu” Receive html code Your computer Harvard’s computer URL = Uniform Resource Locator The Internet
7
February 17, 20117 We know where you are!
8
8February 25, 2010
9
… search companies log your searches … February 17, 20119
10
February 22, 2010 10
11
February 17, 201111
12
Finding pages referring to the search terms Deciding which pages are the most “relevant” February 17, 201112
13
1. Build an index ahead of time February 17, 201113 EddingtonURL, URL, … EdisonURL, URL, … EdmontonURL, URL, … 2.When queried, look up in the index
14
Google “crawls” the entire Web, following links and loading the pages they point to Every time it retrieves a page, it indexes everything on the page maybe keep a “cached” copy of the page A complete crawl probably takes a week or two Opt-out Caching and copyrights? February 17, 201114
15
Primary storage: Silicon memory chips Up to a gigabit or more Random-access: same time for any datum February 17, 201115
16
February 17, 201116
17
Seek delay Rotational latency February 17, 201117
18
Primary: approaching 1 ns = 10 -9 sec Secondary: seek time 5 ms = 5·10 -3 sec Secondary is (5·10 -3 )/10 -9 = 5 million times slower Imagine a bookshelf is primary memory and getting a book takes 10 sec Getting book from secondary storage would take more than a year and a half February 17, 201118
19
February 17, 201119
20
Works only if items are in order same amount of time to access any item Then it takes at most lg n steps to find an item in a table of length n. E.g. n = 1 billion => lg n steps = 30 steps February 17, 201120
21
February 17, 201121 EddingtonURL, URL, … EdisonURL, URL, … EdmontonURL, URL, … Eddington Edison Edmonton Primary Memory Secondary Memory The LexiconThe Lists of Pages
22
Many, many tricks to compress both the index and the lists of URLs Notes show how a lexicon with 25 million entries might fit in 16GB of primary storage The lists of URLs might be vastly greater but OK as long as it takes only one disk access to get back a lot of URLs February 17, 201122
23
Hugely important commercially Page rank is really a new kind of capital People try to “spoof” ranking algorithms Search engineers try to detect and discount spoofing Endless game of cat and mouse … February 17, 201123
24
February 17, 201124 Probably wrong. Also easy to spoof
25
www.holdthisspear.co.ukThedailddoozy.comabout.com
26
February 22, 2010 26
27
Circular? Not really. Can calculate a consistent meaning of “importance” where every page’s importance is the sum of the importance of the pages pointing to it Like scholarly citations of scholarly papers February 17, 201127
28
February 17, 201128
29
Web surfing metric If you wander the web at random, how likely are you to wind up at a given page? Page A is more higher ranked than page B if you are more likely to wind up at A during a completely random meandering through the web February 17, 201129
30
Mission: “to organize the world's information and make it universally accessible and useful.” Brin: “The perfect search engine would understand exactly what you mean and give back exactly what you want” February 25, 201030
31
February 25, 201031
32
February 25, 201032
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.