Download presentation
Presentation is loading. Please wait.
Published byAdela O’Neal’ Modified over 9 years ago
1
Search engine note
2
Search Signals “Heuristics” which allow for the sorting of search results – Word based: frequency, position, … – HTML based: emphasis, Header – URI based: server name, URL – Page based: Not dependent on the Search term, but on the page features PageRank the most important Search results are a combination of these
3
Anchor text Other pages, images, documents, etc. are linked via “anchors” – E.g.,, etc Text around the anchor describes the linked page – UFOs are stealing our cows! These words index to the LINKED page
4
Search “algorithm” Single or multi-word – For every word in query Find the pages the word occurs on and compute – Group 1: Pages with all those words (intersection) – Group 2: Pages with any of those words (union) – For every page in the returned set Sort by formula – k1 * signal1 + k2 * signal2 + … +kn * signaln – (k’s sum to 1 is advantageous computationally)
5
Indexes Search index – For every page, what words occur on that page Plus “features” of word occurance (location, html, etc) Inverted (reverse) index – For every word, what pages it occurs on
6
Summary http://www.youtube.com/watch?v=fnSJBpB_ OKQ http://www.youtube.com/watch?v=fnSJBpB_ OKQ
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.