Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 How Search Engines Work? Ziv Bar-Yossef Department of Electrical Engineering Technion.

Similar presentations


Presentation on theme: "1 How Search Engines Work? Ziv Bar-Yossef Department of Electrical Engineering Technion."— Presentation transcript:

1 1 How Search Engines Work? Ziv Bar-Yossef Department of Electrical Engineering Technion

2 2 What is the Internet? A global network of computers connected to each other Computers “talk” to each other using standard protocols  TCP/IP

3 3 What is the World-Wide Web (WWW)? Collection of pages available via the Internet  Internet users can view pages with web browsers  WWW is only one application of the Internet  Other applications: email, messengers, VOIP, newsgroups, ftp

4 4 Web Pages Various formats  pdf, word, excel, images, mp3, video, text Most popular format: HTML  HTML pages point to each other using hyperlinks  Users “surf the web” by clicking hyperlinks

5 5 What are Search Engines? Users have “information needs”  Where can I find solutions to my math homework problem?  Where can I find mp3s of Miri Messika’s latest album?  What is the weather in Eilat in Channuka?  What other Sharons are famous except for our prime minister? Search engines enable us to find web pages that match our information needs

6 6 What other Sharons are famous, except for our prime minister? Search Engines query User “Information Need” sharon -ariel 1.Sharon Creech 2.Sharon Stone 3.Sharon, Massachusetts Ranked list of matching pages Search Engine Search Engine Web pages Web

7 7 How Search Engines (don’t) Work? query User sharon -ariel 1.Sharon Creech 2.Sharon Stone 3.Sharon, Massachusetts Ranked list of matching pages Web pages Common misconception: when user submits a query, the search engine scans all web pages to find the relevant matches Search Engine Search Engine Web

8 8 How Search Engines Work? query User 1.Sharon Creech 2.Sharon Stone 3.Sharon, Massachusetts Ranked list of matching pages Web pages What do you do when you look for a term in an encyclopedia?  Use the index! Web Search Engine index sharon -ariel

9 9 Search Engine Architecture Crawler Search Engine Index Ranking Algorithm Ranking Algorithm Query Processor Query Processor

10 10 Web Crawler (a.k.a. Spider) Fetches web pages and stores them in a local repository Tries to get as many web pages as possible Follows hyperlinks to learn about new pages Refetches pages that change frequently

11 11 The Index Ariel 1 Sharon 2, the 3 prime 4 minister 5 of 6 Israel 7 founded 8 a 9 new 10 political 11 party 12. Sharon 1 Stone 2 dressed 3 a 4 new 5 Jean 6 Paul 7 Gaultier 8 gown 9 at 10 the 11 Oscars 12 after 13 party 14. www.cnn.com ariel:(cnn.com,1) dress:(hollywood.com,3) found:(cnn.com,8) gaultier:(hollywood.com,8) gown:(hollywood.com,9) israel:(cnn.com,7) jean:(hollywood.com,6) minister:(cnn.com,5) new:(cnn.com,7), (hollywood.com, 5) oscar:(hollywood.com,12) party:(cnn.com,12), (hollywood.com,14) paul:(hollywood.com,7) political:(cnn.com,11) prime:(cnn.com,4) sharon:(cnn.com,2), (hollywood.com,1) stone:(hollywood.com,2) Index www.hollywood.com

12 12 Index by “Anchor Text” Anchor text: what’s written inside a linkinside a link  Example: Ariel Sharon, the prime minister…Ariel Sharon Usually succinctly describes what’s written in the linked page By which terms a page is listed in the index?  Terms that appear in the page  Terms that appear in anchor text of links to the page

13 13 Query Processor Gets a user query Fetches relevant posting lists from index Extracts relevant matches from lists Example: Query = “sharon –ariel”  L 1  posting list of sharon sharon: (cnn.com,2), (hollywood.com,1)  L 2  posting list of ariel ariel: (cnn.com,1)  Return all pages in L 1 that do not occur in L 2 cnn.com

14 14 Ranking Algorithm Many queries have many matching pages  472 million matches for “London” in Google Cannot return all of them to the user  User needs the most relevant results anyway Need to order results by relevance  Most relevant results are at the top Ranking algorithm: a method of ordering matches  The “heart” of a search engine  The reason why Google is the most preferred search engine today

15 15 Google’s PageRank Ranking  Elections  Candidates: all web pages  Voters: all web pages  p votes to q, if p has a hyperlink to q. Favorites(p) = all the pages p votes for. Fans(p) = all the pages that vote for p.  1 if p has no fans

16 16 Google’s PageRank Underlying principles:  A page is “important” if it has important fans  A page splits its “importance” evenly among its favorite pages. 1 1 1 1 1.5 2.5 4

17 17 Google’s PageRank Ranking algorithm:  Find pages that match the given query  Order them by their PageRank  Return top 10 matches

18 18 But…PageRank Not Always Works SPAM

19 19 Conclusions Search engines use index to answer user queries Ranking is the most important component Spam is a problem

20 20 Thank You


Download ppt "1 How Search Engines Work? Ziv Bar-Yossef Department of Electrical Engineering Technion."

Similar presentations


Ads by Google