From Memex to Google in 120 minutes Rivka Taub Amit Levin.

Slides:



Advertisements
Similar presentations
The Inside Story Christine Reilly CSCI 6175 September 27, 2011.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Web Search – Summer Term 2006 VI. Web Search - Indexing (c) Wolfgang Hürst, Albert-Ludwigs-University.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Architecture of the 1st Google Search Engine SEARCHER URL SERVER CRAWLERS STORE SERVER REPOSITORY INDEXER D UMP L EXICON SORTERS ANCHORS URL RESOLVER (CF.
Presentation of Anatomy of a Large-Scale Hypertextual Web Search Engine by Sergey Brin and Lawrence Page (1997) Presenter: Scott White.
Anatomy of a Large-Scale Hypertextual Web Search Engine (e.g. Google)
Outline of presentation 1. Vannevar Bush – CV 2. The after-war situation 3. The Memex 4. Steven Johnson on Bush 5. Discussion point 6. Links (Vannevar.
Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Anatomy of a Large-Scale Hypertextual Web Search Engine ECE 7995: Term.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page Distributed Systems - Presentation 6/3/2002 Nancy Alexopoulou.
Google and Scalable Query Services
1 The anatomy of a Large Scale Search Engine Sergey Brin,Lawrence Page Dept. CS of Stanford University.
Internet Basics مهندس / محمد العنزي
SEARCH ENGINE By Ms. Preeti Patel Lecturer School of Library and Information Science DAVV, Indore E mail:
The WWW and HTML CMPT 281. Outline Hypertext The Internet The World-Wide-Web How the WWW works Web pages Markup HTML.
The Anatomy of a Large- Scale Hypertextual Web Search Engine Sergey Brin, Lawrence Page CS Department Stanford University Presented by Md. Abdus Salam.
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
Lecturer: Ghadah Aldehim
How did the internet develop?. What is Internet? The internet is a network of computers linking many different types of computers all over the world.
The Anatomy of a Large-Scale Hypertextual Web Search Engine By Sergey Brin and Lawrence Page Presented by Joshua Haley Zeyad Zainal Michael Lopez Michael.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Presented By: Sibin G. Peter Instructor: Dr. R.M.Verma.
Anatomy of a search engine Design criteria of a search engine Architecture Data structures.
Introduction To Internet
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Overview What is a Web search engine History Popular Web search engines How Web search engines work Problems.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Search Xin Liu. 2 Searching the Web for Information How a Search Engine Works –Basic parts: 1.Crawler: Visits sites on the Internet, discovering Web pages.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
Gregor Gisler-Merz How to hit in google The anatomy of a modern web search engine.
Google Search Engine
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin & Lawrence Page Presented by: Siddharth Sriram & Joseph Xavier Department of Electrical.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
As We May Think Vannevar Bush Presented by:Eylon Caspi AJ Shankar Jingtao Wang CS294 Reading The Classics 9/21/04.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Kevin Mauricio Apaza Huaranca San Pablo Catholic University.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
1 UNIT 13 The World Wide Web Lecturer: Kholood Baselm.
The World Wide Web: Information Resource. Hock, Randolph. The Extreme Searcher’s Internet Handbook. 2 nd ed. CyberAge Books: Medford. (2007). Internet.
Search Engine and SEO Presented by Yanni Li. Various Components of Search Engine.
Search Xin Liu.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
More New Media Information Technology and Social Life Feb. 4, 2005.
Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University.
The anatomy of a Large-Scale Hypertextual Web Search Engine.
1 Google: Case Study cs430 lecture 15 03/13/01 Kamen Yotov.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
1 CS 430: Information Discovery Lecture 20 Web Search Engines.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
Week-6 (Lecture-1) Publishing and Browsing the Web: Publishing: 1. upload the following items on the web Google documents Spreadsheets Presentations drawings.
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
General Architecture of Retrieval Systems 1Adrienn Skrop.
The Anatomy of a Large-Scale Hypertextual Web Search Engine (The creation of Google)
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
The Anatomy of a Large-Scale Hyper-textual Web Search Engine 전자전기컴퓨터공학과 G 김영제 Database Lab.
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
SEARCH ENGINE by: by: B.Anudeep B.Anudeep Y5CS016 Y5CS016.
Chapter 2: How Search Engines Work. Chapter Objectives Describe the PageRank formula for calculating a webpage’s popularity. Determine how a search engine.
The Anatomy Of A Large Scale Search Engine
The Anatomy of a Large-Scale Hypertextual Web Search Engine
WIRED Week 2 Syllabus Update Readings Overview.
Information Retrieval
Anatomy of a search engine
Data Mining Chapter 6 Search Engines
Web Search Engines.
Information Retrieval and Web Design
Presentation transcript:

From Memex to Google in 120 minutes Rivka Taub Amit Levin

“As We May think” By Vannevar Bush A Paper that talks about the Future

Vannevar- Bush: Biography Vannevar-Bush ( )

* Was Born in Massachusetts * Studied engineering in Tuft college * Earned his bachelor and master degree in 1913 * Earned his doctorate of engineering at 1917 Vannevar- Bush: Biography

Vannevar-Bush ( ) * In 1919, Bush joined MIT’s electrical engineering department, and had stayed there for 25 years. * Completed the differential analyzer in 1931 * During the 1930s, worked on technology for document retrieval and information organization (used microfilm) * In 1938, designed and built the microfilm rapid selector, rumored to have been used for cryptanalysis during WWII Vannevar- Bush: Biography

Vannevar-Bush ( ) Vannevar- Bush: Biography * Was the planner and chairman of a committee that brought together government, military, business and scientists (NDRC) * Supervised the Manhattan project which developed the first atomic bomb * In reply to President Roosevelt’s request for post-war direction, published the articles “As We May Think” (1945) and ”Science the Endless Frontier” (1945) * Served as the chairman of the MIT Corporation * Continued pushing for analog computers, as digital computers rose to prominence

Bush’s Vision: By Science For Science Bush’s Vision Organizing the information: by science, for science

The Record-Technological Predictions Improved microfilm Storage Acquisition Dry Photography Dictation Technology Head-mounted camera By Science For Science Tech Predictions

Technological Predictions- The Record Retrieval Calculation And Automation Machines will manipulate and analyze data Calculuation of “advanced math” and logical thought Microfilm rapid selector By Science For Science Tech Predictions

Microfilm Rapid Selector * Microfilm storage was popular during the 1920s and 1930s * The problem: Selecting documents * Option: Punched-cards. BUT they are too slow, and retrieve only the address of the document, not the document itself * Goal: A system that will combine documents and index By Science For Science Tech predictions Microfilm Rapid Selector

Microfilm Rapid Selector By Science For Science Tech predictions Microfilm Rapid Selector

The Memex “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged supplement to his memory” (As We May Think,1945) By Science For science Tech predictions Microfilm Rapid Selector The Memex

By Science For science Tech predictions Microfilm Rapid Selector The Memex

The Memex - Features * Storage on microfilm * Workstation for stored documents and for projection * An option of adding new images * An option of adding personal comments to a document * Retrieval by document and code By Science For science Tech predictions Microfilm Rapid Selector The Memex

So, What’s new? By Science For science Tech predictions Microfilm Rapid Selector The Memex Associative annotation and selection: “trails”. Imitation of the human brain

From Memex to Hypertext From Memex to Hypertext “The 1987 Hypertext conference: The influence of Bush’s essay “As We May Think” on the emerging field of hypertext was widely acknowledged” (“From Memex to Hypertext”,Nyce & Kahn, 1991) “To a large part we have MEMEXes on our desks today…a web browser with an editor gives quite a good substitute for a MEMEX.” (Berners-Lee, talk at Bush symposium MIT, 1995)

BUT… * Emanuel Goldberg’s statistical machine- a microfilm selector. A US patent was issued in * Paul Otlet, 1934: “The Trait de Documentation”. Described a workstation for scholars, enables to read, write, and select documents. Scholars can connect documents. Coined the term ‘link’. From Memex to Hypertext Previous Ideas

The Memex - Critic * Trails are artificial. Not an objective measure * Every user has his own Memex, no networking * Bush predicted the affect of the record in laboratory research, law, and business accounting and not on the “ordinary person” The Memex Critic

Internet and WWW The Birth of the Internet and the WWW * 1969: The Advanced Research Projects Agency (ARPA) prepared a plan for the United States to maintain control over its missiles and bombers after a nuclear attack. Through this work the Internet was born. * Almost 20 years after the birth of the Internet, the World Wide Web was born to allow the public exchange of information on a global basis. It was built on the backbone of the Internet

A Brief History of Search Engines WWWW(1993):Indexed titles and URLs. Listed results in the order it found them Excite (1993) :Used statistical analysis of word relationships to make searching more efficient. Yahoo (1994) :A collection of favorite websites, that became a searchable directory. It provided a description with each URL Internet and WWW Search Engines

A Brief History of Search Engines WebCrawler (1994): Indexed entire web pages. Was bought in 1997 by Excite Lycos (1994): Provided ranked relevance retrieval and prefix matching Alta Vista (1995): Had nearly unlimited bandwidth (for that time), allowed natural language queries, advanced searching techniques, and allowed users to add or delete their own URL within 24 hours. Internet and WWW Search Engines

“The Anatomy of a Large- Scale Hypertextual Web Search Engine” By S. Brin and L. Page

* Google was born in Stanford university * Was launched in 1998 * Main goal: High Quality Search Quality = Relevance Google Internet and WWW Search Engines Google

Obstacles Web: * Scalability of the web and a growing number of queries * There is no control on what comes in the web- heterogeneous collection Search Engines: * Textual search provides many ‘junk results’ (A search engine that does not return itself to the top of 10 results) * Commercial SE, loss of relevance * Spam Internet and WWW Search Engines Google Obstacles

How Google Achieves Quality search It Makes use of the hypertextual information. In particular it utilizes: 1. The link structure of the web to calculate a quality ranking for each web page (PageRank) 2. Anchor text. Associated to the page in points to: Improves search results and causes for results that are not text-based 3. Other features such as proximity and visual presentation details (e.g. font size) Internet and WWW Search Engines Google Obstacles Quality search

Google’s Architecture Major functions: 1. Crawling 2. Indexing 3. Ranking 4. Searching Internet and WWW Search Engines Google Obstacles Quality search Architecture

Internet and WWW Search Engines Google Obstacles Quality search Architecture

Google’s Architecture URL Server - sends lists of URLs to crawlers Crawler - downloads web pages Store Server - compresses & stores web pages into the repository Indexer - reads the repository & uncompresses the documents - parses the documents - creates forward index - parses out the link Internet and WWW Search Engines Google Obstacles Quality search Architecture

Google’s Architecture URL Revolver - converts relative URLs from the anchors file, to absolute URLs and then to docIDs - generates a database of links - puts the anchor text into the f. index Sorter - generates the inverted index Searcher - answers queries Internet and WWW Search Engines Google Obstacles Quality search Architecture

Crawling The Web Crawling The Web Internet and WWW Search Engines Google Obstacles Quality search Architecture

Searching the Web 1. Parse the query. 2. Convert words into wordIDs. 3. Seek to the start of the doclist in the short barrel for every word. 4. Scan through the doclists until there is a document that matches all the search terms. Internet and WWW Search Engines Google Obstacles Quality search Architecture

Searching the Web 5. Compute the rank of that document for the query. 6. If we are in the short barrels and at the end of any doclist, seek to the start of the doclist in the full barrel for every word and go to step If we are not at the end of any doclist go to step Sort the documents that have matched by rank and return the top k. Internet and WWW Search Engines Google Obstacles Quality search Architecture

The Ranker * Uses hit lists, anchor text hits and PageRank * Types of hits: title, anchor, URL, plain text small font… Internet and WWW Search Engines Google Obstacles Quality search Architecture

The Ranker Vectors: * Type- weight vector, sorted by types for one word query * type-prox weight vector, for multiple words query * Count-weight vector * IR Score is a the dot product of the count weight and the types-weight vectors Internet and WWW Search Engines Google Obstacles Quality search Architecture

What we saw so far: Bush : Memex, Hypertext, Goldberg, Otlet Google: Goal, Obstacles, How to achieve quality, architecture