Searching on the WWW The Google Phenomena Snyder p119-141.

Slides:



Advertisements
Similar presentations
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
1 2/14/05CS120 The Information Era Searching the Web Don’t we already know how to do this?
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Chapter 5 Searching for Truth: Locating Information on the WWW.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
Search engines fdm 20c introduction to digital media lecture warren sack / film & digital media department / university of california, santa.
Chapter 5: Information Retrieval and Web Search
SEARCH ENGINES By, CH.KRISHNA MANOJ(Y5CS021), 3/4 B.TECH, VRSEC. 8/7/20151.
WEB SCIENCE: SEARCHING THE WEB. Basic Terms Search engine Software that finds information on the Internet or World Wide Web Web crawler An automated program.
Internet Research Search Engines & Subject Directories.
Web Searching. Web Search Engine A web search engine is designed to search for information on the World Wide Web and FTP servers The search results are.
An Application of Graphs: Search Engines (most material adapted from slides by Peter Lee) Slides by Laurie Hiyakumoto.
Search Engine Optimization
Chapter 5 Searching for Truth: Locating Information on the WWW.
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
Lecturer: Ghadah Aldehim
Search Engines. Internet protocol (IP) Two major functions: Addresses that identify hosts, locations and identify destination Connectionless protocol.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
ITIS 1210 Introduction to Web-Based Information Systems Internet Research Two How Search Engines Rank Pages & Constructing Complex Searches.
Click on surfer mouse to catch a wave. The Internet is a worldwide network of _______ that are connected by wires and cables. Click the picture below.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Search Engine Interfaces search engine modus operandi.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Searching the Web by Lorrie Brazier Revised by Paula Walton.
Internet Vocabulary CTE Intro. URL  The “address” of a website. Entering this address in the Address Bar will take you directly to a particular website.
1 Search Engine Optimization An introduction to optimizing your web site for best possible search engine results.
The Internet 8th Edition Tutorial 4 Searching the Web.
Search engines are the key to finding specific information on the vast expanse of the World Wide Web. Without sophisticated search engines, it would be.
Lecture 4 Title: Search Engines By: Mr Hashem Alaidaros MKT 445.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Web Search Algorithms By Matt Richard and Kyle Krueger.
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
Search Engines.
Computer Science 1000 Information Searching II Permission to redistribute these slides is strictly prohibited without permission.
1 UNIT 13 The World Wide Web Lecturer: Kholood Baselm.
1 Internet Research Third Edition Unit A Searching the Internet Effectively.
David Evans CS150: Computer Science University of Virginia Computer Science Class 38: Googling.
Internet Research – Illustrated, Fourth Edition Unit A.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Lawrence Snyder University of Washington, Seattle © Lawrence Snyder 2004.
The World Wide Web: Information Resource. How a Search Engine works… How Search Works - YouTube
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
CSCI-235 Micro-Computers in Science The Internet and World Wide Web.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
The Anatomy of a Large-Scale Hypertextual Web Search Engine (The creation of Google)
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
Search Engine Optimization
Tips to Researching on the Internet
Search Engines and Internet Resources
CIW Lesson 6 Web Search Engines.
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Search Engines & Subject Directories
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Information Retrieval
What is a Search Engine EIT, Author Gay Robertson, 2017.
Searching for Truth: Locating Information on the WWW
Search Engines & Subject Directories
Search Engines & Subject Directories
Searching for Truth: Locating Information on the WWW
Searching for Truth: Locating Information on the WWW
Presentation transcript:

Searching on the WWW The Google Phenomena Snyder p

Searching  The best place to look for something is where it’s likely to be found  Key to finding information.

Searching  A lot of information can be found on the WWW  But, that is the flaw of the WWW:  Too much Information  No organization  No structure

Organizing Information  Classification  Hierarchy  Categories with sub-categories  What are some problems with Hierarchies?

Problem With Hierarchies  I want to find the requirements for the Minor in IS Siena College AcademicsFinancial AidAthletics Degree Requirements Forms Schools Science CS Arts IS Minor

Webs or Networks  Multiple Paths Siena College AcademicsFinancial AidAthletics Degree Requirements Forms Schools Science CS Arts IS Minor

Problem With Web Networks  Less Important to Get things in the correct category  Information architects don’t worry too much about  Classification  Organization

Problem With Web Networks  As different Networks of Information are connected  Excessive redundant links emerge  Different organization strategies class  A mess is created

Search Engines to the Rescue  Alternative to searching via navigation  sc/search_engines_timeline.pdf sc/search_engines_timeline.pdf

How Search Engines Work 1. Web Crawling – program (robot) surfs from hyper-link to hyper-link accumulating pages 2. Web Indexed – each accumulated page is added to a database.  URL of web page is stored  Each word, occurrences, and sometimes position are stored.

How Search Engines Work 3.User Search – actually searches the index database 4.Sophisticated Algorithms are used to retrieve and rank pages that “match” the user search.

Step 1: Web Crawling  Hardest task.  million new web pages are added to the Internet every day.  Robots need to know where to start looking  You need the help and cooperation of web page creators.

Step 2: Web Indexing Each URL consists of a list of words...  URL1  word5  word74  word195  word456  URL2  word7  word82  word135  URL3  word5  word74  word165  word288  URL4  word21  word59  URL5  word25  word74  word188  word432  URL6  word7  word186  word430  URL7  word2  word398  URL8  word34  word39  word84  word193 ...

Step 2: Web Indexing Inverted Index: Each word consists of a list of URLs  word1  URL19  URL39  URL82  URL91  word2  URL27  URL41  URL66  URL67  word3  URL49  URL75  URL65  word4  URL29  URL89  word5  URL12  URL48  URL66  word6  URL53  URL73  URL123  URL144  word7  URL3  URL41  URL77 ...

Step 3: User Search  Searching the index database must be quick.  The database is sorted by key words (primary index)  The English language has about 600,000 words  Luckily, only about a tenth them are widely used  The database server needs to store the primary index in memory (RAM).

Step 4: Ranking the results  Searches on common words can return millions of pages.  Ordering or ranking becomes more important as the data increases  Intuitive measures  Number of occurances of search words  Search word in title, keyword, etc  “Importance” of web page  User feedback.

Search Engine Issues  Logical statements AND, OR, etc.  Phrases “Grilled Cheese”  Images – Dali Example  Dishonesty – XXX Example  Differences in Vocabulary - IBM-Issue

Search Engines (Catch 22)  Search Engine Companies make money by placing ads.  More searches = bigger audience = more $$$ from ads  Best Thing: Get as many people to use your search engine as possible  Worst Thing: What if everyone exclusively uses search engines to search the WWW?

How Google became the best  PageRank algorithm  (based on the Clever Algorithm)  PageRank is a measure of importance.  Links from important pages improves your PageRank

PageRank continued   Simplistic Explanation:  Initially all pages have the same PageRank  An iterative process increases the page rank of all pages based on  direct links first (highly weighted *1)  then, one hop links  then, two hop links ...  then, ten hop links (very low weighting *0.001) ...  The algorithm ignores cycles  The algorithm does not reward cliques  Eventually, the page ranks will stabilize (stop increasing) once you’ve considered  Until the page ranks stablize

PageRank intuition  ESPN.com is highly ranked because  Several other highly ranked pages point to it  Millions of low ranked pages point to it  Any page connected-to or part-of ESPN.com will benefit from this.  Intuitively, ESPN is an information authority on sports.

PageRank intuition  Breimer.net is poorly ranked because  Very few pages point to it. None to be exact.  The page is not an authority on “Breimer” until other pages acknowledge its existence via a link.