Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-1 How Search Engines Work Today we show how a search engine works  What happens when.

Slides:



Advertisements
Similar presentations
SEO Best Practices with Web Content Management Brent Arrington, Services Developer, Hannon Hill Morgan Griffith, Marketing Director, Hannon Hill 2009 Cascade.
Advertisements

Computer Information Technology – Section 3-2. The Internet Objectives: The Student will: 1. Understand Search Engines and how they work 2. Understand.
Page 1 June 2, 2015 Optimizing for Search Making it easier for users to find your content.
Search Engines & Search Engine Optimization (SEO) Presentation by Saeed El-Darahali 7 th World Congress on the Management of e-Business.
Information Retrieval in Practice
Internet Resources Discovery (IRD) Search Engines Quality.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Chapter 5 Searching for Truth: Locating Information on the WWW.
How Search Engines Work Source:
Information Retrieval
Christy Gavin Spring, 2009 The Google Search Engine.
Search Engine Optimization (SEO)
Overview of Search Engines
IDK0040 Võrgurakendused I Building a site: Publicising Deniss Kumlander.
SEO Lunch How to Grow A Business in 3 Bites Akiva Ben-Ezra
Databases & Data Warehouses Chapter 3 Database Processing.
Search Engine Optimization (SEO) Week 07 Dynamic Web TCNJ Jean Chu.
Chapter 5 Searching for Truth: Locating Information on the WWW.
Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 7-1 Module II Overview PLANNING: Things to Know BEFORE You Start… Why SEM? Goal Analysis.
Search Optimization Techniques Dan Belhassen greatBIGnews.com Modern Earth Inc.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
آموزش طراحی وب سایت جلسه پانزدهم – بهینه سازی برای موتور جستجو تدریس طراحی وب برای اطلاعات بیشتر تماس بگیرید تاو شماره تماس: پست.
Search Engine Optimization ext 304 media-connection.com The process affecting the visibility of a website across various search engines to.
Promotion & Cataloguing AGCJ 407 Web Authoring in Agricultural Communications.
Search Engines & Search Engine Optimization (SEO).
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
The Business Model and Strategy of MBAA 609 R. Nakatsu.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Search Engine Optimization & Pay Per Click Advertising
Gregor Gisler-Merz How to hit in google The anatomy of a modern web search engine.
Web Searching. How does a search engine work? It does NOT search the Web (when you make a query) It contains a database with info on numerous Web sites.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Search engines are the key to finding specific information on the vast expanse of the World Wide Web. Without sophisticated search engines, it would be.
Continuing Education UCC Fall 2010 Search Engine Optimization.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Search engines are used to for looking for documents. They compile their databases by employing "spiders" or "robots" to crawl through web space from.
The Business Model of Google MBAA 609 R. Nakatsu.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
Search Engines By: Faruq Hasan.
Search Engine Optimisation. On page methodologies –Anything you can affect with the construction of a single page Off page methodologies –Refers to all.
Internet Research – Illustrated, Fourth Edition Unit A.
Search Engine Know- How: How To Optimize Your Content, Navigation Pages, & Documents For Search Engines.
Lawrence Snyder University of Washington, Seattle © Lawrence Snyder 2004.
Chapter 1 Getting Listed. Objectives Understand how search engines work Use various strategies of getting listed in search engines Register with search.
Week 1 Introduction to Search Engine Optimization.
 SEO Terms A few additional terms Search site: This Web site lets you search through some kind of index or directory of Web sites, or perhaps both an.
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
Search Engine Optimization Presented By:- ARKA Softwares Effective! Affordable! Time Groove
Search Engine Optimization Miami (SEO Services Miami in affordable budget)
PPC Tutorial For Beginners. Content  PPC  Paid & Organic Advertisement  What is Search Engine?  How to set up account in Google Adwords? Target Your.
Search Engine Optimization
Information Retrieval in Practice
SEARCH ENGINE OPTIMIZATION.
Fluency with Information Technology
Search Engine Optimization
IS 360 Web Promotion.
BTEC NCF Dip in Comp - Unit 15 Website Development Lesson 04 – Search Engine Optimisation Mr C Johnston.
1 SEO is short for search engine optimization. Search engine optimization is a methodology of strategies, techniques and tactics used to increase the amount.
by Siang Tay Marketing and Development Unit, UniSA
Search Engine Optimization (SEO)
Searching for Truth: Locating Information on the WWW
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Introduction to Information Retrieval
Searching for Truth: Locating Information on the WWW
Searching for Truth: Locating Information on the WWW
SEO (Search Engine Optimization)
Presentation transcript:

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-1 How Search Engines Work Today we show how a search engine works  What happens when a searcher enters keywords  What was performed well in advance  Also explain (briefly) how paid results are chosen If we have time, we will also talk about the size of the Web (If you really want to know how web search engines work, take my CSE345 WWW Search Engines course in the spring!)

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-2 (Google results example) ORGANIC RESULTS PAID RESULTS

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-3 Building an index A search engine does not examine every page on the web when a user puts in a query The engine first builds an index  Custom database of all the words on all pages  Search engine also stores other information

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-4 Overview of organic search

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-5 Matching the Search Query The search query is everything that the user types to get results  It is made up of one or more search terms, plus optional special characters Analyzing the Query  Expanding the query Word variants: plural/singular, various verb forms Spelling correction  Phrases, anti-phrases, and stop words  Word order  Search operators

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-6 Matching the Search Query Organic query matches  Find pages with each of the remaining query terms  Document IDs are listed in a term index  Document information is in a separate doc index

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-7 Matching the Search Query Paid placement matches  Similar to organic match, but using a separate database of ads  Uses similar processing to select which query terms to use  Advertisers choose which queries can match Might require exact match, or allow broad matching  Simpler/faster because there are fewer ads to search through

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-8 Ranking Organic Matches This is a complex, active research area  Goal is to sort matching results from 'best' to 'worst'  Many factors contribute to different rankings in the various engines  Ranking functions are under continuous change Primary factors  Text analysis: keyword density and prominence  Link analysis: page and site authority estimates  Anchor text: terms used to describe page by others  Traffic analysis: which results get clicked on

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-9 Text Analysis: Keyword Density A.k.a. keyword weight Generally refers to the relative frequency of a term on the page Higher keyword density generally means that a document is more 'about' that keyword Natural text has a maximum reasonable density  The book cites a 7% density threshold Multi-term queries target keyword proximity  Pages with the same terms adjacent in same order would benefit most

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-10 Text Analysis: Term Prominence Where do the query terms appear? Good places include:  Title  Headings  Start of body Terms in such places could get extra weight

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-11 Link Analysis: Estimating Authority A typical short query matches millions of pages  Many could even have the same textual (relevance) weight from keyword density and prominence Link analysis estimates the importance of each page, based on the link structure around it The more respected a site is, the more links point to it Some links are more important than others  A link from Yahoo (or the White House!) signifies much more than a link from geocities.com

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-12 Google's PageRank The best-known link analysis algorithm  Algorithm published in 1998published  Very well-studied; improvements are still being made to it today The authoritativeness of a page grows if  More pages link to it  The pages that link to it increase their authority The original algorithm is not a significant component of Google's ranking approach today  Many have shown that it performs poorly now

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-13 Anchor Text What is a page about? Page builders often summarize a page (or the significant aspect of a page) in the anchor text (the text of a link)  These short descriptions look a lot like queries!  Can help determine value of link  A significant component for ranking today

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-14 Traffic Analysis Many engines will track which links you click on from a results page Such clicks can be considered “votes” for URLs Re-ordering based on clicks can improve ranking quality [Joachims et al., 2005]Joachims et al. DirectHit search engine used click-throughs to generate top-10 results (purchased by Ask Jeeves in 2000)

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-15 Ranking Paid Placement Simplest approach: rank by highest bidder  Originally developed by Overture (a.k.a. goto.com)  Advertisers can change bids continuously, and can specify a particular budget Google's approach: rank by most valuable  Combination of bid and click-through rate  More relevant (clicked) ads move up in rank  Users find ads more useful

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-16 Displaying Search Results Once the set of results has been collected and ranked, the results page needs to be generated For first page, select top results (typically 10)  Look up title, URL for linking (and often display)  Generate snippet (portion of page text that illustrates query terms) or look up ad copy

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-17 Collecting Material for the Organic Index Primarily using a crawler/spider  Given a seed list of links, visit each one and add any new URLs found to the list of links to visit

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-18 Building the Organic Index For each page retrieved, extract the text  For each term in the text, add the page's ID (and optionally, positions) to the list of docs for that term

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-19 Building the Organic Index For each page retrieved  Extract the links Record anchor text for each link  Record Title and URL What to crawl?  Can't crawl all pages!  Need to re-crawl oft-changing pages Some engines allow trusted feeds (typically a form of paid inclusion) to get content indexed

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-20 Content Analysis Convert different types of documents  Use a single standard internal representation  Lots of file types: Word, PDF, PostScript, etc. Recognize language used They also extract additional text from a page

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-21 What search engines (and sight-impaired users) don't see They cannot read images (even text in images) Often they do not read Flash content or JavaScript

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-22 What search engines can see Image names Image alt text

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-23 What search engines can see Image names Image alt text Meta text  Title  Description  Keywords (often ignored)  Other directives URL text

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-24 Search Engine Relationships Business relationships have changed significantly over the past five years or so. See the Search EngineRelationship Chart as it can also show connections over time.Search EngineRelationship Chart There are more players than shown (such as Gigablast, Snap.com) and lots of international engines. X A9

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-25 Evaluating Organic Search Results Precision: fraction of search results that are correct (relevant) to a query Recall: fraction of all correct (relevant) answers included in a set of search results

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-26 Evaluating Organic Search Results Precision: fraction of search results that are correct (relevant) to a query Recall: fraction of all correct (relevant) answers included in a set of search results Improving one usually results in worsening of the other

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-27 Evaluating Organic Search Results Precision: fraction of search results that are correct (relevant) to a query Recall: fraction of all correct (relevant) answers included in a set of search results Improving one usually results in worsening of the other In web search, neither can be measured exactly!  Still useful to think about how a change will affect performance

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-28 How big is the Web?

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-29 How big is the Web? Depends!

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-30 How big is the Web? Depends! What if I turn on a laptop that can produce links to an infinite number of pages?  Proposed by Andrei Broder who has studied this

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-31 How big is the Web? Perhaps you mean the size of the index used by web search engines?

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-32 How big is the Web? Perhaps you mean the size of the index used by web search engines?  This is a recurring debate  In 2005, Google was reporting 8B+ pages indexed  Yahoo then announced it had indexed almost 20B  Google declared Yahoo as counting differently  Google no longer reports its index size and regularly underreports the number of machines it uses

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-33 How big is the Web? Perhaps you mean the size of the index used by web search engines?  This is a recurring debate  In 2005, Google was reporting 8B+ pages indexed  Yahoo then announced it had indexed almost 20B  Google declared Yahoo as counting differently  Google no longer reports its index size and regularly underreports the number of machines it uses Estimates of intersection size in 1995 of top 4 indexes was only about 2.7B (different crawls!)

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-34 How big is the Web? Perhaps you mean the size of the index used by web search engines?  This is a recurring debate  In 2005, Google was reporting 8B+ pages indexed  Yahoo then announced it had indexed almost 20B  Google declared Yahoo as counting differently  Google no longer reports its index size and regularly underreports the number of machines it uses Estimates of intersection size in 1995 of top 4 indexes was only about 2.7B (different crawls!) What about pages not indexed by the engines?

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-35 How big is the Web? How large is the indexable web?  That is, ignoring the pages that require passwords, links within flash content, or forms to be filled in (search boxes, registration, etc.)  Recent estimate is > 11.5B [Gulli & Signorini, 2005]Gulli & Signorini Fairly close in time to Yahoo's 20B claim

Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 2-36 How big is the Web? How large is the indexable web?  That is, ignoring the pages that require passwords, links within flash content, or forms to be filled in (search boxes, registration, etc.)  Recent estimate is > 11.5B [Gulli & Signorini, 2005]Gulli & Signorini Fairly close in time to Yahoo's 20B claim The hidden web (the rest) is times larger!  Again, just reported estimates... So it is impossible to know the size of the Web!