Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Frank McCown Comp 250 – Web Development Harding University

Similar presentations


Presentation on theme: "Dr. Frank McCown Comp 250 – Web Development Harding University"— Presentation transcript:

1 Dr. Frank McCown Comp 250 – Web Development Harding University
If a Web Page Can't be Googled, Does it Really Exist? What Web Developers Need to Know About Web Search Engines Dr. Frank McCown Comp 250 – Web Development Harding University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License

2 Web Developers & Search Engines
Search engines are the primary way users find information on the Web If a web page is not indexed by a search engine, it will not be seen (by many) Web developers need to know… How search engines work How to make pages discoverable How to make pages rank highly (SEO)

3 How do you locate information on the Web?
When seeking information online, one must choose the best way to fulfill one’s information need Most popular: Web directories Search engines – primary focus of this lecture Social media

4 Web Directories Pages ordered in a hierarchy Usually powered by humans
Yahoo started as a web directory in 1994 and still maintains one: Open Directory Project (ODP) is largest and is maintained by volunteers

5

6 Search Engines Most often used to fill an information need
Users enter search terms into text box get back a SERP (search engine result page) Queries are generally modified and resubmitted to the SE if the desired results are not found on the first few pages of results Types of search engines: Intranet search engines (Nutch, Solr) Web search engines (Google, Bing, Baidu) Metasearch engines – includes Deep Web (Dogpile, WebCrawler) Vertical (or focused) search engines (Google Scholar, Google Shopping)

7 Components of a Search Engine
Crawler: Downloads web pages and looks for links to new pages in the pages it downloads Indexer: Indexes the content in the pages downloaded by the crawler Search: User’s query is matched to data in the indexes and to the ad indexes and search results shown to the user Figure from Introduction to Information Retrieval by Manning et al., Ch 19.

8 Search query SERP Paid results Page title Organic results

9 Web Crawling Web crawlers or robots fetch a page, place all the page’s links in a queue, fetch the next link from the queue, and repeat Large search engines use thousands (millions?) of continually running web crawlers to (re-)discover web content Web crawlers are usually polite Identify themselves through the http User-Agent request header (e.g., googlebot) Throttle requests to a web server, crawl at off-peak times Honor robots exclusion protocol (robots.txt). Example: User-agent: * Disallow: /private

10 Halloween Easter Egg

11 Indexing After crawling, web pages are indexed by the Indexer
Inverted index is built containing the words and the documents that contain the words Example: cat > 2, 5  dog > 1, 5, 6  fish > 1, 2  bird > 4  Query for dog results in pages 1, 5, and 6 Query for dog and cat results in page 5

12 Google data center in Iowa

13 Ranking Web Pages When a user enters a query, the search engine produces a SERP with the most “relevant” results How is relevancy determined? Broad range of techniques ranging from textual analysis to web graph analysis According to Google: “Relevancy is determined by over 200 factors…”

14 Textual Analysis Term frequency: How often does the page use the search term? Can easily be defeated by including more usage of the term, so usually there is a limit Location and emphasis: Where and how does the search term appear? Title, URL In bold, large font, headings Anchor text: Is the term used by others to create links to the web page? Google-bombing

15 Web Graph Analysis PageRank: Some pages are more “important” if they garner more links from other “important” web pages

16 Search Engine Optimization (SEO)
Economic incentive to rank highly Cottage industry called SEO Process of getting traffic from search results White hat Create content that is meaningful and uses search terms in ideal locations Get others to create meaningful links, esp from social media Black hat Create web pages designed only for search engine consumption which trick it into thinking the page is about certain topics Create link farms to increase PageRank of certain pages

17 More Reading Introduction to Web Search Engines by McCown
Google’s Webmaster Guidelines What is SEO?


Download ppt "Dr. Frank McCown Comp 250 – Web Development Harding University"

Similar presentations


Ads by Google