Dr. Frank McCown Comp 250 – Web Development Harding University

Slides:



Advertisements
Similar presentations
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Advertisements

Search Engines. What is a search engine? Search engines use automated software programs (spider, crawler, robot) to crawl the WWW by following links.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Searching the Web. The Web Why is it important: –“Free” ubiquitous information resource –Broad coverage of topics and perspectives –Becoming dominant.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Introduction Web Development II 5 th February. Introduction to Web Development Search engines Discussion boards, bulletin boards, other online collaboration.
Search Engine Optimization (SEO)
WEB SCIENCE: SEARCHING THE WEB. Basic Terms Search engine Software that finds information on the Internet or World Wide Web Web crawler An automated program.
How Does a Search Engine Work? Part 1 Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial-
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.
SEO for Web Designers By Alfredo Palconit, Jr.. I. What is SEO? A process of improving a site’s traffic and rank from organic search engine results. Notes:
Search Engine Optimization (SEO) Week 07 Dynamic Web TCNJ Jean Chu.
SEO Essentials Let Your Customers Find You. What is SEO? The process of improving the visibility of a website or a webpage in search engines o Uses “organic”
1 Web Developer Foundations: Using XHTML Chapter 11 Web Page Promotion Concepts.
Search Engine Optimization. Introduction SEO is a technique used to optimize a web site for search engines like Google, Yahoo, etc. It improves the volume.
Data Access Worldwide May 16 – 18, 2007 Copyright 2007, Data Access Worldwide May 16 – 18, 2007 Copyright 2007, Data Access Worldwide Search Engine Optimization.
Copyright © Terry Felke-Morris WEB DEVELOPMENT & DESIGN FOUNDATIONS WITH HTML5 7 TH EDITION Chapter 13 Key Concepts 1 Copyright © Terry Felke-Morris.
Introductions Search Engine Development COMP 475 Spring 2009 Dr. Frank McCown.
1 Web Developer & Design Foundations with XHTML Chapter 13 Key Concepts.
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
BZUPAGES.COM Department of IT, Institute of Computing, BZU, Multan Search Engine Optimization.
Adversarial Information Retrieval on the Web or How I spammed Google and lost Dr. Frank McCown Search Engine Development – COMP 475 Mar. 24, 2009.
آموزش طراحی وب سایت جلسه پانزدهم – بهینه سازی برای موتور جستجو تدریس طراحی وب برای اطلاعات بیشتر تماس بگیرید تاو شماره تماس: پست.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
How Does a Search Engine Work? Part 1 Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial.
1 Search Engine Optimization An introduction to optimizing your web site for best possible search engine results.
Lecture 4 Title: Search Engines By: Mr Hashem Alaidaros MKT 445.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
Search Engines.
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
Search Tools and Search Engines Searching for Information and common found internet file types.
Search Engines By: Faruq Hasan.
Search Engine and SEO Presented by Yanni Li. Various Components of Search Engine.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
1 University of Qom Information Retrieval Course Web Search (Spidering) Based on:
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
SEARCH ENGINE OPTIMIZATION. What is Search Engine Optimization?  Search engine optimization ( SEO ) is the process of affecting the visibility of a website.
By Pamela Drake SEARCH ENGINE OPTIMIZATION. WHAT IS SEO? Search engine optimization (SEO) is the process of affecting the visibility of a website or a.
Week 1 Introduction to Search Engine Optimization.
Search Engine Optimization Presented By:- ARKA Softwares Effective! Affordable! Time Groove
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Copyright © Terry Felke-Morris Web Development & Design Foundations with HTML5 8 th Edition CHAPTER 13 KEY CONCEPTS 1.
CONVERSION OPTIMIZATION FOR BY CHRIS NDUNGU (mkulima)
Search Engine Optimization Miami (SEO Services Miami in affordable budget)
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
SEO Company or SEO Agency
SEO Company in Miami
Search Engine Optimization
SEARCH ENGINE OPTIMIZATION.
Search Engine Optimization (SEO)
Search Engine Optimization(S.E.O)
Search Engine Optimization
Best SEO.
Lecture 7. Web Search. Author: Aleksey Semyonov
SEARCH ENGINE OPTIMIZATION SEO. What is SEO? It is the process of optimizing structure, design and content of your website in order to increase traffic.
1 SEO is short for search engine optimization. Search engine optimization is a methodology of strategies, techniques and tactics used to increase the amount.
Search Engine Optimization (SEO)
Maximizing Exposure for Your Non-Profit
Data Mining Chapter 6 Search Engines
Searching for Truth: Locating Information on the WWW
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Searching for Truth: Locating Information on the WWW
Searching for Truth: Locating Information on the WWW
Information Retrieval and Web Design
Best Digital Marketing Tips For Quick Web Pages Indexing Presented By:- Abhinav Shashtri.
Presentation transcript:

Dr. Frank McCown Comp 250 – Web Development Harding University If a Web Page Can't be Googled, Does it Really Exist? What Web Developers Need to Know About Web Search Engines Dr. Frank McCown Comp 250 – Web Development Harding University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License

Web Developers & Search Engines Search engines are the primary way users find information on the Web If a web page is not indexed by a search engine, it will not be seen (by many) Web developers need to know… How search engines work How to make pages discoverable How to make pages rank highly (SEO)

How do you locate information on the Web? When seeking information online, one must choose the best way to fulfill one’s information need Most popular: Web directories Search engines – primary focus of this lecture Social media

Web Directories Pages ordered in a hierarchy Usually powered by humans Yahoo started as a web directory in 1994 and still maintains one: http://dir.yahoo.com/ Open Directory Project (ODP) is largest and is maintained by volunteers http://www.dmoz.org/

Search Engines Most often used to fill an information need Users enter search terms into text box get back a SERP (search engine result page) Queries are generally modified and resubmitted to the SE if the desired results are not found on the first few pages of results Types of search engines: Intranet search engines (Nutch, Solr) Web search engines (Google, Bing, Baidu) Metasearch engines – includes Deep Web (Dogpile, WebCrawler) Vertical (or focused) search engines (Google Scholar, Google Shopping)

Components of a Search Engine Crawler: Downloads web pages and looks for links to new pages in the pages it downloads Indexer: Indexes the content in the pages downloaded by the crawler Search: User’s query is matched to data in the indexes and to the ad indexes and search results shown to the user Figure from Introduction to Information Retrieval by Manning et al., Ch 19.

Search query SERP Paid results Page title Organic results

Web Crawling Web crawlers or robots fetch a page, place all the page’s links in a queue, fetch the next link from the queue, and repeat Large search engines use thousands (millions?) of continually running web crawlers to (re-)discover web content Web crawlers are usually polite Identify themselves through the http User-Agent request header (e.g., googlebot) Throttle requests to a web server, crawl at off-peak times Honor robots exclusion protocol (robots.txt). Example: User-agent: * Disallow: /private

Halloween Easter Egg http://www.mattcutts.com/blog/google-protects-itself-from-zombies/

Indexing After crawling, web pages are indexed by the Indexer Inverted index is built containing the words and the documents that contain the words Example: cat > 2, 5  dog > 1, 5, 6  fish > 1, 2  bird > 4  Query for dog results in pages 1, 5, and 6 Query for dog and cat results in page 5

Google data center in Iowa https://www.google.com/about/datacenters/gallery/#/tech/1

Ranking Web Pages When a user enters a query, the search engine produces a SERP with the most “relevant” results How is relevancy determined? Broad range of techniques ranging from textual analysis to web graph analysis According to Google: “Relevancy is determined by over 200 factors…”

Textual Analysis Term frequency: How often does the page use the search term? Can easily be defeated by including more usage of the term, so usually there is a limit Location and emphasis: Where and how does the search term appear? Title, URL In bold, large font, headings Anchor text: Is the term used by others to create links to the web page? Google-bombing

Web Graph Analysis PageRank: Some pages are more “important” if they garner more links from other “important” web pages

Search Engine Optimization (SEO) Economic incentive to rank highly Cottage industry called SEO Process of getting traffic from search results White hat Create content that is meaningful and uses search terms in ideal locations Get others to create meaningful links, esp from social media Black hat Create web pages designed only for search engine consumption which trick it into thinking the page is about certain topics Create link farms to increase PageRank of certain pages

More Reading Introduction to Web Search Engines by McCown Google’s Webmaster Guidelines What is SEO?