Search Engines and Cloud Computing Charles Severance.

Slides:



Advertisements
Similar presentations
Getting Your Web Site Found. Meta Tags Description Tag This allows you to influence the description of your page with the web crawlers.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
SEO Best Practices with Web Content Management Brent Arrington, Services Developer, Hannon Hill Morgan Griffith, Marketing Director, Hannon Hill 2009 Cascade.
1 Lesson 14 - Unit N Optimizing Your Web Site for Search Engines.
The PageRank Citation Ranking “Bringing Order to the Web”
WEB CRAWLERs Ms. Poonam Sinai Kenkre.
Search Engine Optimization (SEO)
Search Engine Optimization Strategies The Basics of SEO Offered to you by Mary Anne Donovan Vice President, SEO Literacy Consultants.
Search Engine Optimization March 23, 2011 Google Search Engine Optimization Starter Guide.
WEB SCIENCE: SEARCHING THE WEB. Basic Terms Search engine Software that finds information on the Internet or World Wide Web Web crawler An automated program.
IDK0040 Võrgurakendused I Building a site: Publicising Deniss Kumlander.
The Technical SEO Audit Rick Ramos | seOveflow. Introduction  SEO is search engine usability.  Why do you need an audit?  How nimble are your development.
Search Engine Optimization
Search Engine Optimization: Understanding the Engines & Building Successful Sites Zohaib Ahmed Google Analytics Individual Qualified March 2012.
Wasim Rangoonwala ID# CS-460 Computer Security “Privacy is the claim of individuals, groups or institutions to determine for themselves when,
ECommerce Marketing Strategies Rae Montgomery May 16-20, 2005 Oklahoma City, OK.
آموزش طراحی وب سایت جلسه پانزدهم – بهینه سازی برای موتور جستجو تدریس طراحی وب برای اطلاعات بیشتر تماس بگیرید تاو شماره تماس: پست.
Search Engines & Search Engine Optimization (SEO).
Web Categorization Crawler Mohammed Agabaria Adam Shobash Supervisor: Victor Kulikov Winter 2009/10 Design & Architecture Dec
Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
SEO  What is it?  Seo is a collection of techniques targeted towards increasing the presence of a website on a search engine.
UNIT 14 1 Websites. Introduction 2 A website is a set of related webpages stored on a web server. Webmaster: is a person who sets up and maintains a.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
Gregor Gisler-Merz How to hit in google The anatomy of a modern web search engine.
Building a Search Engine Friendly ™ eCommerce Website ECMTA Webinar July 2008 Mountain Media is a trademarks of New Earth Technologies. All other logos/images.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
Search Engines By: Faruq Hasan.
1 University of Qom Information Retrieval Course Web Search (Spidering) Based on:
SEO Friendly Website Building a visually stunning website is not enough to ensure any success for your online presence.
What is Web Information retrieval from web Search Engine Web Crawler Web crawler policies Conclusion How does a web crawler work Synchronization Algorithms.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
By Pamela Drake SEARCH ENGINE OPTIMIZATION. WHAT IS SEO? Search engine optimization (SEO) is the process of affecting the visibility of a website or a.
Eric W. Wohlers, PE Env. Health Director Chris Crawford, Ph.D. Water Resource Specialist Cattaraugus County.
Week 1 Introduction to Search Engine Optimization.
Successful Site Architecture Matt Bailey SiteLogic
1 CS 430: Information Discovery Lecture 17 Web Crawlers.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Search Engine Optimization Miami (SEO Services Miami in affordable budget)
Search can be Your Best Friend You just Need to Know How to Talk to it IW 306 Ágnes Molnár.
Design and Implementation of a High- Performance Distributed Web Crawler Vladislav Shkapenyuk, Torsten Suel 실시간 연구실 문인철
Retrieving and Visualizing Data Charles Severance Python for Everybody
HOW TO USE GOOGLE WEBMASTER TOOLS TO IMPROVE SEO ? GOOGLE WEBMASTEER.
Search Engine Optimization
Information Retrieval in Practice
SEARCH ENGINE OPTIMIZATION.
How do Web Applications Work?
Search Engine Optimization (SEO)
Search Engine Optimization
Search Engines and Search techniques
IS 360 Web Promotion.
SEARCH ENGINE OPTIMIZATION SEO. What is SEO? It is the process of optimizing structure, design and content of your website in order to increase traffic.
1 SEO is short for search engine optimization. Search engine optimization is a methodology of strategies, techniques and tactics used to increase the amount.
Objective % Explain concepts used to create websites.
The Anatomy of a Large-Scale Hypertextual Web Search Engine
IST 497 Vladimir Belyavskiy 11/21/02
Mason Soiza Website Recommendations
Hvhmi ارائه دهنده : ندا منقاش. Hvhmi ارائه دهنده : ندا منقاش.
CNIT 131 HTML5 – Anchor/Link.
Information Retrieval
12. Web Spidering These notes are based, in part, on notes by Dr. Raymond J. Mooney at the University of Texas at Austin.
Presented By:- Abhinav Shashtri.  SEO or Search Engine Optimization is the process of ranking your website easier to make the top on SERP pages and other.
Blog SEO Tips: How to Write SEO Friendly Blog Posts
cs430 lecture 02/22/01 Kamen Yotov
Presentation transcript:

Search Engines and Cloud Computing Charles Severance

What are the last words of “Where the Wild Things Are”?

GoogleGoogle

Google I/O 2008 Keynote Google I/O '08 Keynote by Marissa Mayer Usablity / User Experience / User Testing / Architecture / Philosophy Required Viewing

Lessons The cloud is wide - we can touch 1000 servers in 0.1 seconds For things that seem “intelligent” 0.2 seconds is fast enough - as long as you can do a lot of them Lots of spread-out storage and a fast scan is important Data - Information - Knowledge - starts with data and the ability to look through that data quickly

Scalable Infrastructure 2:50

Infrastructure The only sustainable scalability is when you scale with inexpensive, green solutions Tape Backup is a rate limiting factor - so we need something creative Disaster recovery - “Of course!”

Extracting Knowledge for Search

Associative Memory Humans think in terms of a network and connections of information As compared to linear lists of things Like Python Dictionaries (often called Associative Arrays)

The Web as a Directed Graph Connectivity - Nodes are linked if there is a series of edges (a path) where you can get from one node to another “Strongly Connected” - there is a path from every node to every other node in a graph

Search Engine Architecture Web Crawling Index Building Searching

A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Web Crawler

Web Crawler Retrieve a page Look through the page for links Add the links to a list of “to be retrieved” sites Repeat...

Web Crawling Policy a selection policy that states which pages to download, a re-visit policy that states when to check for changes to the pages, a politeness policy that states how to avoid overloading Web sites, and a parallelization policy that states how to coordinate distributed Web crawlers

robots.txt A way for a web site to communicate with web crawlers An informal and voluntary standard Sometimes folks make a “Spider Trap” to catch “bad” spiders User-agent: * Disallow: /cgi-bin/ Disallow: /images/ Disallow: /tmp/ Disallow: /private/

Google Architecture Web Crawling Index Building Searching

Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Without an index, the search engine would scan every document in the corpus, which would require considerable time and computing power. Search Indexing

Inverted Index An Inverted Index lists all of the documents which contain a particular word Allows us to quickly produce a list of documents given one or a few search terms The problem with the web is that we have too many documents

PageRank Basic Idea: Incoming links signal “value” or “interest” Incoming links from other high ranking sites have greater value Computed by giving all sites some “value” and letting value flow out the outboud links and in the inbound links until value stabilizes

Free and very valuable

Gaming Google The real ranking mechanism has many subtle tuning parameters which are kept secret as well as human intervention Once the web site builders *know* the rules - they can game the system A busy little consultancy - Search Engine Optimization (SEO)

Google Supplemental Index Not a good place to be - crawling happens less frequently and seldom appear in search results Causes: duplicate content, low page rank, link manipulation, page freshness, etc. “Google uses the index as a holding pen for pages it deems to be of low quality or designed to appear artificially high in search results.”

Search Engine Optimization Very dangerous and Google has rules Google will put your site on “supplimental index” for as long as a year Google “Google Hell” cx_ag_0430googhell.html “Google uses the index as a holding pen for pages it deems to be of low quality or designed to appear artificially high in search results.”

Google’s Webmaster Central Lets you work with Google’s crawler and index with regards to your site You establish ownership of a site by adding a meta-tag You can look at crawling activity, page rank, set up a site map, etc.

Search-Friendly Web Development Google I/O Maile Ohye (Google) - June 10, 2008 Mission: Organize the world’s information and make it universally accessible and useful

Webmaster Guidelines Content design Search Engine Optimization Technical Issues

Search-Friendly Web Sites What should you do to ensure your site works well for Google Search (alt tags, title, description, url design) How can your site get in trouble? Google’s focus on “User Experience” and Usability and how they feel when your site is clicked from a search that it reflects on them

Make a site with a clear hierarchy and text links. Every page should be reachable from at least one static text link. Offer a site map to your users with links that point to the important parts of your site. If the site map is larger than 100 or so links, you may want to break the site map into separate pages. Create a useful, information-rich site, and write pages that clearly and accurately describe your content. Think about the words users would type to find your pages, and make sure that your site actually includes those words within it.

Try to use text instead of images to display important names, content, or links. The Google crawler doesn't recognize text contained in images. If you must use images for textual content, consider using the "ALT" attribute to include a few words of descriptive text. Make sure that your elements and ALT attributes are descriptive and accurate. Check for broken links and correct HTML. Keep the links on a given page to a reasonable number (fewer than 100). If you decide to use dynamic pages (i.e., the URL contains a "?" character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few.

Google Architecture Web Crawling Index Building Searching

A web search query is a query that a user enters into web search engine to satisfy his or her information needs. Web search queries are distinctive in that they are unstructured and often ambiguous; they vary greatly from standard query languages which are governed by strict syntax rules. Search Queries

How Search Works

How Search Ads Work

PageRank Story

July 25, :15 PM July 25, :55 PM What PageRank Gets You

Google Keyword Tool Allows you to explore different keywords and see approximate prices

Search Summary Web Crawling Index Building Searching Advertising

Advanced Topics (not required) Big Table