Wasim Rangoonwala ID# 00506259 CS-460 Computer Security “Privacy is the claim of individuals, groups or institutions to determine for themselves when,

Slides:



Advertisements
Similar presentations
Getting Your Web Site Found. Meta Tags Description Tag This allows you to influence the description of your page with the web crawlers.
Advertisements

Computer Information Technology – Section 3-2. The Internet Objectives: The Student will: 1. Understand Search Engines and how they work 2. Understand.
Crawling the WEB Representation and Management of Data on the Internet.
1 CS 502: Computing Methods for Digital Libraries Lecture 16 Web search engines.
1 ETT 429 Spring 2007 Microsoft Publisher II. 2 World Wide Web Terminology Internet Web pages Browsers Search Engines.
Crawling The Web. Motivation By crawling the Web, data is retrieved from the Web and stored in local repositories Most common example: search engines,
WEB CRAWLERs Ms. Poonam Sinai Kenkre.
WEB SCIENCE: SEARCHING THE WEB. Basic Terms Search engine Software that finds information on the Internet or World Wide Web Web crawler An automated program.
 Popularity of browsers:  Popularity of search.
Internet Research Search Engines & Subject Directories.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
SEO-SEARCH ENGINE OPTIMIZATION SEO is an act of to make a website rich for Search Engines and Visitors. SEO simply get the Website Ranking Higher.
SEO Techniques Tech Talk 29 th August 2013 (By PEN Vannak)
What are search engines? Tools used for locating web pages Automated software programs known as spiders or bots to survey the Web and build their databases.
IDK0040 Võrgurakendused I Building a site: Publicising Deniss Kumlander.
Chapter 10 Publishing and Maintaining Your Web Site.
SEARCH ENGINE By Ms. Preeti Patel Lecturer School of Library and Information Science DAVV, Indore E mail:
Meta Tags What are Meta Tags And How Are They Best Used?
By Raza / Faisal By: Raza Usmani Faisal Khan. What is SEO? It is the process of affecting the visibility of a website or a web page in a search engine's.
SEO Essentials Let Your Customers Find You. What is SEO? The process of improving the visibility of a website or a webpage in search engines o Uses “organic”
1 Web Developer Foundations: Using XHTML Chapter 11 Web Page Promotion Concepts.
Lecturer: Ghadah Aldehim
Data Access Worldwide May 16 – 18, 2007 Copyright 2007, Data Access Worldwide May 16 – 18, 2007 Copyright 2007, Data Access Worldwide Search Engine Optimization.
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Search Engine Marketing Shelly Brown Director of Web Services Southwest Baptist University.
Crawlers and Spiders The Web Web crawler Indexer Search User Indexes Query Engine 1.
XHTML Introductory1 Linking and Publishing Basic Web Pages Chapter 3.
Search Engine optimization.  Search engine optimization (SEO) is the process of affecting the visibility of a website or a web page in a search engine's.
Courtney Forsmann IT Help Desk Manager Lewis-Clark State College October 1, 2014.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Crawling Slides adapted from
Overview What is a Web search engine History Popular Web search engines How Web search engines work Problems.
SEO  What is it?  Seo is a collection of techniques targeted towards increasing the presence of a website on a search engine.
1/28: The Internet & Website Design What is the Internet? –Parts of the Internet –Internet & WWW basics –Searching the WWW Website design considerations.
Validating, Promoting, & Publishing Your Web Site Writing For the Web The Internet Writer’s Handbook 2/e.
Google Sitemaps Case Study Eric Papczun SES Chicago Bulk Submit 2.0 December 5 th, 2006.
Do's and don'ts to improve your site's ranking … Presentation by:
Search Engine Optimization. Search Engines ≈50% your new users are from a search engine ≈50% are returning users Many repeat viewers will return using.
Chapter 9 Publishing and Maintaining Your Site. 2 Principles of Web Design Chapter 9 Objectives Understand the features of Internet Service Providers.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Search engines are the key to finding specific information on the vast expanse of the World Wide Web. Without sophisticated search engines, it would be.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
1 Crawling The Web. 2 Motivation By crawling the Web, data is retrieved from the Web and stored in local repositories Most common example: search engines,
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
Internet Research Tips Daniel Fack. Internet Research Tips The internet is a self publishing medium. It must be be analyzed for appropriateness of research.
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
A Brief Digression on Search Engine Optimization (SEO)
1 University of Qom Information Retrieval Course Web Search (Spidering) Based on:
Chapter 1 Getting Listed. Objectives Understand how search engines work Use various strategies of getting listed in search engines Register with search.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Search Engines A Web search engine is a tool designed to search for information on the World Wide Web. The search results are usually presented in a list.
The Internet What is the Internet? The Internet is a lot of computers over the whole world connected together so that they can share information. It.
1 Crawling Slides adapted from – Information Retrieval and Web Search, Stanford University, Christopher Manning and Prabhakar Raghavan.
Week-6 (Lecture-1) Publishing and Browsing the Web: Publishing: 1. upload the following items on the web Google documents Spreadsheets Presentations drawings.
Search Engines 19 Search Engines 19. Search Engines 19 We all use search engines every day But could you explain what happens behind the scenes? That’s.
The Anatomy of a Large-Scale Hypertextual Web Search Engine (The creation of Google)
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Privacy Issues in E-Commerce Seydou Ouattara CIST 2100 OAT Talk.
Technical SEO tips for Web Developers Richa Bhatia Singsys Pte. Ltd.
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Search Engine Optimization
Search Engines and Search techniques
Chapter Five Web Search Engines
Search Engines & Subject Directories
Hvhmi ارائه دهنده : ندا منقاش. Hvhmi ارائه دهنده : ندا منقاش.
Search Engines & Subject Directories
Search Engines & Subject Directories
12. Web Spidering These notes are based, in part, on notes by Dr. Raymond J. Mooney at the University of Texas at Austin.
Searching the Internet
Presentation transcript:

Wasim Rangoonwala ID# CS-460 Computer Security “Privacy is the claim of individuals, groups or institutions to determine for themselves when, how, and to what extent information about them is communicated to others” - Alan Westin: Privacy & Freedom,1967

What are www Robots? A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced. Web robots are sometimes referred to as Web Wanderers, Web Crawlers, or Spiders or Bots.

Web Spiders / Robots Collecting Data

Controlling how search engine access and index your website? Google refers to their spiders as Googlebots and Googlebots-Image Google has a set of computers that continually crawl the web. Together these machines are known as the Googlebot. In general you want Googlebot to access your site so your web pages can be found by people searching on Google.

Controlling how search engine access and index your website? One key Question is: how does Google know what parts of a website the site owner wants to have show up in search results? Can publishers specify that some parts of the site should be private and non- searchable? The good news is that those who publish on the web have a lot of control over which pages should appear in search results and which pages can be kept Private.. Answer: Robots.txt File

Controlling how search engine access and index your website? 1.Robots.txt has been an industry standard for many years that lets a site owner control how search engines access their web site. 2.The robots.txt file contains a list of the pages that search engines shouldn't access. 3.You can exclude pages from Google's crawler by creating a text file called robots.txt and placing it in the root directory. Making Use of Robots.txt File

Controlling how search engine access and index your website? Example of pages you want to kept private from search engines 1.A directory that contains internal logs. 2.News articles that require payment to access. 3.Administration area of website. Database configuration string, stored passwords, credit card details. 4.Images that you want to kept Private. Making Use of Robots.txt File Continue

Achieving Privacy through Robots.txt File # robots.txt File # Currently disallow all images to the Google Image bot User-agent: Googlebot-Image Disallow: / # ALL search engine spiders/crawlers (put at end of file) User-agent: Googlebot Disallow: /admin/ Disallow: /account_password.html Disallow: /address_book.html Disallow: /checkout_payment.html Disallow: /cookie_usage.html Disallow: /login.html Example of Robots.txt File

Privacy through Robots tag You can use a special HTML tag to tell robots not to index the content of a page, and/or not scan it for links to follow. Example... The "NAME" attribute must be "ROBOTS". Valid values for the "CONTENT" attribute are: "INDEX", "NOINDEX", "FOLLOW", "NOFOLLOW". Multiple comma-separated values are allowed, but obviously only some combinations make sense. If there is no robots tag, the default is "INDEX,FOLLOW", so there's no need to spell that out. Example of Tag

Search Engine Web Spiders Names Yahoo! Search-Yahoo Slurp AltaVista- Scooter AskJeeves- Ask Jeeves/Teoma MSN Search- MSNbot Visit For more details on Search Engine Web Spider Names.

Bonus

Google: Anatomy Google Crawlers (GoogleBot) Multiple distributed crawlers Own DNS cache 300 connections open at once Send fetched pages to Store Server Originally written in Python

PageRank ™ Algorithm Hypertext- matching Analysis Google: Technology

Google Webmaster Central Webmasters Central offer services: see which parts of a site Googlebot had problems crawling upload an XML Sitemap file analyze and generate robots.txt files remove URLs already crawled by Googlebot specify the preferred domain identify issues with title and description meta tags understand the top searches used to reach a site get a glimpse at how Googlebot sees pages remove unwanted site links that Google may use in results

When surfing the internet, avoid “free” offers and protect your information! Chatting – guard your information unless You are 100% Sure who you are chatting with. Cookies aren’t just for eating, they may be sending your personal information to others. Protect your passwords like you would your wallet or car keys. Make it complicate! is not secure and should never be though of as private. Don’t even open Spam, download a spam buster ! Beware of phishing, which are fake s Sent to try to gain your personal and financial information. Protect your privacy on the Web

For more Details Visit