Web indexing ICE0534 – Web-based Software Development July 21. 2005 Seonah Lee.

Slides:



Advertisements
Similar presentations
Publishers Web Sites Standard Features. Objectives Access publishers websites Identify general features available on most publishers websites Know how.
Advertisements

Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
ONLINE RESOURCES. QUESTION Do you ever go onto the Internet and plan to only spend a small amount of time looking for something and spend much longer.
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
Information Retrieval in Practice
Finding Information Online Objectives: Students will be able to distinguish between web search tools and library search tools and understand the types.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Introduction Web Development II 5 th February. Introduction to Web Development Search engines Discussion boards, bulletin boards, other online collaboration.
CM143 – Web Week 12 Meta Information Assignment 2 Presentations.
Information Retrieval
Overview of Search Engines
Internet Research Search Engines & Subject Directories.
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
Chapter 8 Web Structure Mining Part-1 1. Web Structure Mining Deals mainly with discovering the model underlying the link structure of the web Deals with.
Navigation and Menus Hillary Funk. Agenda  Overview of Navigation and Menus  Types of Navigation  What good navigation includes  Navigation Stress.
SEARCH ENGINE By Ms. Preeti Patel Lecturer School of Library and Information Science DAVV, Indore E mail:
S.E.O. What we need to do for every site we build.
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
Web Information Retrieval Projects Ida Mele. Rules Students can work in teams (max 3 people) The project must be delivered by the deadline that will be.
Lesson 12 — The Internet and Research
Strengths: SEO – Moderate Page Placement Inbound Links: 11 Onsite Lead Generation Mobile Optimization Onsite Blogging -API To Social Sites - Facebook,
SEO Part 1 Search Engine Marketing Chapter 5 Instructor: Dawn Rauscher.
Courtney Forsmann IT Help Desk Manager Lewis-Clark State College October 1, 2014.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Searching the WWW Chapter 5. Search Engines  Software that lets a user specify search terms. The search engine then finds sites that contain those terms.
HTML Tags Basic Tags Doctype or HTML Head Title Body Use the website to find the definitions
Using Hyperlink structure information for web search.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
NCBI/WHO PubMed/Hinari Course Introduction Session #1, Sept 13, 2005 Session #2, Sept 14, 2005 Internet Concepts and Scientific Literature Resources Ho.
Searching the Web by Lorrie Brazier Revised by Paula Walton.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
1 CS 430: Information Discovery Lecture 9 Term Weighting and Ranking.
What to Know: 9 Essential Things to Know About Web Searching Janet Eke Graduate School of Library and Information Science University of Illinois at Champaign-Urbana.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
Indexes and Abstracts: Dissecting the Resource By M. Leedy.
Search Tools and Search Engines Searching for Information and common found internet file types.
Wiki’s Collaborative tools for information workers within a Web 2.0 environment Ina Smith & Ivy Segoe Dept. of Library Services, University of Pretoria.
Search Engine Know- How: How To Optimize Your Content, Navigation Pages, & Documents For Search Engines.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
Week-6 (Lecture-1) Publishing and Browsing the Web: Publishing: 1. upload the following items on the web Google documents Spreadsheets Presentations drawings.
Search Engine Optimisation No Point having a lovely site and lovely content if no one can find it!
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Data mining in web applications
Search Engine Optimization
Information Retrieval in Practice
WEB SPAM.
HITS Hypertext-Induced Topic Selection
Methods and Apparatus for Ranking Web Page Search Results
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
SEARCH ENGINE OPTIMIZATION. P RESENTATION O VERVIEW  Search Engine Basics  What is SEO?  Key Concepts  Why is Search Engine marketing important? 
Search Engines & Subject Directories
Searching EIT, Author Gay Robertson, 2017.
Data Mining Chapter 6 Search Engines
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Search Engines & Subject Directories
Search Engines & Subject Directories
Junghoo “John” Cho UCLA
Internet Vocabulary Terms
Information Retrieval and Web Design
Presentation transcript:

Web indexing ICE0534 – Web-based Software Development July Seonah Lee

Contents  News related to Web Indexing  Web Indexing?  Web Indexing: Styles  Web Indexing: Tools  Web Indexing in Search Engine  Web Indexing in Google  Summary  References  Question

Google tests tool to aid Web indexing By Dawn Kawamoto, CNET News.com, Monday, June :00 AM

Web Indexing?  Creating indexes for individual web sites Intranets collections of HTML documents collections of web sites.  Purpose for helping users find information using a variety of keywords and gathering similar information.

Web Indexing?  Indexes systematically arranged items entry points to go directly to desired information within a larger document or set of documents  Indexing an analytic process of determining which concepts are worth indexing, what entry labels to use, and how to arrange the entries.

Web Indexing: Styles (1/2)  Back-of-the-Book Style Web Indexing Including “A-Z indexes” to websites or an Intranet Some web indexes take the form of a list of hierarchical categories arranged in alphabetical order

Web Indexing: Styles (2/2)  Metadata and Web Indexing assigning keywords or phrases to web pages or web sites within a meta-tag field so that the web page or web site can be retrieved with a search engine that is customized to search the keywords field.

Web Indexing: Tools CategoryDescriptionTools Standalone or Dedicated tools They are usually used for back- of-the-book indexes HTML Indexer XRefHT32 Embedded indexing It is the process of creating index entries electronically in a document's files FrameMaker Microsoft Word TaggingIt inserts numbered dummy tags in the files, and then builds the index separately In-house Tools KeywordingIt is used primarily in online help materials RoboHelp HTML Utilities and Add-ons It converts a ASCII index file to HTML documents HTML/Prep text searching tools They are aspects of information retrieval that indexers are very interested in. SWISH

Web Indexing: The Most Famous Tool  HTML Indexer, by Brown Inc. 

Web Indexing in Search Engine  Phases of work of Web SE Document gathering Document indexing Searching in response to a query Visualization of search results Parse Query Gathering Indexing Rank or Match The Web Visualization

Web Indexing in Search Engine  Almost every Web Search Engine uses a slightly different technique The parsing discards some html marking Some give different weight to terms in different html field Some do not index the full text of the document, but only part of it Some make full use of “metadata” Very few make use of the information provided by linking: HITS and PageRank (Google)

Web Indexing in Google  PageRank Google assigns a number called the PageRank to every web page that it knows about. Assumption: A page is important if other important web pages link to it Main PageGoogle This PageYahoo Each Page = Node Directed Edge = a link from one to the other

Web Indexing in Google  PageRank: Example R1 R3 R2 R1 = R3 R2 = R1 / 2 R3 = R1 / 2 + R2 R1: 1.2 R3: 1.2 R2: 0.6 R1 = 2R1 R3 = R1 3 = R1 + R2 + R3 Assumption: an average page has a PageRank of 1

Web Indexing in Google  HITS (Hyperlink-Induced Topic Search) Divides pages relating to a topic into two groups  Authorities: pages with good content about a topic  Hubs: pages that link to many authority pages on a topic (directory) Iteratively calculate hub and authority scores for each page in neighborhood and rank results accordingly  Document that many pages point to is a good authority  Document that points to many authorities is a good hub, pointing to many good authorities makes for an even better hub

Summary  Web Indexing  Web Indexing Styles Back-of-the-Book Style Web Indexing Metadata and Web Indexing  Web Indexing Techniques in Google HITS PageRank

References  News  Definition  Tools  Theory gerank.html gerank.html

Question?