Information Retrieval Part 2 Sissi 11/17/2008. Information Retrieval cont..  Web-Based Document Search  Page Rank  Anchor Text  Document Matching.

Slides:



Advertisements
Similar presentations
Getting Your Web Site Found. Meta Tags Description Tag This allows you to influence the description of your page with the web crawlers.
Advertisements

Chapter 5: Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
PHP Meetup - SEO 2/12/2009. Where to Focus? Ensuring the findability of content Ensuring content is well understood by search engines Maximizing the importance.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
SEO Best Practices with Web Content Management Brent Arrington, Services Developer, Hannon Hill Morgan Griffith, Marketing Director, Hannon Hill 2009 Cascade.
How PageRank Works Ketan Mayer-Patel University of North Carolina January 31, 2011.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Evaluating Search Engine
The process of increasing the amount of visitors to a website by ranking high in the search results of a search engine.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Information Retrieval
Overview of Search Engines
WEB SCIENCE: SEARCHING THE WEB. Basic Terms Search engine Software that finds information on the Internet or World Wide Web Web crawler An automated program.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
Search Engine Optimization. What is SEO? Search engine optimization (SEO) is the process of improving the visibility of a website or a web page in search.
Chapter 8 Web Structure Mining Part-1 1. Web Structure Mining Deals mainly with discovering the model underlying the link structure of the web Deals with.
Google and the Page Rank Algorithm Székely Endre
S eminar on Page Ranking Techniques In Search Engines Phapale Gaurav S. [05 IT 6010] Guide: Prof. A. Gupta.
Meta Tags What are Meta Tags And How Are They Best Used?
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Search Engine Optimization
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
Search Optimization Techniques Dan Belhassen greatBIGnews.com Modern Earth Inc.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Search Engine Optimization. Introduction SEO is a technique used to optimize a web site for search engines like Google, Yahoo, etc. It improves the volume.
Adversarial Information Retrieval on the Web or How I spammed Google and lost Dr. Frank McCown Search Engine Development – COMP 475 Mar. 24, 2009.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
The Technology Behind. The World Wide Web In July 2008, Google announced that they found 1 trillion unique webpages! Billions of new web pages appear.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Presented by: Apeksha Khabia Guided by: Dr. M. B. Chandak
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Objective Understand concepts used to web-based digital media. Course Weight : 5%
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
 2008 Pearson Education, Inc. All rights reserved Introduction to XHTML.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Search engines are the key to finding specific information on the vast expanse of the World Wide Web. Without sophisticated search engines, it would be.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
Overview of Web Ranking Algorithms: HITS and PageRank
Web Search Algorithms By Matt Richard and Kyle Krueger.
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
Search & Searchability. Presentation from David Hawking – CSIRO Ineffectual corporate search tools can be the biggest drag on employee productivity. Knowledge.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Search Engines By: Faruq Hasan.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
A search engine is a web site that collects and organizes content from all over the internet Search engines look through their own databases of.
Web coordinator workshop. Introduction Meet and greet –Who are you and what was the last website you visited? Comms team – here for support + our role.
Think Digital, Think Ally Digital Media 1of19 SEO Press Release Strategy 2015.
General Architecture of Retrieval Systems 1Adrienn Skrop.
CS 440 Database Management Systems Web Data Management 1.
Search Engine Optimization Miami (SEO Services Miami in affordable budget)
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Search Engine Optimization(S.E.O)
OCR A-Level Computing - Unit 01 Computer Systems Lesson 1. 3
WEB SPAM.
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Boolean Retrieval Term Vocabulary and Posting Lists Web Search Basics
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Introduction to computers
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Junghoo “John” Cho UCLA
PageRank PAGE RANK (determines the importance of webpages based on link structure) Solves a complex system of score equations PageRank is a probability.
Information Retrieval and Web Design
Information Retrieval and Web Design
Presentation transcript:

Information Retrieval Part 2 Sissi 11/17/2008

Information Retrieval cont..  Web-Based Document Search  Page Rank  Anchor Text  Document Matching  Inverted Lists

Page Rank  PR(A) : the page rank of page A.  C(T): the number of outgoing links from page T.  d: minimum value assigned to any page.  : a page pointing to A.

Algorithm of Page Rank 1.Use the PageRank Equation to compute PageRank for each page in the collection using latest PageRanks of pages. 2.Repeat step 1 until no significant change to any PageRank.

Example in the first iteration:  PR(A)= *(PR(B)+PR(C)) = *(1+1) =1.9  PR(B)= *(PR(A)/2) = *(1.9/2) =0.95  PR(C)= *(PR(A)/2) = *(1.9/2) =0.95 PR(A)=1.48, PR(B)=0.76, PR(C)=0.76 initial value: PR(A)=PR(B)=PR(C)=1 d=0.1

Anchor Text  The anchor text is the visible, clickable text in a hyperlink.  For example:  Wikipedia  The anchor text is Wikipedia; the complex URL displays on the web page as Wikipedia, contributing to a clean, easy to read text or document. Wikipedia

Anchor Text  Anchor text usually gives the user relevant descriptive or contextual information about the content of the link’s destination.  The anchor text may or may not be related to the actual text of the URL of the link.  The words contained in the Anchor Text can determine the ranking that the page will receive by search engines.

Common Misunderstanding  Webmasters sometimes tend to misunderstand anchor text.  Instead of turning appropriate words inside of a sentence into a clickable link, webmasters frequently insert extra text.

Example 1.today our troops have liberated another country from tyranny. To know more, click here. click here 2.The more concise way of coding that would be: today our troops have liberated another country from tyranny.liberated another country

Anchor Text  This proper method of linking is beneficial not only to users, but also to the webmasters as anchor text holds significant weight in search engine ranking.  Most search engine optimization experts recommend against using “click here” to designate a link.

Google Bomb  In September 2000, the first Google bomb was created by Hugedisk Men’s Magazine, a now-defunct online humor magazine.  It linked the text “dumbmotherfucker” to a site selling George W. Bush-related merchandise.  A google search for this term would return the pro-Bush online store as its top result.  After a fair amount of publicity the George W. Bush- related merchandise site retained lawyers and sent a cease and desist letter to Hugedisk, thereby ending the Google bomb.

Existed Google Bomb  When search “more evil than Satan”, it returns the home page of microsoft company.  “miserable failure”, or “worst president”, or ”unelectable” it returns the resume of George W. Bush in the White House website.  “out of touch executives”, or “out of touch management” it returns the home page of google.  Other commercial use

Document Matching  An arbitrarily long document is the query, not just a few key words.  But the goal is still to rank and output an ordered list of relevant documents.  The most similar documents are found using the measures described earlier.

Generalization of searching  Matching a document to a collection of documents looks like a tedious and expensive operation.  Even for a short query, comparison to all large documents in the collection implies a relatively intensive computation task.

Example of document matching  Consider an online help desk, where a complete description of a problem is submitted.  That document could be matched to stored documents, hopefully finding descriptions of similar problems and solutions without having the user experiment with numerous key word searches.

Summarize 1.Search engines and document matchers are not focused on classification of new documents. 2.Their primary goal is to retrieve the most relevant documents from a collection of stored documents.

Inverted Lists  What is inverted lists?  Instead of documents pointing to words, a list of words pointing to documents is the primary internal representation for processing queries and matching documents.

Inverted Lists

Example  If the query contained words 100 and 200 1)First processing W(100) to compute the similarity S(i) of each document i : S(1)=0+1 S(2)=0+1 … 2)Then process W(200) in the same way: S(2)=1+1 …

Summarize 1.The inverted list is the key to the efficiency of information retrieval systems. 2.The inverted list has contributed to make nearest-neighbor methods a pragmatic possibility for prediction.

Conclusion 1.Information retrieval methods are specialized nearest-neighbor methods, which are well- known prediction methods. 2.IR methods typically process unlabeled data and order and display the retrieved documents. 3.The IR methods have no training and induce no new rules for classification.

Thank You!