Information Retrieval (IR) on the Internet. Contents  Definition of IR  Performance Indicators of IR systems  Basics of an IR system  Some IR Techniques.

Slides:



Advertisements
Similar presentations
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Advertisements

How to Pick Up Women…. Using IR Strategies By Mike Wooldridge May 9, 2006.
Chapter 5: Introduction to Information Retrieval
Multimedia Database Systems
Modern Information Retrieval Chapter 1: Introduction
The Development of Sharing Publication Citation Information Website with Article Search System Using OKAPI BM25 Author Hartono ( ) Supervisors Resmana.
LIS618 lecture 9 Web retrieval Thomas Krichel
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
| 1 › Gertjan van Noord2014 Zoekmachines Lecture 4.
IS530 Lesson 12 Boolean vs. Statistical Retrieval Systems.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
Information Retrieval in Practice
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.
Evaluating the Performance of IR Sytems
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
P Beini Ouyang Phrase Matching: Assessing Document Similarity for NASA Scientists and Engineers Beini Ouyang Department of Computer Science.
Information retrieval: overview. Information Retrieval and Text Processing Huge literature dating back to the 1950’s! SIGIR/TREC - home for much of this.
Information Retrieval
HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
Query Relevance Feedback and Ontologies How to Make Queries Better.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
CSM06 Information Retrieval Lecture 4: Web IR part 1 Dr Andrew Salway
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
Do's and don'ts to improve your site's ranking … Presentation by:
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Shelly Warwick, MLS, Ph.D – Permission is granted to reproduce and edit this work for non-commercial educational use as long as attribution is provided.
Searching Tutorial By: Lola L. Introduction:  When you are using a topic, you might want to use “keyword topics.” Using this might help you find better.
Search Engines By: Faruq Hasan.
1 CS 430: Information Discovery Sample Midterm Examination Notes on the Solutions.
1 Automatic indexing Salton: When the assignment of content identifiers is carried out with the aid of modern computing equipment the operation becomes.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Reference Collections: Collection Characteristics.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
What Does the User Really Want ? Relevance, Precision and Recall.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Major Issues n Information is mostly online n Information is increasing available in full-text (full-content) n There is an explosion in the amount of.
Search Engine Optimization
Automated Information Retrieval
Information Retrieval in Practice
An Efficient Algorithm for Incremental Update of Concept space
LECTURE 3: DATABASE SEARCHING PRINCIPLES
Information Retrieval and Web Search
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Multimedia Information Retrieval
Search Techniques and Advanced tools for Researchers
IR Theory: Evaluation Methods
Data Mining Chapter 6 Search Engines
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Information Retrieval and Web Design
Presentation transcript:

Information Retrieval (IR) on the Internet

Contents  Definition of IR  Performance Indicators of IR systems  Basics of an IR system  Some IR Techniques  Search Engines  Challenges faced by IR on the Internet  Conclusion

What is IR  IR refers to going through documents on the Internet  Presenting documents relevant to search terms  Presenting ONLY relevant documents poses a challenge  Hence IR systems are measured according to certain indicators

Performance Indicators  Response time - Time taken to present results - Not really an issue these days  Precision - Percentage of the results that are relevant  Recall - Percentage of ALL relevant documents on the Internet that were presented

Performance Indicators contd.  More on Recall - Not possible to calculate this - If ALL relevant documents were known - Then it would be possible to return ONLY relevant documents during a search  The user is not considered and should be

Basics of an IR system  An IR system has three main concerns - Create abstract view of search terms - Create abstract view of documents - Match both views  Once all three are achieved then the IR system is working properly

Basics of an IR system contd. Search terms Documents Keywords Matching Abstraction Feedback Resulting docs

Basics of an IR system contd.  The process of arriving at a successful abstracted view of the search terms refers to the Query formulation process  The process of arriving at keywords to represent a document and point to it, refers to the Indexing process

Some Techniques used for IR Some Techniques used for IR  Indexing  Ranking

Indexing (IR Technique)  Stripping a document to keywords/search terms  Using these keywords as pointers to the document

Some Approaches to Indexing  Manual Indexing - As the name suggests - Impossible due to the size of the Internet  Metadata - Is an invisible file tied to a web page and holds data about the contents of the page - e.g. the Dublin Core Metadata Element Set which proposes a 15 element set that holds data like; creator, title, subject and so on

An Indexing technique  Term weighting - Keywords do not have the same strength - Numerical values are assigned, the higher the value, the more relevant the keyword - The value is referred to as the weight - Weights can be assigned based on term frequency or on inverse document frequency

Ranking (IR Technique)  Uses term weighting of a document to give priority - The sum of the weights of keywords is used to order results in descending order

Search Engines  These are an intricate part of IR on the Internet  They receive search terms and match them with relevant documents  They only have access to indices as accessing the entire document will degrade performance and be too costly

Some Challenges  The size of the Internet - Research shows that only 60% of the Internet is indexed by search engines - Any one search engine only indexes 3- 34% of the Internet Kobayashi, M. & Takeda, K. (2000). “Information Retrieval on the Web”, [online] in ACM Computing Surveys, Vol. 32, No. 2, June 2000, Kobayashi, M. & Takeda, K. (2000). “Information Retrieval on the Web”, [online] in ACM Computing Surveys, Vol. 32, No. 2, June 2000,

Some Challenges contd.  Even indexed documents are amended, replaced or removed altogether making the indexing structure inaccurate (sometimes)  Impossible to enforce Metadata proposals - Academic journals are not dated sometimes  User may not be clear about the information need which could affect the search terms provided

Conclusion  IR is important as we thrive on information  In spite of the challenges faced by IR, it still returns a decent level of success. Sometimes with the initial set of search terms, sometimes after a few attempts  There is a lot of work going on to improve IR techniques and it is my belief that a breakthrough will be achieved soon

Thank you Any questions?