INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Slides:



Advertisements
Similar presentations
Internet Search Lecture # 3.
Advertisements

Relevance Feedback Limitations –Must yield result within at most 3-4 iterations –Users will likely terminate the process sooner –User may get irritated.
Chapter 5: Introduction to Information Retrieval
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Introduction to Information Retrieval
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ISP 433/533 Week 2 IR Models.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
INFO 624 Week 3 Retrieval System Evaluation
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Latent Semantic Indexing.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
 IR: representation, storage, organization of, and access to information items  Focus is on the user information need  User information need:  Find.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
JASS 2005 Next-Generation User-Centered Information Management Information visualization Alexander S. Babaev Faculty of Applied Mathematics.
AuthorLink: Instant Author Co-Citation Mapping for Online Searching Xia Lin Howard D. White Jan Buzydlowski Drexel University Philadelphia,
H. Lundbeck A/S3-Oct-151 Assessing the effectiveness of your current search and retrieval function Anna G. Eslau, Information Specialist, H. Lundbeck A/S.
Samhaa R. El-Beltagy, Wendy Hall, David De Roure, and Leslie Carr Intelligence, Agents, Multimedia Department of Electronics and Computer Science University.
Modern Information Retrieval Computer engineering department Fall 2005.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
1 Query Operations Relevance Feedback & Query Expansion.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Planning a search strategy.  A search strategy may be broadly defined as a conscious approach to decision making to solve a problem or achieve an objective.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Building a Topic Map Repository Xia Lin Drexel University Philadelphia, PA Jian Qin Syracuse University Syracuse, NY * Presented at Knowledge Technologies.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
An Interactive System for CO-Citation Visualization Xia Lin Jan Buzydlowski Howard D. White Drexel University Philadelphia, PA, USA.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Citation-Based Retrieval for Scholarly Publications 指導教授:郭建明 學生:蘇文正 M
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Chapter Three Presentation: User interface How to Build a Digital Library Ian H. Witten and David Bainbridge.
INFO Week 7 Indexing and Searching Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
GUIDE. P UB M ED
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
LECTURE 3: DATABASE SEARCHING PRINCIPLES
Augmenting (personal) IR
Search Techniques and Advanced tools for Researchers
Data Mining Chapter 6 Search Engines
IL Step 3: Using Bibliographic Databases
Introduction to Information Retrieval
Information Retrieval and Web Design
Introduction to Search Engines
Presentation transcript:

INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University

Query Query is a representation of the user’s information needs Query is a representation of the user’s information needs  It may not represent the information needs exactly because  Information needs are difficult to describe -- semantic difficulty  Query must be in a format acceptable to the retrieval system -- syntactic difficulty

Content-based queries Words Phrases Proximity Pattern Matchingword matching Prefix/suffix Wildcard search Error handling Extended patterns BooleanVector Natural Language

Boolean Queries Request: Request: What are the likely problems when someone gets hurt on his knees when playing basketball? Write your best Boolean query for this request: Write your best Boolean query for this request: If the query returns zero hits, how do you modify the query? If the query returns zero hits, how do you modify the query? If the query returns too many hits, how do you modify the query? If the query returns too many hits, how do you modify the query?

How does AskJeeves translate the request? How does AskJeeves translate the request?  What are the likely problems when someone gets hurt on his knees when playing basketball? What are the likely problems when someone gets hurt on his knees when playing basketball? What are the likely problems when someone gets hurt on his knees when playing basketball?

Construct your best Boolean query for this request: Construct your best Boolean query for this request:  I am doing a research on personal space boundaries. I want to know if there are any sex or race differences in personal space boundaries.

Interaction with Queries Starts with a SEED query Starts with a SEED query  The System responds with a list of related terms Adds selected terms from the list to the query Adds selected terms from the list to the query  The system updates the list of related terms Repeat as needed Repeat as needed

Example: MedLine Search Assistant

Association-based Queries Find documents similar to this document. Find documents similar to this document. Find documents that links to this document Find documents that links to this document  Explicitly  Implicitly

Field-based Queries

Field-based queries will likely improve search precision. Field-based queries will likely improve search precision. Field-based queries require that the data source has a fixed structure and are indexed by the structure. Field-based queries require that the data source has a fixed structure and are indexed by the structure.

Citation-based Queries Retrieve all documents that document A cites. Retrieve all documents that document A cites. Find all documents that cite document A. Find all documents that cite document A. Find all documents that cite this author Find all documents that cite this author Find all document that cite both document A and document B Find all document that cite both document A and document B Find documents that cites both author A and author B Find documents that cites both author A and author B

Co-Citation The college has more than 20 years tradition on Co-citation research. The college has more than 20 years tradition on Co-citation research. Co-citation is the mentioning of any two earlier documents in the bibliographic references of a later third document. Co-citation is the mentioning of any two earlier documents in the bibliographic references of a later third document. Later Document 3 Document 1 cites Document 2 cites ?

Co-Citation Analysis The count of mentions may grow over time as new writings appear. Thus, co- citation counts can reflect citers’ changing perceptions of documents as more or less strongly related. The count of mentions may grow over time as new writings appear. Thus, co- citation counts can reflect citers’ changing perceptions of documents as more or less strongly related. Documents shown to be related by their co-citation counts can be mapped as proximate in intellectual space. Documents shown to be related by their co-citation counts can be mapped as proximate in intellectual space.

Co-Citation Mapping Detects patterns in the frequency with which any works by any two authors are jointly cited in later works. Detects patterns in the frequency with which any works by any two authors are jointly cited in later works. Only recurrent co-citation is significant: The more times authors are cited together, the more strongly related they are in the eyes of citers. Only recurrent co-citation is significant: The more times authors are cited together, the more strongly related they are in the eyes of citers.

A Map of Information Scientists

AuthorLinks

Link-Based Queries Hypertext Structure Hypertext Structure  Is a link a query?  nformation+retrieval  This is called query-mediated link.  It is also called “soft link.”  Is a query a link?  Many pages are dynamically generated from a database or a search engine. Your review pagesYour review pagesYour review pagesYour review pages

Queries, Links, Is there a difference – SIGCHI’97 An experiment was conducted to compare browsing behavior in query- and link- based interfaces. Results suggest that query-mediated links are as effective as explicit queries, and that strategies adopted by users affect performance. This work has implications for the design of information exploration interfaces. An experiment was conducted to compare browsing behavior in query- and link- based interfaces. Results suggest that query-mediated links are as effective as explicit queries, and that strategies adopted by users affect performance. This work has implications for the design of information exploration interfaces.

Query Structure Hierarchical Structure Hierarchical Structure  What does the user want when searching for “substance abuse”  We may not know, but adding narrower terms of “substance abuse” will likely get better results  Alcohol Abuse;  Drug Abuse;  Alcohol-Related Disorders  Amphetamine-Related Disorders  Cocaine-Related Disorders  Marijuana Abuse

Automatic Expansion If there is a defined hierarchy, several search strategies may be defined to expand the query: If there is a defined hierarchy, several search strategies may be defined to expand the query:  Search with the query term only  Search with the query term and all the terms in its upper hierarchy  Search with the query term and all the terms in its lower hierarchy.  Search with the query terms and its all the sibling terms

Query Operations Query execution Query execution Query expansion Query expansion Query translation Query translation

Query Expansion Improve the initial query through automatically Improve the initial query through automatically  restructuring the query or  adding other new terms or  Adjusting weights of each terms.

Restructuring the query: Restructuring the query:  Identify key concepts through natural language processing  Identify any field information that may be contained in the query  Is this an author?  Is this a journal?  Reverse term orders in the query

Adding new terms: Adding new terms:  Synonyms  Hierarchical terms  Scope terms  Does query “Football” retrieve information on football or on soccer?  Relevant terms  Selected terms from relevant documents  Terms co-occur most often with the query terms

Adjusting term weighting Adjusting term weighting  If relevant documents are known, increase the weights for terms assigned to the relevant documents and decrease the weights to terms assigned to non-relevant documents. Adjust term weights in a topic tree: Adjust term weights in a topic tree:  Fruit  Fruit, 0.9 ; apple, 0.7; orange, 0.7; banana, 0.6; ….; Macintosh, 0.1; Computer -.4.

Query Translation From natural language to queries From natural language to queries  AskJeeves From queries in one system to queries in another system From queries in one system to queries in another system From one natural language to another natural language From one natural language to another natural language  Altavista

Other types of representation for user’s needs? Mind-reading? Mind-reading? Non-text queries? Non-text queries? Gesture/motion? Gesture/motion?

IBM – Visualization Space This information system understands the user. It "hears" users' voice commands and "sees"their gestures and body positions. Interactions are natural, more like human-to-human interactions.

Multimedia Queries Content-based Content-based  Text indexing Attribute-based Attribute-based  Color, size, type, time period, … Structure-based Structure-based  Location, shape, layout, etc. Cluster-based Cluster-based  Semantic groups, physical groups, structure-groups, Example: find a photo that has the White House in the center. Example: find a photo that has the White House in the center.

Project Discussion Idea 1: Install and implement an IR system Idea 1: Install and implement an IR system  Focus on system and technology  Need to have a collection  Need to have hand-on experience with systems Idea 2: Conduct an evaluation experiment on one or two selected IR systems Idea 2: Conduct an evaluation experiment on one or two selected IR systems  Focus on interfaces and users Idea 3: Customize an IR system Idea 3: Customize an IR system  Focus on functionality and customization

Project Evaluation Topics Topics  Relevance  Problems identified  Technical difficulties  Solutions/ideas The process The process  Design  Implementation

The report The report  Background  Written  Oral

Midterm Concepts Concepts  What is information retrieval?  Data, information, text, and documents  Two abstractions principles  User’s information needs  Queries and query formats  Precision and Recall  Relevance

Midterm Procedures & problem solving Procedures & problem solving  How to translate a request into a query?  How to expand queries  for better recall or better precision?  How to create an inverted indexing?  How to create a vector space ?  How to calculate similarities of documents?  How to match a query to documents in a vector space?

Discussions Discussions  Challenges of IR  Advantages and disadvantages of Boolean search (vector space, automatic indexing, association-based queries, etc.)  Evaluation of IR systems  With or without using precision/recall.  Difference between data retrieval and information retrieval