INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University
Query Query is a representation of the user’s information needs Query is a representation of the user’s information needs It may not represent the information needs exactly because Information needs are difficult to describe -- semantic difficulty Query must be in a format acceptable to the retrieval system -- syntactic difficulty
Content-based queries Words Phrases Proximity Pattern Matchingword matching Prefix/suffix Wildcard search Error handling Extended patterns BooleanVector Natural Language
Boolean Queries Request: Request: What are the likely problems when someone gets hurt on his knees when playing basketball? Write your best Boolean query for this request: Write your best Boolean query for this request: If the query returns zero hits, how do you modify the query? If the query returns zero hits, how do you modify the query? If the query returns too many hits, how do you modify the query? If the query returns too many hits, how do you modify the query?
How does AskJeeves translate the request? How does AskJeeves translate the request? What are the likely problems when someone gets hurt on his knees when playing basketball? What are the likely problems when someone gets hurt on his knees when playing basketball? What are the likely problems when someone gets hurt on his knees when playing basketball?
Construct your best Boolean query for this request: Construct your best Boolean query for this request: I am doing a research on personal space boundaries. I want to know if there are any sex or race differences in personal space boundaries.
Interaction with Queries Starts with a SEED query Starts with a SEED query The System responds with a list of related terms Adds selected terms from the list to the query Adds selected terms from the list to the query The system updates the list of related terms Repeat as needed Repeat as needed
Example: MedLine Search Assistant
Association-based Queries Find documents similar to this document. Find documents similar to this document. Find documents that links to this document Find documents that links to this document Explicitly Implicitly
Field-based Queries
Field-based queries will likely improve search precision. Field-based queries will likely improve search precision. Field-based queries require that the data source has a fixed structure and are indexed by the structure. Field-based queries require that the data source has a fixed structure and are indexed by the structure.
Citation-based Queries Retrieve all documents that document A cites. Retrieve all documents that document A cites. Find all documents that cite document A. Find all documents that cite document A. Find all documents that cite this author Find all documents that cite this author Find all document that cite both document A and document B Find all document that cite both document A and document B Find documents that cites both author A and author B Find documents that cites both author A and author B
Co-Citation The college has more than 20 years tradition on Co-citation research. The college has more than 20 years tradition on Co-citation research. Co-citation is the mentioning of any two earlier documents in the bibliographic references of a later third document. Co-citation is the mentioning of any two earlier documents in the bibliographic references of a later third document. Later Document 3 Document 1 cites Document 2 cites ?
Co-Citation Analysis The count of mentions may grow over time as new writings appear. Thus, co- citation counts can reflect citers’ changing perceptions of documents as more or less strongly related. The count of mentions may grow over time as new writings appear. Thus, co- citation counts can reflect citers’ changing perceptions of documents as more or less strongly related. Documents shown to be related by their co-citation counts can be mapped as proximate in intellectual space. Documents shown to be related by their co-citation counts can be mapped as proximate in intellectual space.
Co-Citation Mapping Detects patterns in the frequency with which any works by any two authors are jointly cited in later works. Detects patterns in the frequency with which any works by any two authors are jointly cited in later works. Only recurrent co-citation is significant: The more times authors are cited together, the more strongly related they are in the eyes of citers. Only recurrent co-citation is significant: The more times authors are cited together, the more strongly related they are in the eyes of citers.
A Map of Information Scientists
AuthorLinks
Link-Based Queries Hypertext Structure Hypertext Structure Is a link a query? nformation+retrieval This is called query-mediated link. It is also called “soft link.” Is a query a link? Many pages are dynamically generated from a database or a search engine. Your review pagesYour review pagesYour review pagesYour review pages
Queries, Links, Is there a difference – SIGCHI’97 An experiment was conducted to compare browsing behavior in query- and link- based interfaces. Results suggest that query-mediated links are as effective as explicit queries, and that strategies adopted by users affect performance. This work has implications for the design of information exploration interfaces. An experiment was conducted to compare browsing behavior in query- and link- based interfaces. Results suggest that query-mediated links are as effective as explicit queries, and that strategies adopted by users affect performance. This work has implications for the design of information exploration interfaces.
Query Structure Hierarchical Structure Hierarchical Structure What does the user want when searching for “substance abuse” We may not know, but adding narrower terms of “substance abuse” will likely get better results Alcohol Abuse; Drug Abuse; Alcohol-Related Disorders Amphetamine-Related Disorders Cocaine-Related Disorders Marijuana Abuse
Automatic Expansion If there is a defined hierarchy, several search strategies may be defined to expand the query: If there is a defined hierarchy, several search strategies may be defined to expand the query: Search with the query term only Search with the query term and all the terms in its upper hierarchy Search with the query term and all the terms in its lower hierarchy. Search with the query terms and its all the sibling terms
Query Operations Query execution Query execution Query expansion Query expansion Query translation Query translation
Query Expansion Improve the initial query through automatically Improve the initial query through automatically restructuring the query or adding other new terms or Adjusting weights of each terms.
Restructuring the query: Restructuring the query: Identify key concepts through natural language processing Identify any field information that may be contained in the query Is this an author? Is this a journal? Reverse term orders in the query
Adding new terms: Adding new terms: Synonyms Hierarchical terms Scope terms Does query “Football” retrieve information on football or on soccer? Relevant terms Selected terms from relevant documents Terms co-occur most often with the query terms
Adjusting term weighting Adjusting term weighting If relevant documents are known, increase the weights for terms assigned to the relevant documents and decrease the weights to terms assigned to non-relevant documents. Adjust term weights in a topic tree: Adjust term weights in a topic tree: Fruit Fruit, 0.9 ; apple, 0.7; orange, 0.7; banana, 0.6; ….; Macintosh, 0.1; Computer -.4.
Query Translation From natural language to queries From natural language to queries AskJeeves From queries in one system to queries in another system From queries in one system to queries in another system From one natural language to another natural language From one natural language to another natural language Altavista
Other types of representation for user’s needs? Mind-reading? Mind-reading? Non-text queries? Non-text queries? Gesture/motion? Gesture/motion?
IBM – Visualization Space This information system understands the user. It "hears" users' voice commands and "sees"their gestures and body positions. Interactions are natural, more like human-to-human interactions.
Multimedia Queries Content-based Content-based Text indexing Attribute-based Attribute-based Color, size, type, time period, … Structure-based Structure-based Location, shape, layout, etc. Cluster-based Cluster-based Semantic groups, physical groups, structure-groups, Example: find a photo that has the White House in the center. Example: find a photo that has the White House in the center.
Project Discussion Idea 1: Install and implement an IR system Idea 1: Install and implement an IR system Focus on system and technology Need to have a collection Need to have hand-on experience with systems Idea 2: Conduct an evaluation experiment on one or two selected IR systems Idea 2: Conduct an evaluation experiment on one or two selected IR systems Focus on interfaces and users Idea 3: Customize an IR system Idea 3: Customize an IR system Focus on functionality and customization
Project Evaluation Topics Topics Relevance Problems identified Technical difficulties Solutions/ideas The process The process Design Implementation
The report The report Background Written Oral
Midterm Concepts Concepts What is information retrieval? Data, information, text, and documents Two abstractions principles User’s information needs Queries and query formats Precision and Recall Relevance
Midterm Procedures & problem solving Procedures & problem solving How to translate a request into a query? How to expand queries for better recall or better precision? How to create an inverted indexing? How to create a vector space ? How to calculate similarities of documents? How to match a query to documents in a vector space?
Discussions Discussions Challenges of IR Advantages and disadvantages of Boolean search (vector space, automatic indexing, association-based queries, etc.) Evaluation of IR systems With or without using precision/recall. Difference between data retrieval and information retrieval