Interfaces for Querying Collections
Information Retrieval Activities Selecting a collection –Lists, overviews, wizards, automatic selection Submitting a request –Queries & expressiveness –Graphical interfaces –Natural language Examining the response –Next class
Simple Query Interface
Complex Query Interface
Primary HCI Styles Command language Form filling Menu selection Direct manipulation Natural language Others?
Boolean Queries Most commercial full-text retrieval systems (until recently) supported only Boolean queries. Many studies show users have difficulty with Boolean expression –And and Or not as used in English “cats and dogs” “tea or coffee” –Syntax specifying nesting is often cryptic Boolean model does not include ranking –Earlier systems used reverse chronological order
Web-based Boolean Queries Search engines based on Boolean or extended Boolean engines needed to make their systems usable by the Web audience Reduce expressiveness for ease of use –Use “all the words” and “any of the words” –Boolean-based search engines added the + prefix Ranking performed using statistical algorithms and Web-specific heuristics
Command Line Search Command line interfaces for search Example Queries from Melvyl: –FIND PA darwin and TW species or TW descent –FIND TW Mt St. Helens AND DATE 1981
Command Line Search Still in use …
Form and Menus Melvyl
Faceted Queries Boolean queries often return too many or too few results –Conjunctions reduce sets too quickly –Disjunctions grow sets too quickly Solution: –Try out smaller queries to see if they have an appropriately sized set of results –Combine the smaller queries that are successful into larger query. Example: 1.(osteoporosis OR “bone loss”) 2.(drugs OR pharmaceuticals) 3.(preventions OR cure) 4.1 AND 2 AND 3
Post-Coordinate or Quorum Ranking Results are first ranked based on how many facets of the query they match. Faceted Search with Quorum ranking allows specifying each concept in multiple ways yet ranking based on number of concepts included in document. Further extension is to allow users to weight each facet. –Found on the web to help balance different goals of search (e.g. selecting a car or house)
Result Size Problem Occurs with Web Search Too
Graphical Query Specification Graphical interfaces can be static, direct manipulation, or combine the two. Direct manipulation –Continuous representation of objects –Physical actions replace complex syntax –Rapid incremental reversible operations on objects –Immediate feedback on actions
Graphical Boolean Queries Graphical queries are more accurate and faster than command-line queries in some studies Venn diagrams are common graphical approach –Limit to three elements in conjunction VQuery –Let users draw ellipses to create their own queries
VQuery
Process-Based Graphs Can graphically represent the query as a process of selection. Filter-flow model presents a set of filters. –One attribute and set of potential values per filter, multiple values treated as disjunction –Branches in flow indicate disjunctions –Serialized filters indicate conjunctions Fewer errors made with filter-flow than with SQL
Filter-Flow
Block-diagram Visualization Users arrange blocks to specify query. STARS –Users initially type in natural language query –Query terms are turned into blocks –Blocks are then arranged into query –Blocks in same row represent conjunction –Blocks in same column represent disjunction –Allows for previewing the query results by simple rearrangement of blocks
STARS
Magic Lenses Lenses act as filters on an overview visualization. –Disjunction is represented by independent lenses –Conjunction is expressed by placing multiple lenses over one another –Lenses can include addition information Where the term must appear Term frequency requirements Switches to use stemming …
Magic Lenses
Phrases and Proximity Specifying phrases and proximity constraints can be used to vastly improve precision. Phrase search is often used in the context of the Web. –But the phrase must be literal –“President Lincoln” does not match “President Abraham Lincoln” Proximity constraints allow for more general queries –Examples: LEXIS-NEXIS “white w/3 house” means “white within three words of house”
Natural Language and Free Text Queries Many systems treat question as a bag of words Natural language processing can be used to try to better determine the information need. –Extract noun (and verb) phrases –Find noun (and verb) phrases in same sentence Ask.com uses sites preselected to answer particular question forms. –Need to recognize type of question
Ask.com