Considering a Faceted Search-based Model Marti Hearst UCB SIMS NAS CSTB DNS Meeting on Internet Navigation and the Domain Name System: Technical Alternatives and Policy Implications July 12, 2001
Outline The Klensin proposal Synopsis Issues Recommendations UIs and faceted search
A Proposal A Search-based access model for the DNS IETF Internet-Draft by John Klensin A multi-layer approach to naming Faceted descriptions are used to facilitate both flexible naming and inexact search This talk: What does research tell us about the search issues?
Faceted Classification System (simple, regulated) Free-text Search (unregulated) DNS (unchanged) Faceted System (detailed, unregulated) Klensin’s proposal Search Lookup
Layer 2 Industry Category: Restaurant Geolocation: Miami Language: Spanish Network Location Name: Jose’s Pizza
Faceted System (simple, regulated) Layer 2 Inputs: search values for one or more facets Outputs: appropriate DNS names and all tuples with matched facets Allow for partial (fuzzy) match Jose’s Pizza, Miami Alberto’s Pizza, Miami Jose’s Bistro, Miami Jose’s Pizza, Saratoga Joe’s Pizza, Miami … Jose’s Pizza, Miami Alberto’s Pizza, Miami Jose’s Bistro, Miami Jose’s Pizza, Saratoga Joe’s Pizza, Miami …
Layer 2 Selling Points Allows sharing of name space among different (commercial) entities Allows specification according to meaningful attributes
Layer 2 DNS Issues How to guarantee uniqueness? How to determine appropriate descriptors? How to use in a hyperlink? Requires a user interface for confirmation of correct choice
Layer 2 Descriptor Issues Emphasis on geolocation may be problematic May be too spare SFMOMA SFMOMA exhibits SFMOMA exhibit on digital art called
Faceted System (detailed, unregulated) Layer 3 Not centrally coordinated (provided by commercial services) More detailed facets Allow for inheritance Context-sensitive (e.g., restaurant has menu attribute auto repair has services, etc.) Inputs: service-dependent Outputs: layer 2 names
Free-text Search (unregulated) Layer 4 Use standard search to find sites that discuss topics that relate to the query (as web search works today)
Relation to Web Search Web search is perceived to work better today than two years ago. Why? Finds appropriate starting points Also known as source selection Search for “toyota” no longer returns “Tony’s Toyota pages” as the top-ranked hit Before the web, source selection was a separate operation from free text search Also, queries tended to be longer Web search engines could do this exclusively – but they do other things as well.
Recommendations on Klensin Proposal A promising, intriguing approach One tweak: Combine layers 2 and 3 Have a partly regulated portion, and an open portion This however is susceptible to spamming Not clear if this should be regulated
General Pitfalls of Controlled Vocabularies Difficult to get agreement on the set of labels Difficult to assign labels consistently Granularity Salience / Emphasis Context Connotations New labels always appearing; old ones shift in meaning Lay people won’t know the system
The Wron How to do it wrong Force into a Hierarchy Let’s try to find UCB
The Wron How to do it wrong
The Wron How to do it wrong
What is the problem? Two deeply hierarchical facets Region Education Forced in convoluted ways into one hierarchy with irregular cross links
Two Approaches Statistical approaches map words into metadata terms Create flexible user interfaces that progressively reveal appropriate subparts of the system (How to do so is a topic of our research.)
The Practice Using descriptors “under the hood” The limited empirical work indicates Combining free text + descriptors works best Some e-commerce sites do this for finding products Can sometimes match queries to standard information needs “buy” + “palm” “review” + “crouching tiger” “berkeley” + “gap”
The Wron walmart.com Uses metadata “under the hood”
The Promise Using descriptors in the User Interface Use faceted metadata for navigation Query Previews Tailored Search Forms Tightly Combine Navigation & Search
Facets Orthogonal sets of descriptors Gets complicated when they are hierarchical Example: recipes
Metadata Facets Time/DateTopicTaskGeoRegion Advantage: Great for Mixing and Matching
Faceted Recipe Metadata PrepareCuisineIngredientDish Recipe
The Wron Sunset.com Not the right way
Dynamic Previews Avoid empty results sets Show the possible next steps A way to seamlessly integrate Related topics User preferences (personalization) Context-sensitivity
The Wron
Metadata Usage in Epicurious Can choose category types in any order But categories never more than one level deep And can never use more than one instance of a category Even though items may be assigned more than one of each category type Items (recipes) are dead-ends Don’t link to “more like this” Not fully integrated with search
The Wron Epicurious Metadata Usage Problem: lacks integration with search
The Wron This is fixed in marthastewart.com
The Wron Advanced search more specific than sunset.com; also allows for disjunction; thus less likely to get null results
UIs for faceted metadata Use dynamic previews Allow user to select metadata in any order At each step, show different types of relevant metadata, based on prior steps and personal history, include # of documents Previews restricted to only those metadata types that might be helpful Tightly integrate with keyword search
The Flamenco Research Project Systematically determine what works for integrating metadata into search interfaces Develop recommendations that reflect both the task structure and the richness of the information structure
Summary Agreement on metadata descriptors assignment is difficult to achieve Descriptors need to be constantly updated Layer 2 is probably not rich enough Assigning specifiers is quite different than searching for specified items Fuzzy search can help, but Requires a UI for confirmation of correct choices This will end up looking like a search service Can make search more meaningful and task-based
Summary Web search engines can do source selection, but Sometimes users do want source selection, But often search hits based on content of pages is often closer to what users want to do We need to be certain not to confuse source selection from content search