Semi-Automated Creation of Facet Hierarchies Marti Hearst School of Information, UC Berkeley Joint work with Dr. Emilia Stoica.

Slides:



Advertisements
Similar presentations
Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003.
Advertisements

Automating Creation of Hierarchical Faceted Metadata Structures Emilia Stoica, Marti Hearst and Megan Richardson* School of Information, Berkeley *Dept.
Information retrieval mon jan data…. framework for today’s lecture…
Faceted Metadata for Information Architecture and Search CHI 2007 Course Notes Session I Marti Hearst, School of Information, UC Berkeley Preston Smalley.
1 Using Words to Search a Thousand Images Hierarchical Faceted Metadata in Search & Browsing Marti Hearst SIMS, UC Berkeley Research funded by: NSF CAREER.
Semi-Automated Creation of Facet Hierarchies Marti Hearst School of Information, UC Berkeley Joint work with Dr. Emilia Stoica.
Castanet: Using WordNet to Build Facet Hierarchies Emilia Stoica and Marti Hearst School of Information, Berkeley.
MaNIS Interface Project Mayjane Co Denise Green Jane Lee Rebecca Shapley.
Measuring Information Architecture CHI 01 Panel Position Statement Marti Hearst UC Berkeley.
1 Ideas for Integrating Browsing and Search in the CDL Marti Hearst SIMS, UC Berkeley
Faceted Metadata for Site Navigation and Search Marti Hearst 12/17/2009.
Social Tagging and Search Marti Hearst UC Berkeley.
Faceted Metadata for Information Architecture and Search CHI Course - April 24, 2006 Session I Marti Hearst, School of Information, UC Berkeley Preston.
Nearly-Automated Metadata Hierarchy Creation Emilia Stoica and Marti Hearst SIMS University of California, Berkeley.
Faceted Metadata in Search Interfaces Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
Thoughts on Social Tagging Marti Hearst UC Berkeley Taxonomy Bootcamp ’07 Keynote Talk.
1 Flexible Search and Navigation using Faceted Metadata Prof. Marti Hearst Dr. Rashmi Sinha, Ame Elliott, Jennifer English, Kirsten Swearingen, Ping Yee.
Measuring Information Architecture Marti Hearst UC Berkeley.
Castanet: Using WordNet to Build Facet Hierarchies Emilia Stoica and Marti Hearst School of Information, Berkeley.
Measuring Information Architecture Marti Hearst UC Berkeley.
A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.
Yahoo Visit Day Joint Reseach Opportunities Marti Hearst UC Berkeley School of Information.
Best Practices for Search for the Federal Government Marti Hearst Web Manager University November 10, 2009.
Faceted Metadata in Search Interfaces Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
Faceted Metadata in Search Interfaces Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
Using Metadata to Improve Search User Interfaces Marti Hearst UC Berkeley FLINT Workshop, August 2001.
Next Generation OPACs Kat Hagedorn Scott Martin Jake Glenn July 12, 2007.
1 i247: Information Visualization and Presentation Marti Hearst April 7, 2008.
Faceted Metadata for Information Architecture and Search Marti Hearst, SIMS at UC Berkeley Preston Smalley & Corey Chandler, eBay User Experience & Design.
Some Thoughts on Tagging Marti Hearst UC Berkeley.
Facets of a Metaproject: a case in human interface design research Human Factors and Interface Design Ransom Byers April 25, 2005.
Thoughts on Tagging & Search Marti Hearst UC Berkeley.
Usability of Grouping of Retrieval Results Marti Hearst School of Information, UC Berkeley September 1, 2006.
Faceted Metadata in Image Search & Browsing Using Words to Browse a Thousand Images Ka-Ping Yee, Kirsten Swearingen, Kevin Li, Marti Hearst Group for User.
UIs for Faceted Navigation Recent Advances and Remaining Open Problems HCIR’08 Marti Hearst, UC Berkeley (including some slides from Corey Chandler of.
Measuring Information Architecture Marti Hearst UC Berkeley.
SIMS 213: User Interface Design & Development Marti Hearst Thurs, March 18, 2004.
Faceted Metadata in Search Interfaces Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
Transforming Tags to (Faceted) Tagsonomies Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
1 Flexible Search and Navigation using Faceted Metadata Prof. Marti Hearst University of California, Berkeley Search Engines Meeting, April 2002 Research.
MaNIS Interface Project Mayjane Co Denise Green Jane Lee Rebecca Shapley.
Some Thoughts on Tagging Marti Hearst UC Berkeley.
Considering a Faceted Search-based Model Marti Hearst UCB SIMS NAS CSTB DNS Meeting on Internet Navigation and the Domain Name.
1 Using Words to Search a Thousand Images Hierarchical Faceted Metadata in Search & Browsing Marti Hearst SIMS, UC Berkeley Research funded by: NSF CAREER.
Faceted Metadata for Information Architecture and Search CHI Course - April 24, 2006 Session I Marti Hearst, School of Information, UC Berkeley Preston.
SIMS 213: User Interface Design & Development Marti Hearst Thurs, March 14, 2002.
Information retrieval thur jan data…. framework for today’s lecture…
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Information retrieval wed sept data…. -start at 6.45.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
How can Search Interfaces Enhance the Value of Semantic Annotations (and Vice Versa?) Keynote Talk ESAIR’13: Sixth International Workshop on Exploiting.
Information Retrieval Effectiveness of Folksonomies on the World Wide Web P. Jason Morrison.
Recuperação de Informação B Cap. 10: User Interfaces and Visualization , , 10.9 November 29, 1999.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Module 10a: Display and Arrangement IMT530: Organization of Information Resources Winter, 2008 Michael Crandall.
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Information Architecture Strategy Recommendation Highlights Presented by Cord Woodruff, Ph.D. September 5, 2001.
Collaborative Query Previews in Digital Libraries Lin Fu, Dion Goh, Schubert Foo Division of Information Studies School of Communication and Information.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
User Characterization in Search Personalization
NLP Support for Faceted Navigation in Scholarly Collections
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
The Use of Facets in Web Search Engines
Document Clustering Matt Hughes.
Magnet & /facet Zheng Liang
Incorporating Metadata into Search User Interfaces
Presentation transcript:

Semi-Automated Creation of Facet Hierarchies Marti Hearst School of Information, UC Berkeley Joint work with Dr. Emilia Stoica

Marti Hearst, Taxonomy Bootcamp ‘06 Outline  Faceted Metadata  Definition  Advantages  Flamenco:  Search Interface Design using Faceted Metadata  Castanet:  (Semi) Automated Tool for Creation of Category Systems  Comparison to State-of-the-Art Alternatives  Conclusions

Marti Hearst, Taxonomy Bootcamp ‘06 Focus: Search and Navigation of Large Collections Image Collections E-Government Sites Shopping Sites Digital Libraries

Marti Hearst, Taxonomy Bootcamp ‘06  Study by Vividence in 2001 on 69 Sites  70% eCommerce  31% Service  21% Content  2% Community  Poorly organized search results  Frustration and wasted time  Poor information architecture  Confusion  Dead ends  "back and forthing"  Forced to search Problems with Site Search

Marti Hearst, Taxonomy Bootcamp ‘06 What we want to Achieve  Integrate browsing and searching seamlessly  Support exploration and learning  Avoid dead-ends, “pogo’ing”, and “lostness”

Marti Hearst, Taxonomy Bootcamp ‘06 Main Idea  Use hierarchical faceted metadata  Design the interface to:  Allow flexible navigation  Provide previews of next steps  Organize results in a meaningful way  Support both expanding and refining the search

Marti Hearst, Taxonomy Bootcamp ‘06 The Problem With Hierarchy  Most things can be classified in more than one way.  Most organizational systems do not handle this well.  Example: Animal Classification otter penguin robin salmon wolf cobra bat Skin Covering Locomotion Diet robin bat wolf penguin otter, seal salmon robin bat salmon wolf cobra otter penguin seal robin penguin salmon cobra bat otter wolf

Marti Hearst, Taxonomy Bootcamp ‘06  Inflexible  Force the user to start with a particular category  What if I don’t know the animal’s diet, but the interface makes me start with that category?  Wasteful  Have to repeat combinations of categories  Makes for extra clicking and extra coding  Difficult to modify  To add a new category type, must duplicate it everywhere or change things everywhere The Problem with Hierarchy

Marti Hearst, Taxonomy Bootcamp ‘06 The Problem With Hierarchy start furscalesfeathers swimflyrun slither furscalesfeathersfurscalesfeathers fish rodents insects fish rodents insects fish rodents insects fish rodents insects fish rodents insects fish rodents insects fish rodents insects fish rodents insects fish rodents insects salmonbatrobinwolf …

Marti Hearst, Taxonomy Bootcamp ‘06 The Idea of Facets  Facets are a way of labeling data  A kind of Metadata (data about data)  Can be thought of as properties of items  Facets vs. Categories  Items are placed INTO a category system  Multiple facet labels are ASSIGNED TO items

Marti Hearst, Taxonomy Bootcamp ‘06 The Idea of Facets  Create INDEPENDENT categories (facets)  Each facet has labels (sometimes arranged in a hierarchy)  Assign labels from the facets to every item  Example: recipe collection Course Main Course Cooking Method Stir-fry Cuisine Thai Ingredient Bell Pepper Curry Chicken

Marti Hearst, Taxonomy Bootcamp ‘06 The Idea of Facets  Break out all the important concepts into their own facets  Sometimes the facets are hierarchical  Assign labels to items from any level of the hierarchy Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sorbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple

Marti Hearst, Taxonomy Bootcamp ‘06 Using Facets  Now there are multiple ways to get to each item Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sherbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple Fruit > Pineapple Dessert > Cake Preparation > Bake Dessert > Dairy > Sherbet Fruit > Berries > Strawberries Preparation > Freeze

Marti Hearst, Taxonomy Bootcamp ‘06 Example: Nobel Prize Winners Collection (Before and After Facets)

Marti Hearst, Taxonomy Bootcamp ‘06 Only One Way to View Laureates

Marti Hearst, Taxonomy Bootcamp ‘06 First, Choose Prize Type

Marti Hearst, Taxonomy Bootcamp ‘06 Next, view the list! The user must first choose an Award type (literature), then browse through the laureates in chronological order. No choice is given to, say organize by year and then award, or by country, then decade, then award, etc.

Marti Hearst, Taxonomy Bootcamp ‘06 Flamenco Interface: Using Hierarchical Faceted Metadata

Marti Hearst, Taxonomy Bootcamp ‘06 Opening View Select literature from PRIZE facet

Marti Hearst, Taxonomy Bootcamp ‘06 Group results by YEAR facet

Marti Hearst, Taxonomy Bootcamp ‘06 Select 1920’s from YEAR facet

Marti Hearst, Taxonomy Bootcamp ‘06 Current query is PRIZE > literature AND YEAR: 1920’s. Now remove PRIZE > literature

Marti Hearst, Taxonomy Bootcamp ‘06 Now Group By YEAR > 1920’s

Marti Hearst, Taxonomy Bootcamp ‘06 Hierarchy Traversal: Group By YEAR > 1920’s, and drill down to 1921

Marti Hearst, Taxonomy Bootcamp ‘06 Select an individual item

Marti Hearst, Taxonomy Bootcamp ‘06 Use Endgame to expand out

Marti Hearst, Taxonomy Bootcamp ‘06 Use Endgame to expand out

Marti Hearst, Taxonomy Bootcamp ‘06 Or use “More like this” to find similar items

Marti Hearst, Taxonomy Bootcamp ‘06 Start a new search using keyword “California”

Marti Hearst, Taxonomy Bootcamp ‘06 Note that category structure remains after the keyword search

Marti Hearst, Taxonomy Bootcamp ‘06 The query is now a keyword ANDed with a facet subhierarchy

Marti Hearst, Taxonomy Bootcamp ‘06 Using Facets  The system only shows the labels that correspond to the current set of items  Start with all items and all facets  The user then selects a label within a facet  This reduces the set of items (only those that have been assigned to the subcategory label are displayed)  This also eliminates some subcategories from the view.

Marti Hearst, Taxonomy Bootcamp ‘06 Advantages of Facets  Can’t end up with empty results sets  (except with keyword search)  Helps avoid feelings of being lost.  Easier to explore the collection.  Helps users infer what kinds of things are in the collection.  Evokes a feeling of “browsing the shelves”  Is preferred over standard search for collection browsing in usability studies.  (Interface must be designed properly)

Marti Hearst, Taxonomy Bootcamp ‘06 Advantages of Facets  Seamless to add new facets and subcategories  Seamless to add new items.  Helps with “categorization wars”  Don’t have to agree exactly where to place something  Interaction can be implemented using a standard relational database.  May be easier for automatic categorization

Marti Hearst, Taxonomy Bootcamp ‘06 Information previews  Use the metadata to show where to go next  More flexible than canned hyperlinks  Less complex than full search  Help users see and return to previous steps  Reduces mental work  Recognition over recall  Suggests alternatives  More clicks are ok only if (J. Spool)  The “scent” of the target does not weaken  If users feel they are going towards, rather than away, from their target.

Marti Hearst, Taxonomy Bootcamp ‘06 Facets vs. Hierarchy  Early Flamenco studies compared allowing multiple hierarchical facets vs. just one facet.  Multiple facets was preferred and more successful.

Marti Hearst, Taxonomy Bootcamp ‘06 Limitation of Facets  Do not naturally capture MAIN THEMES  Facets do not show RELATIONS explicitly Aquamarine Red Orange Door Doorway Wall  Which color associated with which object? Photo by J. Hearst, jhearst.typepad.com

Marti Hearst, Taxonomy Bootcamp ‘06 Terminology Clarification  Facets vs. Attributes  Facets are shown independently in the interface  Attributes just associated with individual items  E.g., ID number, Source, Affiliation  However, can always convert an attribute to a facet  Facets vs. Labels  Labels are the names used within facets  These are organized into subhierarchies  Synonyms  There should be alternate names for the category labels  Currently (in Flamenco) this is done with subcategories  E.g., Deer has subcategories “stag”, “fawn”, “doe”

Marti Hearst, Taxonomy Bootcamp ‘06 Usability Study Results

Marti Hearst, Taxonomy Bootcamp ‘06 Flamenco Usability Studies  Usability studies done on 3 collections:  Recipes (epicurious): 13,000 items  Architecture Images: 40,000 items  Fine Arts Images: 35,000 items  Conclusions:  Users like and are successful with the dynamic faceted hierarchical metadata, especially for browsing tasks  Very positive results, in contrast with studies on earlier iterations.

Marti Hearst, Taxonomy Bootcamp ‘06 Most Recent Usability Study  Participants & Collection  32 Art History Students  ~35,000 images from SF Fine Arts Museum  Study Design  Within-subjects  Each participant sees both interfaces  Balanced in terms of order and tasks  Participants assess each interface after use  Afterwards they compare them directly  Data recorded in behavior logs, server logs, paper-surveys; one or two experienced testers at each trial.  Used 9 point Likert scales.  Session took about 1.5 hours; pay was $15/hour

Marti Hearst, Taxonomy Bootcamp ‘06 Post-Interface Assessments All significant at p<.05 except “simple” and “overwhelming”

Marti Hearst, Taxonomy Bootcamp ‘06 Post-Test Comparison FacetedBaseline Overall Assessment More useful for your tasks Easiest to use Most flexible More likely to result in dead ends Helped you learn more Overall preference Find images of roses Find all works from a given period Find pictures by 2 artists in same media Which Interface Preferable For:

How to Create Facet Hierarchies? Our Approach: Castanet

Marti Hearst, Taxonomy Bootcamp ‘06 Example: Recipes (3500 docs)

Marti Hearst, Taxonomy Bootcamp ‘06 Castanet Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Castanet Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Castanet Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Castanet Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Castanet Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Our Approach: Leverage the structure of WordNet

Marti Hearst, Taxonomy Bootcamp ‘06 Our Approach  Leverage the structure of WordNet Documents WordNet Get hypernym paths Select terms Build tree Compress tree Divide into facets

Marti Hearst, Taxonomy Bootcamp ‘06 1. Select Terms red blue  Select well distributed terms from collection Documents WordNet Get hypernym paths Select terms Build tree Comp. tree

Marti Hearst, Taxonomy Bootcamp ‘06 2. Get Hypernym Path red blue chromatic color abstraction property visual property color red, redness abstraction property visual property color blue, blueness chromatic color Documents WordNet Get hypernym paths Select terms Build tree Comp. tree

Marti Hearst, Taxonomy Bootcamp ‘06 3. Build Tree red blue chromatic color abstraction property visual property color red, redness abstraction property visual property color blue, blueness chromatic color red blue abstraction property visual property color red, redness chromatic color blue, blueness Documents WordNet Get hypernym paths Select terms Build tree Comp. tree

Marti Hearst, Taxonomy Bootcamp ‘06 4. Compress Tree Documents WordNet Get hypernym paths Select terms Build tree Comp. tree red, redness color red chromatic color blue, blueness blue green, greenness green red color chromatic color blue

Marti Hearst, Taxonomy Bootcamp ‘06 4. Compress Tree (cont.) red color chromatic color blue green color redbluegreen Documents WordNet Get hypernym paths Select terms Build tree Comp. tree

Marti Hearst, Taxonomy Bootcamp ‘06 5. Divide into Facets Divide into facets

Marti Hearst, Taxonomy Bootcamp ‘06 Disambiguation  Ambiguity in:  Word senses  Paths up the hypernym tree Sense 1 for word “tuna” organism, being => plant, flora => vascular plant => succulent => cactus => tuna Sense 2 for word “tuna” organism, being => fish => food fish => tuna => bony fish => spiny-finned fish => percoid fish => tuna 2 paths for same word2 paths for same sense

Marti Hearst, Taxonomy Bootcamp ‘06 How to Select the Right Senses and Paths?  First: build core tree  (1) Create paths for words with only one sense  (2) Use Domains  Wordnet has 212 Domains  medicine, mathematics, biology, chemistry, linguistics, soccer, etc.  Automatically scan the collection to see which domains apply  The user selects which of the suggested domains to use or may add own  Paths for terms that match the selected domains are added to the core tree  Then: add remaining terms to the core tree.

Marti Hearst, Taxonomy Bootcamp ‘06 Using Domains dip glosses: Sense 1: A depression in an otherwise level surface Sense 2: The angle that a magnet needle makes with horizon Sense 3: Tasty mixture into which bite-size foods are dipped dip hypernyms Sense 1 Sense 2 Sense 3 solid shape, form food => concave shape => space => ingredient, fixings => depression => angle => flavorer Given domain “food”, choose sense 3

Castanet Evaluation

Marti Hearst, Taxonomy Bootcamp ‘06 Castanet Evaluation  This is a tool for information architects, so people of this type did the evaluation  We compared output on  Recipes  Biomedical journal titles  We compared to two state-of-the-art algorithms  LDA (Blei et al. 04)  Subsumption (Sanderson & Croft ’99)

Marti Hearst, Taxonomy Bootcamp ‘06 Subsumption Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Subsumption Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Subsumption Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Subsumption Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 LDA Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 LDA Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 LDA Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Evaluation Method  Information architects assessed the category systems  For each of 2 systems’ output:  Examined and commented on top-level  Examined and commented on two sub-levels  Then comment on overall properties  Meaningful?  Systematic?  Likely to use in your work?

Marti Hearst, Taxonomy Bootcamp ‘06 Evaluation Results  Results on recipes collection for “Would you use this system in your work?”  Yes in some cases or yes definitely:  Pine (Castanet): 29/34  Oak (LDA): 0/18  Birch (Subsumption): 6/16  Results on quality of categories:

Marti Hearst, Taxonomy Bootcamp ‘06 Opportunities for Tagging  New opportunity: Tagging, folksonomies  (flickr de.lici.ous)  People are created facets in a decentralized manner  They are assigning multiple facets to items  This is done on a massive scale  This leads naturally to meaningful associations

Marti Hearst, Taxonomy Bootcamp ‘06 Conclusions  Flexible application of hierarchical faceted metadata is a proven approach for navigating large information collections.  Midway in complexity between simple hierarchies and deep knowledge representation.  Currently in use on e-commerce sites; spreading to other domains  Systems are needed to help create faceted metadata structures  Our WordNet-based algorithm, while not perfect, seems like it will be a useful tool for Information Architects.

Marti Hearst, Taxonomy Bootcamp ‘06 Acknowledgements  Flamenco Team  Brycen Chun, Ame Elliott, Jennifer English, Kevin Li, Rashmi Sinha, Emilia Stoica, Kirsten Swearingen, Ka- Ping Yee  Castanet  Emilia Stoica  Funding  This work supported in part by NSF (IIS )

For more information: flamenco.berkeley.edu Thank you! Marti Hearst & Emilia Stoica