Download presentation
Presentation is loading. Please wait.
Published byKristian Blankenship Modified over 6 years ago
1
NLP Support for Faceted Navigation in Scholarly Collections
ACL’09 Workshop on NLP for Scholarly Collections Marti Hearst and Emilia Stoica Presented by Preslav Nakov
2
Motivation Faceted navigation is now standard for “vertical” content collections e-commerce stores image collections It is also being used for digital libraries WorldCat, NCSU, Chicago Problem: the facets for the SUBJECT facet need to be richer. How to automatically create these facets? Our solution: CastaNet applied to scholarly collections
3
Outline Definition of faceted metadata
Examples of faceted navigation in use Castanet: an algorithm for (semi) automatic creation of facet hierarchies Application of Castanet to a scholarly collection
4
The Idea of Facets Facets are a way of labeling data
A kind of Metadata (data about data) Can be thought of as properties of items Facets vs. Categories Items are placed INTO a category system Multiple facet labels are ASSIGNED TO items
5
The Idea of Facets Create INDEPENDENT categories (facets)
Each facet has labels (sometimes arranged in a hierarchy) Assign labels from the facets to every item Example: recipe collection Ingredient Cooking Method Chicken Stir-fry Bell Pepper Curry Course Cuisine Main Course Thai
6
The Idea of Facets Break out all the important concepts into their own facets Sometimes the facets are hierarchical Assign labels to items from any level of the hierarchy Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sorbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple
7
Using Facets Now there are multiple ways to get to each item
Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sherbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple Fruit > Pineapple Dessert > Cake Preparation > Bake Dessert > Dairy > Sherbet Fruit > Berries > Strawberries Preparation > Freeze
8
Faceted navigation’s advantages:
Integrate browsing and searching seamlessly Support exploration and learning Avoid dead-ends, “pogo’ing”, and “lostness”
9
Uses of Faceted Navigation in Online Digital Libraries
10
WorldCat
11
WorldCat
12
U Chicago
13
U Chicago
14
Advantages of Facets Can’t end up with empty results sets
(except with keyword search) Helps avoid feelings of being lost. Easier to explore the collection. Helps users infer what kinds of things are in the collection. Evokes a feeling of “browsing the shelves” Is preferred over standard search for collection browsing in usability studies. (Interface must be designed properly)
15
Limitation of Facets Do not naturally capture MAIN THEMES
Facets do not show RELATIONS explicitly Aquamarine Red Orange Door Doorway Wall Which color associated with which object? Photo by J. Hearst, jhearst.typepad.com
16
Usability Studies (using Flamenco)
Usability studies done on 3 collections: Recipes (epicurious): 13,000 items Architecture Images: 40,000 items Fine Arts Images: 35,000 items Conclusions: Users like and are successful with the dynamic faceted hierarchical metadata, especially for browsing tasks Very positive results, in contrast with studies on earlier iterations.
17
How to Create Facet Hierarchies?
Our Approach: Castanet
18
Biomedical Journal Titles (3275 Titles)
"Journal of clinical hypertension" American journal of hypertension : journal of the American Society of Hypertension Hypertension in pregnancy : official journal of the International Society for the Study of Hypertension in Pregnancy Journal of interventional cardiac electrophysiology : an international journal of arrhythmias and pacing Heart failure reviews Hypertension research : official journal of the Japanese Society of Hypertension Current hypertension reports European journal of heart failure : journal of the Working Group on Heart Failure of the European Society of Cardiology "Congestive heart failure (Greenwich, Conn.)" "Clinical and experimental hypertension (New York, N.Y. : 1993)" Hypertension Journal of human hypertension
19
Castanet Output (Bio titles)
20
Castanet Output (Bio titles)
21
Castanet Output (LibraryThing tags)
22
Castanet Output (LibraryThing Tags)
23
Castanet Output (LibraryThing Tags)
24
Our Approach: Leverage the structure of WordNet
25
Our Approach Leverage the structure of WordNet Documents Select terms
Build tree Compress tree WordNet Get hypernym paths Divide into facets
26
1. Select Terms Select well distributed terms from collection red blue
Documents Select terms Build tree Comp. tree Get hypernym paths WordNet red blue
27
2. Get Hypernym Path chromatic color abstraction property
Documents WordNet Get hypernym paths Select terms Build tree Comp. chromatic color abstraction property visual property color red, redness abstraction property visual property color blue, blueness chromatic color red blue
28
3. Build Tree red blue abstraction property visual property color
Documents Select terms Build tree Comp. tree Get hypernym paths WordNet red blue abstraction property visual property color red, redness chromatic color blue, blueness chromatic color abstraction property visual property color red, redness blue, blueness red blue
29
4. Compress Tree color green red color chromatic color blue
Documents Select terms Build tree Comp. tree Get hypernym paths WordNet color green red color chromatic color blue chromatic color red, redness blue, blueness green, greenness red blue green
30
4. Compress Tree (cont.) color color chromatic color red blue green
Documents Select terms Build tree Comp. tree Get hypernym paths WordNet color color chromatic color red blue green red blue green
31
5. Divide into Facets Divide into facets
32
Disambiguation Ambiguity in: Word senses Paths up the hypernym tree
Sense 1 for word “tuna” organism, being => plant, flora => vascular plant => succulent => cactus => tuna Sense 2 for word “tuna” => fish => food fish => bony fish => spiny-finned fish => percoid fish 2 paths for same word 2 paths for same sense
33
How to Select the Right Senses and Paths?
First: build core tree (1) Create paths for words with only one sense (2) Use Domains Wordnet has 212 Domains medicine, mathematics, biology, chemistry, linguistics, soccer, etc. Automatically scan the collection to see which domains apply The user selects which of the suggested domains to use or may add own Paths for terms that match the selected domains are added to the core tree Then: add remaining terms to the core tree.
34
Using Domains dip glosses:
Sense 1: A depression in an otherwise level surface Sense 2: The angle that a magnet needle makes with horizon Sense 3: Tasty mixture into which bite-size foods are dipped dip hypernyms Sense Sense Sense 3 solid shape, form food => concave shape => space => ingredient, fixings => depression => angle => flavorer Given domain “food”, choose sense 3
35
Castanet Evaluation
36
Castanet Evaluation This is a tool for information architects, so people of this type did the evaluation We compared output on Recipes Biomedical journal titles We compared to two state-of-the-art algorithms LDA (Blei et al. 04) Subsumption (Sanderson & Croft ’99)
37
Subsumption Output (Bio titles)
38
Subsumption Output (Bio titles)
39
LDA Output (Bio titles)
40
LDA Output (Bio titles)
41
Evaluation Method Information architects assessed the category systems
For each of 2 systems’ output: Examined and commented on top-level Examined and commented on two sub-levels Then comment on overall properties Meaningful? Systematic? Likely to use in your work?
42
Evaluation Results (Bio titles)
15 participants, all PubMed Users Results for “Would you use this system in your work?” Answering “Yes in some cases” or “yes definitely” Pine (Castanet): /15 Oak (LDA): /7 Birch (Subsumption): 1/8
43
Evaluation Results (recipes)
Results on recipes collection for “Would you use this system in your work?” Yes in some cases or yes definitely: Pine (Castanet): /34 Oak (LDA): /18 Birch (Subsumption): 6/16 Results on quality of categories:
44
Conclusions Flexible application of hierarchical faceted metadata is a proven approach for navigating scholarly collections. Midway in complexity between simple hierarchies and deep knowledge representation. Currently in use in digital library sites, but the SUBJECT categories need more work. Algorithms are needed to help create faceted metadata structures Our WordNet-based algorithm, while not perfect, provides a good starting point for scholarly collections
45
For more information: flamenco.berkeley.edu
Thank you! Preslav Nakov, Marti Hearst & Emilia Stoica
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.