Nearly-Automated Metadata Hierarchy Creation Emilia Stoica and Marti Hearst SIMS University of California, Berkeley.

Slides:



Advertisements
Similar presentations
Clustering Art & Learning the Semantics of Words and Pictures Manigantan Sethuraman.
Advertisements

Visual Scripting of XML
An adaptive hierarchical questionnaire based on the Index of Learning Styles Alvaro Ortigosa, Pedro Paredes, Pilar Rodriguez Universidad Autónoma de Madrid.
Automating Creation of Hierarchical Faceted Metadata Structures Emilia Stoica, Marti Hearst and Megan Richardson* School of Information, Berkeley *Dept.
Semi-Automated Creation of Facet Hierarchies Marti Hearst School of Information, UC Berkeley Joint work with Dr. Emilia Stoica.
Castanet: Using WordNet to Build Facet Hierarchies Emilia Stoica and Marti Hearst School of Information, Berkeley.
Aki Hecht Seminar in Databases (236826) January 2009
Measuring Information Architecture CHI 01 Panel Position Statement Marti Hearst UC Berkeley.
 Mark Sanderson, University of Sheffield University of Sheffield CIIR, University of Massachusetts Deriving concept hierarchies from text Mark Sanderson,
Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller.
Faceted Metadata in Search Interfaces Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
Cone Trees and Collapsible Cylindrical Trees
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Castanet: Using WordNet to Build Facet Hierarchies Emilia Stoica and Marti Hearst School of Information, Berkeley.
Semi-Automated Creation of Facet Hierarchies Marti Hearst School of Information, UC Berkeley Joint work with Dr. Emilia Stoica.
A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.
Yahoo Visit Day Joint Reseach Opportunities Marti Hearst UC Berkeley School of Information.
Faceted Metadata in Search Interfaces Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
Faceted Metadata in Search Interfaces Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
Trees Chapter 25 Slides by Steve Armstrong LeTourneau University Longview, TX  2007,  Prentice Hall.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Measuring Information Architecture Marti Hearst UC Berkeley.
Faceted Metadata in Search Interfaces Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
Transforming Tags to (Faceted) Tagsonomies Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
1 Prototype Hierarchy Based Clustering for the Categorization and Navigation of Web Collections Zhao-Yan Ming, Kai Wang and Tat-Seng Chua School of Computing,
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Navigating and Browsing 3D Models in 3DLIB Hesham Anan, Kurt Maly, Mohammad Zubair Computer Science Dept. Old Dominion University, Norfolk, VA, (anan,
Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.
PowerPoint How To: Editing Literature to Incorporate Mathematic Concepts Created by: Selena Empey.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
ADVANCED FUNCTIONS PRATICE. MANAGING THE WG CONTENT WG manager and admin user can : Edit WG home page Broadcast members Create page => matching link is.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
ILSA Home Ontology CHOP An Introduction to CHOP: the Common Home Ontology in Protege Steve Harp and John Phelps Honeywell Laboratories November, 2001.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
How can Search Interfaces Enhance the Value of Semantic Annotations (and Vice Versa?) Keynote Talk ESAIR’13: Sixth International Workshop on Exploiting.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Recuperação de Informação B Cap. 10: User Interfaces and Visualization , , 10.9 November 29, 1999.
WDO-It! 101 Workshop: Creating an abstraction of a process UTEP’s Trust Laboratory NDR HP MP.
Algorithmic Detection of Semantic Similarity WWW 2005.
Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.
Objective Students will add, subtract, multiply, divide, and simplify radicals.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
1 CompSci 105 SS 2006 Principles of Computer Science Lecture 17: Heaps cont.
Learning Taxonomic Relations from Heterogeneous Evidence Philipp Cimiano Aleksander Pivk Lars Schmidt-Thieme Steffen Staab (ECAI 2004)
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
I NTERFACES, A BSTRACT C LASSES, AND D ATA S TRUCTURES.
Web Site Development - Process of planning and creating a website.
Gennette Gill Montek Singh Bottleneck Analysis and Alleviation in Pipelined Systems: A Fast Hierarchical Approach Univ. of North Carolina Chapel Hill,
Houses of Mirrors: Deeply Adaptive Designs for Machine Cognition Deborah Duong, Michael Ross.
1 The tree data structure Outline In this topic, we will cover: –Definition of a tree data structure and its components –Concepts of: Root, internal, and.
Identification of Classes. Object Oriented Analysis (OOA) OOA is process by which we identify classes that play role in achieving system goals & requirements.
XP New Perspectives on Macromedia Dreamweaver MX 2004 Tutorial 5 1 Adding Shared Site Elements.
Build a Virtual Field Trip Beth Breiner Coordinator of Educational Technology Parkland School District.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
B/B+ Trees 4.7.
NLP Support for Faceted Navigation in Scholarly Collections
Conceptual Modeling.
Introduction to Data Structure
(edited by Nadia Al-Ghreimil)
Category-Based Pseudowords
Part of the Multilingual Web-LT Program
Merge Sort 11/28/2018 2:21 AM The Greedy Method The Greedy Method.
Presentation transcript:

Nearly-Automated Metadata Hierarchy Creation Emilia Stoica and Marti Hearst SIMS University of California, Berkeley

Motivation Want to assign items labels from multiple hierarchies

Motivation Description: 19th c. paint horse; saddle and hackamore; spurs; bandana on rider; old time cowboy hat; underchin thong; flying off. Nature Animal Mammal Horse Occupations Cowboy Clothing Hats Cowboy Hat Media Engraving Wood Eng. Location North America America

Use in Browsing Interfaces like Flamenco

How to Obtain the Hierarchies? Goal: Help an information architect get started Currently they do it all by hand! Assume they will do some editing Nearly automated Multiple hierarchies (facets) Automatically assign items to multiple hierarchies

Related Work Automated text categorization LOTS of work on this Assumes that a set of categories is already created To be intuitive, a categorization should contain sets of IS-A relations (hierarchical) Rosenfeld and Morville, (2002) Pratt, Hearst, and Fagan (1999) Current automated approaches contain only associative relations

Examples of Associative Relations Hofmann 1999 Collection: Machine learning abstracts Top-level categories: learn, paper, base, model, new train Problem: These are not intuitive categories for machine learning Sanderson and Croft 1999 Collection: Medical texts Top level categories: disease, post polio, serious disease, dengue, infection control, immunology, … Problem: These are at different levels of generality

Examples of Associative Relations Schuetze 1993 Collection: Arts descriptions Sample Groupings: carriage cart horse ride walk passing horseback wagon men chicken rider bald balding head facing hand faced arm hat haired glove long Problem: Terms are associated with one another, but are not organized into hierarchies that can be navigated.

Our Approach Leverage the structure of WordNet Documents WordNet Get hypernym paths Select terms Build tree Compress tree

1. Select Terms red blue Select well distributed terms from collection Documents WordNet Get hypernym paths Select terms Build tree Comp. tree

2. Get Hypernym Path red blue chromatic color abstraction property visual property color red, redness abstraction property visual property color blue, blueness chromatic color Get hypernym path for each term Documents WordNet Get hypernym paths Select terms Build tree Comp. tree

3. Build Tree red blue chromatic color abstraction property visual property color red, redness abstraction property visual property color blue, blueness chromatic color red blue abstraction property visual property color red, redness chromatic color blue, blueness Documents WordNet Get hypernym paths Select terms Build tree Comp. tree Merge hypernym paths to build a tree

4. Compress Tree Documents WordNet Get hypernym paths Select terms Build tree Comp. tree Eliminate a parent with fewer than n children unless it is the root or its distribution is larger than 0.1*max dist red, redness color red chromatic color blue, blueness blue green, greenness green red color chromatic color blue

4. Compress Tree (cont.) Eliminate a child whose name appears within parent’s red color chromatic color blue green color redbluegreen Documents WordNet Get hypernym paths Select terms Build tree Comp. tree

5. Remove top Levels Top levels of WordNet are too general, e.g. Entity Substance, matter Abstraction

Disambiguation Ambiguity in: Word senses Paths up the hypernym tree Sense 1 for word “tuna” organism, being => plant, flora => vascular plant => succulent => cactus => tuna Sense 2 for word “tuna” organism, being => fish => food fish => tuna => bony fish => spiny-finned fish => percoid fish => tuna 2 paths for same word2 paths for same sense

How to Select the Right Senses and Paths? (This part is not in the paper.) Solution: Modify the algorithm First: build core tree (1) Create paths for words with only one sense (2) Use Domains Wordnet has 212 Domains  medicine, mathematics, biology, chemistry, linguistics, soccer, etc. Automatically scan the collection to see which domains apply The user selects which of the suggested domains to use or he may add his own Paths for terms that match the selected domains are added to the core tree Then: add remaining terms to the core tree.

Using Domains dip glosses: Sense 1: A depression in an otherwise level surface Sense 2: The angle that a magnet needle makes with horizon Sense 3: Tasty mixture into which bite-size foods are dipped dip hypernyms Sense 1 Sense 2 Sense 3 solid shape, form food => concave shape => space => ingredient, fixings => depression => angle => flavorer Given domain “food”, choose sense 3

Enrich Core Tree For each new term t Q(t)  0 ; // set of candidate paths for each path p of t compute the fraction f p (t) of nodes in p that are shared with a path in the core tree if ( f p (t) > thresh )  Q(t) = Q(t) U {p} if ( Q(t) = {} ) chose first sense of t else among all p ’s in Q(t), chose path in core tree with most items assigned

Enrich Core Tree entity entity substance, matter object food, nutrient artifact nutriment instrumentality dish device fondue, fondu conductor semiconductor diode light-emitting diode (led) Core tree Toaster with led indicators Chip (p1) Chip (p2) entity entity substance,matter object food, nutrient artifact nutriment instrumentality dish device snack food conductor chip semiconductor chip

Enrich Core Tree entity entity entity entity substance, matter object substance,matter object food, nutrient artifact food, nutrient artifact nutriment instrumentality nutriment instrumentality dish device dish device fondue, fondu conductor snack food conductor semiconductor chip semiconductor diode chip light-emitting diode (led) Core tree Chip (p1) Chip (p2) f p1 (Chip) = 5/7 Q = {p1}

Enrich Core Tree entity entity entity entity substance, matter object substance,matter object food, nutrient artifact food, nutrient artifact nutriment instrumentality nutriment instrumentality dish device dish device fondue, fondu conductor snack food conductor semiconductor chip semiconductor diode chip light-emitting diode (led) Core tree Chip (p1) Chip (p2) f p1 (Chip) = 5/7 f p2 (Chip) = 7/8 Q = {p1, p2}

Enrich Core Tree (cont’d) entity entity substance, matter object food, nutrient artifact nutriment instrumentality dish (1699) device fondue, fondu (40) conductor semiconductor (45) diode light-emitting diode (led) Core tree snack food chip Chose this path since it has more items assigned chip

Results on a Recipes/ Kitchen Appliances Data Set

Discussion This is very simple, but works very well Why hasn’t this been done before? Because WordNet did not have enough coverage?

Conclusions Can nearly-automatically build a set of hierarchies by finding IS-A relations between terms using WordNet The method has been tested on various domains: medicine, mathematics, recipes, news, arts User study in progress Limitations: The ontology has to be appropriate for the target domain No disambiguation between nouns, verbs, and adjectives