Download presentation
Presentation is loading. Please wait.
Published byEmery Foster Modified over 9 years ago
1
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata Anon Plangprasopchok 1, Kristina Lerman 1, Lise Getoor 2 1 USC Information Sciences Institute 2 Department of Computer Science, University of Maryland SIGKDD 2010 2010. 12. 17. Summarized and Presented by Kim Chung Rim, IDS Lab., Seoul National University
2
Copyright 2010 by CEBT Contents Introduction Goal Challenges Learning Folksonomies SAP: Growing a Tree Evaluation Conclusion & Discussion 2
3
Copyright 2010 by CEBT Introduction The social Web has changed the way people create and use information Users organize their content via tags Sites such as Flickr, Del.icio.us and YouTube allows users to tag their contents with descriptive keywords 3
4
Copyright 2010 by CEBT Social Metadata captures collective knowledge of Social Web users Once mined from many users, Such collective knowledge will add a rich semantic layer to the content of Social Web Could be used for information discovery 4
5
Copyright 2010 by CEBT Structured Social Metadata Some Social Websites allow users to organize tags hierarchically Delicious Users can group related tags into bundles Flickr Users can group related photos into sets and related sets into collections 5
6
Copyright 2010 by CEBT Folksonomy Folksonomy – a common taxonomy that is learned from social metadata. There exist predefined hierarchies such as WordNet or Linnaean classification system. Benefits of Folksonomy over already existing hierarchies represent collective agreement of many individuals Are relatively inexpensive to obtain Can adapt to evolving vocabularies and community’s information needs Directly tied to the annotated content 6
7
Copyright 2010 by CEBT Goal The goal of this paper is to Extract users’ knowledge (folk knowledge) from user-generated semantics To build up a tree of communal taxonomy 7 animal fish cat dog biggle jindo animal reptile cat animal fish cat dog biggle jindo reptile
8
Copyright 2010 by CEBT Challenges Sparseness Social metadata is very sparse – only 3.74 tags per photo Nosiy vocabulary Spelling errors and meaningless tags – not sure, mykid Ambiguity Individual tag is often ambiguous – jaguar 8
9
Copyright 2010 by CEBT Challenges Structural noise and conflicts Users use different term to build their own hierarchy Varying granularity level Users’ level of expertise and expressiveness differ 9
10
Copyright 2010 by CEBT Learning Folksonomies - intuition roots with similar term should be aggregated horizontally A Root and a leaf with similar term should be aggregated vertically 10 animal fish cat dog animal reptile cat animal fish cat dog reptile animal fish cat dog biggle jindo animal fish cat dog biggle jindo
11
Copyright 2010 by CEBT Learning Folksonomy Relational Clustering Two nodes should be clustered if their similarity is above a certain threshold Similarity is computed using local similarity & structural similarity – localSim(a,b) measures the similarity of node names and their tag distribution – structSim(a,b) measures the similarity of leaves – is a weight for adjusting contributions from localSim and structSim ( ) 11
12
Copyright 2010 by CEBT Learning Folksonomy - terms A personal hierarchy is called a sapling A sapling is composed of a root node and its leaves Tag statistic of a node x is defined as Where and are tag and its frequency is aggregated from all = {(canada, 1), (vacation, 1), … } = {(bc,1), (canada, 2), (vacation, 1), … } 12
13
Copyright 2010 by CEBT Local Similarity The local similarity of nodes a and b is composed of Name similarity Tag distribution similarity nameSim can be any string similarity metric, which returns a value from 0 to 1 tagSim can be any function for measuring the similarity of distributions – in this paper, tagSim counts the number of common tags n, in the top K tag of a and b. if n ≥ J (a certain threshold), it returns 1, else it returns n/J. 13
14
Copyright 2010 by CEBT Structural Similarity Compute structural similarity in two cases Similarity between two root nodes Similarity between a root node and a leaf node 14
15
Copyright 2010 by CEBT Root-to-Root Similarity Similarity based on How many of their children have common name The tag distribution similarity of those that do not have the same name – Where, – is an aggregation of tag distributions of all 15
16
Copyright 2010 by CEBT Root-to-Leaf Similarity Takes into account when a sapling contains the same leaf as the compared node When merging UK->{Scotland, Glasgow, Edinburgh, London} and Scotland->{Glasgow, Shetland}, it can be said that Scotland and UK are similar because UK contains a shortcut to Glasgow Measures overlap between siblings of and children of 16
17
Copyright 2010 by CEBT Relational Clustering When the nodeSim(a,b) exceeds a certain threshold, two nodes are clustered(merged) 17 animal fish cat dog animal reptile cat animal fish cat dog reptile
18
Copyright 2010 by CEBT SAP: Growing a Tree Using Incremental Relational Clustering, 18 canada victoria Horizontal aggregation Vertical aggregation
19
Copyright 2010 by CEBT Handling Problems Shortcuts Take the longer route Noisy vocabularies If a name is used by less than 1% of the user who contributed in the sapling, it is considered as noisy and discarded 19
20
Copyright 2010 by CEBT Evaluation 20,759 saplings from 7,121 users are selected from Flickr Values for several parameters are chosen 20 ParametersValue K40 J4 RR 0.1 LR 0.8 0.5
21
Copyright 2010 by CEBT Evaluation Methodologies Against the reference hierarchy (ODP) Taxonomic Overlap – measuring structure similarity between two trees. Measures how many ancestor and descendant nodes overlap to those in the reference tree for each node and take the harmonic mean (fmTO) Lexical Recall – measuring how well an approach can discover concepts in the reference hierarchy (LR) Structural evaluation Area Under Tree – a measurement to define how bushy and deep a tree is (AUT) Manual evaluation (for concepts not in ODP) Asking 5 judges whether the path is correct 21
22
Copyright 2010 by CEBT Evaluation metric - AUT AUT for below tree is: 1*(3+1)/2 + 1*(3+2)/2 = 4.5 22 animal fish cat dog biggle jindo
23
Copyright 2010 by CEBT Baseline of comparison SIG – past work of the authors Assume nodes having the same name refer to the same concept Keep the relations between node pairs Combine all relations into a tree 23
24
Copyright 2010 by CEBT Results 24
25
Copyright 2010 by CEBT Results Simple comparison 25
26
Copyright 2010 by CEBT Some of the Learned Folksonomies 26
27
Copyright 2010 by CEBT Conclusion & Discussion This work can create more accurate and more detailed folksonomies than the current state-of-the-art approach by exploiting structural information This work is more scalable – it incrementally grows the folksonomies Lacks comparison with other state-of-the-art approaches 27
28
Copyright 2010 by CEBT Thank you 28
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.