Presentation is loading. Please wait.

Presentation is loading. Please wait.

Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata Anon Plangprasopchok 1, Kristina Lerman 1, Lise Getoor 2 1 USC.

Similar presentations


Presentation on theme: "Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata Anon Plangprasopchok 1, Kristina Lerman 1, Lise Getoor 2 1 USC."— Presentation transcript:

1 Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata Anon Plangprasopchok 1, Kristina Lerman 1, Lise Getoor 2 1 USC Information Sciences Institute 2 Department of Computer Science, University of Maryland SIGKDD 2010 2010. 12. 17. Summarized and Presented by Kim Chung Rim, IDS Lab., Seoul National University

2 Copyright  2010 by CEBT Contents  Introduction  Goal  Challenges  Learning Folksonomies  SAP: Growing a Tree  Evaluation  Conclusion & Discussion 2

3 Copyright  2010 by CEBT Introduction  The social Web has changed the way people create and use information Users organize their content via tags  Sites such as Flickr, Del.icio.us and YouTube allows users to tag their contents with descriptive keywords 3

4 Copyright  2010 by CEBT Social Metadata  captures collective knowledge of Social Web users  Once mined from many users, Such collective knowledge will add a rich semantic layer to the content of Social Web Could be used for information discovery 4

5 Copyright  2010 by CEBT Structured Social Metadata  Some Social Websites allow users to organize tags hierarchically  Delicious Users can group related tags into bundles  Flickr Users can group related photos into sets and related sets into collections 5

6 Copyright  2010 by CEBT Folksonomy  Folksonomy – a common taxonomy that is learned from social metadata.  There exist predefined hierarchies such as WordNet or Linnaean classification system.  Benefits of Folksonomy over already existing hierarchies represent collective agreement of many individuals Are relatively inexpensive to obtain Can adapt to evolving vocabularies and community’s information needs Directly tied to the annotated content 6

7 Copyright  2010 by CEBT Goal  The goal of this paper is to Extract users’ knowledge (folk knowledge) from user-generated semantics To build up a tree of communal taxonomy 7 animal fish cat dog biggle jindo animal reptile cat animal fish cat dog biggle jindo reptile

8 Copyright  2010 by CEBT Challenges  Sparseness Social metadata is very sparse – only 3.74 tags per photo  Nosiy vocabulary Spelling errors and meaningless tags – not sure, mykid  Ambiguity Individual tag is often ambiguous – jaguar 8

9 Copyright  2010 by CEBT Challenges  Structural noise and conflicts Users use different term to build their own hierarchy  Varying granularity level Users’ level of expertise and expressiveness differ 9

10 Copyright  2010 by CEBT Learning Folksonomies - intuition  roots with similar term should be aggregated horizontally  A Root and a leaf with similar term should be aggregated vertically 10 animal fish cat dog animal reptile cat animal fish cat dog reptile animal fish cat dog biggle jindo animal fish cat dog biggle jindo

11 Copyright  2010 by CEBT Learning Folksonomy  Relational Clustering Two nodes should be clustered if their similarity is above a certain threshold  Similarity is computed using local similarity & structural similarity – localSim(a,b) measures the similarity of node names and their tag distribution – structSim(a,b) measures the similarity of leaves –  is a weight for adjusting contributions from localSim and structSim ( ) 11

12 Copyright  2010 by CEBT Learning Folksonomy - terms  A personal hierarchy is called a sapling  A sapling is composed of a root node and its leaves  Tag statistic of a node x is defined as Where and are tag and its frequency is aggregated from all  = {(canada, 1), (vacation, 1), … }  = {(bc,1), (canada, 2), (vacation, 1), … } 12

13 Copyright  2010 by CEBT Local Similarity  The local similarity of nodes a and b is composed of Name similarity Tag distribution similarity nameSim can be any string similarity metric, which returns a value from 0 to 1 tagSim can be any function for measuring the similarity of distributions – in this paper, tagSim counts the number of common tags n, in the top K tag of a and b. if n ≥ J (a certain threshold), it returns 1, else it returns n/J. 13

14 Copyright  2010 by CEBT Structural Similarity  Compute structural similarity in two cases Similarity between two root nodes Similarity between a root node and a leaf node 14

15 Copyright  2010 by CEBT Root-to-Root Similarity  Similarity based on How many of their children have common name The tag distribution similarity of those that do not have the same name – Where, – is an aggregation of tag distributions of all 15

16 Copyright  2010 by CEBT Root-to-Leaf Similarity  Takes into account when a sapling contains the same leaf as the compared node  When merging UK->{Scotland, Glasgow, Edinburgh, London} and Scotland->{Glasgow, Shetland}, it can be said that Scotland and UK are similar because UK contains a shortcut to Glasgow  Measures overlap between siblings of and children of 16

17 Copyright  2010 by CEBT Relational Clustering  When the nodeSim(a,b) exceeds a certain threshold, two nodes are clustered(merged) 17 animal fish cat dog animal reptile cat animal fish cat dog reptile

18 Copyright  2010 by CEBT SAP: Growing a Tree  Using Incremental Relational Clustering, 18 canada victoria Horizontal aggregation Vertical aggregation

19 Copyright  2010 by CEBT Handling Problems  Shortcuts Take the longer route  Noisy vocabularies If a name is used by less than 1% of the user who contributed in the sapling, it is considered as noisy and discarded 19

20 Copyright  2010 by CEBT Evaluation  20,759 saplings from 7,121 users are selected from Flickr Values for several parameters are chosen 20 ParametersValue K40 J4  RR 0.1  LR 0.8  0.5

21 Copyright  2010 by CEBT Evaluation Methodologies  Against the reference hierarchy (ODP) Taxonomic Overlap – measuring structure similarity between two trees. Measures how many ancestor and descendant nodes overlap to those in the reference tree for each node and take the harmonic mean (fmTO) Lexical Recall – measuring how well an approach can discover concepts in the reference hierarchy (LR)  Structural evaluation Area Under Tree – a measurement to define how bushy and deep a tree is (AUT)  Manual evaluation (for concepts not in ODP) Asking 5 judges whether the path is correct 21

22 Copyright  2010 by CEBT Evaluation metric - AUT  AUT for below tree is: 1*(3+1)/2 + 1*(3+2)/2 = 4.5 22 animal fish cat dog biggle jindo

23 Copyright  2010 by CEBT Baseline of comparison  SIG – past work of the authors Assume nodes having the same name refer to the same concept Keep the relations between node pairs Combine all relations into a tree 23

24 Copyright  2010 by CEBT Results 24

25 Copyright  2010 by CEBT Results  Simple comparison 25

26 Copyright  2010 by CEBT Some of the Learned Folksonomies 26

27 Copyright  2010 by CEBT Conclusion & Discussion  This work can create more accurate and more detailed folksonomies than the current state-of-the-art approach by exploiting structural information  This work is more scalable – it incrementally grows the folksonomies  Lacks comparison with other state-of-the-art approaches 27

28 Copyright  2010 by CEBT Thank you 28


Download ppt "Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata Anon Plangprasopchok 1, Kristina Lerman 1, Lise Getoor 2 1 USC."

Similar presentations


Ads by Google