1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Conceptual Clustering
Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme Presented by Smitashree Choudhury.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Dong Liu Xian-Sheng Hua Linjun Yang Meng Weng Hong-Jian Zhang.
Clustering V. Outline Validating clustering results Randomization tests.
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr.
The Complex Dynamics of Collaborative Tagging Harry Halpin University of Edinburgh Valentin Robu CWI, Netherlands Hana Shepherd Princeton University WWW.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Xyleme A Dynamic Warehouse for XML Data of the Web.
WMES3103 : INFORMATION RETRIEVAL
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya F. Noy and Mark A. Musen.
Recommender Systems; Social Information Filtering.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
1 Prototype Hierarchy Based Clustering for the Categorization and Navigation of Web Collections Zhao-Yan Ming, Kai Wang and Tat-Seng Chua School of Computing,
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Tag-based Social Interest Discovery
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata Anon Plangprasopchok 1, Kristina Lerman 1, Lise Getoor 2 1 USC.
Information Flow using Edge Stress Factor Communities Extraction from Graphs Implied by an Instant Messages Corpus Franco Salvetti University of Colorado.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Laboratory for InterNet Computing CSCE 561 Social Media Projects Ryan Benton October 8, 2012.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
1 Shape Segmentation and Applications in Sensor Networks Xianjin Xhu, Rik Sarkar, Jie Gao Department of CS, Stony Brook University INFOCOM 2007.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Algorithmic Detection of Semantic Similarity WWW 2005.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
Thesis Proposal: Prediction of popular social annotations Abon.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Facilitating Document Annotation Using Content and Querying Value.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Junjie Yao, Yuxin Huang Tag Cube: Acquiring Latent Conceptual Structures from Folksonomy Data.
Using ODP Metadata to Personalize Search Presented by Lan Nie 09/21/2005, Lehigh University.
Queensland University of Technology
Exploring Social Tagging Graph for Web Object Classification
Sofus A. Macskassy Fetch Technologies
Postdoc, School of Information, University of Arizona
Web Page Cleaning for Web Mining
Presentation transcript:

1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

2 Motivation Users Web content hierarchical classification Consume Produce Annotate Organize Discover Annotation / Metadata Organize Search Recommend Leverage Categorize

3 Motivation Goal: to induce category knowledge from social annotation produced by many users Metadata from an individual user may be too inaccurate and incomplete… The metadata from different users may complement each other, making it, in combination, meaningful.

4 Folksonomy Original definition: classification emerging from the use of tags by users (Thomas Vander Wal) In this work: hidden classification hierarchies from annotation created by many users

5 Hierarchical Relations in Social Web Appear Implicitly Appear Explicitly Tags: Insect Grasshopper Australian Macro Orthopteran ( 直翅類 ) Folder (collection) Sub folder (set) Relations Goal: to induce deeper hierarchies from this metadata

6 Outline Motivation Approaches Results Discussion Related work

7 Inducing Hierarchy from Tags Existing approaches Graph based (Mika05) build a network of associated tags (node = tag, edge = co- occurrence of tags) suggest applying betweenness centrality and set theory to determine broader/narrower relations Hierarchical Clustering (Brooks06; Heymann06+) Tags appear more frequently would have higher centrality and thus more abstract. Probabilistic subsumption ( Sanderson99+, Schmitz06) x is broader than y if x subsumes y x subsumes y if p(x|y) > t & p(y|x) < t x y

8 Inducing Hierarchy from Tags Some difficulties when using tags to induce hierarchy: Above relations induced using subsumption approach on tags [Sanderson99+, Schmitz06] Washington  United States Car  Automobile Notation: A  B (A is broader than B) (hypernym relation) Insect  Hongkong Color  Brazilian Specificity  Rarity Tags are from different facets*

9 Inducing Hierarchy from user-specified relations User specified relations, e.g., –Flickr’s Collection-Set, –Delicious’ Bundle-Tag, –Bibsonomy’s Relation-Tag Key intuition: Not so many people specify peculiar relations like –“automobile”  “car”, or –“Washington”  “United States”

10 Simple Strategy Sets Collection The Netherlands - Holanda Set Collection Blijdorp - Rotterdam Tokenize + Stem … Concept relations netherland holanda blijdorp rotterdam holanda rotterdam countri netherland blijdorp 2. Link concepts & Select path blijdorp countri netherland hollandchina …… 1.Remove “noisy” relations -Conflict resolution -Significance test

11 Remove noisy relations: 1 st approach Conflict Resolution (when both a->b and b->a appear) –Relation conflicts occur because of noise –Voting scheme: Keep a  b (and discard b  a) If N u (a  b) > 1 and N u (a  b) > N u (b  a) insect butterfly insect 10 2

12 Remove noisy relations: 2 nd approach Significance Test - Use statistical significance test to decide if a  b is significant - Null hypothesis: observed relation a  b was generated by chance, via the random, independent generation of individual concepts a, b (according to the binomial distribution). # observations reject accept  # of a  b Is “b” narrower than “a” by chance?

13 Link concepts and select path Link concepts: assume that same terms refer to the same concept. anim bug insect moth possible paths from anim  moth: 1)a  b  i  m 2)a  i  m 3)a  m 4)a  b  m Network Bottleneck idea: “the flow bottleneck is a minimum flow capacity among all relations in the path” 1) a  b  i  m [BN score = min(26,1,18) = 1] 2) a  i  m [BN score = min(72,18) = 18] 3) a  m [BN score = min(10) = 10] 4) a  b  m [BN score = min(26,4) = 4] 10 anim bug + Select path: link relations from many users can cause a spaghetti graph anim insect  anim buginsect

14 Evaluation & Data Set Contribution#2:Learning Concept Hierarchies Hypothesis: the approach that takes explicit relations into account can induce better hierarchies. “Better” means more consistent with hand-built hierarchies (ODP ver. 10/08) The baseline approach is subsumption approach [Schmitz06] Collection and set terms are used instead of tags, making it comparable. Data Set:  Data from 17 user groups, devoted to wildlife and naturalist photography  21,792 of 39,922 users specify at least one collection  110,543 unique terms (c.f. 166,153 unique terms in ODP), 15,495 terms in common.

15 Evaluation methodology ODP has many sub hierarchies: comparing to the induced ones are impractical! Contribution#2:Learning Concept Hierarchies It’s easier to compare when specifying “root concept” and “leaf concepts”, i.e., specifying a certain sub tree to compare. Reference hierarchy Relations (right after tokenized) Induced hierarchy Induce (remove noise+link) (ODP)

16 Metrics Taxonomic Overlap [adapted from Maedche02+] –measuring structure similarity between two trees –for each node, determining how many ancestor and descendant nodes overlap to those in the reference tree. Lexical Recall –measuring how well an approach can discover concepts, existing in the reference hierarchy (coverage) Contribution#2:Learning Concept Hierarchies

17 Quantitative Results

18 Quantitative Results Contribution#2:Learning Concept Hierarchies Manually selecting 32 root nodes Taxonomic Overlap : 27 of them are better than those by subsumption 3 of them get zero score in both approaches Lexical Recall: 28 of them are better than those by subsumption 2 of them get similar score on both approaches the rest, by subsumption, only induce the root node. The proposed approach can induce deeper trees The proposed approach can induce hierarchies more consistent with ODP in almost all cases.

19 Sport hierarchy

20 Invertebrate hierarchy

21 Country hierarchy

22 Discussion Simple strategy to aggregate a large number of shallow relations specified by different users into a common, deeper hierarchy Induced hierarchies are more consistent with ODP Future work includes:  Term ambiguity  Relation types  Global path  Apply to other datasets

23 Thank You! & Questions?