Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme Presented by Smitashree Choudhury.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Dong Liu Xian-Sheng Hua Linjun Yang Meng Weng Hong-Jian Zhang.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Li-Jia Li Yongwhan Lim Li Fei-Fei Chong Wang David M. Blei B UILDING AND U SING A S EMANTIVISUAL I MAGE H IERARCHY CVPR, 2010.
WMES3103 : INFORMATION RETRIEVAL
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya F. Noy and Mark A. Musen.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
1 Prototype Hierarchy Based Clustering for the Categorization and Navigation of Web Collections Zhao-Yan Ming, Kai Wang and Tat-Seng Chua School of Computing,
Feature Selection for Automatic Taxonomy Induction The Features Input: Two terms Output: A numeric score, or. Lexical-Syntactic Patterns Co-occurrence.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli 1 A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Tag-based Social Interest Discovery
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Query Relevance Feedback and Ontologies How to Make Queries Better.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata Anon Plangprasopchok 1, Kristina Lerman 1, Lise Getoor 2 1 USC.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)
A Graph-based Friend Recommendation System Using Genetic Algorithm
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
1 Helping online communities to semantically enrich folksonomies ISICIL, mai 2010 Freddy Limpens, Fabien Gandon Edelweiss, INRIA Sophia Antipolis {freddy.limpens,
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Marina Drosou, Evaggelia Pitoura Computer Science Department
Algorithmic Detection of Semantic Similarity WWW 2005.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.
Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
2010 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology (WI-IAT) Hierarchical Cost-sensitive Web Resource Acquisition.
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
Clustering (Search Engine Results) CSE 454. © Etzioni & Weld To Do Lecture is short Add k-means Details of ST construction.
1 Intelligent Information System Lab., Department of Computer and Information Science, Korea University Semantic Social Network Analysis Kyunglag Kwon.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Junjie Yao, Yuxin Huang Tag Cube: Acquiring Latent Conceptual Structures from Folksonomy Data.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Using ODP Metadata to Personalize Search University of Seoul Computer Science Database Lab. Min Mi-young.
Using ODP Metadata to Personalize Search Presented by Lan Nie 09/21/2005, Lehigh University.
Exploring Social Tagging Graph for Web Object Classification
Improving Data Discovery Through Semantic Search
Postdoc, School of Information, University of Arizona
Web Page Cleaning for Web Mining
Presentation transcript:

Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr

Motivation Users Web content classification Consume Produce Annotate Organize Discover Annotation / Metadata Organize Search Recommend Leverage

Inducing Folksonomy GOAL: induce hidden classification hierarchies, “Folksonomies*,” from user generated metadata Although metadata from an individual user may be too inaccurate and incomplete, the metadata from different users may complement each other, making it, in combination, meaningful. In this work, we explore some strategies that combine metadata from many users and then induce folksonomies. * The definition is somewhat different from the original one, made by Thomas Vander Wal.

Outline Motivation Hierarchical Relations Approaches Results Discussion Related work

Hierarchical Relations in Social Web Appear Implicitly Appear Explicitly Tags: Insect Grasshopper Australian Macro Orthoptera Folder (collection) Sub folder (set) Relations Goal: to induce deeper hierarchies from this metadata

Inducing Hierarchy from Tags Existing approaches Graph based [Mika05] build a network of associated tags (node = tag, edge = co-occurrence of tags) suggest applying betweenness centrality and set theory to determine broader/narrower relations Hierarchical Clustering [Brooks06; Heymann06+] Tags appearing more frequently would likely have higher centrality and thus more abstract. Probabilistic subsumption [ Sanderson99+; Schmitz06] x is broader than y if x subsumes y x subsumes y if p(x|y) > t & p(y|x) < t x y

Inducing Hierarchy from Tags Some difficulties when using tags to induce hierarchy: Above relations induced using subsumption approach on tags [Sanderson99+, Schmitz06] Washington  United States Car  Automobile Notation: A  B (A is broader than B Or hypernym relation) Insect  Hongkong Color  Brazilian Specificity  Rarity Tags are from different facets

User specified relations, e.g., Flickr’s Collection-Set Delicious’ Bundle-Tag Bibsonomy’s Relation-Tag Key intuition: Not so many people specify peculiar relations like “automobile”  “car”, or “Washington”  “United States” Inducing Hierarchy from user-specified relations In this work, we concentrate on metadata from Flickr.

Simple Strategy Mushrooms & Fungi Set Collection Fungi, Puffballs & Shelf Fungi Tokenize + Stem … Concept relations mushroom fungi Shelf fungi fungi puffballs live thing fungi mushrooms puffballs shelf fungi live thing fungi mushroomplant …… 2. Link concepts & Select path 1.Remove “noisy” relations -Conflict resolution -Significance test Sets Collection

Remove noisy relations: 1 st approach Conflict Resolution (when both A  B and B  A appear) Relation conflicts occur because of noise Voting scheme: Keep A  B (and discard B  A) If N u (A  B) > 1 and N u (A  B) > N u (B  A) insect butterfly insect 10 2

Remove noisy relations: 2 nd approach Significance Test - Use statistical significance test to decide if A  B is significant - Null hypothesis: observed relation A  B was generated by chance, via the random, independent generation of individual concepts A, B. # observations reject accept  # of A  B Is B narrower than A by chance?

Link Concepts Link concepts together simply assume that same terms refer to the same concept anim bug anim insect  anim buginsect bug insect anim moth bug moth insect moth

Select path anim bug insect moth possible paths from anim  moth: 1)a  b  i  m 2)a  i  m 3)a  m 4)a  b  m Network Bottleneck idea: “the flow bottleneck is a minimum flow capacity among all relations in the path” 1) a  b  i  m [BN score = min(26,1,18) = 1] 2) a  i  m [BN score = min(72,18) = 18] 3) a  m [BN score = min(10) = 10] 4) a  b  m [BN score = min(26,4) = 4] 10 Select path: link relations from many users can cause a spaghetti graph

Evaluation & Data Set Hypothesis: the approach that takes explicit relations into account can induce better hierarchies. “Better” means more consistent with the reference hierarchy (obtained from Open Directory Project (ODP)) ODP Hierarchy in ODP is created by volunteer editors controlled under ODP guidelines

Evaluation & Data Set (2) The baseline approach is subsumption approach [Schmitz06] Collection and set terms are used instead of tags, making it comparable. Data Set:  Data from 17 user groups, devoted to wildlife and naturalist photography  21,792 of 39,922 users specify at least one collection  110,543 unique terms (c.f. 166,153 unique terms in ODP), 15,495 terms in common.

Evaluation methodology ODP has many sub hierarchies: comparing to the induced ones are impractical! It’s easier to compare when specifying “root concept” and “leaf concepts”, i.e., specifying a certain sub tree to compare. Reference hierarchy Relations (right after tokenized) Induced hierarchy Induce (remove noise+link) (ODP)

Metrics Taxonomic Overlap [adapted from Maedche02+] measuring structure similarity between two trees for each node, determining how many ancestor and descendant nodes overlap to those in the reference tree. Lexical Recall measuring how well an approach can discover concepts, existing in the reference hierarchy (coverage)

Quantitative Result Manually selecting 32 root nodes Subs 1/32 Conres 11/32 Sig001 15/32 Subs 2/32 Conres 17/32 Sig001 6/32 Subs ~ 0.85 Conres ~ 2.24 Sig001 ~ 2.08

Sport hierarchy

Invertebrate hierarchy

Country hierarchy

Discussion Simple strategy to aggregate a large number of shallow relations specified by different users into a common, deeper hierarchy Induced hierarchies are more consistent with ODP Future work includes:  Term ambiguity  Global structure  Relation types  Apply to other datasets

Related Work Learning concept hierarchy from text data Syntactic based [Hearst92, Caraballo99, Pasca04, Cimiano+05, Snow+06] Word clustering [e.g., Segal+02, Blei+03] Induce concept hierarchy from tags Graph-based & clustering based [Mika05, Brooks+06, Heymann+06, Zhou07+] Probabilistic subsumption [Schmitz06] Ontology alignment [e.g., Udrea+07] Exploit user-specified hierarchy GiveALink [Markines06+]

Questions? Is the metric used in evaluation meaningful? How is the scalability of the system? WordNet, ODP is already there. Why do we need this system? How is this work related to ontology enrichment? Is it ethical to collect users’ data? …. Questions? THANK YOU!