Download presentation
Presentation is loading. Please wait.
Published byWilliam Charles Modified over 8 years ago
1
Junjie Yao, Yuxin Huang 2010-05-13 Tag Cube: Acquiring Latent Conceptual Structures from Folksonomy Data
2
Taxonomy & Folksonomy
3
Delicious Snapshot Design blog software music tools reference art video programming webdesign web2.0 mac howto linux tutorial web free news photography shopping blogs css imported education travel javascript food games Development inspiration politics flash apple tips java google osx business windows iphone science productivity books toread helath funny internet wordpress ajax ruby research humor fun technology search opensource Photoshop media recipes cool work article marketing security mobile jobs rails lifehacks tutorials resources php social download diy ubuntu freeware portfolio photo movies writing graphics youtube audio online
4
User Contributed Description Users Web content classification Consume Produce Annotate Organize Discover Organize Search Recommend Leverage
5
Automatic Taxonomy Induction decomposed into two subtasks: term extraction relation formation. relation formulation becomes the focus of most research: Pattern based. clustering-based. clustering-based approaches cannot generate relations as accurate as pattern based. performance is largely influenced by the types of features used.
6
Clustering features Contextual Co-occurrence Syntactic dependency Usually, we need the Is-a and sibling relations
7
Tree, relation construction Directed Maximum Spanning Tree. Or Prefix Tree. Hierarchical Agglomerative Clustering. Pachinko Allocation models. LDA based extension algorithm.
8
Tag hierarchy challenges: Sparse and short context: compared to textual documents Low quality and noise: Typo, synonym, describer instead of categorizer Scalability and data size.
9
Inducing Hierarchy from Tags Existing approaches Graph based [Mika05] build a network of associated tags (node = tag, edge = co-occurrence of tags) suggest applying betweenness centrality and set theory to determine broader/narrower relations Hierarchical Clustering [Brooks06; Heymann06+] Tags appearing more frequently would likely have higher centrality and thus more abstract. Probabilistic subsumption [ Sanderson99+; Schmitz06] x is broader than y if x subsumes y x subsumes y if p(x|y) > t & p(y|x) < t x y
10
Difficulties… Some difficulties when using tags to induce hierarchy: Above relations induced using subsumption approach on tags [Sanderson99+, Schmitz06] Washington United States Car Automobile Notation: A B (A is broader than B Or hypernym relation) Insect Hongkong Color Brazilian Specificity Rarity Tags are from different facets Obama President Chrome Browser Frequency Abstractness
11
Our Previous Work Graph based representation build a graph of associated tags (node = tag, edge = co-occurrence of tags in bookmarking actions) Applying association rule and confidence level to determine broader/narrower relations Greedy algorithm for tree induction. Evaluation Empirical and lack of ground truth Temporal Fusion Consecutive smoothing.
12
Review Comment Positive: Interesting work, evolutionary aspect Remarkable data size Negative: heuristic contribution of apparent marginal value weak evaluation, non-convincing at this point Overall: solution is intuitively reasonable but heuristic. The technical depth of this work is limited and the evaluation is insufficient to support the proposed solution.
13
Proposed Improvements Rich similarity measurements Better Partial Order determination Efficient and accuracy tree deduction Reasonable evaluation and practical application scenarios Incorporate burst detection into taxonomy evolution?
14
Why cubes? OLAP, Text Cube Cube organization and searching in folksonomy/ social media? The multi-categories and roll-up, drill-down characters differ from single taxonomy.
15
Similarity Measurements Co-action Co-resource Co-user Clustering or pLSA to cope with sparse features.
16
Partial Order Determination Define new ontology metrics to replace frequency. Take consideration into various aspects: Resource coverage E.g. browser, firefox, chrome Temporal duration E.g. president, obama Abstractness Terms appearing in tags, corresponding resource description. Pair training Graph centrality? Term Association
17
Tree Deduction Given the tagging terms and similarity/partial order relation, To deduce the underlying taxonomy: Prefix tree or incremental clustering. The optimization objective: Minimize the conflict order pairs and weighted layer penalty. Iteratively update the convergence. Incrementally expand the taxonomy tree.
18
Evaluation Under Ground Truth It’s easier to compare when specifying “root concept” and “leaf concepts”, i.e., specifying a certain sub tree to compare. Reference hierarchy Relations (right after tokenized) Induced hierarchy Induce (remove noise+link) (ODP)
19
Metrics Taxonomic Overlap [adapted from Maedche02+] measuring structure similarity between two trees for each node, determining how many ancestor and descendant nodes overlap to those in the reference tree. Lexical Recall measuring how well an approach can discover concepts, existing in the reference hierarchy (coverage)
20
Application Scenario Organize and represent resources into the tag cube. A new way of similarity measurement based on tree distance. Resource search & recommendation
21
Evaluation ODP dataset The overlapped pages between our delicious dataset and ODP is around 50,000. Ground truth: Pages distance in the original ODP tree Baseline 0: tag vector based similarity measurement Baseline x: other taxonomy induction methods.
22
Cube Materialization Tree compression MDL principle.
23
References [1] D. Burdick, P. M. Deshpande, T. S. Jayram, R. Ramakrishnan, and S. Vaithyanathan. OLAP over uncertain and imprecise data. In Proc. of VLDB, pages 970–981, 2005. [2] Y. Cao, H. Duan, C. Lin, Y. Yu, and H. Hon. Recommending questions using the mdl-based tree cut model. In Proc. of WWW, pages 81–90, 2008. [3] P. Cimiano, A. Hotho, and S. Staab. Learning concept hierarchies from text corpora using formal concept analysis. J. Artif. Int. Res., 24(1):305–339, 2005. [4] C. X. Lin, B. Ding, J. Han, F. Zhu, and B. Zhao. Text cube: Computing IR measures for multidimensional text database analysis. In Proc. of ICDM, pages 905–910, 2008. [5] B. Markines, C. Cattuto, F. Menczer, D. Benz, A. Hotho, and G. Stumme. Evaluating similaritymeasures for emergent semantics of social tagging. In Proc. of WWW, pages 641–650, 2009. [6] M. Sanderson and B. Croft. Deriving concept hierarchies from text. pages 206–213, 1999. [7] H. Yang and J. Callan. A metric-based framework for automatic taxonomy induction. In Proc. of ACL, pages 271–279, 2009. [8] X. Yin and S. Shah. Building taxonomy of web search intents for name entity queries. In Proc. of WWW, pages 1001–1010, 2010. [9] Y. Yu, C. X. Lin, Y. Sun, C. Chen, J. Han, B. Liao, T. Wu, C. Zhai, D. Zhang, and B. Zhao. iNextCube: information network-enhanced text cube. Proc. VLDB Endow., 2(2):1622–1625, 2009.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.