Presentation is loading. Please wait.

Presentation is loading. Please wait.

Junjie Yao, Yuxin Huang 2010-05-13 Tag Cube: Acquiring Latent Conceptual Structures from Folksonomy Data.

Similar presentations


Presentation on theme: "Junjie Yao, Yuxin Huang 2010-05-13 Tag Cube: Acquiring Latent Conceptual Structures from Folksonomy Data."— Presentation transcript:

1 Junjie Yao, Yuxin Huang 2010-05-13 Tag Cube: Acquiring Latent Conceptual Structures from Folksonomy Data

2 Taxonomy & Folksonomy

3 Delicious Snapshot  Design blog software music tools reference art video programming webdesign web2.0 mac howto linux tutorial web free news photography shopping blogs css imported education travel javascript food games  Development inspiration politics flash apple tips java google osx business windows iphone science productivity books toread helath funny internet wordpress ajax ruby research humor fun technology search opensource  Photoshop media recipes cool work article marketing security mobile jobs rails lifehacks tutorials resources php social download diy ubuntu freeware portfolio photo movies writing graphics youtube audio online

4 User Contributed Description Users Web content classification Consume Produce Annotate Organize Discover Organize Search Recommend Leverage

5 Automatic Taxonomy Induction  decomposed into two subtasks:  term extraction  relation formation.  relation formulation becomes the focus of most research: Pattern based. clustering-based.  clustering-based approaches cannot generate relations as accurate as pattern based.  performance is largely influenced by the types of features used.

6 Clustering features  Contextual  Co-occurrence  Syntactic dependency Usually, we need the Is-a and sibling relations

7 Tree, relation construction  Directed Maximum Spanning Tree. Or Prefix Tree.  Hierarchical Agglomerative Clustering.  Pachinko Allocation models. LDA based extension algorithm.

8 Tag hierarchy challenges:  Sparse and short context:  compared to textual documents  Low quality and noise:  Typo, synonym, describer instead of categorizer  Scalability and data size.

9 Inducing Hierarchy from Tags Existing approaches Graph based [Mika05] build a network of associated tags (node = tag, edge = co-occurrence of tags) suggest applying betweenness centrality and set theory to determine broader/narrower relations Hierarchical Clustering [Brooks06; Heymann06+] Tags appearing more frequently would likely have higher centrality and thus more abstract. Probabilistic subsumption [ Sanderson99+; Schmitz06] x is broader than y if x subsumes y x subsumes y if p(x|y) > t & p(y|x) < t x y

10 Difficulties… Some difficulties when using tags to induce hierarchy: Above relations induced using subsumption approach on tags [Sanderson99+, Schmitz06] Washington  United States Car  Automobile Notation: A  B (A is broader than B Or hypernym relation) Insect  Hongkong Color  Brazilian Specificity  Rarity Tags are from different facets Obama  President Chrome  Browser Frequency  Abstractness

11 Our Previous Work  Graph based representation  build a graph of associated tags (node = tag, edge = co-occurrence of tags in bookmarking actions)  Applying association rule and confidence level to determine broader/narrower relations  Greedy algorithm for tree induction.  Evaluation  Empirical and lack of ground truth  Temporal Fusion  Consecutive smoothing.

12 Review Comment Positive:  Interesting work, evolutionary aspect  Remarkable data size Negative:  heuristic contribution of apparent marginal value  weak evaluation, non-convincing at this point Overall: solution is intuitively reasonable but heuristic. The technical depth of this work is limited and the evaluation is insufficient to support the proposed solution.

13 Proposed Improvements  Rich similarity measurements  Better Partial Order determination  Efficient and accuracy tree deduction  Reasonable evaluation and practical application scenarios  Incorporate burst detection into taxonomy evolution?

14 Why cubes?  OLAP, Text Cube  Cube organization and searching in folksonomy/ social media?  The multi-categories and roll-up, drill-down characters differ from single taxonomy.

15 Similarity Measurements  Co-action   Co-resource   Co-user   Clustering or pLSA to cope with sparse features.

16 Partial Order Determination  Define new ontology metrics to replace frequency.  Take consideration into various aspects:  Resource coverage E.g. browser, firefox, chrome  Temporal duration E.g. president, obama  Abstractness Terms appearing in tags, corresponding resource description. Pair training  Graph centrality?  Term Association

17 Tree Deduction Given the tagging terms and similarity/partial order relation, To deduce the underlying taxonomy:  Prefix tree or incremental clustering.  The optimization objective:  Minimize the conflict order pairs and weighted layer penalty.  Iteratively update the convergence.  Incrementally expand the taxonomy tree.

18 Evaluation Under Ground Truth It’s easier to compare when specifying “root concept” and “leaf concepts”, i.e., specifying a certain sub tree to compare. Reference hierarchy Relations (right after tokenized) Induced hierarchy Induce (remove noise+link) (ODP)

19 Metrics  Taxonomic Overlap [adapted from Maedche02+]  measuring structure similarity between two trees  for each node, determining how many ancestor and descendant nodes overlap to those in the reference tree.  Lexical Recall  measuring how well an approach can discover concepts, existing in the reference hierarchy (coverage)

20 Application Scenario  Organize and represent resources into the tag cube.  A new way of similarity measurement based on tree distance.  Resource search & recommendation

21 Evaluation  ODP dataset  The overlapped pages between our delicious dataset and ODP is around 50,000.  Ground truth: Pages distance in the original ODP tree  Baseline 0: tag vector based similarity measurement  Baseline x: other taxonomy induction methods.

22 Cube Materialization  Tree compression  MDL principle.

23 References  [1] D. Burdick, P. M. Deshpande, T. S. Jayram, R. Ramakrishnan, and S. Vaithyanathan. OLAP over uncertain and imprecise data. In Proc. of VLDB, pages 970–981, 2005.  [2] Y. Cao, H. Duan, C. Lin, Y. Yu, and H. Hon. Recommending questions using the mdl-based tree cut model. In Proc. of WWW, pages 81–90, 2008.  [3] P. Cimiano, A. Hotho, and S. Staab. Learning concept hierarchies from text corpora using formal concept analysis. J. Artif. Int. Res., 24(1):305–339, 2005.  [4] C. X. Lin, B. Ding, J. Han, F. Zhu, and B. Zhao. Text cube: Computing IR measures for multidimensional text database analysis. In Proc. of ICDM, pages 905–910, 2008.  [5] B. Markines, C. Cattuto, F. Menczer, D. Benz, A. Hotho, and G. Stumme. Evaluating similaritymeasures for emergent semantics of social tagging. In Proc. of WWW, pages 641–650, 2009.  [6] M. Sanderson and B. Croft. Deriving concept hierarchies from text. pages 206–213, 1999.  [7] H. Yang and J. Callan. A metric-based framework for automatic taxonomy induction. In Proc. of ACL, pages 271–279, 2009.  [8] X. Yin and S. Shah. Building taxonomy of web search intents for name entity queries. In Proc. of WWW, pages 1001–1010, 2010.  [9] Y. Yu, C. X. Lin, Y. Sun, C. Chen, J. Han, B. Liao, T. Wu, C. Zhai, D. Zhang, and B. Zhao. iNextCube: information network-enhanced text cube. Proc. VLDB Endow., 2(2):1622–1625, 2009.


Download ppt "Junjie Yao, Yuxin Huang 2010-05-13 Tag Cube: Acquiring Latent Conceptual Structures from Folksonomy Data."

Similar presentations


Ads by Google