Integrating Class Hierarchies Yuzhong Qu
NJVR 2,996 vocabularies 455,718 terms Instantiation found for From 261 PLDs (many are from w3.org) 455,718 terms 396,023 classes, 59,868 properties, (many are in YAGO NS) Instantiation found for 115,707 classes (29.2%), e.g. foaf:Person 25,963 properties (43.4%), e.g. dc:creator 1,874 vocabularies (62.6%)
Select Vocabulary Class and property Instantiated classes //and their ancestors The amount of instantiation, e.g. k 10 (100?)
Instantiated Class Hierarchy
Homomorphism Let M ={S1, S2 ,…} be a partially ordered set (or poset), and so does N= {C1, C2 ,…} H:MN be a functional relation from M to N (partial?) Si Sj H(Si) H(Sj) Note Merging class hierarchies (taxonomies) Abstractive summary of a given class hierarchy |Range H| K
Distance C S H
Distance
Merge S
Summary of instances (class hierarchy)
Summary of instances (class hierarchy) ? OR
Instance category and taxonomy leaf node is weighted
Related Problem Huffman Coding Minimum-cost flow problem (Directed) Steiner Tree Node-weighted Steiner Tree (Weighted)Vertex Cover (Weighted) Dominating Set Maximum coverage problem (select no more than K sets) Weighted version (elements are weighted) Minimum Set Cover Weighted version (sets are weighted)
Huffman Coding (Minimum weighted path length) Huffman D A. A method for the construction of minimum-redundancy codes. Proceedings of the IRE, 1952, 40(9): 1098-1101.
Minimum-cost flow problem Given a directed graph with source s and sink t, where each edge (u,v) has capacity c(u, v), flow f(u, v), and cost a(u, v). You are required to send an amount of flow d from s to t. Minimize
Minimum Steiner Tree Given a set V of points (vertices), interconnect them by a network (graph) of shortest length, where the length is the sum of the lengths of all edges.
Minimum Steiner Tree Given an edge-weighted graph G = (V, E, w) and a subset S ⊆ V of required vertices. A Steiner tree is a tree in G that spans all vertices of S. The task is to find a minimum-weight Steiner tree.
Dominating Set problem A dominating set for a graph G = (V, E) is a subset D of V such that every vertex not in D is adjacent to at least one member of D. The minimum dominating set is NP-hard Its decision version is a classical NP-complete decision problem the problem is not fixed-parameter tractable in the sense that no algorithm with running time f(k)nO(1) for any function f exists unless the W-hierarchy collapses to FPT=W[2]. if the input graph is planar, the problem remains NP-hard, but a fixed-parameter algorithm is known.
Vertex Cover problem A vertex cover of a graph is a set of vertices such that each edge of the graph is incident to at least one vertex of the set. The minimum vertex cover is NP-hard Its decision version, the vertex cover problem, was one of Karp's 21 NP-complete problems “if G has a vertex cover of k vertices” is fixed-parameter tractable O(kn + 1.2852k)
Other Techniques Graph summarization Graph edit distance
Reference (Minimum cost flow) James B. Orlin. A polynomial time primal network simplex algorithm for minimum cost flows. Mathematical Programming. 1997(78): 109–129.
Reference (Steiner Tree) Klein P, Ravi R. A nearly best-possible approximation algorithm for node-weighted Steiner trees. Journal of Algorithms, 1995, 19(1): 104-115. Zelikovsky A. A series of approximation algorithms for the acyclic directed Steiner tree problem. Algorithmica, 1997, 18(1): 99-110. Charikar M, Chekuri C, Cheung T, et al. Approximation algorithms for directed Steiner problems. Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms. 1998: 192-200. Zosin L, Khuller S. On directed Steiner trees. Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms. 2002: 59-63.
Reference (Vertex Cover) Niedermeier R, Rossmanith P. On efficient fixed-parameter algorithms for weighted vertex cover. Journal of Algorithms, 2003, 47(2): 63-77. White L J, Gillenson M L. An efficient algorithm for minimum k-covers in weighted graphs. Mathematical Programming, 1975, 8(1): 20-42. Chen J, Kanj I A, Xia G. Improved parameterized upper bounds for vertex cover. Mathematical Foundations of Computer Science 2006. Springer Berlin Heidelberg, 2006: 238-249.
Reference (Graph Summarization) Navlakha S, Rastogi R, Shrivastava N. Graph summarization with bounded error. Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008: 419-432. Tian Y, Hankins R A, Patel J M. Efficient aggregation for graph summarization. Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008: 567-580. Zhang N, Tian Y, Patel J M. Discovery-driven graph summarization. Data Engineering (ICDE), 2010 IEEE 26th International Conference on. IEEE, 2010: 880-891. Gao X, Xiao B, Tao D, et al. A survey of graph edit distance. Pattern Analysis and applications, 2010, 13(1): 113-129.
Reference (Document Summarization) Celikyilmaz A, Hakkani-Tur D. A hybrid hierarchical model for multi-document summarization. ACL 2010: 815-824. Shen C, Li T. Multi-document summarization via the minimum dominating set. COLING 2010: 984-992
Acknowledgement Q&A Discussion