Cross-lingual Knowledge Linking Across Wiki Knowledge Bases Zhichun Wang, Juanzi Li, Zhigang Wang, Jie Tang
A tiny example Every article can be represented as a five-tuple b a Simple Solution(A) 1. The similarity of titles exploiting Google Translation API c y Simple Solution(B) 1.Similarity aggregation x z Simple Solution(C) The vector of the similarities SVM learns from the existing links Classify There are 3*3 articles pairs, and one labeled. Our task is to label the other eight pairs.
The weakness of SVM approach SVM only consider the similarity of articles’ local features, it does not take the relations of predications and any constraints into account. b a c Two intuitions: 1.Similarity functions and capture the relations between candidate cross-lingual links and existing ones, now we should model the relations within candidate cross-lingual 2.One article from can only have cross-lingual link with one article from a x y b y x z
Incorporate these information into an unified model 1.Model the possible cross-lingual links(graph product) 2.Similarity functions capture the relations between candidate cross-lingual links and existing ones 3.Model the relations within candidate cross-lingual links 4.Global constraints b a c y x ax bx cx is the set of nodes having relations to z cy ay by denotes the set of labels conflicting with according to the 1-to-1 linking constraint. az bz cz
Linkage Factor Graph Model Node feature function---f Edge feature function----g Constraint feature function---h Details about the feature function Exponential-linear function
Model learning and Inference Objective function: Log-likelihood of labeled data How to maximize the log-likelihood? gradient decent method Loopy Belief Propagation algorithm
Model learning and Inference(cont) Tricks in the implementation 1. Candidate selection Only article pairs that have at least one common outlink are mapped to nodes in the LFG model. 2. Distributed Learning MPI
Experiment Experiment setting 2000 English articles with cross-lingual links to Chinese articles from Wikipedia, and then pick out the corresponding 2000 Chinese articles 3-fold cross validation
Factor contribution analysis 1.According to the decrement of F1-scores, all these factors are useful in predicting new cross-lingual links. 2.LFG achieves a 3.1% increase of F1-score by considering the relations among article pairs.
References P. Cudre-Mauroux, P. Haghani, M. Jost, K. Aberer,and H. De Meer. idmesh: graph-based disambiguation of linked data. In Proceedings of WWW ’09, pages 591–600, 2009. J. Hopcroft, T. Lou, and J. Tang. Who will follow you back? reciprocal relationship prediction. In Proceedings of CIKM’11, 2011. J. Tang, T. Lou, and J. Kleinberg. Inferring social ties across heterogenous networks. In Proceedings of WSDM’12, pages 743–752, 2012. J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In Proceedings of SIGKDD’09, pages 807–816, 2009. C. Wang, J. Han, Y. Jia, J. Tang, D. Zhang, Y. Yu, and J. Guo. Mining advisor-advisee relationships from research publication networks. In Proceedings of KDD’10, pages 203–212, 2010. F. Kschischang, B. Frey, and H.-A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2):498–519, 2001.
The end Thanks for listening!