Download presentation
Presentation is loading. Please wait.
Published byToby Patrick Modified over 9 years ago
1
Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge Cross-Lingual Linking of News Stories using ESA Nitish Aggarwal, Kartik Asooja, Paul Biutelaar, Tamara Polajanar, Jorge Gracia DERI, NUI Galway, Ireland OEG, UPM, Madrid, Spain Tuesday, 18 Dec, 2012 CL!NSS, FIRE-2012
2
Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge Overview Problem Space Approach Search Space Reduction Semantic Ranking Cross-Lingual Explicit Semantic Analysis (CL-ESA) Evaluations Conclusion & Future Work 2
3
Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge Problem Space Cross-lingual news story linking identify the same news articles in different languages Cross-Lingual Plagiarism detection Data set 50 English News Stories 50K Hindi News Stories Challenge Not directly Translated – Similar keywords in different stories – Different keywords in similar stories 3
4
Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge Approach Search Space Reduction News publication dates – by taking K days window Vocabulary overlap – Translating English news stories using Google Translate Semantic Ranking Rank the news stories with their semantic relatedness CL-ESA semantic relatedness score 4
5
Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge Corpus-based Relatedness Semantic meaning as a distributional vector – Words that occur in similar contexts tend to have similar/ related meanings i.e. meaning of a word can be defined in terms of its context. (Distributional Hypothesis (Harris, 1954)) Latent Semantic Analysis (LSA) – Latent or implicit semantics (unsupervised) Explicit Semantic Analysis (ESA) – Explicit semantics from explicitly derived concepts (supervised) 5 Semantic Ranking/Relatedness
6
Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge 6 Word 1 Word n W 1 *URI1+w 2 *URI 2 …. w n *URI n Word 1 Word n W 1 *URI1+w 2 *URI 2 …. w n *URI n Word 1 Word n W 1 *URI1+w 2 *URI 2 …. w n *URI n EN HI ES Inverted Index W 11 *URI1+w 12 *URI 2 …. w 1n *URI n Vector Cosine Semantic Relatedness Term@en Term@hi Cross lingual ESA (CL-ESA) Multilingual Wikipedia Index EN, DE, ES, PT, FR, NL, HI – Easily extendable for other languages Performed better than CL-latent models
7
Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge Run1 window of 4 days (2 days before and 2 days after) Rank all news stories using CL-ESA Run2 window of 14 days (7 days before and 7 days after) Rank all news stories using Modified CL-ESA Run3 English stories were translated into Hindi using Google translator Took top 1000 Hindi news using vocabulary overlap Re-rank all news stories using CL-ESA 7 Experiments
8
Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge CL!NSS challenge 8 Evaluation: Results
9
Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge Initial approach for cross lingual linking of news stories Bigger window with modified CL-ESA works best Translated vocabulary overlap did not work well Use other ranking scores LSA, LDA Evaluate separate effect of components Bigger window size Vs Ranking function 9 Conclusion
10
Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge Thank You Questions? 10
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.