 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge.

Slides:



Advertisements
Similar presentations
Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute 1 From OntoSelect to OntoSelect-SWSE.
Advertisements

Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
 Copyright 2006 Digital Enterprise Research Institute. All rights reserved. The Future is Now JeromeDL A Digital Library on Social Semantic.
2 Information Retrieval System IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.
Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro.
 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Enabling Networked Knowledge.
Scott Wen-tau Yih (Microsoft Research) Joint work with Vahed Qazvinian (University of Michigan)
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
 Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Live Linked Open Sensor.
Scott Wen-tau Yih Joint work with Kristina Toutanova, John Platt, Chris Meek Microsoft Research.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Linking the Real World Manfred.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved Digital Enterprise Research Institute Ontologies & Natural Language.
Multilinguality to the Rescue Manaal Faruqui & Chris Dyer Language Technologies Institute SCS, CMU.
Tag-based Social Interest Discovery
 Copyright 2007 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute (DERI) Galway
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.
 Copyright 2006 Digital Enterprise Research Institute. All rights reserved. Collaborative Building of Controlled Vocabularies Crosswalks Mateusz.
Multi-Prototype Vector Space Models of Word Meaning __________________________________________________________________________________________________.
Which of the two appears simple to you? 1 2.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
 Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute The Digital Enterprise Research.
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky (Technion) Eugene Agichtein (Emory) Evgeniy Gabrilovich (Yahoo!
 Copyright 2007 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute The Social Semantic Desktop.
Multilingual Relevant Sentence Detection Using Reference Corpus Ming-Hung Hsu, Ming-Feng Tsai, Hsin-Hsi Chen Department of CSIE National Taiwan University.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Enabling Networked Knowledge.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
1 Business Proprietary © 2009 Oculus Info Inc. Everyone’s a Critic: Memory Models and Uses for an Artificial Turing Judge W. Joseph MacInnes, Blair C.
Evgeniy Gabrilovich and Shaul Markovitch
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Expert (and Novice) Finding.
Miruna Bădescu Eau de Web CHM Workshop in Bucharest December 2010 CHM controlled vocabulary - what is involved -
Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Link Distribution on Wikipedia [0407]KwangHee Park.
Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
 Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute From Web 1.0 to Web 2.0.
TDT 2004 Unsupervised and Supervised Tracking Hema Raghavan UMASS-Amherst at TDT 2004.
An Adaptive User Profile for Filtering News Based on a User Interest Hierarchy Sarabdeep Singh, Michael Shepherd, Jack Duffy and Carolyn Watters Web Information.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Link Distribution on Wikipedia [0422]KwangHee Park.
Mapping the NCI Thesaurus and the Collaborative Inter-Lingual Index Amanda Hicks University of Florida HealthInsight Workshop, Oslo, Norway.
Cross-lingual Dataless Classification for Many Languages
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Sentiment analysis algorithms and applications: A survey
Cross-lingual Dataless Classification for Many Languages
Vector-Space (Distributional) Lexical Semantics
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Multilingual Information Access in a Digital Library
Improving DevOps and QA efficiency using machine learning and NLP methods Omer Sagi May 2018.
Peggy van der Kreeft Deutsche Welle
Multilingualism in Eurostat publications
Large scale multilingual and multimodal integration
Active AI Projects at WIPO
Presentation transcript:

 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Enabling Networked Knowledge Cross-Lingual Linking of News Stories using ESA Nitish Aggarwal, Kartik Asooja, Paul Biutelaar, Tamara Polajanar, Jorge Gracia DERI, NUI Galway, Ireland OEG, UPM, Madrid, Spain Tuesday, 18 Dec, 2012 CL!NSS, FIRE-2012

Digital Enterprise Research Institute Enabling Networked Knowledge Overview Problem Space Approach  Search Space Reduction  Semantic Ranking Cross-Lingual Explicit Semantic Analysis (CL-ESA) Evaluations Conclusion & Future Work 2

Digital Enterprise Research Institute Enabling Networked Knowledge Problem Space Cross-lingual news story linking  identify the same news articles in different languages  Cross-Lingual Plagiarism detection Data set  50 English News Stories  50K Hindi News Stories Challenge  Not directly Translated – Similar keywords in different stories – Different keywords in similar stories 3

Digital Enterprise Research Institute Enabling Networked Knowledge Approach Search Space Reduction  News publication dates – by taking K days window  Vocabulary overlap – Translating English news stories using Google Translate Semantic Ranking  Rank the news stories with their semantic relatedness  CL-ESA semantic relatedness score 4

Digital Enterprise Research Institute Enabling Networked Knowledge Corpus-based Relatedness  Semantic meaning as a distributional vector – Words that occur in similar contexts tend to have similar/ related meanings i.e. meaning of a word can be defined in terms of its context. (Distributional Hypothesis (Harris, 1954))  Latent Semantic Analysis (LSA) – Latent or implicit semantics (unsupervised)  Explicit Semantic Analysis (ESA) – Explicit semantics from explicitly derived concepts (supervised) 5 Semantic Ranking/Relatedness

Digital Enterprise Research Institute Enabling Networked Knowledge 6 Word 1 Word n W 1 *URI1+w 2 *URI 2 …. w n *URI n Word 1 Word n W 1 *URI1+w 2 *URI 2 …. w n *URI n Word 1 Word n W 1 *URI1+w 2 *URI 2 …. w n *URI n EN HI ES Inverted Index W 11 *URI1+w 12 *URI 2 …. w 1n *URI n Vector Cosine Semantic Relatedness Cross lingual ESA (CL-ESA) Multilingual Wikipedia Index  EN, DE, ES, PT, FR, NL, HI – Easily extendable for other languages  Performed better than CL-latent models

Digital Enterprise Research Institute Enabling Networked Knowledge Run1  window of 4 days (2 days before and 2 days after)  Rank all news stories using CL-ESA Run2  window of 14 days (7 days before and 7 days after)  Rank all news stories using Modified CL-ESA Run3  English stories were translated into Hindi using Google translator  Took top 1000 Hindi news using vocabulary overlap  Re-rank all news stories using CL-ESA 7 Experiments

Digital Enterprise Research Institute Enabling Networked Knowledge CL!NSS challenge 8 Evaluation: Results

Digital Enterprise Research Institute Enabling Networked Knowledge Initial approach for cross lingual linking of news stories  Bigger window with modified CL-ESA works best  Translated vocabulary overlap did not work well Use other ranking scores  LSA, LDA Evaluate separate effect of components  Bigger window size Vs Ranking function 9 Conclusion

Digital Enterprise Research Institute Enabling Networked Knowledge Thank You Questions? 10