This research is supported by NIH grant U54-GM114838, a grant from the Allen Institute for Artificial Intelligence (allenai.org), and Contract HR0011-15-2-0025.

Slides:



Advertisements
Similar presentations
On-line Compilation of Comparable Corpora and Their Evaluation Radu ION, Dan TUFIŞ, Tiberiu BOROŞ, Alexandru CEAUŞU and Dan ŞTEFĂNESCU Research Institute.
Advertisements

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Large-Scale Entity-Based Online Social Network Profile Linkage.
Global and Local Wikification (GLOW) in TAC KBP Entity Linking Shared Task 2011 Lev Ratinov, Dan Roth This research is supported by the Defense Advanced.
1/1/ Question Classification in English-Chinese Cross-Language Question Answering: An Integrated Genetic Algorithm and Machine Learning Approach Min-Yuh.
An Information Theoretic Approach to Bilingual Word Clustering Manaal Faruqui & Chris Dyer Language Technologies Institute SCS, CMU.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Multilinguality to the Rescue Manaal Faruqui & Chris Dyer Language Technologies Institute SCS, CMU.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Printing: This poster is 48” wide by 36” high. It’s designed to be printed on a large-format printer. Customizing the Content: The placeholders in this.
Multilingual Synchronization focusing on Wikipedia
The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Sampletalk Technology Presentation Andrew Gleibman
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Enhanced Infrastructure for Creation & Collection of Translation Resources Zhiyi Song, Stephanie Strassel (speaker), Gary Krug, Kazuaki Maeda.
1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.
1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
Comparative Study of Name Disambiguation Problem using a Scalable Blocking-based Framework Byung-Won On, Dongwon Lee, Jaewoo Kang, Prasenjit Mitra JCDL.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Multilingual Synchronization focusing on Wikipedia
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs SHONOSUKE ISHIWATARI NOBUHIRO KAJI NAOKI YOSHINAGA.
1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
DeepWalk: Online Learning of Social Representations
Unsupervised Sparse Vector Densification for Short Text Similarity
Relating Foreign Language Curricula to the CEFR in the Maltese context
Cross-Lingual Named Entity Recognition via Wikification
Cross-lingual Models of Word Embeddings: An Empirical Comparison
Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Sentiment analysis algorithms and applications: A survey
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Translation of Unknown Words in Low Resource Languages
Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2
Presenter: Hajar Emami
Vector-Space (Distributional) Lexical Semantics
GLOW- Global and Local Algorithms for Disambiguation to Wikipedia
Lecture 24: NER & Entity Linking
Distributed Representation of Words, Sentences and Paragraphs
Relational Inference for Wikification
Word Embeddings with Limited Memory
iSRD Spam Review Detection with Imbalanced Data Distributions
Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B
Deep Cross-media Knowledge Transfer
Large scale multilingual and multimodal integration
Domain Mixing for Chinese-English Neural Machine Translation
Using Multilingual Neural Re-ranking Models for Low Resource Target Languages in Cross-lingual Document Detection Using Multilingual Neural Re-ranking.
Dennis Zhao,1 Dragomir Radev PhD1 LILY Lab
Language Transfer of Audio Word2Vec:
Joint Learning of Correlated Sequence Labeling Tasks
Neural Machine Translation by Jointly Learning to Align and Translate
Active AI Projects at WIPO
Presentation transcript:

This research is supported by NIH grant U54-GM114838, a grant from the Allen Institute for Artificial Intelligence (allenai.org), and Contract HR with the US Defense Advanced Research Projects Agency (DARPA). Approved for Public Release, Distribution Unlimited. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government.) Cross-lingual Wikification is the task of grounding mentions written in non-English documents to entries in the English Wikipedia. This task involves the problem of comparing textual clues across languages, which requires developing a notion of similarity between text snippets across languages. In this paper, we address this problem by jointly training multilingual embeddings for words and Wikipedia titles. The proposed method can be applied to all languages represented in Wikipedia, including those for which no machine translation technology is available. We create a challenging dataset in 12 languages and show that our proposed approach outperforms various baselines. Moreover, our model compares favorably with the best systems on the TAC KBP2015 Entity Linking task including those that relied on the availability of translation from the target language to English. 1.Learning monolingual word and title embeddings [Wang et al., 2014] It is led by and mainly composed of Sunni Arabs from Iraq… It is led by and mainly composed of en.wikipedia.org/wiki/Sunni_Islam Arabs from en.wikipedia.org/wiki/Iraq… Skip-gram with negative sampling [Mikolov et al., 2013] Since a title appears as a token in the transformed text, we will obtain an embedding for each word and title from the model. 2.Aligning embeddings of two languages by the model based on canonical correlation analysis (CCA) [Hotelling, 1936; Faruqui and Dyer, 2014] Instead of using a dictionary which maps the words between two languages, we use the title mapping obtained from inter-language links in Wikipedia P en, P tr = CCA(, ) M en = E en P en, M tr = E tr P tr E: Monolingual embeddings, M: Multilingual embeddings Unlike other multilingual embedding models, this method can be done on all languages in Wikipedia For a (foreign mention, English title) pair, we use the multilingual embeddings to compute various features We represent a foreign mention using embeddings of: Other mentions in the same document Tayvan, ABD ve İngiltere'de hukuk okuması, Tsai'ye bir LL.B. kazandırdı Context words Tayvan, ABD ve İngiltere'de hukuk okuması, Tsai'ye bir LL.B. kazandırdı Disambiguated English titles before the mention A candidate title is represented by its English title embedding Ranking Features Pr(c|m), Pr(m|c) Cosine similarity of the candidate title embeddings and the above mention representations Linear Ranking SVM One third of the test mentions are hard which can not be solved by the most common title given the mention Given mentions in a non-English document, find the corresponding titles in the English Wikipedia Tayvan, ABD ve İngiltere'de hukuk okuması, Tsai'ye bir LL.B. kazandırdı … M ain challenge : comparing non-English words to English Wikipedia titles United_States Texas Turkey Istanbul … Amerika_Birleşik_Devletleri Teksas Türkiye İstanbul … We focus on the English titles in the intersection of the English and the foreign language Wikipedia title space Two dictionaries Hyperlinked foreign string  all possible English titles. Foreign word  all possible English titles Query the first dictionary by the full mention string. If fails, query the second dictionary by each word in the mention LanguageMethodHardEasyTotal Spanish EsWikifier MonoEmb WordAlign WikiME German WordAlign WikiME French WordAlign WikiME ItalianWikiME ChineseWikiME HebrewWikiME ThaiWikiME ArabicWikiME TurkishWikiME TamilWikiME TagalogWikiME UrduWikiME ApproachSpanishChinese Translation + EnWikifier 79.35N/A EsWikifier 79.04N/A WikiME Typing Top TAC’15 System WikiME