8/13/2004NYCNLP (COLING 2004) Cross-lingual Information Extraction System Evaluation Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University.

Slides:



Advertisements
Similar presentations
Statistical modelling of MT output corpora for Information Extraction.
Advertisements

Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.
July 9, 2003ACL An Improved Pattern Model for Automatic IE Pattern Acquisition Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University.
J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
On Burstiness-Aware Search for Document Sequences Theodoros Lappas Benjamin Arai Manolis Platakis Dimitrios Kotsakos Dimitrios Gunopulos SIGKDD 2009.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:
Pre-CODIE System: Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University Crosslingual On-Demand Information Extraction IE from Japanese source.
Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
LREC Combining Multiple Models for Speech Information Retrieval Muath Alzghool and Diana Inkpen University of Ottawa Canada.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Truc-Vien T. Nguyen Lab: Named Entity Recognition.
Information Extraction Kuang-hua Chen Language & Information Processing System Lab. (LIPS) Department of Library and Information.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Processing of large document collections Part 10 (Information extraction: multilingual IE, IE from web, IE from semi-structured data) Helena Ahonen-Myka.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Natural Language Processing Group Department of Computer Science University of Sheffield, UK Improving Semi-Supervised Acquisition of Relation Extraction.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
2012: Monolingual and Crosslingual SMS-based FAQ Retrieval Johannes Leveling CNGL, School of Computing, Dublin City University, Ireland.
Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.
On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.
A Language Independent Method for Question Classification COLING 2004.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
A S URVEY ON I NFORMATION E XTRACTION FROM D OCUMENTS U SING S TRUCTURES OF S ENTENCES Chikayama Taura Lab. M1 Mitsuharu Kurita 1.
NYU/CRL system for DUC and Prospect for Single Document Summaries Satoshi Sekine (New York University) Chikashi Nobata (CRL – Japan) September 14, 2001.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
Automatic Question Answering  Introduction  Factoid Based Question Answering.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Information Extraction from Single and Multiple Sentences Mark Stevenson Department of Computer Science University of Sheffield, UK.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Keyword Translation Accuracy and Cross-Lingual Question Answering in Chinese and Japanese Teruko Mitamura Mengqiu Wang Hideki Shima Frank Lin In CMU EACL.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Automatically Labeled Data Generation for Large Scale Event Extraction
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Statistical NLP: Lecture 13
Introduction to Information Extraction
NYU/CRL system for DUC and Prospect for Single Document Summaries
Presentation transcript:

8/13/2004NYCNLP (COLING 2004) Cross-lingual Information Extraction System Evaluation Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University

8/13/2004NYCNLP (COLING 2004) Outline 1.Introduction 2.Cross-lingual IE system Translation-based QDIE system Cross-lingual QDIE system 3.Experiment 4.Discussion 5.Conclusion

8/13/2004NYCNLP (COLING 2004) Information Extraction Identifying entities from source text and mapping from source text to pre-defined table. “A smiling Palestinian suicide bomber triggered a massive explosion in the heavily policed heart of downtown Jerusalem today, …” Date: Location: Perpetrator: downtown Jerusalem A … suicide bomber today (Terrorism Activity)

8/13/2004NYCNLP (COLING 2004) Local Context Local contexts provides a useful information to identify entities. Date: Location: Perpetrator: downtown Jerusalem A … suicide bomber today “A smiling Palestinian suicide bomber triggered a massive explosion in the heavily policed heart of downtown Jerusalem today, …”

8/13/2004NYCNLP (COLING 2004) Extraction Patterns Extraction patterns have been widely used as an effective means to extract entities. –Pre-defined template (Riloff 1993): (kidnapped in ) –Predicate-Argument (Yangarber et al. 2000): (, appoint, ) –Dependency Tree (Sudo et al. 2003): (trigger(OBJ: explosion) (ADV: ))) Because of the cost in portability of IE system, automatic pattern discovery technique has become important. –application of bootstrapping method (Riloff and Jones 1999, Yangarber et al. 2000)

8/13/2004NYCNLP (COLING 2004) Pattern Discovery ….. QDIE = query-driven information extraction query IR (1) Get relevant documents (2) Score pattern candidates based on TF/IDF (3) Use pattern matching Source document (Sudo et al. 2003) Preprocess source documents (NE-tagging, Dependency parsing) keyword narrative Any subtree that contains at least one NE instance

8/13/2004NYCNLP (COLING 2004) Cross-lingual IE Assume we have –Machine Translation System –Basic linguistic tools for source and target language Morphological analyzer, parser, NE-tagger, IR system query Japanese English Source document E-QDIE J-QDIE MT system

8/13/2004NYCNLP (COLING 2004) Outline 1.Introduction 2.Cross-lingual IE system Translation-based QDIE system Cross-lingual QDIE system 3.Experiment 4.Discussion 5.Conclusion

8/13/2004NYCNLP (COLING 2004) Translation-based QDIE system query Japanese English Source document (1) Translate the source documents …... (2) Use English QDIE system Source document

8/13/2004NYCNLP (COLING 2004) Cross-lingual QDIE system query Japanese English Source document …... query (1) Translate the user’s query (2) Use Japanese QDIE system (3) Translate the extracted table

8/13/2004NYCNLP (COLING 2004) Comparison of two systems Translation-based QDIE –No source-language-specific tools are necessary except MT system. –Tools for E-QDIE system were customized into English (not output of MT system) Cross-lingual QDIE –MT for short sentences or phrases (for query and extracted entities) –Tools for J-QDIE system were customized into Japanese.

8/13/2004NYCNLP (COLING 2004) Experiment Management Succession Extraction Task (simple version of MUC-6 task) –Identify the entities involved in a succession event. Person, Post, Organization Test document –100 articles (61 relevant, 39 irrelevant) accumulated from Yomiuri Newspaper 1999 (Japanese) –Person(173/651), Post(210/626), Organization(111/709) Source document and tools –130,000 articles from Yomiuri Newspaper 1998 (Japanese) –MT system: “King of Translation” (IBM) –NE tagger: (Sekine and Nobata 2004). Extraction performance is measured by recall/precision of extracted entities.

8/13/2004NYCNLP (COLING 2004) Cross-lingual QDIE does better Maximum recall: crosslingual system: 60% translation-based system:41%

8/13/2004NYCNLP (COLING 2004) Translation QDIE suffers from NE recognition errors NE tagger was customized for English (WSJ) –many of the Japanese NEs do not occur in WSJ. [ Kansai Economic Federation ] ORG → [ Kansai ] LOC [ Economic Federation ] ORG –Translation errors result in fewer and noisier pattern candidates Translation / Cross-lingual –Person:4543/ –Post:3924/ –Organization:4014/ 11812

used Giza++ (Och et al. 2003) to make word alignments between original Japanese sentences and MT-ed English sentences. doubled the number of pattern candidates. NE tagging by Cross-language Projection 順天堂 大 の 水野 美邦 教授 Professor Mizuno 美邦 of 順天堂 large (= Yoshikuni Mizuno, professor at Juntendo Univ.) 大 = abbreviation of 大学 (=Univ.) Frequently mistranslated as “Large” (inspired by Riloff et al. 2002) Japanese: MT output:

8/13/2004NYCNLP (COLING 2004) Still Cross-lingual QDIE does better Maximum recall: crosslingual system: 60% translation-based system withNE projection52% translation-based system:41%

8/13/2004NYCNLP (COLING 2004) Problems in Translation Incorrect dependency structure caused by MT translation errors.

8/13/2004NYCNLP (COLING 2004) Correct Translation: On the sixth, since the financial reports for the fiscal year that ended in February, 1999 will end in a deficit, "Okajima" (Marunouchi, Kofu- city), the leading department store in the prefecture, announced that six of the thirteen full-time directors, including President Hiroyuki Okajima (40), two executive directors and a managing director, submitted the resignation letter and will formally resign at the general meeting of shareholders of the company.

8/13/2004NYCNLP (COLING 2004) From Muika the term settlement of accounts ended February, 99 having become the prospect of the first deficit settlement of accounts after the war etc., six of President Hiroyuki Okajima ( 40 ), two managing directors, one managing directors, the full-time directors that are 13 persons submitted the resignation report, “Okajima” of Marunouchi, Kofu-shi who is the major department store within the prefecture announced that he resigns formally by the fixed general meeting of shareholders of the company planned at the end of this month. MT Output:

8/13/2004NYCNLP (COLING 2004) Problems in Translation Structural difference –multiple translations of a single source language expression make pattern discovery more difficult on MT output に就任する。 be appointed to assume be inaugurated as (translation error)

8/13/2004NYCNLP (COLING 2004) Related Work Riloff et al –showed how CLIE systems can be developed with IE learning tools, bitext alignment and an MT system. –conducted experiments on relatively close language pair: English and French “achieved roughly the same level of performance as the source- language IE system” We expect that the perforamnce gap between translation-based IE and Cross-lingual IE is more pronounced with a more divergent language pair like Japanese and English.

8/13/2004NYCNLP (COLING 2004) Conclusion We discussed the difficulty in cross-lingual information extraction caused by the translation of the source text. Cross-lingual QDIE performs better –Translation-based QDIE suffers from NE recognition errors. –Structural errors and incorrect dependency analysis in MT output caused fewer and noisier pattern candidates

8/13/2004NYCNLP (COLING 2004) Further Discussions Linguistic tools necessary for QDIE systems are available for major languages. Speculation from TIDES Surprise Language Exercise: development of tools in a new language –Machine Translation –Cross-lingual Information Retrieval –Named Entity tagger –(dependency/shallow/full) parser needs more work Additional performance gain for Cross-lingual QDIE may be achieved by the techniques for query translation + query expansion.

8/13/2004NYCNLP (COLING 2004)

8/13/2004NYCNLP (COLING 2004) NE tagging by Cross-language Projection used Giza++ (Och et al. 2003) to make word alignments between original Japanese sentences and MT-ed English sentences. doubled the number of pattern candidates. President Akiyama is inaugurated as the following chairman of Kansai Economic Federation. 秋山社長が関西経済連合会の次期会長に就任する。 (inspired by Riloff et al. 2002)