J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators.

Slides:



Advertisements
Similar presentations
A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Advertisements

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Chapter 5: Introduction to Information Retrieval
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Maurice Hermans.  Ontologies  Ontology Mapping  Research Question  String Similarities  Winkler Extension  Proposed Extension  Evaluation  Results.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
C SC 620 Advanced Topics in Natural Language Processing Lecture 22 4/15.
Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg.
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
Jumping Off Points Ideas of possible tasks Examples of possible tasks Categories of possible tasks.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Meaning-Oriented Question-Answering with Ontological Semantics An AQUAINT Project from ILIT.
Ontology-based Information Extraction for Business Intelligence
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Principles of Programming Chapter 1: Introduction  In this chapter you will learn about:  Overview of Computer Component  Overview of Programming 
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Logic Programming for Natural Language Processing Menyoung Lee TJHSST Computer Systems Lab Mentor: Matt Parker Analytic Services, Inc.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Processing of large document collections Part 10 (Information extraction: multilingual IE, IE from web, IE from semi-structured data) Helena Ahonen-Myka.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Final Review 31 October WP2: Named Entity Recognition and Classification Claire Grover University of Edinburgh.
The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
How do we Collect Data for the Ontology? AmphibiaTree 2006 Workshop Saturday 11:30–11:45 J. Leopold.
A Language Independent Method for Question Classification COLING 2004.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
Facilitating Document Annotation using Content and Querying Value.
Topic #1: Introduction EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Principles of Programming Chapter 1: Introduction  In this chapter you will learn about:  Overview of Computer Component  Overview of Programming 
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
Facilitating Document Annotation Using Content and Querying Value.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Automatically Labeled Data Generation for Large Scale Event Extraction
Approaches to Machine Translation
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Statistical NLP: Lecture 13
CS416 Compiler Design lec00-outline September 19, 2018
Introduction to Information Extraction
Introduction CI612 Compiler Design CI612 Compiler Design.
Approaches to Machine Translation
CS416 Compiler Design lec00-outline February 23, 2019
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Presentation transcript:

J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators Translingual architectures Information integration in MIE systems Evaluation Adaptability Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators Translingual architectures Information integration in MIE systems Evaluation Adaptability

J. Turmo, 2006 Adaptive Information Extraction Introduction Multilinguality Multilingual IE (MIE) tasks: The textual information contained in the output templates is wanted to be presented in a different language than the input documents Tipically: input documents written in one language output templates written in another one

J. Turmo, 2006 Adaptive Information Extraction Introduction Multilinguality Relatively little research in MIE LRE program in Europe ECRAN, FACILE, AVENTINUS, SPARKLE, … tools and components for IE in different languages TIDES program in USA PROTEUS, RIPTIDES, CREST, … fast machine translation and information access

J. Turmo, 2006 Adaptive Information Extraction Up to now Multilingual IE evaluation just for NE tasks. Two recent scenarios: CoNLL : Language-independent NE recognition ACE 2007: Arabic input documents English output NE mentions Fei Huang (2005). Multilingual NE Extraction and Translation from text and speech. PhD. Thesis Introduction Multilinguality Open research line

J. Turmo, 2006 Adaptive Information Extraction Introduction Multilinguality Basic elements of MIE architectures: language guessers monolingual architectures Classical approches: use of Machine Translation with monolingual IE architectures extension of monolingual architectures to translingual architectures

J. Turmo, 2006 Adaptive Information Extraction Introduction Multilinguality Basic elements of MIE architectures: language guessers monolingual architectures Classical approches: use of Machine Translation with monolingual IE architectures extension of monolingual architectures to translingual architectures

J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine translators Translingual architectures Information integration in MIE systems Evaluation Adaptability Information Extraction Systems Multilinguality Introduction Language guessers Machine translators Translingual architectures Information integration in MIE systems Evaluation Adaptability

J. Turmo, 2006 Adaptive Information Extraction Language guessers Multilinguality Goal: identify the language of a document Linguistic approach: based on a vocabulary of keywords idea: at least one word from a tipical sentence written in some language should be included in the corresponding vocabulary manually built

J. Turmo, 2006 Adaptive Information Extraction Language guessers Multilinguality Stochastic approach: most widely used based on: generate a frequency table of elements per language compare frequencies of elements in the document with those in the table. elements = or special characters or word sequences or char sequences (different approaches)

J. Turmo, 2006 Adaptive Information Extraction Language guessers Multilinguality Stochastic approach: Pros: good results (over 95% accuracy) Cons: short texts [Zhdanova,02] copes with this problem

J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators Translingual architectures Information integration in MIE systems Evaluation Adaptability Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators Translingual architectures Information integration in MIE systems Evaluation Adaptability

J. Turmo, 2006 Adaptive Information Extraction Machine translators Multilinguality A set of monoligual IE systems Language guesser IE (s 1 ) IE (s 2 ) IE (s k ) mt (s 1,t) mt (s 2,t) mt (s k,t) templates sisi t MIE

J. Turmo, 2006 Adaptive Information Extraction Machine translators Multilinguality Just one monoligual IE system Language guesser mt (t’,t) templates sisi t IE (t’) MT (s 1,t’) MT (s 2,t’) MT (s k,t’) MIE

J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators Translingual architectures Information integration in MIE systems Evaluation Adaptability Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators Translingual architectures Information integration in MIE systems Evaluation Adaptability

J. Turmo, 2006 Adaptive Information Extraction Translingual architectures Multilinguality Try to overcome the ineficiency the MIE architectures based on MT Fussion of IE and interlingua MT Idea: when dealing with a particular domain, it is possible to build a language-independent conceptual model of the particular scenario of extraction [Gaizauskas et al. 97]

J. Turmo, 2006 Adaptive Information Extraction Translingual architectures Multilinguality For each source language requires: Use of different lexical preprocessors Use of different syntactico-semantic parsing Use of different sets of IE patterns (if the MIE system is based on pattern matching) Possible use of language-independent processors (e.g., NERC)

J. Turmo, 2006 Adaptive Information Extraction Translingual architectures Multilinguality Use of language-independent ontology The internal representation of the extracted information is language independent Use of soft techniques for NL generation The output templates are generated using the lexicon of the target language lexical choice problem!

J. Turmo, 2006 Adaptive Information Extraction Translingual architectures Multilinguality M-LASIE system [Gaizauskas et. al 97] Ad-hoc representation of the domain model Lexicons mapped to concepts Add a new source language, involves Add new lexicon + mappings Add new tagger and parser …

J. Turmo, 2006 Adaptive Information Extraction Translingual architectures Multilinguality M-TURBIO system [Turmo et. al 99] EuroWordNet (EWN) Sets of IE-patterns for each source language Mappings from IE-patterns to ILIs in EWN Add a new source language, involves Add new IE-patterns Add new tagger and parser …

J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators Translingual architectures Information integration in MIE systems Evaluation Adaptability Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators Translingual architectures Information integration in MIE systems Evaluation Adaptability

J. Turmo, 2006 Adaptive Information Extraction Information Integration in MIEs Multilinguality The most general architecture Input documents in different source languages not aligned Output templates in different target languages Possible approaches: MIE system + II system MIE/II system

J. Turmo, 2006 Adaptive Information Extraction Information Integration in MIEs Multilinguality Pros: Versatil An instance can occur just in one document written in a specific language. Can be easier to extract an instance expressed in one language than another better processors or resources Cons: Problems inherent to II inconsistent values, similar values, generalizations, …

J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Evaluation Introduction Metrics Data sets Adaptability Information Extraction Systems Multilinguality Evaluation Introduction Metrics Data sets Adaptability

J. Turmo, 2006 Adaptive Information Extraction Introduction Evaluation The evaluation of the performance of an IE system depends on different factors: The IE task: domain, language, document style, … The user needs : software use, human use, just some clues about the relevant facts, the context in which they occur, … What does correctly extracted means? What are the right metrics? What are the best data sets?

J. Turmo, 2006 Adaptive Information Extraction Introduction Evaluation The president of ALP in Spain will leave his job tomorrow night NP NP The president of ALP in Spain will leave his job tomorrow night NP Exact extraction ? The president of ALP in Spain will leave his job tomorrow night NP The president of ALP in Spain will leave his job tomorrow night NP Exact extraction ?

J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Evaluation Introduction Metrics Data sets Adaptability Information Extraction Systems Multilinguality Evaluation Introduction Metrics Data sets Adaptability

J. Turmo, 2006 Adaptive Information Extraction Metrics Evaluation Different evaluation frameworks with different points of view of what is correctly extracted: MUC: correct = partial extraction (-MUC5) correct = exact extraction (MUC6, MUC7) Recall, Precision and F (c.f., Historical Framework) PASCAL: correct = exact extraction Same metrics as in MUC6 ACE: correct = partial extraction (more sophisticated than MUC)

J. Turmo, 2006 Adaptive Information Extraction Metrics Evaluation ACE metric Idea: How well match the information extracted by a system with that of the reference model? Given a system output, s, and a reference model, m, find the global optimum of function Value(s,m) that maximizes the matchings between instances in s and instances in m

J. Turmo, 2006 Adaptive Information Extraction Metrics Evaluation ACE metric Value(s,m) = Value(sys_token i ) / Value(ref_token j ) Σ i Σ j token = instance extracted = [attributes, args or mentions] Value(token) = Element_value(token) * Argument_value(token) Penalties: unmapped attributes, unmapped arguments, wrong mappings Parameters: weights for penalties

J. Turmo, 2006 Adaptive Information Extraction Metrics Evaluation ACE metric Software for ACE evaluation and more information on ACE evaluation available in

J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Evaluation Introduction Metrics Data sets Adaptability Information Extraction Systems Multilinguality Evaluation Introduction Metrics Data sets Adaptability

J. Turmo, 2006 Adaptive Information Extraction Data sets Evaluation Ad-hoc State of the art (e.g., from MUC, ACE, PASCAL) Each one appropriated to evaluate different IE tasks, depending on different factors Availability ? Suitability ?

J. Turmo, 2006 Adaptive Information Extraction Data sets: MUC Evaluation Sources: free text written text (Newswire) MUC-6 and MUC7 data sets Suitable tasks: NE subtasks Element Extraction tasks (template element –TE) Event Extraction tasks (scenario template -ST) Relation Extraction tasks are quite easy Language: English Available from LDC (Linguistic Data Consortium)

J. Turmo, 2006 Adaptive Information Extraction Data sets: ACE Evaluation Sources: Free text written text (Newswires, Weblogs, Discussion Forums) Free text oral transcripts (Broadcast News, Telph. conversations) Suitable tasks (up to now): NE subtasks (extended from MUC) Relation Extraction tasks Event Extraction tasks need more annotation efforts Language: English, Arabic, Chinese, Spanish depending on the input source Available from LDC (Linguistic Data Consortium)

J. Turmo, 2006 Adaptive Information Extraction Data sets: PASCAL Evaluation Sources: Semi-structure documents (Seminar announcements, Corporate acquisitions, Legal sentences) Suitable tasks (up to now): Element Extraction tasks Language: English, Italian Available from Similar sources in repository RISE