Information Extraction Junichi Tsujii Graduate School of Science University of Tokyo Japan Ronen Feldman Bar Ilan University Israel.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Information Extraction: What It Is How to Do It Where It’s Going Douglas E. Appelt Artificial Intelligence Center SRI International.
Processing of large document collections Part 8 (Information extraction) Helena Ahonen-Myka Spring 2005.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Chapter 20: Natural Language Generation Presented by: Anastasia Gorbunova LING538: Computational Linguistics, Fall 2006 Speech and Language Processing.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Information Extraction CS 652 Information Extraction and Integration.
Introduction to Computational Linguistics Lecture 2.
Information Extraction from Scientific Texts Junichi Tsujii Graduate School of Science University of Tokyo Japan.
Information Extraction CS 652 Information Extraction and Integration.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
Introduction to Information Extraction Chia-Hui Chang Dept. of Computer Science and Information Engineering, National Central University, Taiwan
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
ISP 433/633 Week 9 NLP in IR. Natural Language Processing Simple Definition: –A study of how to use computers to do things with human languages. What.
Empirical Methods in Information Extraction - Claire Cardie 자연어처리연구실 한 경 수
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Information Extraction from Biomedical Text Jerry R. Hobbs Artificial Intelligence Center SRI International.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/2010 Overview of NLP tasks (text pre-processing)
Information Extraction
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
School of Engineering and Computer Science Victoria University of Wellington COMP423 Intelligent agents.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Search Engines and Information Retrieval Chapter 1.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
CSC 594 Topics in AI – Text Mining and Analytics
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
NLP And The Semantic Web Dainis Kiusals COMS E6125 Spring 2010.
Chapter 10. Parsing with CFGs From: Chapter 10 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by.
Finite State Parsing & Information Extraction CMSC Intro to NLP January 10, 2006.
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
©2003 Paula Matuszek CSC 9010: Information Extraction Dr. Paula Matuszek (610) Fall, 2003.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Introduction to CL & NLP CMSC April 1, 2003.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Introduction to NLP Chapter 1: Overview Heshaam Faili University of Tehran.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Data Mining: Text Mining
Information Retrieval
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
©2012 Paula Matuszek CSC 9010: Information Extraction Overview Dr. Paula Matuszek (610) Spring, 2012.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
CSC 594 Topics in AI – Natural Language Processing
Statistical NLP: Lecture 3
Information Extraction
Introduction to Information Extraction
Social Knowledge Mining
Machine Learning in Natural Language Processing
Automatic Detection of Causal Relations for Question Answering
Natural Language Processing
CS246: Information Retrieval
Introduction Dataset search
Presentation transcript:

Information Extraction Junichi Tsujii Graduate School of Science University of Tokyo Japan Ronen Feldman Bar Ilan University Israel

Application Tasks of NLP (1)Information Retrieval/Detection (2)Passage Retrieval (3)Information Extraction (5)Text Understanding (4) Question/Answering Tasks To search and retrieve documents in response to queries for information To search and retrieve part of documents in response to queries for information To extract information that fits pre-defined database schemas or templates, specifying the output formats To answer general questions by using texts as knowledge base: Fact retrieval, combination of IR and IE To understand texts as people do: Artificial Intelligence

(1)Information Retrieval/Detection (2)Passage Retrieval (3)Information Extraction (5)Text Understanding (4) Question/Answering Tasks Ranges of Queries Pre-Defined: Fixed aspects of information carried in texts

IE definitions  Entity: an object of interest such as a person or organization  Attribute: A property of an entity such as name, alias, descriptor or type  Fact: A relationship held between two or more entities such as Position of Person in Company  Event: An activity involving several entities such as terrorist act, airline crash, product information

IE accuracy typical figures by information type  Entity: 90-98%  Attribute: 80%  Fact: 60-70%  Event:50-60%

MUC conferences  MUC 1 to MUC 7  1987 to 1997  Topics: Naval operations (2) Terrorist Activity (2) Joint venture and microelectronics Management changes Space Vehicles and Missile launches

The ACE Evaluation  The ACE program – challenge of extracting content from human language. Research effort directed to master first the extraction of “entities” Then the extraction of “relations” among these entities Finally the extraction of “events” that are causally related sets of relations  After two years, top systems successfully capture well over 50 % of the value at the entity level

Applications of IE  Routing of information  Infrastructure for IR and categorization (higher level features)  Event based summarization  Automatic creation of databases and knowledge bases

Where would IE be useful?  Semi-structured text  Generic documents like news articles  Most of the information in the doc is centred around a set of easily identifiable entities

Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and “metal wood” clubs a month. Example of IE: FASTUS(1993) TIE-UP-1 Relationship: TIE-UP Entities: “Bridgestone Sport Co.” “a local concern” “a Japanese trading house” Joint Venture Company: “Bridgestone Sports Taiwan Co.” Activity: ACTIVITY-1 Amount: NT$ ACTIVITY-1 Activity: PRODUCTION Company: “Bridgestone Sports Taiwan Co.” Product: “iron and ‘metal wood’ clubs” Start Date: DURING: January 1990

Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and “metal wood” clubs a month. Example of IE: FASTUS(1993) TIE-UP-1 Relationship: TIE-UP Entities: “Bridgestone Sport Co.” “a local concern” “a Japanese trading house” Joint Venture Company: “Bridgestone Sports Taiwan Co.” Activity: ACTIVITY-1 Amount: NT$ ACTIVITY-1 Activity: PRODUCTION Company: “Bridgestone Sports Taiwan Co.” Product: “iron and ‘metal wood’ clubs” Start Date: DURING: January 1990

Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and “metal wood” clubs a month. Example of IE: FASTUS(1993) TIE-UP-1 Relationship: TIE-UP Entities: “Bridgestone Sport Co.” “a local concern” “a Japanese trading house” Joint Venture Company: “Bridgestone Sports Taiwan Co.” Activity: ACTIVITY-1 Amount: NT$ ACTIVITY-1 Activity: PRODUCTION Company: “Bridgestone Sports Taiwan Co.” Product: “iron and ‘metal wood’ clubs” Start Date: DURING: January 1990

Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and “metal wood” clubs a month. Example of IE: FASTUS(1993) TIE-UP-1 Relationship: TIE-UP Entities: “Bridgestone Sport Co.” “a local concern” “a Japanese trading house” Joint Venture Company: “Bridgestone Sports Taiwan Co.” Activity: ACTIVITY-1 Amount: NT$ ACTIVITY-1 Activity: PRODUCTION Company: “Bridgestone Sports Taiwan Co.” Product: “iron and ‘metal wood’ clubs” Start Date: DURING: January 1990

Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and “metal wood” clubs a month. Example of IE: FASTUS(1993) TIE-UP-1 Relationship: TIE-UP Entities: “Bridgestone Sport Co.” “a local concern” “a Japanese trading house” Joint Venture Company: “Bridgestone Sports Taiwan Co.” Activity: ACTIVITY-1 Amount: NT$ ACTIVITY-1 Activity: PRODUCTION Company: “Bridgestone Sports Taiwan Co.” Product: “iron and ‘metal wood’ clubs” Start Date: DURING: January 1990

FASTUS 1.Complex Words: Recognition of multi-words and proper names 2.Basic Phrases: Simple noun groups, verb groups and particles 3.Complex phrases: Complex noun groups and verb groups 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. Based on finite states automata (FSA) set up new Twaiwan dallors a Japanese trading house had set up production of 20, 000 iron and metal wood clubs [company] [set up] [Joint-Venture] with [company]

Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and “metal wood” clubs a month. Example of IE: FASTUS(1993) TIE-UP-1 Relationship: TIE-UP Entities: “Bridgestone Sport Co.” “a local concern” “a Japanese trading house” Joint Venture Company: “Bridgestone Sports Taiwan Co.” Activity: ACTIVITY-1 Amount: NT$ ACTIVITY-1 Activity: PRODUCTION Company: “Bridgestone Sports Taiwan Co.” Product: “iron and ‘metal wood’ clubs” Start Date: DURING: January 1990

Information Extraction ………. Jurgen Pfrang, 51, reportedly stumbled upon the robbers on the second floor of his Nanjing home early on Sunday. The deputy general manager of Yaxing Benz, a Sino-German joint venture that makes buses and bus chassis in nearby Yangzhou, was hacked to death with 45 cm watermelon knives. ………. Name of the Venture: Yaxing Benz Products: buses and bus chassis Location: Yangzhou,China Companies involved: (1)Name: X? Country: German (2)Name: Y? Country: China

Information Extraction A German vehicle-firm executive was stabbed to death …. ………. Jurgen Pfrang, 51, reportedly stumbled upon the robbers on the second floor of his Nanjing home early on Sunday. The deputy general manager of Yaxing Benz, a Sino-German joint venture that makes buses and bus chassis in nearby Yangzhou, was hacked to death with 45 cm watermelon knives. ………. Crime-Type: Murder Type: Stabbing The killed: Name: Jurgen Pfrang Age: 51 Profession: Deputy general manager Location: Nanjing, China Different template for crimes

(1)Information Retrieval/Detection (2)Passage Retrieval (3)Information Extraction (5)Text Understanding (4) Question/Answering Tasks Interpretation of Texts User System

Collection of Texts IR System Characterization of Texts Queries

Collection of Texts IR System Characterization of Texts Queries Interpretation Knowledge

Collection of Texts Passage IR System Characterization of Texts Queries Interpretation Knowledge

Collection of Texts Passage IR System Characterization of Texts Queries Interpretation Knowledge IE System Texts Templates Structures of Sentences NLP

Interpretation Knowledge IE System Texts Templates

Interpretation Knowledge IE System Texts Templates Predefined General Framework of NLP/NLU IE as compromise NLP

(1)Information Retrieval/Detection (2)Passage Retrieval (3)Information Extraction (5)Text Understanding (4) Question/Answering Tasks Performance Evaluation Rather clear A bit vague Rather clear A bit vague Very vague

N N: Correct Documents M:Retrieved Documents C: Correct Documents that are actually retrieved M C Query Collection of Documents Precision: Recall: C M C N F-Value: P R P+R 2P ・ R

N N: Correct Templates M:Retrieved Templates C: Correct Templates that are actually retrieved M C Query Collection of Documents Precision: Recall: C M C N F-Value: P R P+R 2P ・ R More complicated due to partially filled templates

Framework of IE IE as compromise NLP

Syntactic Analysis General Framework of NLP Morphological and Lexical Processing Semantic Analysis Context processing Interpretation Difficulties of NLP (1) Robustness: Incomplete Knowledge Incomplete Domain Knowledge Interpretation Rules Predefined Aspects of Information

Syntactic Analysis General Framework of NLP Morphological and Lexical Processing Semantic Analysis Context processing Interpretation Difficulties of NLP (1) Robustness: Incomplete Knowledge Incomplete Domain Knowledge Interpretation Rules Predefined Aspects of Information

Approaches for building IE systems  Knowledge Engineering Approach Rules crafted by linguists in cooperation with domain experts Most of the work done by insoecting a set of relevant documents

Approaches for building IE systems  Automatically trainable systems Techniques based on statistics and almost no linguistic knowledge Language independent Main input – annotated corpus Small effort for creating rules, but crating annotated corpus laborious

Techniques in IE (1) Domain Specific Partial Knowledge: Knowledge relevant to information to be extracted (2) Ambiguities: Ignoring irrelevant ambiguities Simpler NLP techniques (4) Adaptation Techniques: Machine Learning, Trainable systems (3) Robustness: Coping with Incomplete dictionaries (open class words) Ignoring irrelevant parts of sentences

General Framework of NLP Morphological and Lexical Processing Syntactic Analysis Semantic Anaysis Context processing Interpretation Open class words: Named entity recognition (ex) Locations Persons Companies Organizations Position names Domain specific rules:, Inc. Mr.. Machine Learning: HMM, Decision Trees Rules + Machine Learning Part of Speech Tagger FSA rules Statistic taggers 95 % F-Value 90 Domain Dependent Local Context Statistical Bias

General Framework of NLP Morphological and Lexical Processing Syntactic Analysis Semantic Anaysis Context processing Interpretation FASTUS 1.Complex Words: Recognition of multi-words and proper names 2.Basic Phrases: Simple noun groups, verb groups and particles 3.Complex phrases: Complex noun groups and verb groups 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. Based on finite states automata (FSA)

General Framework of NLP Morphological and Lexical Processing Syntactic Analysis Semantic Anaysis Context processing Interpretation FASTUS 1.Complex Words: Recognition of multi-words and proper names 2.Basic Phrases: Simple noun groups, verb groups and particles 3.Complex phrases: Complex noun groups and verb groups 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. Based on finite states automata (FSA)

General Framework of NLP Morphological and Lexical Processing Syntactic Analysis Semantic Analysis Context processing Interpretation FASTUS 1.Complex Words: Recognition of multi-words and proper names 2.Basic Phrases: Simple noun groups, verb groups and particles 3.Complex phrases: Complex noun groups and verb groups 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. Based on finite states automata (FSA)

Chomsky Hierarchy Hierarchy of Grammar of Automata Regular Grammar Finite State Automata Context Free Grammar Push Down Automata Context Sensitive Grammar Linear Bounded Automata Type 0 Grammar Turing Machine Computationally more complex, Less Efficiency

Chomsky Hierarchy Hierarchy of Grammar of Automata Regular Grammar Finite State Automata Context Free Grammar Push Down Automata Context Sensitive Grammar Linear Bounded Automata Type 0 Grammar Turing Machine Computationally more complex, Less Efficiency A B nn

PN ’s ADJ Art N PN P ’s Art John’s interesting book with a nice cover

PN ’s ADJ Art N PN P ’s Art John’s interesting book with a nice cover

PN ’s ADJ Art N PN P ’s Art John’s interesting book with a nice cover

PN ’s ADJ Art N PN P ’s Art John’s interesting book with a nice cover

PN ’s ADJ Art N PN P ’s Art John’s interesting book with a nice cover

PN ’s ADJ Art N PN P ’s Art John’s interesting book with a nice cover

PN ’s ADJ Art N PN P ’s Art John’s interesting book with a nice cover

PN ’s ADJ Art N PN P ’s Art John’s interesting book with a nice cover

PN ’s ADJ Art N PN P ’s Art John’s interesting book with a nice cover

PN ’s ADJ Art N PN P ’s Art John’s interesting book with a nice cover

PN ’s ADJ Art N PN P ’s Art John’s interesting book with a nice cover Pattern-maching PN ’s (ADJ)* N P Art (ADJ)* N {PN ’s/ Art}(ADJ)* N(P Art (ADJ)* N)*

General Framework of NLP Morphological and Lexical Processing Syntactic Analysis Semantic Analysis Context processing Interpretation FASTUS 1.Complex Words: Recognition of multi-words and proper names 2.Basic Phrases: Simple noun groups, verb groups and particles 3.Complex phrases: Complex noun groups and verb groups 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. Based on finite states automata (FSA)

Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 “metal wood” clubs a month. Example of IE: FASTUS(1993) 1.Complex words 2.Basic Phrases: Bridgestone Sports Co.: Company name said : Verb Group Friday : Noun Group it : Noun Group had set up : Verb Group a joint venture : Noun Group in : Preposition Taiwan : Location Attachment Ambiguities are not made explicit

Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 “metal wood” clubs a month. Example of IE: FASTUS(1993) 1.Complex words 2.Basic Phrases: Bridgestone Sports Co.: Company name said : Verb Group Friday : Noun Group it : Noun Group had set up : Verb Group a joint venture : Noun Group in : Preposition Taiwan : Location {{}} a Japanese tea house a [Japanese tea] house a Japanese [tea house]

Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 “metal wood” clubs a month. Example of IE: FASTUS(1993) 1.Complex words 2.Basic Phrases: Bridgestone Sports Co.: Company name said : Verb Group Friday : Noun Group it : Noun Group had set up : Verb Group a joint venture : Noun Group in : Preposition Taiwan : Location Structural Ambiguities of NP are ignored

Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 “metal wood” clubs a month. Example of IE: FASTUS(1993) 2.Basic Phrases: Bridgestone Sports Co.: Company name said : Verb Group Friday : Noun Group it : Noun Group had set up : Verb Group a joint venture : Noun Group in : Preposition Taiwan : Location 3.Complex Phrases

[COMPNY] said Friday it [SET-UP] [JOINT-VENTURE] in [LOCATION] with [COMPANY] and [COMPNY] to produce [PRODUCT] to be supplied to [LOCATION]. [JOINT-VENTURE], [COMPNY], capitalized at 20 million [CURRENCY-UNIT] [START] production in [TIME] with production of 20,000 [PRODUCT] a month. Example of IE: FASTUS(1993) 2.Basic Phrases: Bridgestone Sports Co.: Company name said : Verb Group Friday : Noun Group it : Noun Group had set up : Verb Group a joint venture : Noun Group in : Preposition Taiwan : Location 3.Complex Phrases Some syntactic structures like …

[COMPNY] said Friday it [SET-UP] [JOINT-VENTURE] in [LOCATION] with [COMPANY] to produce [PRODUCT] to be supplied to [LOCATION]. [JOINT-VENTURE] capitalized at [CURRENCY] [START] production in [TIME] with production of [PRODUCT] a month. Example of IE: FASTUS(1993) 2.Basic Phrases: Bridgestone Sports Co.: Company name said : Verb Group Friday : Noun Group it : Noun Group had set up : Verb Group a joint venture : Noun Group in : Preposition Taiwan : Location 3.Complex Phrases Syntactic structures relevant to information to be extracted are dealt with.

Syntactic variations GM set up a joint venture with Toyota. GM announced it was setting up a joint venture with Toyota. GM signed an agreement setting up a joint venture with Toyota. GM announced it was signing an agreement to set up a joint venture with Toyota.

Syntactic variations GM set up a joint venture with Toyota. GM announced it was setting up a joint venture with Toyota. GM signed an agreement setting up a joint venture with Toyota. GM announced it was signing an agreement to set up a joint venture with Toyota. [SET-UP] GM plans to set up a joint venture with Toyota. GM expects to set up a joint venture with Toyota. S NPVP VNP N VP V GM signed agreement setting up

Syntactic variations GM set up a joint venture with Toyota. GM announced it was setting up a joint venture with Toyota. GM signed an agreement setting up a joint venture with Toyota. GM announced it was signing an agreement to set up a joint venture with Toyota. [SET-UP] GM plans to set up a joint venture with Toyota. GM expects to set up a joint venture with Toyota. S NPVP V GM set up

[COMPNY] [SET-UP] [JOINT-VENTURE] in [LOCATION] with [COMPANY] to produce [PRODUCT] to be supplied to [LOCATION]. [JOINT-VENTURE] capitalized at [CURRENCY] [START] production in [TIME] with production of [PRODUCT] a month. Example of IE: FASTUS(1993) 3.Complex Phrases 4.Domain Events [COMPANY][SET-UP][JOINT-VENTURE]with[COMPNY] [COMPANY][SET-UP][JOINT-VENTURE] (others)* with[COMPNY] The attachment positions of PP are determined at this stage. Irrelevant parts of sentences are ignored.

Complications caused by syntactic variations Relative clause The mayor, who was kidnapped yesterday, was found dead today. [NG] Relpro {NG/others}* [VG] {NG/others}*[VG] [NG] Relpro {NG/others}* [VG]

Complications caused by syntactic variations Relative clause The mayor, who was kidnapped yesterday, was found dead today. [NG] Relpro {NG/others}* [VG] {NG/others}*[VG] [NG] Relpro {NG/others}* [VG]

Complications caused by syntactic variations Relative clause The mayor, who was kidnapped yesterday, was found dead today. [NG] Relpro {NG/others}* [VG] {NG/others}*[VG] [NG] Relpro {NG/others}* [VG] Basic patterns Surface Pattern Generator Patterns used by Domain Event Relative clause construction Passivization, etc.

FASTUS 1.Complex Words: 2.Basic Phrases: 3.Complex phrases: 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. Based on finite states automata (FSA) Piece-wise recognition of basic templates Reconstructing information carried via syntactic structures by merging basic templates NP, who was kidnapped, was found.

FASTUS 1.Complex Words: 2.Basic Phrases: 3.Complex phrases: 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. Based on finite states automata (FSA) Piece-wise recognition of basic templates Reconstructing information carried via syntactic structures by merging basic templates NP, who was kidnapped, was found.

FASTUS 1.Complex Words: 2.Basic Phrases: 3.Complex phrases: 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. Based on finite states automata (FSA) Piece-wise recognition of basic templates Reconstructing information carried via syntactic structures by merging basic templates NP, who was kidnapped, was found.

FASTUS 1.Complex Words: 2.Basic Phrases: 3.Complex phrases: 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. Based on finite states automata (FSA) Piece-wise recognition of basic templates Reconstructing information carried via syntactic structures by merging basic templates NP, who was kidnapped, was found.

Current state of the arts of IE 1.Carefully constructed IE systems F-60 level (interannotater agreement: 60-80%) Domain: telegraphic messages about naval operation (MUC-1:87, MUC-2:89) news articles and transcriptions of radio broadcasts Latin American terrorism (MUC-3:91, MUC-4:1992) News articles about joint ventures (MUC-5, 93) News articles about management changes (MUC-6, 95) News articles about space vehicle (MUC-7, 97) 2.Handcrafted rules (named entity recognition, domain events, etc) Automatic learning from texts: Supervised learning : corpus preparation Non-supervised, or controlled learning