Information Extraction

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Introduction to Computational Linguistics Lecture 2.
Information Extraction from Scientific Texts Junichi Tsujii Graduate School of Science University of Tokyo Japan.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
ISP 433/633 Week 9 NLP in IR. Natural Language Processing Simple Definition: –A study of how to use computers to do things with human languages. What.
CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/2010 Overview of NLP tasks (text pre-processing)
Information Extraction
Introduction to Machine Learning Approach Lecture 5.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Information Extraction Junichi Tsujii Graduate School of Science University of Tokyo Japan Ronen Feldman Bar Ilan University Israel.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Word Sense Disambiguation and Information Retrieval ByGuitao Gao Qing Ma Prof:Jian-Yun Nie.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Finite State Parsing & Information Extraction CMSC Intro to NLP January 10, 2006.
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
Introduction to CL & NLP CMSC April 1, 2003.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
©2012 Paula Matuszek CSC 9010: Information Extraction Overview Dr. Paula Matuszek (610) Spring, 2012.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Using Semantic Relations to Improve Information Retrieval
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
Measuring Monolinguality
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Statistical NLP: Lecture 3
Text Based Information Retrieval
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Information Retrieval and Web Search
Robust Semantics, Information Extraction, and Information Retrieval
Information Retrieval and Web Search
Word Sense Disambiguation
Introduction to Information Extraction
Social Knowledge Mining
Machine Learning in Natural Language Processing
Statistical NLP: Lecture 9
Natural Language - General
Automatic Detection of Causal Relations for Question Answering
Chunk Parsing CS1573: AI Application Development, Spring 2003
Natural Language Processing
Chapter 11 user support.
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
Text Mining & Natural Language Processing
CS246: Information Retrieval
Information Retrieval
Extracting Why Text Segment from Web Based on Grammar-gram
Statistical NLP : Lecture 9 Word Sense Disambiguation
Statistical NLP: Lecture 10
Presentation transcript:

Information Extraction September 28, 2006 11/12/2018

Dictionary Approaches Problem of scale for all ML approaches Build a classifier for each sense ambiguity Machine readable dictionaries (Lesk ‘86) Retrieve all definitions of content words occurring in context of target (e.g. the happy seafarer ate the bass) Compare for overlap with sense definitions of target entry (bass2: a type of fish that lives in the sea) Choose sense with most overlap Limits: Entries are short --> expand entries to ‘related’ words 11/12/2018

Disambiguation using machine readable dictionaries Lesk’s approach [Lesk 1988] : Senses are represented by different definitions Look up context words definitions Find co-occurring words Select most similar sense Accuracy: 50% - 70%. Problem: not enough overlapping words between definitions 11/12/2018

Disambiguation using machine readable dictionaries Wilks’ approach [Wilks 1990] : Attempt to solve Lesk’s problem Expanding dictionary definition Use Longman Dictionary of Contemporary English ( LDOCE ) more word co-occurring evidence collected Accuracy: between 53% and 85%. 11/12/2018

Wilks’ approach [Wilks 1990] Commonly co-occurring words in LDOCE. [Wilks 1990] 11/12/2018

Disambiguation using machine readable dictionaries Luk’s approach [Luk 1995]: Statistical sense disambiguation Use definitions from LDOCE co-occurrence data collected from Brown corpus defining concepts : 1792 words used to write definitions of LDOCE LDOCE pre-processed :conceptual expansion 11/12/2018

Luk’s approach [Luk 1995]: Entry in LDOCE Conceptual expansion 1. (an order given by a judge which fixes) a punishment for a criminal found guilty in court   found guilty in court { {order, judge, punish, crime, criminal,find, guilt, court}, 2. a group of words that forms a statement, command, exclamation, or question, usu. contains a subject and a verb, and (in writing) begins with a capital letter and ends with one of the marks. ! ? {group, word, form, statement, command, question, contain, subject, verb, write, begin, capital, letter, end, mark} } 11/12/2018 Noun “sentence” and its conceptual expansion [Luk 1995]

Luk’s approach [Luk 1995] cont. Collect co-occurrence data of defining concepts by constructing a two-dimensional Concept Co-occurrence Data Table (CCDT) Brown corpus divided into sentences collect conceptual co-occurrence data for each defining concept which occurs in the sentence Insert collect data in the Concept Co-occurrence Data Table. 11/12/2018

Luk’s approach [Luk 1995] cont. Score each sense S with respect to context C 11/12/2018 [Luk 1995]

Luk’s approach [Luk 1995] cont. Select sense with the highest score Accuracy: 77% Human accuracy: 71% 11/12/2018

Approaches using Roget's Thesaurus [Yarowsky 1992] Resources used: Roget's Thesaurus Grolier Multimedia Encyclopedia Senses of a word: categories in Roget's Thesaurus 1042 broad categories covering areas like, tools/machinery or animals/insects 11/12/2018

Approaches using Roget's Thesaurus [Yarowsky 1992] cont. tool, implement, appliance, contraption, apparatus, utensil, device, gadget, craft, machine, engine, motor, dynamo, generator, mill, lathe, equipment, gear, tackle, tackling, rigging, harness, trappings, fittings, accoutrements, paraphernalia, equipage, outfit, appointments, furniture, material, plant, appurtenances, a wheel, jack, clockwork, wheel-work, spring, screw, Some words placed into the tools/machinery category [Yarowsky 1992] 11/12/2018

Approaches using Roget's Thesaurus [Yarowsky 1992] cont. Collect context for each category: From Grolier Encyclopedia each occurrence of each member of the category extracts 100 surrounding words Sample occurrence of words in the tools/machinery category [Yarowsky 1992] 11/12/2018

Approaches using Roget's Thesaurus [Yarowsky 1992] cont. Identify and weight salient words: Sample salient words for Roget categories 348 and 414 [Yarowsky 1992] To disambiguate a word: sums up the weights of all salient words appearing in context Accuracy: 92% disambiguating 12 words 11/12/2018

Summary Many useful approaches developed to do WSD Future Supervised and unsupervised ML techniques Novel uses of existing resources (WN, dictionaries) Future More tagged training corpora becoming available New learning techniques being tested, e.g. co-training 11/12/2018

11/12/2018

Information Extraction from Scientific Texts Junichi Tsujii Graduate School of Science University of Tokyo Japan 11/12/2018

What is IE ? 11/12/2018

Application Tasks of NLP (1)Information Retrieval/Detection To search and retrieve documents in response to queries for information (2)Passage Retrieval To search and retrieve part of documents in response to queries for information (3)Information Extraction To extract information that fits pre-defined database schemas or templates, specifying the output formats (4) Question/Answering Tasks To answer general questions by using texts as knowledge base: Fact retrieval, combination of IR and IE (5)Text Understanding 11/12/2018 To understand texts as people do: Artificial Intelligence

(1)Information Retrieval/Detection Ranges of Queries (1)Information Retrieval/Detection (2)Passage Retrieval Pre-Defined: Fixed aspects of information carried in texts (3)Information Extraction (4) Question/Answering Tasks (5)Text Understanding 11/12/2018

Example of IE: FASTUS(1993) Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and “metal wood” clubs a month. TIE-UP-1 Relationship: TIE-UP Entities: “Bridgestone Sport Co.” “a local concern” “a Japanese trading house” Joint Venture Company: “Bridgestone Sports Taiwan Co.” Activity: ACTIVITY-1 Amount: NT$200000000 ACTIVITY-1 Activity: PRODUCTION Company: “Bridgestone Sports Taiwan Co.” Product: “iron and ‘metal wood’ clubs” Start Date: DURING: January 1990 11/12/2018

Example of IE: FASTUS(1993) Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and “metal wood” clubs a month. TIE-UP-1 Relationship: TIE-UP Entities: “Bridgestone Sport Co.” “a local concern” “a Japanese trading house” Joint Venture Company: “Bridgestone Sports Taiwan Co.” Activity: ACTIVITY-1 Amount: NT$200000000 ACTIVITY-1 Activity: PRODUCTION Company: “Bridgestone Sports Taiwan Co.” Product: “iron and ‘metal wood’ clubs” Start Date: DURING: January 1990 11/12/2018

Example of IE: FASTUS(1993) Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and “metal wood” clubs a month. TIE-UP-1 Relationship: TIE-UP Entities: “Bridgestone Sport Co.” “a local concern” “a Japanese trading house” Joint Venture Company: “Bridgestone Sports Taiwan Co.” Activity: ACTIVITY-1 Amount: NT$200000000 ACTIVITY-1 Activity: PRODUCTION Company: “Bridgestone Sports Taiwan Co.” Product: “iron and ‘metal wood’ clubs” Start Date: DURING: January 1990 11/12/2018

Example of IE: FASTUS(1993) Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and “metal wood” clubs a month. TIE-UP-1 Relationship: TIE-UP Entities: “Bridgestone Sport Co.” “a local concern” “a Japanese trading house” Joint Venture Company: “Bridgestone Sports Taiwan Co.” Activity: ACTIVITY-1 Amount: NT$200000000 ACTIVITY-1 Activity: PRODUCTION Company: “Bridgestone Sports Taiwan Co.” Product: “iron and ‘metal wood’ clubs” Start Date: DURING: January 1990 11/12/2018

Example of IE: FASTUS(1993) Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and “metal wood” clubs a month. TIE-UP-1 Relationship: TIE-UP Entities: “Bridgestone Sport Co.” “a local concern” “a Japanese trading house” Joint Venture Company: “Bridgestone Sports Taiwan Co.” Activity: ACTIVITY-1 Amount: NT$200000000 ACTIVITY-1 Activity: PRODUCTION Company: “Bridgestone Sports Taiwan Co.” Product: “iron and ‘metal wood’ clubs” Start Date: DURING: January 1990 11/12/2018

Based on finite states automata (FSA) FASTUS Based on finite states automata (FSA) 1.Complex Words: Recognition of multi-words and proper names set up new Twaiwan dallors 2.Basic Phrases: Simple noun groups, verb groups and particles a Japanese trading house had set up 3.Complex phrases: Complex noun groups and verb groups production of 20, 000 iron and metal wood clubs 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. [company] [set up] [Joint-Venture] with 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. 11/12/2018

Example of IE: FASTUS(1993) Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and “metal wood” clubs a month. TIE-UP-1 Relationship: TIE-UP Entities: “Bridgestone Sport Co.” “a local concern” “a Japanese trading house” Joint Venture Company: “Bridgestone Sports Taiwan Co.” Activity: ACTIVITY-1 Amount: NT$200000000 ACTIVITY-1 Activity: PRODUCTION Company: “Bridgestone Sports Taiwan Co.” Product: “iron and ‘metal wood’ clubs” Start Date: DURING: January 1990 11/12/2018

Information Extraction ………. Jurgen Pfrang, 51, reportedly stumbled upon the robbers on the second floor of his Nanjing home early on Sunday. The deputy general manager of Yaxing Benz, a Sino-German joint venture that makes buses and bus chassis in nearby Yangzhou, was hacked to death with 45 cm watermelon knives. Name of the Venture: Yaxing Benz Products: buses and bus chassis Location: Yangzhou,China Companies involved: (1)Name: X? Country: German (2)Name: Y? Country: China 11/12/2018

Information Extraction A German vehicle-firm executive was stabbed to death …. ………. Jurgen Pfrang, 51, reportedly stumbled upon the robbers on the second floor of his Nanjing home early on Sunday. The deputy general manager of Yaxing Benz, a Sino-German joint venture that makes buses and bus chassis in nearby Yangzhou, was hacked to death with 45 cm watermelon knives. Crime-Type: Murder Type: Stabbing The killed: Name: Jurgen Pfrang Age: 51 Profession: Deputy general manager Location: Nanjing, China Different template for crimes 11/12/2018

Interpretation of Texts (1)Information Retrieval/Detection User System (2)Passage Retrieval (3)Information Extraction (4) Question/Answering Tasks (5)Text Understanding 11/12/2018

Characterization of Texts Queries IR System Collection of Texts 11/12/2018

Characterization of Texts Interpretation Knowledge Characterization of Texts Queries IR System Collection of Texts 11/12/2018

Characterization of Texts Interpretation Knowledge Characterization of Texts Passage IR System Queries Collection of Texts 11/12/2018

Characterization of Texts Interpretation Knowledge Characterization of Texts Passage IR System IE System Queries Templates Structures of Sentences NLP Collection of Texts Texts 11/12/2018

Interpretation Knowledge IE System Templates Texts 11/12/2018

IE as compromise NLP Predefined Knowledge IE as compromise NLP Interpretation IE System General Framework of NLP/NLU Templates Texts Predefined 11/12/2018

Performance Evaluation (1)Information Retrieval/Detection Rather clear A bit vague Very vague (2)Passage Retrieval (3)Information Extraction (4) Question/Answering Tasks (5)Text Understanding 11/12/2018

Collection of Documents Query N N: Correct Documents M:Retrieved Documents C: Correct Documents that are actually retrieved Collection of Documents M C Precision: Recall: C M N F-Value: P R P+R 2P・R 11/12/2018

Collection of Documents Query N N: Correct Templates M:Retrieved Templates C: Correct Templates that are actually retrieved Collection of Documents M C Precision: Recall: C M N F-Value: P R P+R 2P・R More complicated due to partially filled templates 11/12/2018

Framework of IE IE as compromise NLP 11/12/2018

General Framework of NLP Difficulties of NLP (1) Robustness: Incomplete Knowledge General Framework of NLP Morphological and Lexical Processing Syntactic Analysis Predefined Aspects of Information Semantic Analysis Incomplete Domain Knowledge Interpretation Rules Context processing Interpretation 11/12/2018

General Framework of NLP Difficulties of NLP (1) Robustness: Incomplete Knowledge General Framework of NLP Morphological and Lexical Processing Syntactic Analysis Predefined Aspects of Information Semantic Analysis Incomplete Domain Knowledge Interpretation Rules Context processing Interpretation 11/12/2018

(1) Domain Specific Partial Knowledge: Techniques in IE (1) Domain Specific Partial Knowledge: Knowledge relevant to information to be extracted (2) Ambiguities: Ignoring irrelevant ambiguities Simpler NLP techniques (3) Robustness: Coping with Incomplete dictionaries (open class words) Ignoring irrelevant parts of sentences (4) Adaptation Techniques: Machine Learning, Trainable systems 11/12/2018

General Framework of NLP 95 % F-Value 90 Part of Speech Tagger FSA rules Statistic taggers General Framework of NLP Local Context Statistical Bias Open class words: Named entity recognition (ex) Locations Persons Companies Organizations Position names Morphological and Lexical Processing Syntactic Analysis Domain Dependent Semantic Anaysis Domain specific rules: <Word><Word>, Inc. Mr. <Cpt-L>. <Word> Machine Learning: HMM, Decision Trees Rules + Machine Learning Context processing Interpretation 11/12/2018

General Framework of NLP Based on finite states automata (FSA) FASTUS General Framework of NLP Based on finite states automata (FSA) 1.Complex Words: Recognition of multi-words and proper names Morphological and Lexical Processing 2.Basic Phrases: Simple noun groups, verb groups and particles Syntactic Analysis 3.Complex phrases: Complex noun groups and verb groups 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. Semantic Anaysis Context processing Interpretation 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. 11/12/2018

General Framework of NLP Based on finite states automata (FSA) FASTUS General Framework of NLP Based on finite states automata (FSA) 1.Complex Words: Recognition of multi-words and proper names Morphological and Lexical Processing 2.Basic Phrases: Simple noun groups, verb groups and particles Syntactic Analysis 3.Complex phrases: Complex noun groups and verb groups 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. Semantic Anaysis Context processing Interpretation 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. 11/12/2018

General Framework of NLP Based on finite states automata (FSA) FASTUS General Framework of NLP Based on finite states automata (FSA) 1.Complex Words: Recognition of multi-words and proper names Morphological and Lexical Processing 2.Basic Phrases: Simple noun groups, verb groups and particles Syntactic Analysis 3.Complex phrases: Complex noun groups and verb groups 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. Semantic Analysis Context processing Interpretation 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. 11/12/2018

Computationally more complex, Less Efficiency Chomsky Hierarchy Hierarchy of Grammar of Automata Regular Grammar Finite State Automata Context Free Grammar Push Down Automata Context Sensitive Grammar Linear Bounded Automata Type 0 Grammar Turing Machine Computationally more complex, Less Efficiency 11/12/2018

Computationally more complex, Less Efficiency Chomsky Hierarchy Hierarchy of Grammar of Automata Regular Grammar Finite State Automata Context Free Grammar Push Down Automata Context Sensitive Grammar Linear Bounded Automata Type 0 Grammar Turing Machine Computationally more complex, Less Efficiency A B n 11/12/2018

1 ’s PN Art 2 ADJ N ’s Art 3 John’s interesting book with a nice cover ADJ N ’s Art 3 John’s interesting book with a nice cover P 4 PN 11/12/2018

1 ’s PN Art 2 ADJ N ’s Art 3 John’s interesting book with a nice cover ADJ N ’s Art 3 John’s interesting book with a nice cover P 4 PN 11/12/2018

1 ’s PN Art 2 ADJ N ’s Art 3 John’s interesting book with a nice cover ADJ N ’s Art 3 John’s interesting book with a nice cover P 4 PN 11/12/2018

1 ’s PN Art 2 ADJ N ’s Art 3 John’s interesting book with a nice cover ADJ N ’s Art 3 John’s interesting book with a nice cover P 4 PN 11/12/2018

1 ’s PN Art 2 ADJ N ’s Art 3 John’s interesting book with a nice cover ADJ N ’s Art 3 John’s interesting book with a nice cover P 4 PN 11/12/2018

1 ’s PN Art 2 ADJ N ’s Art 3 John’s interesting book with a nice cover ADJ N ’s Art 3 John’s interesting book with a nice cover P 4 PN 11/12/2018

1 ’s PN Art 2 ADJ N ’s Art 3 John’s interesting book with a nice cover ADJ N ’s Art 3 John’s interesting book with a nice cover P 4 PN 11/12/2018

1 ’s PN Art 2 ADJ N ’s Art 3 John’s interesting book with a nice cover ADJ N ’s Art 3 John’s interesting book with a nice cover P 4 PN 11/12/2018

1 ’s PN Art 2 ADJ N ’s Art 3 John’s interesting book with a nice cover ADJ N ’s Art 3 John’s interesting book with a nice cover P 4 PN 11/12/2018

1 ’s PN Art 2 ADJ N ’s Art 3 John’s interesting book with a nice cover ADJ N ’s Art 3 John’s interesting book with a nice cover P 4 PN 11/12/2018

PN ’s (ADJ)* N P Art (ADJ)* N {PN ’s/ Art}(ADJ)* N(P Art (ADJ)* N)* Pattern-maching PN ’s (ADJ)* N P Art (ADJ)* N {PN ’s/ Art}(ADJ)* N(P Art (ADJ)* N)* 1 ’s PN Art 2 ADJ N ’s Art 3 John’s interesting book with a nice cover P 4 PN 11/12/2018

General Framework of NLP Based on finite states automata (FSA) FASTUS General Framework of NLP Based on finite states automata (FSA) 1.Complex Words: Recognition of multi-words and proper names Morphological and Lexical Processing 2.Basic Phrases: Simple noun groups, verb groups and particles Syntactic Analysis 3.Complex phrases: Complex noun groups and verb groups 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. Semantic Analysis Context processing Interpretation 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. 11/12/2018

Example of IE: FASTUS(1993) Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 “metal wood” clubs a month. 1.Complex words 2.Basic Phrases: Bridgestone Sports Co.: Company name said : Verb Group Friday : Noun Group it : Noun Group had set up : Verb Group a joint venture : Noun Group in : Preposition Taiwan : Location Attachment Ambiguities are not made explicit 11/12/2018

Example of IE: FASTUS(1993) Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 “metal wood” clubs a month. {{ }} 1.Complex words 2.Basic Phrases: Bridgestone Sports Co.: Company name said : Verb Group Friday : Noun Group it : Noun Group had set up : Verb Group a joint venture : Noun Group in : Preposition Taiwan : Location a Japanese tea house a [Japanese tea] house a Japanese [tea house] 11/12/2018

Example of IE: FASTUS(1993) Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 “metal wood” clubs a month. 1.Complex words 2.Basic Phrases: Bridgestone Sports Co.: Company name said : Verb Group Friday : Noun Group it : Noun Group had set up : Verb Group a joint venture : Noun Group in : Preposition Taiwan : Location Structural Ambiguities of NP are ignored 11/12/2018

Example of IE: FASTUS(1993) Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 “metal wood” clubs a month. 2.Basic Phrases: Bridgestone Sports Co.: Company name said : Verb Group Friday : Noun Group it : Noun Group had set up : Verb Group a joint venture : Noun Group in : Preposition Taiwan : Location 3.Complex Phrases 11/12/2018

Example of IE: FASTUS(1993) [COMPNY] said Friday it [SET-UP] [JOINT-VENTURE] in [LOCATION] with [COMPANY] and [COMPNY] to produce [PRODUCT] to be supplied to [LOCATION]. [JOINT-VENTURE], [COMPNY], capitalized at 20 million [CURRENCY-UNIT] [START] production in [TIME] with production of 20,000 [PRODUCT] a month. 2.Basic Phrases: Bridgestone Sports Co.: Company name said : Verb Group Friday : Noun Group it : Noun Group had set up : Verb Group a joint venture : Noun Group in : Preposition Taiwan : Location 3.Complex Phrases Some syntactic structures like … 11/12/2018

Example of IE: FASTUS(1993) [COMPNY] said Friday it [SET-UP] [JOINT-VENTURE] in [LOCATION] with [COMPANY] to produce [PRODUCT] to be supplied to [LOCATION]. [JOINT-VENTURE] capitalized at [CURRENCY] [START] production in [TIME] with production of [PRODUCT] a month. 2.Basic Phrases: Bridgestone Sports Co.: Company name said : Verb Group Friday : Noun Group it : Noun Group had set up : Verb Group a joint venture : Noun Group in : Preposition Taiwan : Location 3.Complex Phrases Syntactic structures relevant to information to be extracted are dealt with. 11/12/2018

GM set up a joint venture with Toyota. Syntactic variations GM set up a joint venture with Toyota. GM announced it was setting up a joint venture with Toyota. GM signed an agreement setting up a joint venture with Toyota. GM announced it was signing an agreement to set up a joint venture with Toyota. 11/12/2018

GM set up a joint venture with Toyota. Syntactic variations GM set up a joint venture with Toyota. GM announced it was setting up a joint venture with Toyota. GM signed an agreement setting up a joint venture with Toyota. GM announced it was signing an agreement to set up a joint venture with Toyota. S NP VP V N GM signed agreement setting up [SET-UP] GM plans to set up a joint venture with Toyota. GM expects to set up a joint venture with Toyota. 11/12/2018

GM set up a joint venture with Toyota. Syntactic variations GM set up a joint venture with Toyota. GM announced it was setting up a joint venture with Toyota. GM signed an agreement setting up a joint venture with Toyota. GM announced it was signing an agreement to set up a joint venture with Toyota. S [SET-UP] NP VP GM V set up GM plans to set up a joint venture with Toyota. GM expects to set up a joint venture with Toyota. 11/12/2018

Example of IE: FASTUS(1993) The attachment positions of PP are determined at this stage. Irrelevant parts of sentences are ignored. [COMPNY] [SET-UP] [JOINT-VENTURE] in [LOCATION] with [COMPANY] to produce [PRODUCT] to be supplied to [LOCATION]. [JOINT-VENTURE] capitalized at [CURRENCY] [START] production in [TIME] with production of [PRODUCT] a month. 3.Complex Phrases 4.Domain Events [COMPANY][SET-UP][JOINT-VENTURE]with[COMPNY] [COMPANY][SET-UP][JOINT-VENTURE] (others)* with[COMPNY] 11/12/2018

Complications caused by syntactic variations Relative clause The mayor, who was kidnapped yesterday, was found dead today. [NG] Relpro {NG/others}* [VG] {NG/others}*[VG] [NG] Relpro {NG/others}* [VG] 11/12/2018

Complications caused by syntactic variations Relative clause The mayor, who was kidnapped yesterday, was found dead today. [NG] Relpro {NG/others}* [VG] {NG/others}*[VG] [NG] Relpro {NG/others}* [VG] 11/12/2018

Complications caused by syntactic variations Relative clause The mayor, who was kidnapped yesterday, was found dead today. [NG] Relpro {NG/others}* [VG] {NG/others}*[VG] [NG] Relpro {NG/others}* [VG] Basic patterns Surface Pattern Generator Patterns used by Domain Event Relative clause construction Passivization, etc. 11/12/2018

Based on finite states automata (FSA) FASTUS Based on finite states automata (FSA) NP, who was kidnapped, was found. 1.Complex Words: 2.Basic Phrases: 3.Complex phrases: 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. Piece-wise recognition of basic templates Reconstructing information carried via syntactic structures by merging basic templates 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. 11/12/2018

Based on finite states automata (FSA) FASTUS Based on finite states automata (FSA) NP, who was kidnapped, was found. 1.Complex Words: 2.Basic Phrases: 3.Complex phrases: 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. Piece-wise recognition of basic templates 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. Reconstructing information carried via syntactic structures by merging basic templates 11/12/2018

Based on finite states automata (FSA) FASTUS Based on finite states automata (FSA) NP, who was kidnapped, was found. 1.Complex Words: 2.Basic Phrases: 3.Complex phrases: 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. Piece-wise recognition of basic templates 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. Reconstructing information carried via syntactic structures by merging basic templates 11/12/2018

Based on finite states automata (FSA) FASTUS Based on finite states automata (FSA) NP, who was kidnapped, was found. 1.Complex Words: 2.Basic Phrases: 3.Complex phrases: 4.Domain Events: Patterns for events of interest to the application Basic templates are to be built. Piece-wise recognition of basic templates 5. Merging Structures: Templates from different parts of the texts are merged if they provide information about the same entity or event. Reconstructing information carried via syntactic structures by merging basic templates 11/12/2018

Current state of the arts of IE Carefully constructed IE systems F-60 level (interannotater agreement: 60-80%) Domain: telegraphic messages about naval operation (MUC-1:87, MUC-2:89) news articles and transcriptions of radio broadcasts Latin American terrorism (MUC-3:91, MUC-4:1992) News articles about joint ventures (MUC-5, 93) News articles about management changes (MUC-6, 95) News articles about space vehicle (MUC-7, 97) Handcrafted rules (named entity recognition, domain events, etc) Automatic learning from texts: Supervised learning : corpus preparation Non-supervised, or controlled learning 11/12/2018