Plain Text Information Extraction (based on Machine Learning)

Slides:



Advertisements
Similar presentations
1 Initial Results on Wrapping Semistructured Web Pages with Finite-State Transducers and Contextual Rules Chun-Nan Hsu Arizona State University.
Advertisements

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.
Plain Text Information Extraction (based on Machine Learning ) Chia-Hui Chang Department of Computer Science & Information Engineering National Central.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Information Extraction CS 652 Information Extraction and Integration.
Annotation Free Information Extraction Chia-Hui Chang Department of Computer Science & Information Engineering National Central University
Information Extraction CS 652 Information Extraction and Integration.
Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney.
Information Extraction from Web Documents CS 652 Information Extraction and Integration Li Xu Yihong Ding.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Machine Learning for Information Extraction Li Xu.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Empirical Methods in Information Extraction - Claire Cardie 자연어처리연구실 한 경 수
Presented by Zeehasham Rasheed
Information Extraction from HTML: General Machine Learning Approach Using SRV.
Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,
Methodology Conceptual Database Design
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
1/24 Learning to Extract Genic Interactions Using Gleaner LLL05 Workshop, 7 August 2005 ICML 2005, Bonn, Germany Mark Goadrich, Louis Oliphant and Jude.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Processing of large document collections Part 10 (Information extraction: learning extraction patterns) Helena Ahonen-Myka Spring 2005.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
ENTITY EXTRACTION: RULE-BASED METHODS “I’m booked on the train leaving from Paris at 6 hours 31” Rule: Location (Token string = “from”) ({DictionaryLookup=location}):loc.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –TBEDL ML: Classical methods from AI.
1 A Hierarchical Approach to Wrapper Induction Presentation by Tim Chartrand of A paper bypaper Ion Muslea, Steve Minton and Craig Knoblock.
Presenter: Shanshan Lu 03/04/2010
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
For Monday Finish chapter 19 No homework. Program 4 Any questions?
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Processing of large document collections Part 9 (Information extraction: learning extraction patterns) Helena Ahonen-Myka Spring 2006.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Language Identification and Part-of-Speech Tagging
Approaches to Machine Translation
PRESENTED BY: PEAR A BHUIYAN
A Brief Introduction to Distant Supervision
Rule Induction for Classification Using
Web Information Extraction
Improving a Pipeline Architecture for Shallow Discourse Parsing
Web Data Extraction Based on Partial Tree Alignment
Introduction to Information Extraction
CSc4730/6730 Scientific Visualization
Presented by: Prof. Ali Jaoua
Learning to Parse Database Queries Using Inductive Logic Programming
Discriminative Frequent Pattern Analysis for Effective Classification
Introduction Task: extracting relational facts from text
Approaches to Machine Translation
Kriti Chauhan CSE6339 Spring 2009
Family History Technology Workshop
Hierarchical, Perceptron-like Learning for OBIE
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Plain Text Information Extraction (based on Machine Learning) Chia-Hui Chang Department of Computer Science & Information Engineering National Central University chia@csie.ncu.edu.tw 9/24/2002

Introduction Plain Text Information Extraction The task of locating specific pieces of data from a natural language document To obtain useful structured information from unstructured text DARPA’s MUC program The extraction rules are based on syntactic analyzer semantic tagger

Related Work Free-text documents On-line documents SRV, AAAI-1998 PALKA, MUC-5, 1993 AutoSlog, AAAI-1993 E. Riloff LIEP, IJCAI-1995 Huffman Crystal, IJCAI-1995, KDD-1997 Solderland On-line documents SRV, AAAI-1998 D. Freitag Rapier, ACL-1997, AAAI-1999 M. E. Califf WHISK, ML-1999 Solderland

Dayne Freitag Dayne@cs.cmu.edu AAAI-98 SRV Information Extraction from HTML: Application of a General Machine Learning Approach Dayne Freitag Dayne@cs.cmu.edu AAAI-98

Introduction SRV Extraction pattern A general-purpose relational learner A top-down relational algorithm for IE Reliance on a set of token-oriented features Extraction pattern First-order logic extraction pattern with predicates based on attribute-value tests

Extraction as Text Classification Identify the boundaries of field instances Treat each fragment as a bag-of-words Find the relations from the surrounding context

Relational Learning Inductive Logic Programming (ILP) Input: class-labeled instances Output: classifier for unlabeled instances Typical covering algorithm Attribute values are added greedily to a rule The number of positive examples is heuristically maximized while the number of negative examples is heuristically minimized

Simple Features Features on individual token Length (e.g. single letter or multiple letters) Character type (e.g. numeric or alphabet) Orthography (e.g. capitalized) Part of speech (e.g. verb) Lexical meaning (e.g. geographical_place)

Individual Predicates Length (=3): accepts only fragments containing three tokens Some(?A [] capitalizedp true): the fragment contains some token that is capitalized Every(numericp false): every token in the fragment is non-numeric Position(?A fromfirst <2): the token bound to ?A is either first or second in the fragment Relpos(?A ?B =1) the token bound to ?A immediately preceds the token bound to ?B

Relational Features Relational Feature types Adjacency (next_token) Linguistic syntax (subject_verb)

Example

Search Adding predicates greedily, attempting to cover as many positive and as few negative examples as possible. At every step in rule construction, all documents in the training set are scanned and every text fragment of appropriate size counted. Every legal predicate is assessed in terms of the number of positive and negative examples it covers. A position-predicate is not legal unless some-predicate is already part of the rule

Relational Paths Relational features are used only in the Path argument to the some-predicate Some(?A [prev_token prev_token] capitalized true): The fragment contains some token preceded by a capitalized token two tokens back.

Validation Training Phase Testing 2/3: learning 1/3: validation Bayesian m-estimates: All rules matching a given fragment are used to assign a confidence score. Combined confidence:

Adapting SRV for HTML

Experiments Data Source: Data Set: Two Experiments Four university computer science departments: Cornell, U. of Texas, U. of Washington, U. of Wisconsin Data Set: Course: title, number, instructor Project: title, member 105 course pages 96 project pages Two Experiments Random: 5 cross-validation LOUO: 4-fold experiments

Each rule has its own confidence OPD Coverage: Each rule has its own confidence

MPD

Simply memorizes field instances Baseline Strategies Simply memorizes field instances Random Guesser OPD MPD

Conclusions Increased modularity and flexibility Top-down induction Domain-specific information is separate from the underlying learning algorithm Top-down induction From general to specific Accuracy-coverage trade-off Associate confidence score with predictions Critique: single-slot extraction rule

M.E. Califf and R.J. Mooney ACL-97, AAAI-1999 RAPIER Relational Learning of Pattern-Match Rules for Information Extraction M.E. Califf and R.J. Mooney ACL-97, AAAI-1999

Rule Representation Single-slot extraction patterns Syntactic information (part-of-speech tagger) Semantic class information (WordNet)

The Learning Algorithm A specific to general search The pre-filler pattern contains an item for each word The filler pattern has one item from each word in the filler The post-filler has one item for each word Compress the rules for each slot Generate the least general generalization (LGG) of each pair of rules When the LGG of two constraints is a disjunction, we create two alternatives (1) disjunction (2) removal of the constraints.

Example Located in Atlanta, Georgia. Offices in Kansas City, Missouri.

Example: Located in Atlanta, Georgia. Offices in Kansas City, Missouri. Assume there is a semantic class for states, but not one for cities.

Experimental Evaluation 300 computer-related Jobs 17 slots: employer, location, salary, job requirements, language and platform.

Experimental Evaluation 485 seminar announcement 4 slots:

S. Soderland University of Washington Journal of Machine Learning 1999 WHISK: S. Soderland University of Washington Journal of Machine Learning 1999

Semi-structured Text

Free Text Person name Position Verb stem Verb stem

WHISK Rule Representation For Semi-structured IE

WHISK Rule Representation For Free Text IE Skip only whithin the same syntactic field Person name Position Verb stem Verb stem

Example – Tagged by Users

The WHISK Algorithm

Creating a Rule from a Seed Instance Top-down rule induction Start from an empty rule Add terms within the extraction boundary (Base_1) Add terms just outside the extraction (Base_2) Until the seed is covered

Example

EN

AutoSlog: Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Dept. of Computer Science, University of Massachusetts, AAAI93

AutoSlog Purpose: Extraction pattern (concept nodes): Automatically constructs a domain-specific dictionary for IE Extraction pattern (concept nodes): Conceptual anchor: a trigger word Enabling conditions: constraints

Physical target slot of a bombing template Concept Node Example Physical target slot of a bombing template

Construction of Concept Nodes Given a target piece of information. AutoSlog finds the first sentence in the text that contains the string. The sentence is handed over to CIRCUS which generates a conceptual analysis of the sentence. The first clause in the sentence is used. A set of heuristics are applied to suggest a good conceptual anchor point for a concept node. If none of the heuristics is satisfied, AutoSlog searches for the next sentence, and goto 3.

Conceptual Anchor Point Heuristics

Background Knowledge Concept Node Construction Domain Specification Slot The slot of the answer key Hard and soft constraints Type: Use template types such as bombing, kidnapping Enabling condition: heuristic pattern Domain Specification The type of a template The constraints for each template slot

Another good concept node definition Perpetrator slot from a perpetrator template

A bad concept node definition Victim slot from a kidnapping template

Empirical Results Input: Output: Performance: Annotated corpus of texts in which the targeted information is marked and annotated with semantic tags denoting the type of information (e.g., victim) and type of event (e.g., kidnapping) 1500 texts with 1258 answer keys contain 4780 string fillers Output: 1237 concept node definitions Human intervention: 5 user-hour to sift through all generated concept nodes 450 definitions are kept Performance:

Conclusion In 5 person-hour, AutoSlog creates a dictionary that achieves 98% of the performance of hand-crafted dictionary Each concept node is a single-slot extraction pattern Reasons for bad definitions When a sentence contains the targeted string but does not describe the event When a heuristic proposes the wrong conceptual anchor point When CIRCUS incorrectly analyzes the sentence

CRYSTAL: Inducing a Conceptual Dictionary S. Soderland, D. Fisher, J. Aseltine, W. Lehnert University of Massachusetts IJCAI’95

Concept Nodes (CN) CN-type Subtype Extracted syntactic constituents Linguistic patterns Constraints on syntactic constituents

The CRYSTAL Induction Tool Creating initial CN definitions For each instance Inducing generalized CN definitions Relaxing constraints for highly similar definitions Word constraints: intersecting strings of words Class constraints: moving up the semantic hierarchy

Inducing Generalized CN Definitions Start from a CN definition, D Assume we have found a second definition D’ which is similar to D, Create a new definition U Delete from the dictionary all definitions covered by U, e.g. D and D’ Test if U extracts only marked information If ‘Yes’, then go to Step 2 and set D=U, If ‘No’, then start from another definition as D

Implementation Issue Finding similar definitions Similarity metric Indexing CN definitions by verbs and by extraction buffers Similarity metric Intersecting classes or intersecting strings of words Testing error rate of a generalized definition A database of instances segmented by sentence analyzer is constructed

Experimental Results 385 annotated hospital discharge reports 14719 training instances The choice of error tolerance parameter is used to manipulate a tradeoff between precision and recall Output: CN definitions 194, coverage=10 527, 2<coverage<10

Comparison Bottom-up: From specific to generalized CRYSTAL [Soderland, 1996] RAPIER [Califf & Mooney, 1997] Top-down: From general to specific SRV [Freitag, 1998] WHISK [Soderland, 1999]

References I. Muslea, Extraction Patterns for Information Extraction Tasks: A Survey, The AAAI-99 Workshop on Machine Learning for Information Extraction. Riloff, E. (1993) Automatically Constructing a Dictionary for Information Extraction Tasks, AAAI-93, pp. 811-816 S. Soderland, et al, CRYSTAL: Inducing a Conceptual Dictionary, AAAI-95. Dayne Freitag, Information Extraction from HTML: Application of a General Machine Learning Approach, AAAI98 Mary Elaine Califf and Raymond J. Mooney, Relational Learning of Pattern-Match Rules for Information Extraction, AAAI-99, Orlando, FL, pp. 328-334, July, 1999. S. Soderland, Learning information extraction rules for semi-structured and free text. J. of Machine Learning, 1999.