Download presentation
Presentation is loading. Please wait.
1
Information Extraction from Biomedical Text Jerry R. Hobbs Artificial Intelligence Center SRI International
2
Introduction Information Extraction: Extract entities, relations, events Capture structured information Domain specific Focus only relevant parts Mainly on economic and military interest? Biomedical domain
3
Cascaded Finite-State Transducers Separate Processing into several stages FASTUS (Finite-State Automaton Text Understanding System) Earlier Stages: Smaller linguistic objects Domain independent Later Stages: Domain dependent patterns
4
Cascaded Finite-State Transducers Complex Words Basic Phrases Complex phrases Domain Patterns Merging Structures
5
Example gamma-Glutamyl kinase, the 1 st enzyme of the proline biosynthetic pathway, was puried to a homogeneity from an Escherichia coli strain resistant to the proline analog 3,4-dehydroproline. The enzyme had a native molecular weight of 236,000 and was apparently comprised of six identical 40,000-dalton subunits.
6
Target Database Reaction Object: Attributes ID Pathway Enzyme .. Enzyme Object Attribute ID Name Molecular-Weight Subunit-Component Subunit-Number
7
Complex Words gamma-Glutamyl kinase, the 1 st enzyme of the proline biosynthetic pathway, was purified to a homogeneity from an Escherichia coli strain resistant to the proline analog 3,4- dehydroproline. The enzyme had a native molecular weight of 236,000 and was apparently comprised of six identical 40,000-dalton subunits. gamma-Glutamyl kinase, the 1 st enzyme of the proline biosynthetic pathway, was purified to a homogeneity from an Escherichia coli strain resistant to the proline analog 3,4- dehydroproline. The enzyme had a native molecular weight of 236,000 and was apparently comprised of six identical 40,000-dalton subunits. Recognizes multiword fixed phrases proper names Rich in the biological domain Use lexicon or ML and Statistic methods
8
Basic Phrases Segment a sentence into noun groups, verb groups, and particles Use Sager 1981 grammar
9
Complex Phrases Appositives with their Head none groups “of” prepositional phrases to Their head noun groups
10
Complex Phrases Structures of basic and complex phrases, entities and events
11
Clause-Level Domain Patterns The enzyme had a native molecular weight of 236,000 and was apparently comprised of six identical 40,000- dalton subunits.
12
Clause-Level Domain Patterns The enzyme had a native molecular weight of 236,000 and was apparently comprised of six identical 40,000- dalton subunits.
13
Merging Structures First 4 levels: processes within single sentence This level: collect and combine information for on entity or relationship Three Criteria: The internal structure of noun groups The nearness along some metric Consistency and compatibility of the 2 structures
15
Compile – Time Transformations Subject-Verb-Object pattern linguistic patterns (passive, relative clauses, etc)
16
Types of Specialized Domains “noun-driven” approach The type of an entity is highly predictive of its role in event Loose S-V-O patterns “verb-driven” approach The role of the entities in events cannot be predicted from their type Tight S-V-O patterns
17
Limitation of IE Technology MUC (1990): Name recognition: ~95% recall and precision Event recognition: ~60% recall and precision Possible reasons: Process of merging Only works with explicit information Common cases are covered, how about those rare cases?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.