Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing
Kunze, Rösner: Detection of Relations in Textual Documents2 Introduction
Kunze, Rösner: Detection of Relations in Textual Documents3 Introduction to extract information from text, you can use techniques like simple pattern matching etc. additional knowledge is required: 'Thursday': a day of a week meaning of (implicit) `open' vs. `close' `Pay-what-you-wish' text understanding / techniques of NLP `Exhibition of over 30 color photographs and stories of life in China's Yunnan Province …'
Kunze, Rösner: Detection of Relations in Textual Documents4 Introduction ontologies contain information about: definition/description of concepts and description of instances kind of relation (name, type), –definition of domain and range values, –characteristic of the relation: cardinality, transitivity,...,
Kunze, Rösner: Detection of Relations in Textual Documents5 Natural Language Processing NLP techniques: –case frame analysis –exploiting syntactic structures –corpus-based IE for an initial ontology corpus: –autopsy protocols (400 protocols) –different document parts: findings histological findings background discussion … –short linguistic structures –typical attribute-value structures
Kunze, Rösner: Detection of Relations in Textual Documents6 Overview Case Frame Analysis of Specific Syntactic Structures Discussion/Conclusion
Kunze, Rösner: Detection of Relations in Textual Documents7 Case Frames resources: –results from syntactic parser Flachschnitt in das Zungengewebe –results from semantic tagger –description of case frames
Kunze, Rösner: Detection of Relations in Textual Documents8 Case Frames (corpus-based) definition of roles for a concept –`Flachschnitt' (flat cut) `location' –sem. category: `tissue' –PP, case of NP: accusative, preposition: `in' –`Herausschleudern' (skidding) `patient' –sem. category: `body-hum' –NP; case of NP: genitive `location' –sem. category: `vehicle' –PP, case of NP: dative, preposition: `aus'
Kunze, Rösner: Detection of Relations in Textual Documents9 Case Frames … Flachschnitt medizinischer Schnitt TISSUE P(akk, fak, in) in das Zungengewebe Herausschleudern event BODY-HUM N(gen, fak) des Koerpers VEHICLE P(dat, fak, aus) …
Kunze, Rösner: Detection of Relations in Textual Documents10 Case Frames coverage of phrases like `fracture of elbow joint'? abstraction –`fracture' (sem. category: `trauma') role `patient': sem. category: `bone' –`bruise' (sem. category: `trauma') role `patient': sem. category: `organ' –`hematoma' (sem. category: `trauma') role `patient': sem. category: `tissue' concept x (sem. category: `trauma') –role `patient': sem. category: `body-part'
Kunze, Rösner: Detection of Relations in Textual Documents11 Case Frames results: –relations are defined by the case frame name/type of relation domain, range –corpus-based abstractions: redefinition of semantic restriction –use the least general hypernym as semantic restriction not yet extracted: –information about the characteristic of a relation
Kunze, Rösner: Detection of Relations in Textual Documents12 Overview Case Frame Analysis of Specific Syntactic Structures Discussion/Conclusion
Kunze, Rösner: Detection of Relations in Textual Documents13 Analysis of Specific Syntactic Structures from general to specific information resources: –results from syntactic parser –results from semantic tagger –description of interpretation of syntactic structures Which word class can be interpreted as concept/instance? Which word class describes a relation? –adjective in a NP: describes the noun in the NP relation `prop‘ –negations: negate concepts, verbs, or properties of a concept –particle: modification of adjectives
Kunze, Rösner: Detection of Relations in Textual Documents14 Analysis of Specific Syntactic Structures CL Med N ADJ prop(N, ADJ) N interpreted as concept ADJ interpreted as concept results: prop_cat adj (N,ADJ)
Kunze, Rösner: Detection of Relations in Textual Documents15 Analysis of Specific Syntactic Structures `liver tissue bloodless‘ Steps: bloodless* blood concentration bloodless liver_tissue*tissueliver tissue nouns and adjectives are interpreted as concept/instance adjectives describe a relation in general: 'prop' prop_blood-concentration concept instance relation
Kunze, Rösner: Detection of Relations in Textual Documents16 Analysis of Specific Syntactic Structures `liver tissue bloodless‘ … …
Kunze, Rösner: Detection of Relations in Textual Documents17 Analysis of Specific Syntactic Structures "kaum wahrnehmbare Unterblutungen" (Engl. "hardly detectable hematomas") results of syntactic parser: kaum wahrnehmbare Unterblutungen results of semantic tagger: –`kaum': weak-graduation –`wahrnehmbar': unknown token –`Unterblutung': trauma resources for interpretation: N: concept/instance ADJ: concept/instance rel: prop ADV: concept/instance rel: mod adverb specifies adjective adjective specifies noun
Kunze, Rösner: Detection of Relations in Textual Documents18 Analysis of Specific Syntactic Structures `hardly detectable hematomas‘ Steps: detectable*unspecified hematoma*traumahematoma nouns, adjectives and adverbs are interpreted as concept/instance adjectives and adverbs describe relations prop_unspecified concept instance relation hardly*hardlyweak-graduation mod_weak-graduation
Kunze, Rösner: Detection of Relations in Textual Documents19 Analysis of Specific Syntactic Structures `hardly detectable hematomas‘
Kunze, Rösner: Detection of Relations in Textual Documents20 Analysis of Specific Syntactic Structures `hardly detectable hematomas‘
Kunze, Rösner: Detection of Relations in Textual Documents21 Analysis of Specific Syntactic Structures concept instance relation Protégé Plugin for Visualization: Ontoviz Phrases like: NP NP NP NP N Adj Conj Adj NP N conj N Adj …
Kunze, Rösner: Detection of Relations in Textual Documents22 Analysis of Specific Syntactic Structures results –definition of concepts/instances –corpus-based definition/concretion of relations: prop prop_cat ADJ information about domain, relation not extracted: –information about the characteristic of a relation
Kunze, Rösner: Detection of Relations in Textual Documents23 Overview Case Frame Analysis of Specific Syntactic Structures Discussion/Conclusion
Kunze, Rösner: Detection of Relations in Textual Documents24 Conclusion NLP techniques for extraction of information –analyse syntactic structures –information about semantic categories –result: corpus-based description of an initial ontology case frame analysis –relations are described in the case frame –disadvantage: creation of case frames –advantage: a definition of the relation analysis specific syntactic structures –a general interpretation of tokens and the syntactic structures –redefined by results from the semantic tagger –disadvantage: in some case, only the general relation definition is delivered –advantage: less effort to describe the resources
Kunze, Rösner: Detection of Relations in Textual Documents25 Conclusion no information about the characteristic of a relation (cardinality, …) solutions –analyse occurrences in the corpus corpus-based assumption about cardinality –integration of additional knowledge initial domain specific ontology
Kunze, Rösner: Detection of Relations in Textual Documents26 Key Aspects for IE ‘conceptual’ preprocessing steps: Names of concepts occur in different linguistic structures; compound vs. complex noun phrase (like ‘liver tissue’ and ’tissue of liver’) –handle only one canonical linguistic structure as a representative for all paraphrases treatment of generalisation within local contexts –The token ‘liver’ may occur in the first sentence of a paragraph. In the next sentences of the paragraph, only the hypernym ‘organ’ is used. concept or instance: which term in a linguistic structure has to be interpreted as a concept and which as an instance of a concept resp. definition of the scope for a concept: –a paragraph starts with a description of an organ (e.g. organ ‘liver’ in: ‘The liver shows.... Bloodrichness of the tissue.’ ), after this follows a description of parts of the organ (e.g., ‘Gewebe’). In such cases, additional knowledge about the domain has to be employed (for example, about meronyms or holonyms) –tissue part-of liver vs tissue part-of concept X