Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Slides:



Advertisements
Similar presentations
SWG Strategy (C) Copyright IBM Corp. 2006, All Rights Reserved. P4 Task 2 Fact Extraction using a CNL Current Status David Mott, Dave Braines, ETS,
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Statistical NLP: Lecture 3
Units of specialized knowledge* “A unit of specialized knowledge (SKU) is a unit that represents specialized knowledge at the content level, and communicates.
Experiences with UIMA in NLP teaching and research Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.
1 A Hidden Markov Model- Based POS Tagger for Arabic ICS 482 Presentation A Hidden Markov Model- Based POS Tagger for Arabic By Saleh Yousef Al-Hudail.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Project topics Projects are due till the end of May Choose one of these topics or think of something else you’d like to code and send me the details (so.
Introduction to Computational Linguistics Lecture 2.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
NLP and Speech 2004 English Grammar
Empirical Methods in Information Extraction - Claire Cardie 자연어처리연구실 한 경 수
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Integration of Information Extraction with an Ontology M. Vargas-Vera, J.Domingue, Y.Kalfoglou, E. Motta and S. Buckingham Sum.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,
Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
Knowledge Representation and Semantic Capturing Albena Strupchanska Linguistic Modelling Department, Institute for Parallel Processing, Bulgarian Academy.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
NLP superficial and lexic level1 Superficial & Lexical level 1 Superficial level What is a word Lexical level Lexicons How to acquire lexical information.
Survey of Semantic Annotation Platforms
A hybrid method for Mining Concepts from text CSCE 566 semester project.
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Peter Gärdenfors & Massimo Warglien Using Conceptual Spaces
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
SOEN 343 Software Design Section H Fall 2006 Dr Greg Butler
Complex Linguistic Features for Text Classification: A Comprehensive Study Alessandro Moschitti and Roberto Basili University of Texas at Dallas, University.
Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut.
Flexible Text Mining using Interactive Information Extraction David Milward
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Structural Modeling. Objectives O Understand the rules and style guidelines for creating CRC cards, class diagrams, and object diagrams. O Understand.
Semiautomatic domain model building from text-data Petr Šaloun Petr Klimánek Zdenek Velart Petr Šaloun Petr Klimánek Zdenek Velart SMAP 2011, Vigo, Spain,
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Noun-Phrase Analysis in Unrestricted Text for Information Retrieval David A. Evans, Chengxiang Zhai Laboratory for Computational Linguistics, CMU 34 th.
Lecture 6: Structural Modeling
Linguistic Essentials
The interface between model-theoretic and corpus-based semantics
IFS310: Module 6 3/1/2007 Data Modeling and Entity-Relationship Diagrams.
Design Model Lecture p6 T120B pavasario sem.
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.
MedKAT Medical Knowledge Analysis Tool December 2009.
Some Thoughts to Consider 8 How difficult is it to get a group of people, or a group of companies, or a group of nations to agree on a particular ontology?
Specifications …writing descriptive detail. Specifications: Purpose Document a product in enough detail that someone else could create or maintain it.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Knowledge Structure Vijay Meena ( ) Gaurav Meena ( )
AUTONOMOUS REQUIREMENTS SPECIFICATION PROCESSING USING NATURAL LANGUAGE PROCESSING - Vivek Punjabi.
©2012 Paula Matuszek CSC 9010: Information Extraction Overview Dr. Paula Matuszek (610) Spring, 2012.
Using Semantic Relations to Improve Information Retrieval
Parts of Speech By: Miaya Nischelle Sample. NOUN A noun is a person place or thing.
When our vacation ended Piper and Levy climbed up in the tree, and they would not answer their mother. 1. Which answer contains the prepositional phrase.
Beginning Syntax Linda Thomas
Statistical NLP: Lecture 3
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
GCSE English Language 2017/18 Session 5
Extracting Semantic Concept Relations
PREPOSITIONAL PHRASES
Linguistic Essentials
Semantic Markup for Semantic Web Tools:
Presentation transcript:

Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing

Kunze, Rösner: Detection of Relations in Textual Documents2 Introduction

Kunze, Rösner: Detection of Relations in Textual Documents3 Introduction to extract information from text, you can use techniques like simple pattern matching etc. additional knowledge is required: 'Thursday': a day of a week meaning of (implicit) `open' vs. `close' `Pay-what-you-wish' text understanding / techniques of NLP `Exhibition of over 30 color photographs and stories of life in China's Yunnan Province …'

Kunze, Rösner: Detection of Relations in Textual Documents4 Introduction ontologies contain information about: definition/description of concepts and description of instances kind of relation (name, type), –definition of domain and range values, –characteristic of the relation: cardinality, transitivity,...,

Kunze, Rösner: Detection of Relations in Textual Documents5 Natural Language Processing NLP techniques: –case frame analysis –exploiting syntactic structures –corpus-based IE for an initial ontology corpus: –autopsy protocols (400 protocols) –different document parts: findings histological findings background discussion … –short linguistic structures –typical attribute-value structures

Kunze, Rösner: Detection of Relations in Textual Documents6 Overview Case Frame Analysis of Specific Syntactic Structures Discussion/Conclusion

Kunze, Rösner: Detection of Relations in Textual Documents7 Case Frames resources: –results from syntactic parser Flachschnitt in das Zungengewebe –results from semantic tagger –description of case frames

Kunze, Rösner: Detection of Relations in Textual Documents8 Case Frames (corpus-based) definition of roles for a concept –`Flachschnitt' (flat cut) `location' –sem. category: `tissue' –PP, case of NP: accusative, preposition: `in' –`Herausschleudern' (skidding) `patient' –sem. category: `body-hum' –NP; case of NP: genitive `location' –sem. category: `vehicle' –PP, case of NP: dative, preposition: `aus'

Kunze, Rösner: Detection of Relations in Textual Documents9 Case Frames … Flachschnitt medizinischer Schnitt TISSUE P(akk, fak, in) in das Zungengewebe Herausschleudern event BODY-HUM N(gen, fak) des Koerpers VEHICLE P(dat, fak, aus) …

Kunze, Rösner: Detection of Relations in Textual Documents10 Case Frames coverage of phrases like `fracture of elbow joint'? abstraction –`fracture' (sem. category: `trauma') role `patient': sem. category: `bone' –`bruise' (sem. category: `trauma') role `patient': sem. category: `organ' –`hematoma' (sem. category: `trauma') role `patient': sem. category: `tissue' concept x (sem. category: `trauma') –role `patient': sem. category: `body-part'

Kunze, Rösner: Detection of Relations in Textual Documents11 Case Frames results: –relations are defined by the case frame name/type of relation domain, range –corpus-based abstractions: redefinition of semantic restriction –use the least general hypernym as semantic restriction not yet extracted: –information about the characteristic of a relation

Kunze, Rösner: Detection of Relations in Textual Documents12 Overview Case Frame Analysis of Specific Syntactic Structures Discussion/Conclusion

Kunze, Rösner: Detection of Relations in Textual Documents13 Analysis of Specific Syntactic Structures from general to specific information resources: –results from syntactic parser –results from semantic tagger –description of interpretation of syntactic structures Which word class can be interpreted as concept/instance? Which word class describes a relation? –adjective in a NP: describes the noun in the NP  relation `prop‘ –negations: negate concepts, verbs, or properties of a concept –particle: modification of adjectives

Kunze, Rösner: Detection of Relations in Textual Documents14 Analysis of Specific Syntactic Structures CL Med  N ADJ prop(N, ADJ) N interpreted as concept ADJ interpreted as concept results: prop_cat adj (N,ADJ)

Kunze, Rösner: Detection of Relations in Textual Documents15 Analysis of Specific Syntactic Structures `liver tissue bloodless‘ Steps: bloodless* blood concentration bloodless liver_tissue*tissueliver tissue nouns and adjectives are interpreted as concept/instance adjectives describe a relation in general: 'prop' prop_blood-concentration concept instance relation

Kunze, Rösner: Detection of Relations in Textual Documents16 Analysis of Specific Syntactic Structures `liver tissue bloodless‘ … …

Kunze, Rösner: Detection of Relations in Textual Documents17 Analysis of Specific Syntactic Structures "kaum wahrnehmbare Unterblutungen" (Engl. "hardly detectable hematomas") results of syntactic parser: kaum wahrnehmbare Unterblutungen results of semantic tagger: –`kaum': weak-graduation –`wahrnehmbar': unknown token –`Unterblutung': trauma resources for interpretation: N: concept/instance ADJ: concept/instance rel: prop ADV: concept/instance rel: mod adverb specifies adjective adjective specifies noun

Kunze, Rösner: Detection of Relations in Textual Documents18 Analysis of Specific Syntactic Structures `hardly detectable hematomas‘ Steps: detectable*unspecified hematoma*traumahematoma nouns, adjectives and adverbs are interpreted as concept/instance adjectives and adverbs describe relations prop_unspecified concept instance relation hardly*hardlyweak-graduation mod_weak-graduation

Kunze, Rösner: Detection of Relations in Textual Documents19 Analysis of Specific Syntactic Structures `hardly detectable hematomas‘

Kunze, Rösner: Detection of Relations in Textual Documents20 Analysis of Specific Syntactic Structures `hardly detectable hematomas‘

Kunze, Rösner: Detection of Relations in Textual Documents21 Analysis of Specific Syntactic Structures concept instance relation Protégé Plugin for Visualization: Ontoviz Phrases like: NP  NP NP NP  N Adj Conj Adj NP  N conj N Adj …

Kunze, Rösner: Detection of Relations in Textual Documents22 Analysis of Specific Syntactic Structures results –definition of concepts/instances –corpus-based definition/concretion of relations: prop  prop_cat ADJ information about domain, relation not extracted: –information about the characteristic of a relation

Kunze, Rösner: Detection of Relations in Textual Documents23 Overview Case Frame Analysis of Specific Syntactic Structures Discussion/Conclusion

Kunze, Rösner: Detection of Relations in Textual Documents24 Conclusion NLP techniques for extraction of information –analyse syntactic structures –information about semantic categories –result: corpus-based description of an initial ontology case frame analysis –relations are described in the case frame –disadvantage: creation of case frames –advantage: a definition of the relation analysis specific syntactic structures –a general interpretation of tokens and the syntactic structures –redefined by results from the semantic tagger –disadvantage: in some case, only the general relation definition is delivered –advantage: less effort to describe the resources

Kunze, Rösner: Detection of Relations in Textual Documents25 Conclusion no information about the characteristic of a relation (cardinality, …) solutions –analyse occurrences in the corpus corpus-based assumption about cardinality –integration of additional knowledge initial domain specific ontology

Kunze, Rösner: Detection of Relations in Textual Documents26 Key Aspects for IE ‘conceptual’ preprocessing steps: Names of concepts occur in different linguistic structures; compound vs. complex noun phrase (like ‘liver tissue’ and ’tissue of liver’) –handle only one canonical linguistic structure as a representative for all paraphrases treatment of generalisation within local contexts –The token ‘liver’ may occur in the first sentence of a paragraph. In the next sentences of the paragraph, only the hypernym ‘organ’ is used. concept or instance: which term in a linguistic structure has to be interpreted as a concept and which as an instance of a concept resp. definition of the scope for a concept: –a paragraph starts with a description of an organ (e.g. organ ‘liver’ in: ‘The liver shows.... Bloodrichness of the tissue.’ ), after this follows a description of parts of the organ (e.g., ‘Gewebe’). In such cases, additional knowledge about the domain has to be employed (for example, about meronyms or holonyms) –tissue part-of liver vs tissue part-of concept X