© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Human Language Technology in Ontology Engineering Ontology Learning from Text Paul Buitelaar.

Slides:



Advertisements
Similar presentations
Context Problem Research Question Background Framework Results Demo Conclusions Further Work Ricardo Gacitua 1, Pete Sawyer 1, Paul Rayson 1, Scott Piao.
Advertisements

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Toward Linguistically Grounded Ontologies by Paul Buitelaar, Philipp Cimiano, Peter Haase, and Michael Sintek (Ireland, Netherlands, Germany) presented.
© Paul Buitelaar: eJustice Presentation, July 15th, 2004 Ontologies Contributions from Language Technology Paul Buitelaar DFKI GmbH Language Techology.
Language Technology for the Semantic Web OntoWeb/AgentLink, Barcelona: February 4 th,2003 OntoWeb SIG5 Language Technology in.
FCA-MERGE: Bottom-up Merging of Ontologies
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Introduction to Computational Linguistics Lecture 2.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
1 CS 502: Computing Methods for Digital Libraries Lecture 12 Information Retrieval II.
Bootstrapping an Ontology-based Information Extraction System Alexander Maedche, Günter Neumann, Steffen Staab (presented by D. Lonsdale) CS 652 – June.
Empirical Methods in Information Extraction - Claire Cardie 자연어처리연구실 한 경 수
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved Digital Enterprise Research Institute Ontologies & Natural Language.
Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
10 December, 2013 Katrin Heinze, Bundesbank CEN/WS XBRL CWA1: DPM Meta model CWA1Page 1.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Knowledge Discovery in Ontology Learning A survey.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 The BT Digital Library A case study in intelligent content management Paul Warren
NLP superficial and lexic level1 Superficial & Lexical level 1 Superficial level What is a word Lexical level Lexicons How to acquire lexical information.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Language Technology for the Semantic Web OntoWeb5,Florida,October 17 th,2003 WP12: Language Technology Overview SIG5 Paul Buitelaar.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
© Paul Buitelaar, February 2002 Corpus Annotation Day at DI Multi-Layer Annotation for Cross- Lingual Information Retrieval in the Medical Domain Paul.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas.
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Semiautomatic domain model building from text-data Petr Šaloun Petr Klimánek Zdenek Velart Petr Šaloun Petr Klimánek Zdenek Velart SMAP 2011, Vigo, Spain,
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Towards the Semantic Web 6 Generating Ontologies for the Semantic Web: OntoBuilder R.H.P. Engles and T.Ch.Lech 이 은 정
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Towards Linguistically Grounded Ontologies Paul Buitelaar, Philipp Cimiano, Peter Haase, and Michael Sintek Proceedings of the 6 th European Semantic Web.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
SALSA-WS 09/05 Approximating Textual Entailment with LFG and FrameNet Frames Aljoscha Burchardt, Anette Frank Computational Linguistics Department Saarland.
July 2002, DI Colloquium Semantic Annotation for Semantic Indexing Paul Buitelaar, Martin VolkMuchMore DFKI Language Technology Saarbrücken, Germany Eurospider.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
MDD-Kurs / MDA Cortex Brainware Consulting & Training GmbH Copyright © 2007 Cortex Brainware GmbH Bild 1Ver.: 1.0 How does intelligent functionality implemented.
Human Language Technology for the Semantic Web Paul Buitelaar DFKI GmbH Language Techology Lab & DFKI Competence Center Semantic Web Saarbrücken,
Learning Attributes and Relations
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Extracting Semantic Concept Relations
CS246: Information Retrieval
Presentation transcript:

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Human Language Technology in Ontology Engineering Ontology Learning from Text Paul Buitelaar DFKI GmbH Language Techology Lab DFKI Competence Center Semantic Web Saarbrücken, Germany

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Overview  HLT and Ontology Engineering  Automated Linguistic Analysis  Ontology Learning from Text  Further Issues: Evaluation  Conclusions

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Ontology Lifecycle Creating Populating Validating Evolving Maintaining Deploying

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 HLT in the Ontology Lifecycle Ontology (Knowledge) Ontology Learning Development & Evolution Linguistic Analysis to Extract Classes / Relations Ontology Population Knowledge Base Generation Linguistic Analysis to Extract Instances Instances Documents (Text) HLT for Ontology Learning and Population from Text Human Language Technology = Automated Linguistic Analysis Classes, Relations/Properties

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Automated Linguistic Analysis

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Linguistic Analysis: Example The Dell computer with a flat screen had to be rejected because of a failure in the motherboard. Dell computer flat screen motherboard has-a reject failure location-of animate-entity

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Levels of Linguistic Analysis Lexical Analysis  Word Class: Part-of-Speech (also Semantic Class)  Word Structure: Morphology Phrase Analysis  Sentence Structure: Phrases (if ‘shallow’: Chunks )  Semantic Units Dependency Structure Analysis  Sentence Meaning: Predicate Argument Structure (Clause)  Semantic Structure

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Part-of-Speech, Morphology Part-of-Speech  e.g.: noun, verb, adjective, preposition, …  PoS tag sets may have between 10 and 50 (or more) tags Morphology  Most languages have inflection and declination, e.g.: Singular/Plural computer, computers Present/Past reject, rejected  Many languages have also complex (de)composition, e.g.: Flachbildschirm (flat screen)> flach + Bildschirm > flach + Bild + Schirm

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Phrases, Terms, Named Entities Semantic Units  Phrases (e.g. nominal - NP, prepositional - PP) NP a flat screen PP with a flat screen NP (recursive) the Dell computer with a flat screen a failure in the motherboard  Terms (domain-specific phrases) Dell computer Dell computer with a flat screen  Named Entities (phrases corresponding to dates, names, …) COMPANY Dell COMPANY Dell Computer Corporation PERSON Michael Dell

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Dependency Structure (I) Semantic Structure  Dependencies between Predicates and Arguments the Dell computer with a flat screen had to be rejected PRED: reject ARG1: ENTITY ARG2: ‘the Dell computer with a flat screen’ ‘Logical Form’ : reject(x,y) & animate-entity(x) & computer(y) & …  Dependency Structure Analysis is based on: Sub-categorization Frames reject :: Subj:NP, Obj:NP Selection Restrictions reject :: Subj:NP:ANIMATE-ENTITY, Obj:NP:ENTITY

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Dependency Structure (II) The Dell computer that has been rejected was claimed to have suffered from handling. reject(e 1,x 1,y 1 ) & animate-entity(x 1 ) & Dell_computer(y 1 ) & claim(e 2,x 2,e 3 ) & animate-entity(x 2 ) & suffer_from(e 3,y 1,y 2 ) & handling (y 2 ) PRED claim SUBJ y 1 XCOMP PRED computer MOD Dell ADJUNCT PRED reject PRED suffer SUBJ y 1 OBL-from handling claim y1y1 Dell reject suffer y1y1 y1y1 handling SUBJ XCOMP MOD ADJUNCTOBL-from SUBJ y 1 : computer Lexical Functional Grammar (LFG)

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Ontology Learning from Text

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Some History Lexical Knowledge Extraction  Extraction of lexical semantic representations (word meaning) from Machine Readable Dictionaries – 70‘s/80‘s  Extraction of semantic lexicons from corpora for Information Extraction systems - 80‘s/90‘s, e.g. CRYSTAL (Soderland)  Answer extraction in Question Answering, e.g. Webclopedia (Hovy) Thesaurus Extraction  Similar work, (complex, multilingual) term extraction  e.g. Sextant (Grefenstette); DR-Link (Liddy) Ontology Learning from Text  Similar work, (domain-specific) term / relation extraction  e.g. TextToOnto (Maedche & Staab), OntoLearn (Velardi et al.)  Discussed here: OntoLT (Buitelaar, Olejnik & Sintek)

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 TextToOnto Association Rules

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 OntoLearn Domain-Specific WordNet Tuning and Extension

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 OntoLT: Some Background Ontology Learning from Text  Taxonomy Extraction, Document Clustering String-based, Document Level  “Unnamed” Relation Extraction, Word Clustering Stemming & Part-of-Speech, Token Level  Extraction of Terms, “Named” Relations Pred-Arg & Head-Mod Structure, Term Level TextToOnto OntoLearn Text in Ontology Engineering  Textual Grounding of Concepts Retain Linguistic Contexts and Realizations  Text-based Ontology Monitoring Compare Language Use over Time

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 OntoLT: Some Background Ontology Learning from Text  Taxonomy Extraction, Document Clustering String-based, Document Level  “Unnamed” Relation Extraction, Word Clustering Stemming & Part-of-Speech, Token Level  Extraction of Terms, “Named” Relations Pred-Arg & Head-Mod Structure, Term Level Text in Ontology Engineering  Textual Grounding of Concepts Retain Linguistic Contexts and Realizations  Text-based Ontology Monitoring Compare Language Use over Time OntoLT

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 OntoLT What is it? OntoLT provides a middleware solution in ontology development that enables the ontology engineer to bootstrap or extend a domain- specific ontology from a relevant text collection How does it work? 1. automatic linguistic annotation 2. automatic statistical preprocessing 3. interactive definition of mapping rules 4. interactive user validation of candidates 5. automatic integration into an ontology

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 OntoLT: Architecture

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 … … … … … mittler patellar Sehne Drittel … Linguistic Annotation … … mittlere Patellarsehnendrittel (mid patellar ligament third) An 40 Kniegelenkpräparaten wurden mittlere Patellarsehnendrittel mit einer neuen Knochenverblockungstechnik in einem zweistufigen Bohrkanal femoral fixiert.

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Mapping Rules Precondition Language Var (Y, XPath (Y)) Get all occurrences of element Y, e.g. HeadNoun, Modifier, Subject, … Concat ConcatList combined through AND, OR, NOT, EQUAL Operators CreateCls create a new class with super-class AddSlot add a slot with range to a new or existing class CreateInst introduce an instance for a new or existing class FillSlot set the value of a slot of an instance

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Mapping Rules Precondition Language Var (Y, XPath (Y)) Get all occurrences of element Y, e.g. HeadNoun, Modifier, Subject, … Concat ConcatList combined through AND, OR, NOT, EQUAL Operators CreateCls create a new class with super-class AddSlot add a slot with range to a new or existing class CreateInst introduce an instance for a new or existing class FillSlot set the value of a slot of an instance

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Example Experiment Ontology Extraction for Neurology  Neurology Section of a Medical Corpus Medical Scientific Journal Abstracts – MuchMore Project  XML-based Linguistic Annotation PoS, Lemmatization, Phrases, Pred-Arg Structure  Statistical Preprocessing (chi-square) Select Domain-Relevant Linguistic Entities  Definition of Mapping Rules Define Operators for Selected Linguistic Entities  Generate & Validate Class/Slot Candidates Select Candidates for Integration in Neurology Ontology  Generate “Ontology Fragments” for Neurology

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004

Further Issues Future Development  Organization of Class/Slot Candidate List Inference & Clustering - “Graph Restructuring”  Extend Statistical Preprocessing Multiple Reference Corpora Extended Frequency Information  Include Machine Learning Approach Semi-Automatic Definition of Mapping Rules Performance Evaluation  Guidelines ECAI04 Workshop on OLP  Benchmark Challenge within PASCAL NoE

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Evaluation: What? -- Subtasks  Classes (Multilingual) Term Extraction Named-Entity Recognition Similarity Thesaurus Term,Document Clustering  Class-Hierarchy (Taxonomy) Thesaurus Extraction Term,Document Clustering  Class-Properties (Relations) Relation Extraction ? Formal Properties of Relations (Properties)  Class-Instances (Individuals) (Multilingual) Term Extraction Named-Entity Recognition Term,Document Classification

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Evaluation: How? By Sub-Task – Evaluation of:  Classes – Term,NE Extraction,Clustering  Class-Hierarchy – Thesaurus Extraction  Class-Properties – Relation Extraction  Class-Instances – Term,NE Extraction,Classification By Application – Evaluation of:  Ontology Learning and Population – Gold Standard  IR,QA – Precision /Recall Increase with Ontology?  Interactive QA – Increased User Satisfaction?  Information Access – Increased User Performance?

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Conclusions Stay Tuned  OntoLT Release To be Announced on Protégé-Discussion List  Evaluation Ontology Learning & Population (OLP) Challenge Within PASCAL NoE - First Task Spring 2005 ECAI04 Workshop: Evaluation of Text-based OLP