A Domain Ontology Engineering Tool with General Ontologies and Text Corpus Naoki Sugiura, Masaki Kurematsu, Naoki Fukuta, Naoki Izumi, & Takahira Yamaguchi.

Slides:

Advertisements

Similar presentations

Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme Presented by Smitashree Choudhury.

Advertisements

Improved TF-IDF Ranker

1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.

Leveraging Data and Structure in Ontology Integration Octavian Udrea 1 Lise Getoor 1 Renée J. Miller 2 1 University of Maryland College Park 2 University.

Scott Wen-tau Yih (Microsoft Research) Joint work with Vahed Qazvinian (University of Michigan)

January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.

Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.

IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.

Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.

A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.

1 Extracting RDF Data from Unstructured Sources Based on an RDF Target Schema Tim Chartrand Research Supported By NSF.

1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.

Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University

Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.

CS246 Basic Information Retrieval. Today’s Topic  Basic Information Retrieval (IR)  Bag of words assumption  Boolean Model  Inverted index  Vector-space.

Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.

1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.

Ontology Matching Basics Ontology Matching by Jerome Euzenat and Pavel Shvaiko Parts I and II 11/6/2012Ontology Matching Basics - PL, CS 6521.

Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,

Knowledge Discovery in Ontology Learning A survey.

Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.

Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido.

Intelligent Database Systems Lab Presenter: WU, JHEN-WEI Authors: Rodrigo RizziStarr, Jose´ Maria Parente de Oliveira IS Concept maps as the first.

Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.

Mining and Analysis of Control Structure Variant Clones Guo Qiao.

1 Technologies for (semi-) automatic metadata creation Diana Maynard.

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

Querying Structured Text in an XML Database By Xuemei Luo.

Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut.

Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:

1 Query Operations Relevance Feedback & Query Expansion.

Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,

21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.

1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.

10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.

An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

Semantic Visualization What do we mean when we talk about visualization? - Understanding data - Showing the relationships between elements of data Overviews.

Semantic Wordfication of Document Collections Presenter: Yingyu Wu.

Ontology-Centered Personalized Presentation of Knowledge Extracted from the Web Ralitsa Angelova.

Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.

Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.

1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.

Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.

Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

Model Composition Andrew Finney No relevant affiliation.

Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.

Learning Taxonomic Relations from Heterogeneous Evidence Philipp Cimiano Aleksander Pivk Lars Schmidt-Thieme Steffen Staab (ECAI 2004)

Term Weighting approaches in automatic text retrieval. Presented by Ehsan.

2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.

Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.

Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.

1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.

A Simple English-to-Punjabi Translation System By : Shailendra Singh.

Measuring Monolinguality

Approaches to Machine Translation

Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin

Associative Query Answering via Query Feature Similarity

Approaches to Machine Translation

Ying Dai Faculty of software and information science,

Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.

Ying Dai Faculty of software and information science,

Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou

Presentation transcript:

A Domain Ontology Engineering Tool with General Ontologies and Text Corpus Naoki Sugiura, Masaki Kurematsu, Naoki Fukuta, Naoki Izumi, & Takahira Yamaguchi

DODDLE and DODDLE II  Domain Ontology rapiD DeveLopmet Environment  Builds taxonomic and non-taxonomic relationships  Uses dictionary approach and text corpus (body) to build relationships

DODDLE & DODDLE II  Large Ontologies are difficult to build by hand  Locates relationships between words based on context similarities; even if separated  Disadvantages Human Interaction is still required Low amount of success

DODDLE vs DODDLE II  DODDLE only works on taxonomic relationships  DODDLE II Extension of DODDLE Finds non-taxonomic relationships

Outline  Overview  Taxonomic Relationships  Non-Taxonomic Relationships  Case Studies  Problems/Future Work  Conclusion  Assessment

Overview Domain Terms Domain Specific Text Corpus Concept Extraction Module NTRL ModuleTRA Module

Overview TRA Module Matched Result Analysis Trimmed Result Analysis Modification using syntactic strategies Taxonomic Relationship MRD (Wordnet) MRD (Wordnet)

Overview NTRL Module Extraction of frequent words WordSpace creation Extraction of similar concept pairs Non-Taxonomic Relationship Concept specification templates Domain Specific Text Corpus

Overview Taxonomic Relationship Non-Taxonomic Relationship Interaction Module

TRA Module Matched Result Analysis Trimmed Result Analysis Modification using syntactic strategies Taxonomic Relationship MRD (Wordnet) MRD (Wordnet)

TRA  Matched Result Analysis Constructs PAB and STM  Trimmed Result Analysis Remove unnecessary nodes  Modification using statistical strategies Allows for human input

PAB and STM

TRA

NTRL Module Extraction of frequent words WordSpace creation Extraction of similar concept pairs Non-Taxonomic Relationship Concept specification templates Domain Specific Text Corpus

NTRL  Extraction of key words Primitive: 4 words Collocation matrix  a i,j = f i before f j …f8 f4 f3 f7 f8 f4 f1 f3 f4 f9 f2 f5 f1 f7 f1 f5 … …f8 f4 f3 f7 f8 f4 f1 f3 f4 f9 f2 f5 f1 f7 f1 f5 …

NTRL oWordSpace Creation Context Vectors Word Vectors  Sum of Context Vectors  г(w)=∑ ( ∑ φ(f)) iε C(w) f close to i A vector representation of a word of phrase w a 4-gram vector of a 4 gram f Appearance places of a word or phrase w WordSpace is a collocation of г(w)

NTRL  Extraction of Concept Pairs Each input has a best-matched “synset”  Synset: collection of word vectors Sum of the word vectors set to a concept which corresponds with each input term Inner product of all combinations of concept pairs Match is determined by user set threshold  Case Study:.87

NTRL  Finding Association Rules Locates Rules of the form:

NTRL  Constructing Concept Specification Templates Set of Similar concept pairs and association rules DODDLE sets priorities between concept pairs  Based on TRA Module and Co-occurrence information

Case Study  Law-“Contract for International Sale of Goods”  Business -“XML Common Business Library” Support: 0.4 % Confidence: 80%

Law Case Study  Given 46 Concepts  WordSpace: 77 concept pairs  Association between input terms: 55 pairs or terms  Templates

Business Case Study  Input: 57 terms  Wordspace: 40 pairs  Association between input terms: 39

Taxonomic Results Bus. PrecisionRecall per path Recall per subtree Matched Result Trimmed Result Law PrecisionRecall per path Recall per subtree Matched Result Trimmed Result.3.15

Non-taxonomic Results Law WSARJoin of WS and AR # Extracted Concept Pairs # Accepted Concept Pairs Precision Recall Bus. WSARJoin of WS and AR # Extracted Concept Pairs # Accepted Concept Pairs Precision

Problems/ Future Work  Threshold Changes with each domain  Specification of a Concept Relation Still need to specify relationships  Ambiguity of Multiple Terminology “transmission” Semantic specialization of multi-definition words needed.  DODDLE-R Uses RDF tags

Conclusion  Uses MRD and text corpus  Two strategies for taxonomic: matched result analysis and trimmed result analysis  Non-Taxonomic: extracted by co- occurrence information in text corpus  Concept Specification: a way to eliminate concept pairs to build an ontology

Assessment  Designed to be a tool  No time results  Determining thresholds is plug-and- guess.

Questions ?