Download presentation
Presentation is loading. Please wait.
1
A Domain Ontology Engineering Tool with General Ontologies and Text Corpus Naoki Sugiura, Masaki Kurematsu, Naoki Fukuta, Naoki Izumi, & Takahira Yamaguchi
2
DODDLE and DODDLE II Domain Ontology rapiD DeveLopmet Environment Builds taxonomic and non-taxonomic relationships Uses dictionary approach and text corpus (body) to build relationships
3
DODDLE & DODDLE II Large Ontologies are difficult to build by hand Locates relationships between words based on context similarities; even if separated Disadvantages Human Interaction is still required Low amount of success
4
DODDLE vs DODDLE II DODDLE only works on taxonomic relationships DODDLE II Extension of DODDLE Finds non-taxonomic relationships
5
Outline Overview Taxonomic Relationships Non-Taxonomic Relationships Case Studies Problems/Future Work Conclusion Assessment
6
Overview Domain Terms Domain Specific Text Corpus Concept Extraction Module NTRL ModuleTRA Module
7
Overview TRA Module Matched Result Analysis Trimmed Result Analysis Modification using syntactic strategies Taxonomic Relationship MRD (Wordnet) MRD (Wordnet)
8
Overview NTRL Module Extraction of frequent words WordSpace creation Extraction of similar concept pairs Non-Taxonomic Relationship Concept specification templates Domain Specific Text Corpus
9
Overview Taxonomic Relationship Non-Taxonomic Relationship Interaction Module
10
TRA Module Matched Result Analysis Trimmed Result Analysis Modification using syntactic strategies Taxonomic Relationship MRD (Wordnet) MRD (Wordnet)
11
TRA Matched Result Analysis Constructs PAB and STM Trimmed Result Analysis Remove unnecessary nodes Modification using statistical strategies Allows for human input
12
PAB and STM
13
TRA
14
NTRL Module Extraction of frequent words WordSpace creation Extraction of similar concept pairs Non-Taxonomic Relationship Concept specification templates Domain Specific Text Corpus
15
NTRL Extraction of key words Primitive: 4 words Collocation matrix a i,j = f i before f j …f8 f4 f3 f7 f8 f4 f1 f3 f4 f9 f2 f5 f1 f7 f1 f5 … …f8 f4 f3 f7 f8 f4 f1 f3 f4 f9 f2 f5 f1 f7 f1 f5 …
16
NTRL oWordSpace Creation Context Vectors Word Vectors Sum of Context Vectors г(w)=∑ ( ∑ φ(f)) iε C(w) f close to i A vector representation of a word of phrase w a 4-gram vector of a 4 gram f Appearance places of a word or phrase w WordSpace is a collocation of г(w)
17
NTRL Extraction of Concept Pairs Each input has a best-matched “synset” Synset: collection of word vectors Sum of the word vectors set to a concept which corresponds with each input term Inner product of all combinations of concept pairs Match is determined by user set threshold Case Study:.87
18
NTRL Finding Association Rules Locates Rules of the form:
19
NTRL Constructing Concept Specification Templates Set of Similar concept pairs and association rules DODDLE sets priorities between concept pairs Based on TRA Module and Co-occurrence information
20
Case Study Law-“Contract for International Sale of Goods” Business -“XML Common Business Library” Support: 0.4 % Confidence: 80%
21
Law Case Study Given 46 Concepts WordSpace: 77 concept pairs Association between input terms: 55 pairs or terms Templates
22
Business Case Study Input: 57 terms Wordspace: 40 pairs Association between input terms: 39
23
Taxonomic Results Bus. PrecisionRecall per path Recall per subtree Matched Result.2.29.71 Trimmed Result.22.13.5 Law PrecisionRecall per path Recall per subtree Matched Result.25.23.19 Trimmed Result.3.15
24
Non-taxonomic Results Law WSARJoin of WS and AR # Extracted Concept Pairs 7755117 # Accepted Concept Pairs 181327 Precision.23.24.23 Recall.38.27.56 Bus. WSARJoin of WS and AR # Extracted Concept Pairs 403966 # Accepted Concept Pairs 302039 Precision.75.51.59
25
Problems/ Future Work Threshold Changes with each domain Specification of a Concept Relation Still need to specify relationships Ambiguity of Multiple Terminology “transmission” Semantic specialization of multi-definition words needed. DODDLE-R Uses RDF tags
26
Conclusion Uses MRD and text corpus Two strategies for taxonomic: matched result analysis and trimmed result analysis Non-Taxonomic: extracted by co- occurrence information in text corpus Concept Specification: a way to eliminate concept pairs to build an ontology
27
Assessment Designed to be a tool No time results Determining thresholds is plug-and- guess.
28
Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.