Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Domain Ontology Engineering Tool with General Ontologies and Text Corpus Naoki Sugiura, Masaki Kurematsu, Naoki Fukuta, Naoki Izumi, & Takahira Yamaguchi.

Similar presentations


Presentation on theme: "A Domain Ontology Engineering Tool with General Ontologies and Text Corpus Naoki Sugiura, Masaki Kurematsu, Naoki Fukuta, Naoki Izumi, & Takahira Yamaguchi."— Presentation transcript:

1 A Domain Ontology Engineering Tool with General Ontologies and Text Corpus Naoki Sugiura, Masaki Kurematsu, Naoki Fukuta, Naoki Izumi, & Takahira Yamaguchi

2 DODDLE and DODDLE II  Domain Ontology rapiD DeveLopmet Environment  Builds taxonomic and non-taxonomic relationships  Uses dictionary approach and text corpus (body) to build relationships

3 DODDLE & DODDLE II  Large Ontologies are difficult to build by hand  Locates relationships between words based on context similarities; even if separated  Disadvantages Human Interaction is still required Low amount of success

4 DODDLE vs DODDLE II  DODDLE only works on taxonomic relationships  DODDLE II Extension of DODDLE Finds non-taxonomic relationships

5 Outline  Overview  Taxonomic Relationships  Non-Taxonomic Relationships  Case Studies  Problems/Future Work  Conclusion  Assessment

6 Overview Domain Terms Domain Specific Text Corpus Concept Extraction Module NTRL ModuleTRA Module

7 Overview TRA Module Matched Result Analysis Trimmed Result Analysis Modification using syntactic strategies Taxonomic Relationship MRD (Wordnet) MRD (Wordnet)

8 Overview NTRL Module Extraction of frequent words WordSpace creation Extraction of similar concept pairs Non-Taxonomic Relationship Concept specification templates Domain Specific Text Corpus

9 Overview Taxonomic Relationship Non-Taxonomic Relationship Interaction Module

10 TRA Module Matched Result Analysis Trimmed Result Analysis Modification using syntactic strategies Taxonomic Relationship MRD (Wordnet) MRD (Wordnet)

11 TRA  Matched Result Analysis Constructs PAB and STM  Trimmed Result Analysis Remove unnecessary nodes  Modification using statistical strategies Allows for human input

12 PAB and STM

13 TRA

14 NTRL Module Extraction of frequent words WordSpace creation Extraction of similar concept pairs Non-Taxonomic Relationship Concept specification templates Domain Specific Text Corpus

15 NTRL  Extraction of key words Primitive: 4 words Collocation matrix  a i,j = f i before f j …f8 f4 f3 f7 f8 f4 f1 f3 f4 f9 f2 f5 f1 f7 f1 f5 … …f8 f4 f3 f7 f8 f4 f1 f3 f4 f9 f2 f5 f1 f7 f1 f5 …

16 NTRL oWordSpace Creation Context Vectors Word Vectors  Sum of Context Vectors  г(w)=∑ ( ∑ φ(f)) iε C(w) f close to i A vector representation of a word of phrase w a 4-gram vector of a 4 gram f Appearance places of a word or phrase w WordSpace is a collocation of г(w)

17 NTRL  Extraction of Concept Pairs Each input has a best-matched “synset”  Synset: collection of word vectors Sum of the word vectors set to a concept which corresponds with each input term Inner product of all combinations of concept pairs Match is determined by user set threshold  Case Study:.87

18 NTRL  Finding Association Rules Locates Rules of the form:

19 NTRL  Constructing Concept Specification Templates Set of Similar concept pairs and association rules DODDLE sets priorities between concept pairs  Based on TRA Module and Co-occurrence information

20 Case Study  Law-“Contract for International Sale of Goods”  Business -“XML Common Business Library” Support: 0.4 % Confidence: 80%

21 Law Case Study  Given 46 Concepts  WordSpace: 77 concept pairs  Association between input terms: 55 pairs or terms  Templates

22 Business Case Study  Input: 57 terms  Wordspace: 40 pairs  Association between input terms: 39

23 Taxonomic Results Bus. PrecisionRecall per path Recall per subtree Matched Result.2.29.71 Trimmed Result.22.13.5 Law PrecisionRecall per path Recall per subtree Matched Result.25.23.19 Trimmed Result.3.15

24 Non-taxonomic Results Law WSARJoin of WS and AR # Extracted Concept Pairs 7755117 # Accepted Concept Pairs 181327 Precision.23.24.23 Recall.38.27.56 Bus. WSARJoin of WS and AR # Extracted Concept Pairs 403966 # Accepted Concept Pairs 302039 Precision.75.51.59

25 Problems/ Future Work  Threshold Changes with each domain  Specification of a Concept Relation Still need to specify relationships  Ambiguity of Multiple Terminology “transmission” Semantic specialization of multi-definition words needed.  DODDLE-R Uses RDF tags

26 Conclusion  Uses MRD and text corpus  Two strategies for taxonomic: matched result analysis and trimmed result analysis  Non-Taxonomic: extracted by co- occurrence information in text corpus  Concept Specification: a way to eliminate concept pairs to build an ontology

27 Assessment  Designed to be a tool  No time results  Determining thresholds is plug-and- guess.

28 Questions ?


Download ppt "A Domain Ontology Engineering Tool with General Ontologies and Text Corpus Naoki Sugiura, Masaki Kurematsu, Naoki Fukuta, Naoki Izumi, & Takahira Yamaguchi."

Similar presentations


Ads by Google