Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knowledge Discovery in Ontology Learning A survey.

Similar presentations


Presentation on theme: "Knowledge Discovery in Ontology Learning A survey."— Presentation transcript:

1 Knowledge Discovery in Ontology Learning A survey

2 Outline Introduction OL Data Input OL Application Fields OL Methods OL Tools (practical session)

3 Introduction Ontology Engineering is a time-consuming task Ontology Learning (OL) is the semi-automatic process supporting ontology engineering OL it is a bottom-up and data-driven process OL is an interdisciplinary field

4 OL Data Input Pure NL text Ontologies KB (DB) instances Schemata –DB schemata –Web schemata Log files

5 OL Application Fields OL can support Ontology Engineering (and management) in different phases. –Ontology extraction: based on some input the ontology engineer gets ontology proposal. –Ontology reuse: pruning existing domain ontologies for a specific application. –Ontology interoperability (multiple ontology management): mapping discovery.

6 OL Methods (outline) Ontology Extraction (from text) –Weak ontology notion Document Ontology extraction –Strong ontology notion Association rules Conceptual clustering Ontology Reuse –Ontology Pruning Ontology Learning for interoperability

7 Document Ontology extraction (1) Extraction of concepts from a set of documents and identification of relationships between these concepts with different individual terms [3] No semantic relations extraction Only concepts extraction (aggregation of terms identified with the same concept) Use of statistical analisys above a set of documents Good for domain specific applications

8 Document Ontology extraction (2) Input (text documents) Pre-processing Normalization LSI (using SVD) Document Ontology Construction

9 Document Ontology extraction (3) m x n m x r r x r r x n XX= Terms Documents Singular Value Decomposition Terms Concepts AUSVTVT

10 Association Rules (1) Make use of shallow text processing techniques [6] No taxonomic relation Assumption: syntactic relations  semantic relations

11 Association Rules (2) Preprocess the text documents –Morphological analysis –Recognition of name entities –Retrieval of domain specific concepts (if available) –Disambiguation using context information Determine Concept Pairs set (CP) using several heuristic (either general or domain dependant) –NP-PP heuristic –Sentence heuristic –Title heuristic

12 Association Rules (3) Determine T = {{a i,1,…,a i,n }| (a i,1, a i,2 )  CP  m >2  ((a i,1, a i,m )  H  (a i,2, a i,m )  H)} Determine support and confidence for all association rules X k  Y k, where |X k |=|Y k |=1 Propose to the user only the rules that exceed user-defined thresholds support (X k  Y k ) = confidence (X k  Y k ) = |{t i |X k  Y k  t i }| n |{t i |X k  t i }|

13 Conceptual Clustering (1) Use of conceptual clustering approach [2,5] to extract a hierarchy of concepts and to learn subcategorization frames In our case, examples to cluster are set of words, associated to the frequency of the corresponding instantiated frame in the corpora Syntactic parser provides parsed sentences including attachments of noun phrases to verbs and clauses Unambiguous parsed sentences is not a requirement, noise is taken in account The meaning of the concepts of the ontology is characterized by the subcategorization frames they appear in

14 Conceptual Clustering (2) E.g.:

15 Conceptual Clustering (3) C 1 : to cook inC 2 : to put in oven (4) stew pan (12) frying pan (2) oven (5) stew pan (3) wok (6) pan (2) Clusters which have a maximum overlap (thus, clusters which contains the same words with the same frequencies) have to be merged.

16 Ontology Pruning Ontology pruning is a data-driven means to reuse existing (general) ontologies in order to tailor them to a certain domain [4] The approach uses data-oriented techniques that are based on word/concept frequencies The idea is to compare the frequencies of words/concepts in two different corpora, one domain-specific and one generic Words/concepts whose frequencies, in the domain-specific corpora, overcome of a certain percentage the frequencies of the same words in the generic corpora, are accepted, the others rejected

17 OL for Interoperability (1) The key challenge here is to find semantic mappings between similar elements from two ontologies [1] First problem: how can we define a meaningful similarity measure? Second problem: how can we compute such measure using the available data? An assumption here, is to have instances that can be used to learn concepts

18 OL for Interoperability (2) Similarity Measure –Many definitions are possible (it is task dependent) –Many similarity measures are based on the joint probability distribution: P(A, B) – P(¬A, B) – P(A, ¬B) – P(¬A, ¬B) –Jaccard coefficent – JC(A,B) = = P(A  B) P(A  B) P(A, B) P(A, B) + P( ¬ A, B) + P(A, ¬ B) AB

19 OL for Interoperability (3) Distribution estimator –We assume to have a set of instances that is representative of the universe covered by the ontology –N(U i A,B ) is the number of instances of the i th ontology that belongs to both A and B –P(A, B) = –Problem: what if A and B does not belong to the same ontology? (because this is our case!) [N(U 1 A,B ) + N(U 2 A,B )] [N(U 1 ) + N(U 2 )]

20 OL for Interoperability (4) R ACD EF t 1, t 2 t 3, t 4 t 5, t 6 t 7 t 1, t 2, t 3, t 4 t 5, t 6, t 7 Trained Learner L G BH IJ s2s2 s 3, s 4 s 5, s 6 s1s1 s 1, s 2, s 3, s 4 U1AU1A U 1 ¬A U 2 ¬B U2BU2B L s 1, s 3 s 2, s 4 s5s5 s6s6 U 2 A, B U 2 A, ¬B U 2 ¬A, B U 2 ¬A, ¬ B

21 OL Tools (KAON) http://kaon.semanticweb.org Open Source Java based Implements a modular framework Text2Onto, module for OL from text (association rules, see Association Rules (1)) Association Rules (1) Ontology Pruning implemented (simple filter on TF)

22 References [1] A. Doan, J. Madhavan, P. Domingos, A. Halevy. Learning to map between ontologies on the Semantic Web. In Proceedings of the 11th International World Wide Web Conference (WWW 2002), Hawaii, USA, May 2002. [2] D. Faure, C. Nedellec. A corpus-based conceptual clustering method for verb frames and ontology acquisition. In 1st International Conference on Language resources and Evaluation -- Workshop on Adapting lexical and corpus resources to sublanguages and applications, Granada, Spain, pages 1--8, 1998. [3] G. R. Maddi, C. S. Velvadapu, S. Srivastava, J. Gil de Lamadrid. Ontology Extraction from text documents by Singular Value Decomposition. [4] A. Maedche, R. Volz, R. Studer, B. Lauser. Pruning-based identification of a domain in ontologies. In Proc. of I-KNOW'03, Graz, Austria, 07 2003. [5] A. Maedche, V. Zacharias. Ontology-based Instance Clustering. In proc. of ECML/PKDD. Springer, 2002. [6] A. Maedche, S. Staab. Discovering Conceptual Relations from Text. In Proc. Of ECAI-2000.


Download ppt "Knowledge Discovery in Ontology Learning A survey."

Similar presentations


Ads by Google