Use of FCA in the Ontology Extraction Step for the Improvement of the Semantic Information Retrieval Peter Butka TU Košice, Slovakia Semantic Web Environment and Retrieval Tasks Information retrieval improvement of unknown set of text documents –Preprocessing of documents set –Building of the ontology Creating of concept hierarchy Finding of relations between concepts Extraction of instances (ontology population) –Using of created ontology and instances for the improvement of IR Unknown set of documents “Classic” indexing methods Preprocessing steps Building of ontology IR task Ontology- based IR comb.
Use of FCA and labeling of concepts Formal Concept Analysis –Explorative method for data analysis –Concept lattice Concept is cluster of “similar” objects (similarity is based on presence of same attributes) Concepts are hierarchically organized (specific vs. general) Use of FCA on texts –Output – one-sided fuzzy concept lattice –Clustering via concepts (agglomerative) –Interpretation Use of LabelSOM method for improving of interpretability of concepts (clusters) Concept (467): IntSet 467(|467| = 2) = {6 59} Labels (467): indian , provinc , negoti , nehru , kashmir , india , pakistan
FCA and texts –interpretability problems –time-consuming Solution – problem reduction => e.g. use of clustering algorithms –Pre-clustering of document set (e.g. Hierarchical SOM) –Creation of ontology parts from smaller sets (FCA) –Merging of small models to complete ontology Possible use of reduction approach to ontology creation step Starting set of documents Final Merged Ontology C1C1 O1O1 C2 C2 O2O2 CnCn OnOn Clustering phase Ontology parts creating phase Merging phase...