Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Semantic Text Classification Based on DBpedia Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo 255049, China { brj,

Similar presentations


Presentation on theme: "A Semantic Text Classification Based on DBpedia Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo 255049, China { brj,"— Presentation transcript:

1 A Semantic Text Classification Based on DBpedia Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo 255049, China { brj, ljhbrj}@sdut.edu.cn AST2009

2 OUTLINE 1.BACKGROUND 2. DBpedia 3.OUR PROPOSED METHODS 4.EXPERIMENT 5.CONCLUSION

3 1.BACKGROUND “Bag of Words” (BOW).VS. “Bag of Conceptions” (BOC) Semantic Features Representation

4 2. DBpedia DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web.

5 3.OUR PROPOSED METHODS Definition 1 (Core Ontology). A core ontology is a structure O := (C,<c) consisting of a set C, whose elements are called concept identifiers, and a partial order <c on C, called concept hierarchy or taxonomy. Definition 2 (Subconcepts and Superconcepts). If c1 <c c2 for any c1, c2 ∈ C, then c1 is a subconcept (specialization) of c2 and c2 is a superconcept (generalization) of c1. If c1 <c c2 and there exists no c3 ∈ C with c1 <c c3 <c c2, then c1 is a direct subconcept of c2, and c2 is a direct superconcept of c1, denoted by c1 ﹤ c2.

6 3.OUR PROPOSED METHODS The candidate expression detection algorithm Input: document d = {w1,w2, …,wn}, Lex = (SC;RefC) and window size k ≥ 1. i 1 list Ls index-term s while i≤n do for j = min(k, n - i + 1) to 1 do s {wi…wi+j-1} if s ∈ SC then save s in Ls i i + j break else if j = 1 then i i + j end if end for end while return Ls

7 4.EXPERIMENT

8 Datasets Our goal is to obtain a high performance for closely related categories. Therefore, in order to test our approach, we designed a robot to crawler a data set from Yahoo! Website. It is contained the closely related (ambiguous) categories under Science->Biology. The test categories under Science->Biology considered here for Training and Testing are: Bio-Archaeology, Bio-Informatics, Genetics, Food Science and Microbiology.

9 4.EXPERIMENT ClassName01234TotalAccuracy% 0Bio- archae ology 8..2.1080.00% 1Bio- infor matics.811.1080.00% 2FoodScience..73.1070.00% 3Genetics2125.1050.00% 4Microbiolog y 21..71070.00% Table 1. Confusion Matrix before Applying Semantic Processing

10 4.EXPERIMENT Table 2. Confusion Matrix after Applying Semantic Processing Class Name01234TotalAccuracy% 0Bio- archaeology 9..1.1090.00% 1Bio- informatics.10... 100.00% 2FoodScience..82.1080.00% 3Genetics1.18.1080.00% 4Microbiolog y 21..71070.00%

11 4.EXPERIMENT Fig.3 Accuracy from Semantic Representation Terms vs. Bag of Words

12 5.CONCLUSION In this paper, we have discussed a novel approach to applying DBpedia’s background knowledge represent documents for boosting text categorization performance. Our approach and experiments prove that applying semantic level processing and normalization help in achieving higher accuracies over classification of documents, which have words with cross category references.

13 END THANKS !


Download ppt "A Semantic Text Classification Based on DBpedia Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo 255049, China { brj,"

Similar presentations


Ads by Google