A Semantic Text Classification Based on DBpedia Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo 255049, China { brj,

Slides:



Advertisements
Similar presentations
Background Knowledge for Ontology Construction Blaž Fortuna, Marko Grobelnik, Dunja Mladenić, Institute Jožef Stefan, Slovenia.
Advertisements

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
CS395: Visual Recognition Spatial Pyramid Matching Heath Vinicombe The University of Texas at Austin 21 st September 2012.
Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge Date: 2011/11/21 Source: Claudiu S. Firan (CIKM’10)
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Merging Taxonomies. Assertion Creation and maintenance of large ontologies will require the capability to merge taxonomies This problem is similar to.
Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.
Distributional Clustering of Words for Text Classification L. Douglas Baker Andrew Kachites McCallum SIGIR’98.
Learning to Extract Form Labels Nguyen et al.. The Challenge We want to retrieve and integrate online databases We want to retrieve and integrate online.
Distributed Representations of Sentences and Documents
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
Entity Recognition via Querying DBpedia ElShaimaa Ali.
A hybrid method for Mining Concepts from text CSCE 566 semester project.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
Can Knowledge about Community Resilience be Organized by Means of Ontological Modeling? Sarah Newell Wojtek Michalowski Davood Astaraky Telfer School of.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
Mining fuzzy domain ontology based on concept Vector from wikipedia category network.
TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith.
, 1/21, © Library and Documentation Systems Division 21 st APAN Meeting Tokyo, January 2006 AGROVOC and AOS, Margherita Sini, FAO From.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Automatic Detection of Social Tag Spams Using a Text Mining Approach Hsin-Chang Yang Associate Professor Department of Information Management National.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Emerging Trend Detection Shenzhi Li. Introduction What is an Emerging Trend? –An Emerging Trend is a topic area for which one can trace the growth of.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Improving Support Vector Machine through Parameter Optimized Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj,
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Concept-based Short Text Classification and Ranking
Text Categorization by Boosting Automatically Extracted Concepts Lijuan Cai and Tommas Hofmann Department of Computer Science, Brown University SIGIR 2003.
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Learning video saliency from human gaze using candidate selection CVPR2013 Poster.
Marko Grobelnik, Janez Brank, Blaž Fortuna, Igor Mozetič.
Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE Using Support Vector Machine for Integrating Catalogs.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Boosting the Feature Space: Text Classification for Unstructured.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
SERVICE ANNOTATION WITH LEXICON-BASED ALIGNMENT Service Ontology Construction Ontology of a given web service, service ontology, is constructed from service.
String Kernels on Slovenian documents Blaž Fortuna Dunja Mladenić Marko Grobelnik.
Queensland University of Technology
Logistic Regression: To classify gene pairs
Arabic Text Categorization Based on Arabic Wikipedia
Bag-of-Visual-Words Based Feature Extraction
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Prepared by: Mahmoud Rafeek Al-Farra
Project 1: Text Classification by Neural Networks
Objects as Attributes for Scene Classification
Ying Dai Faculty of software and information science,
Deep Cross-media Knowledge Transfer
Ying Dai Faculty of software and information science,
Michal Rosen-Zvi University of California, Irvine
Ying Dai Faculty of software and information science,
Enriching Taxonomies With Functional Domain Knowledge
Hierarchical, Perceptron-like Learning for OBIE
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

A Semantic Text Classification Based on DBpedia Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj, AST2009

OUTLINE 1.BACKGROUND 2. DBpedia 3.OUR PROPOSED METHODS 4.EXPERIMENT 5.CONCLUSION

1.BACKGROUND “Bag of Words” (BOW).VS. “Bag of Conceptions” (BOC) Semantic Features Representation

2. DBpedia DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web.

3.OUR PROPOSED METHODS Definition 1 (Core Ontology). A core ontology is a structure O := (C,<c) consisting of a set C, whose elements are called concept identifiers, and a partial order <c on C, called concept hierarchy or taxonomy. Definition 2 (Subconcepts and Superconcepts). If c1 <c c2 for any c1, c2 ∈ C, then c1 is a subconcept (specialization) of c2 and c2 is a superconcept (generalization) of c1. If c1 <c c2 and there exists no c3 ∈ C with c1 <c c3 <c c2, then c1 is a direct subconcept of c2, and c2 is a direct superconcept of c1, denoted by c1 ﹤ c2.

3.OUR PROPOSED METHODS The candidate expression detection algorithm Input: document d = {w1,w2, …,wn}, Lex = (SC;RefC) and window size k ≥ 1. i 1 list Ls index-term s while i≤n do for j = min(k, n - i + 1) to 1 do s {wi…wi+j-1} if s ∈ SC then save s in Ls i i + j break else if j = 1 then i i + j end if end for end while return Ls

4.EXPERIMENT

Datasets Our goal is to obtain a high performance for closely related categories. Therefore, in order to test our approach, we designed a robot to crawler a data set from Yahoo! Website. It is contained the closely related (ambiguous) categories under Science->Biology. The test categories under Science->Biology considered here for Training and Testing are: Bio-Archaeology, Bio-Informatics, Genetics, Food Science and Microbiology.

4.EXPERIMENT ClassName01234TotalAccuracy% 0Bio- archae ology % 1Bio- infor matics % 2FoodScience % 3Genetics % 4Microbiolog y % Table 1. Confusion Matrix before Applying Semantic Processing

4.EXPERIMENT Table 2. Confusion Matrix after Applying Semantic Processing Class Name01234TotalAccuracy% 0Bio- archaeology % 1Bio- informatics % 2FoodScience % 3Genetics % 4Microbiolog y %

4.EXPERIMENT Fig.3 Accuracy from Semantic Representation Terms vs. Bag of Words

5.CONCLUSION In this paper, we have discussed a novel approach to applying DBpedia’s background knowledge represent documents for boosting text categorization performance. Our approach and experiments prove that applying semantic level processing and normalization help in achieving higher accuracies over classification of documents, which have words with cross category references.

END THANKS !