System for Semi-automatic ontology construction

Slides:



Advertisements
Similar presentations
Background Knowledge for Ontology Construction Blaž Fortuna, Marko Grobelnik, Dunja Mladenić, Institute Jožef Stefan, Slovenia.
Advertisements

Data Mining Tools Overview Business Intelligence for Managers.
1 Relational Data Mining Applied to Virtual Engineering of Product Designs Monika Žáková 1, Filip Železný 1, Javier A. Garcia-Sedano 2, Cyril Masia Tissot.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Landmark Classification in Large- scale Image Collections Yunpeng Li David J. Crandall Daniel P. Huttenlocher ICCV 2009.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Interfaces for Selecting and Understanding Collections.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University Motivation.
Thien Anh Dinh1, Tomi Silander1, Bolan Su1, Tianxia Gong
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Data mining and machine learning A brief introduction.
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
Data Clustering 1 – An introduction
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
The identification of interesting web sites Presented by Xiaoshu Cai.
Chapter 9 Neural Network.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
Article by Dunja Mladenic, Marko Grobelnik, Blaz Fortuna, and Miha Grcar, Chapter 3 in Semantic Knowledge Management: Integrating Ontology Management,
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
META-LEARNING FOR AUTOMATIC SELECTION OF ALGORITHMS FOR TEXT CLASSIFICATION Karol Furdík, Ján Paralič, Gabriel Tutoky {Jan.Paralic,
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Some questions -What is metadata? -Data about data.
Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National.
April 2014 SEWM Event Detection from Social Media: User-centric Parallel Split-n-merge and Composite Kernel  Truc-Vien T. Nguyen, Lugano University,
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Musical Genre Categorization Using Support Vector Machines Shu Wang.
Marko Grobelnik, Janez Brank, Blaž Fortuna, Igor Mozetič.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Data Mining and Text Mining. The Standard Data Mining process.
String Kernels on Slovenian documents Blaž Fortuna Dunja Mladenić Marko Grobelnik.
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Information Organization: Overview
Semi-Supervised Clustering
Sentiment analysis algorithms and applications: A survey
Bag-of-Visual-Words Based Feature Extraction
Constrained Clustering -Semi Supervised Clustering-
Development of the Amphibian Anatomical Ontology
Sparsity Analysis of Term Weighting Schemes and Application to Text Classification Nataša Milić-Frayling,1 Dunja Mladenić,2 Janez Brank,2 Marko Grobelnik2.
Presented by: Hassan Sayyadi
Introduction to Data Science Lecture 7 Machine Learning Overview
Topic 3: Cluster Analysis
Presented by: Prof. Ali Jaoua
Topic Oriented Semi-supervised Document Clustering
Information Networks: State of the Art
Concave Minimization for Support Vector Machine Classifiers
Information Retrieval
Junheng, Shengming, Yunsheng 11/09/2018
Topic 5: Cluster Analysis
Hierarchical, Perceptron-like Learning for OBIE
Semi-Automatic Data-Driven Ontology Construction System
Machine Learning – a Probabilistic Perspective
Information Organization: Overview
Unsupervised Machine Learning: Clustering Assignment
Ontology-Enhanced Aspect-Based Sentiment Analysis
Presentation transcript:

System for Semi-automatic ontology construction Blaž Fortuna Dunja Mladenić Marko Grobelnik Jožef Stefan Institute

Why semi-automatic? There is a big divide between unsupervised and fully supervised construction tools. Both approaches have weak points: it is difficult to obtain desired results using unsupervised methods, e.g. limited background knowledge manual tools (e.g. Protégé, OntoStudio) are time consuming, user needs to know the entire domain. We combined these two approaches in order to eliminate these weaknesses: the user guides the construction process, the system helps the user with suggestions based on the document collection.

What is topic ontology? It consists of: set of topics, set of ‘Subtopic-of’ relations between topics, collection of documents, set of ‘Subject-of’ relations between documents and topics. Topic Subtopics Documents

How do we help the user? Our Topic ontology construction tool helps the user by: … identifying the topics and relations: hierarchical clustering, latent semantic analysis … naming the topics: keywords extraction. … visualizing the topic ontology, … outlier detection.

Topic and relation identification k-means clustering used to identify topics: cluster of documents => topic documents are assigned to clusters => ‘subject-of’ relation We can repeat clustering on a subset of documents assigned to a specific topic => identifies subtopics and ‘subtopic-of’ relation

Keywords extraction … using centroid vector: A centroid vector of a given topic is the average document from this topic (normalised sum of topic’s documents) Most descriptive keywords for a given topic are the words with the highest weights in the centroid vector. … using linear SVM classifier: SVM classifier is trained to seperate documents of the given topic from the other document in the context Words that are found most important for the classification are selected as keywords for the topic Context Topic

Topic ontology visualization Selected topic Suggestions of subtopics Topic Keywords All documents Outlier detection Topic document

Dataset Company name www.textmining.net Company description

Topic ontology of Yahoo! Finances

Other features & Future work Reads and writes to RDF Can work as plug-in for OntoStudio Already actively used in more projects and by different people Future work Identification of relations based on context of instances Extend to more general types of ontologies