Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.

Slides:



Advertisements
Similar presentations
1 Relational Data Mining Applied to Virtual Engineering of Product Designs Monika Žáková 1, Filip Železný 1, Javier A. Garcia-Sedano 2, Cyril Masia Tissot.
Advertisements

ADBIS 2007 A Clustering Approach to Generalized Pattern Identification Based on Multi-instanced Objects with DARA Rayner Alfred Dimitar Kazakov Artificial.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Learning to Extract Form Labels Nguyen et al.. The Challenge We want to retrieve and integrate online databases We want to retrieve and integrate online.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton.
OCFS: Optimal Orthogonal Centroid Feature Selection for Text Categorization Jun Yan, Ning Liu, Benyu Zhang, Shuicheng Yan, Zheng Chen, and Weiguo Fan et.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Mining Negative Rules in Large Databases using GRD Dhananjay R Thiruvady Supervisor: Professor Geoffrey Webb.
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
Recommender systems Ram Akella November 26 th 2008.
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
Scalable Text Mining with Sparse Generative Models
ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery.
Introduction to Data Mining Engineering Group in ACL.
Solve for y when x = 1, 2, 3 and 4. 1.) y = x ) y = 5x 4 3.) y = 3x Solve for y when x is -2, -1, 0, 1. Patterns and Functions Day 2.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.
Experiment Databases: Towards better experimental research in machine learning and data mining Hendrik Blockeel Katholieke Universiteit Leuven.
A simple method for multi-relational outlier detection Sarah Riahi and Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada.
A Framework for Examning Topical Locality in Object- Oriented Software 2012 IEEE International Conference on Computer Software and Applications p
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
Marko Grobelnik Jozef Stefan Institute ( Ljubljana, Slovenia.
Studying the Presence of Genetically Modified Variants in Organic Oilseed Rape by using Relational Data Mining Aneta Ivanovska 1, Celine Vens 2, Sašo Džeroski.
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
Presented by: Apeksha Khabia Guided by: Dr. M. B. Chandak
What are developers talking about? AN ANALYSIS OF TOPICS AND TRENDS IN STACK OVERFLOW DENNIS PORTENGEN.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Hierarchical Annotation of Medical Images Ivica Dimitrovski 1, Dragi Kocev 2, Suzana Loškovska 1, Sašo Džeroski 2 1 Department of Computer Science, Faculty.
Recommending Twitter Users to Follow Using Content and Collaborative Filtering Approaches John HannonJohn Hannon, Mike Bennett, Barry SmythBarry Smyth.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
BEHAVIORAL TARGETING IN ON-LINE ADVERTISING: AN EMPIRICAL STUDY AUTHORS: JOANNA JAWORSKA MARCIN SYDOW IN DEFENSE: XILING SUN & ARINDAM PAUL.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
Facilitating Document Annotation using Content and Querying Value.
CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
Clustering More than Two Million Biomedical Publications Comparing the Accuracies of Nine Text-Based Similarity Approaches Boyack et al. (2011). PLoS ONE.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
Local/Global Term Analysis for Discovering Community Differences in Social Networks David Fuhry, Yiye Ruan, and Srinivasan Parthasarathy Data Mining Research.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Text Document Categorization by Term Association Maria-luiza Antonie Osmar R. Zaiane University of Alberta, Canada 2002 IEEE International Conference on.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Intelligent Database Systems Lab Presenter : JIAN-REN CHEN Authors : Wen Zhang, Taketoshi Yoshida, Xijin Tang 2011.ESWA A comparative study of TF*IDF,
1 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Comparative Evaluation of Approaches to.
After completing this lesson, you will be able to: Identify the basic features of productivity programs. Lesson: 1 Introduction to Productivity Programs.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Text Mining SEC Filings for Fraud Detection Fletcher Glancy ISQS 7342.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Eurostat – Unit D1 Key indicators for the European policies Euro-indicators Working Group Luxembourg, 4 th & 5 th December 2008.
Mining Top-n Local Outliers in Large Databases Author: Wen Jin, Anthony K. H. Tung, Jiawei Han Advisor: Dr. Hsu Graduate: Chia- Hsien Wu.
RaJoLink: Creative Knowledge Discovery by Literature Outlier Detection
A Methodology for Finding Bad Data
A Consensus-Based Clustering Method
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
A Comparative Study of Link Analysis Algorithms
Affiliation of presenter
Prepared by: Mahmoud Rafeek Al-Farra
Movie Recommendation System
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Building pattern  Complete the following tables and write the rule 
Presentation transcript:

Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results

Overview Introduction Methodology Experimental results Conclusion

Introduction Relational data mining algorithms aim to induce models and/or relational patterns from multiple tables Individual-centered relational databases can be transformed to a single-table form – propositionalization

Motivation Wordification inspired by text mining techniques Large number of simple, easy to understand features Greater scalability, handling large datasets Can be used as a preprocessing step to propositional learners, as well as to declarative modeling / constraint solving (De Raedt et al., today’s invited talk)

Methodology 1. Transformation from relational database to a textual corpus 2. TF-IDF weight calculation

Transformation from relational database to a textual corpus One individual of the initial relational database - > one text document Features -> the words of this document Words constructed as a combination:

Transformation from relational database to a textual corpus For each individual, the words generated for the main table are concatenated with words generated from the secondary (BK) tables

Example

TF-IDF weights No explicit use of existential variables in our features, TF-IDF instead The weight of a word gives a strong indication of how relevant is the feature for the given individual. The TF-IDF weights can then be used either for filtering words with low importance or using them directly by a propositional learner.

Experimental results Slovenian traffic accidents database IMDB database Top 250 and bottom 100 movies Movies, actors, movie genres, directors, director genres Applied the wordification methodology Performed association rule learning

Experimental results

Conclusion Novel propositionalization technique called Wordification Greater scalability Easy to understand features Further work: Test on larger databases Experimental comparison with other propositionalization techniques Combine with propositionalization–like approach to mining heterogeneous information networks (Gr č ar et al. 2012), applicable to CLP in data preprocessing Gr č ar, Trdin, Lavra č : A Methodology for Mining Document-Enriched Heterogeneous Information Networks, Computer Journal 2012