Presentation is loading. Please wait.

Presentation is loading. Please wait.

Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.

Similar presentations


Presentation on theme: "Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results."— Presentation transcript:

1 Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results

2 Overview Introduction Methodology Experimental results Conclusion

3 Introduction Relational data mining algorithms aim to induce models and/or relational patterns from multiple tables Individual-centered relational databases can be transformed to a single-table form – propositionalization

4 Motivation Wordification inspired by text mining techniques Large number of simple, easy to understand features Greater scalability, handling large datasets Can be used as a preprocessing step to propositional learners, as well as to declarative modeling / constraint solving (De Raedt et al., today’s invited talk)

5 Methodology 1. Transformation from relational database to a textual corpus 2. TF-IDF weight calculation

6 Transformation from relational database to a textual corpus One individual of the initial relational database - > one text document Features -> the words of this document Words constructed as a combination:

7 Transformation from relational database to a textual corpus For each individual, the words generated for the main table are concatenated with words generated from the secondary (BK) tables

8 Example

9 TF-IDF weights No explicit use of existential variables in our features, TF-IDF instead The weight of a word gives a strong indication of how relevant is the feature for the given individual. The TF-IDF weights can then be used either for filtering words with low importance or using them directly by a propositional learner.

10 Experimental results Slovenian traffic accidents database IMDB database Top 250 and bottom 100 movies Movies, actors, movie genres, directors, director genres Applied the wordification methodology Performed association rule learning

11 Experimental results

12 Conclusion Novel propositionalization technique called Wordification Greater scalability Easy to understand features Further work: Test on larger databases Experimental comparison with other propositionalization techniques Combine with propositionalization–like approach to mining heterogeneous information networks (Gr č ar et al. 2012), applicable to CLP in data preprocessing Gr č ar, Trdin, Lavra č : A Methodology for Mining Document-Enriched Heterogeneous Information Networks, Computer Journal 2012


Download ppt "Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results."

Similar presentations


Ads by Google