Christine Preisach, Steffen Rendle and Lars Schmidt- Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Germany Relational.

Slides:



Advertisements
Similar presentations
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Random Forest Predrag Radenković 3237/10
Albert Gatt Corpora and Statistical Methods Lecture 13.
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Data Mining Classification: Alternative Techniques
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Lei Tang May.04,  Typical Classification task: IID assumption  Relational Learning: instances are interrelated.  Some Examples: ◦ Hypertext Classification.
K nearest neighbor and Rocchio algorithm
Mapping Between Taxonomies Elena Eneva 30 Oct 2001 Advanced IR Seminar.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Efficient Concept-Based Mining Model for Enhancing.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Issues with Data Mining
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
The identification of interesting web sites Presented by Xiaoshu Cai.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1.
Bug Localization with Machine Learning Techniques Wujie Zheng
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Facilitating Document Annotation using Content and Querying Value.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
The Canopies Algorithm from “Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching” Andrew McCallum, Kamal Nigam, Lyle.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Data Mining and Text Mining. The Standard Data Mining process.
A Simple Approach for Author Profiling in MapReduce
Experience Report: System Log Analysis for Anomaly Detection
Data Mining Practical Machine Learning Tools and Techniques
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Clustering of Web pages
Learning Coordination Classifiers
Text & Web Mining 9/22/2018.
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Model generalization Brief summary of methods
Chapter 7: Transformations
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presentation transcript:

Christine Preisach, Steffen Rendle and Lars Schmidt- Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Germany Relational Classification Using Automatically Extracted Relations by Record Linkage

2 Outline Motivation Relation Extraction and Multi-Relational Classification Framework Relation Extraction Multi-Relational Classification Evaluation Conclusion

3 Example: Motivation P1 P3 P2 PublicationTitleAuthorConferenceCategory 1 Classification of scientific publications John Smith ICDMData Mining 2 Classification of Hypertext John Smith KDD Data Mining 3 Hierarchical Clustering Dan Miller ICDM Data Mining

4 Motivation Traditional classifiers takes only local attributes like keywords, title and abstract into account Assumption: Instances are independent But: Assumption does not hold –Instances can be related to other documents by the authorship, citations, same conference etc. These relations should be exploited and combined in order to improve classification accuracy. But: Manuel extraction of relations by experts is expensive Automatic extraction of relations from noisy attributes.

5 Data Mining Category 5th International Conference on Data Mining KDD ICDM 2005 Conference Dan MillerHierarchical Clustering 3 John Smith Classification of Hypertext 2 J. SmithClassification of scientific publications 1 AuthorTitlePublication Relation Extraction Component Extraction of relations from objects with noisy attributes Multi-Relational Classification Component Use extracted relations instead or additionally to local attributes for classification Relation Extraction and Relational Classification Framework

6 Relation Extraction Pairwise feature extraction –from noisy attributes with several similarity measures (e.g. TFIDF, cosine similarity, Levenshtein) Probabilistic pairwise decision model –Use extracted similarities as features for a probabilistic classifier and build a model on the training data –And apply it on unknown pairs Collective decision model –If is an equivalence relation then use constrained clustering (e.g. HAC) using the pair wise decision model as a learned similarity measure to transform into a binary relation Pairwise feature extraction Probabilistic pairwaise decision model Collective decision model Attributes Relations

7 Relation Extraction Collective Decision Model Initialisation Must Links Cannot Links

8 Multi-Relational Classification Relational classification problem: –Make use of additional information of related objects (i.e. their classes or attributes) –Propositionalize the relational data e.g. with: where is the neighborhood of

9 Multi-Relational Classification Algorithm: 1. for each relation R:1 to m (a) Build a undirected weighted graph with (b) Perform relational classification simultaneously for all instances in the test set (c) Output a probability distribution 2. Apply ensemble classification to the resulting probability distributions of these relations 3. Output final classification … … Relational Classification Relational Classification … Ensemble Classification

10 Simple Relational Methods –Probabilistic Relational Neighbor Classifier (EPRN) [Macskassy and Provost 2003] Where is a normalization factor, is the weight and is the iteration –EPRN2HOP Takes additionally the neighbors of the direct neighbors into account if the direct neighborhood size is small Multi-Relational Classification

11 Aggregation-based Relational Learning Methods –Use aggregation functions in order to propositionalize the set-valued attribute –Use aggregated values as attributes for traditional machine learning methods –We used Logistic Regression as classifier Multi-Relational Classification Category 1 Category 2 Category 3 Category 1

12 Methods which combine different models Increases classification accuracy Usage –Combine results achieved by relational classification for different relations –Combine results of relational and local models Voting Stacking –Use Meta-classifier to learn a model on the results of different models –Build new instances –Apply cross validation Ensemble Classification

13 Evaluation Data –CompuScience data set scientific papers 77 topics (categories) Relations: authors, reviewer, journals –Cora deduplication data set citations 112 unique publications Relation:samePaper –Cora data set 3298 papers 12 categories Relations: conferences, authors, citations

14 Evaluation – Relation Extraction Evaluation set single linkage complete linkage average linkage X tst X F1 measure for finding the SamePaper relation on Cora Pairwise feature extraction with TFIDF, Levenshtein, Jaccard, Cosine on all attributes

15 The ensemble of relational and content-based text classification achieved a significantly higher F-measure then the pure text classifier Evaluation – Multi-Relational Classification 3-fold cross validation on CompuScience for Author, Reviewer and Journal relation

16 Evaluation Multi-Relational Classification using automatically extracted relations 50%/50% splits, 10 runs

17 Summary: –Presented framework for relation extraction and multi- relational classification Automatic relation extraction with record linkage Relational classification using each extracted relation for classification and fusing the results with ensemble methods Future Work –Evaluate our framework on different data sets and relations –Evaluate the relational classifiers quality depending on the quality of the extracted relations Conclusion and Future Work

18 Questions ? Christine Preisach Steffen Rendle Lars Schmidt-Thieme Thank you