Collective Classification A brief overview and possible connections to email-acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.

Slides:

Advertisements

Similar presentations

Learning Relational Probability Trees Jennifer Neville David Jensen Lisa Friedland Michael Hay Presented by Andrew Tjang.

Advertisements

A Tutorial on Learning with Bayesian Networks

Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.

+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.

Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.

Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)

A Hierarchy of Independence Assumptions for Multi-Relational Bayes Net Classifiers School of Computing Science Simon Fraser University Vancouver, Canada.

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

Lei Tang May.04,  Typical Classification task: IID assumption  Relational Learning: instances are interrelated.  Some Examples: ◦ Hypertext Classification.

Modelling Relational Statistics With Bayes Nets School of Computing Science Simon Fraser University Vancouver, Canada Tianxiang Gao Yuke Zhu.

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

HTL-ACTS Workshop, June 2006, New York City Improving Speech Acts Analysis via N-gram Selection Vitor R. Carvalho & William W. Cohen Carnegie Mellon.

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

Example Data Sets Prior Research Join related objects to form independent compound objects, cluster normally (Yin et al., 2005). Use attribute-based distance.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

History-Dependent Graphical Multiagent Models Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University of Michigan, USA.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.

1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.

On Node Classification in Dynamic Content-based Networks.

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Two Approximate Algorithms for Belief Updating Mini-Clustering - MC Robert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating",

Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.

Learning TFC Meeting, SRI March 2005 On the Collective Classification of “Speech Acts” Vitor R. Carvalho & William W. Cohen Carnegie Mellon University.

CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Link Prediction Topics in Data Mining Fall 2015 Bruno Ribeiro

1 Data Mining Functionalities / Data Mining Tasks Concepts/Class Description Concepts/Class Description Association Association Classification Classification.

CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.

CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.

John Lafferty Andrew McCallum Fernando Pereira

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.

SIGIR, August 2005, Salvador, Brazil On the Collective Classification of “Speech Acts” Vitor R. Carvalho & William W. Cohen Carnegie Mellon University.

CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.

SA-1 University of Washington Department of Computer Science & Engineering Robotics and State Estimation Lab Dieter Fox Stephen Friedman, Lin Liao, Benson.

CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.

 Dynamic Bayesian Networks Beyond Graphical Models – Carlos Guestrin Carnegie Mellon University December 1 st, 2006 Readings: K&F: 18.1, 18.2,

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.

Learning Relational Dependency Networks for Relation Extraction

Reasoning Under Uncertainty: Belief Networks

General Graphical Model Learning Schema

Asymmetric Gradient Boosting with Application to Spam Filtering

Predicting Primary Myocardial Infarction from Electronic Health Records -Jitong Lou.

Conditional Random Fields

General Gibbs Distribution

Graduate School of Information Sciences, Tohoku University

Class #16 – Tuesday, October 26

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Approximate Inference by Sampling

Dynamic Bayesian Networks Beyond 10708

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Discriminative Probabilistic Models for Relational Data

Label and Link Prediction in Relational Data

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Learning to Rank Typed Graph Walks: Local and Global Approaches

NER with Models Allowing Long-Range Dependencies

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18

Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^

Statistical Relational AI

Sequential Learning with Dependency Nets

Presentation transcript:

Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie Mellon University November 10 th 2004

Data Representation “Flat” Data –Object: msgs –Attributes: words, sender, etc –Class: spam/not spam –Usually assumed IID Sequential Data –Object: words in text –Attr: capitalized, number, dict –Class: POS (or name/not) Relational Data –class+attributes –+links(relations) –Example: webpages pronnamedetnameverb spam Not spam

J. Neville et al., 2003

Relational Data and Collective Classification Different objects interact Different types of relations (links) Attributes may be correlated Examples: – actors, directors, movies, companies – papers, authors, conferences, citations – company, employee, customer, Classify objects collectively Use prediction on some objects to improve prediction on related objects

Collective Classification Methods Relational Probability Trees (RPT) Iterative methods (Relaxation-based Methods) Relational Dependency Networks (RDN) Relational Bayesian Networks (RBN/PRM) Relational Markov Networks (RMN) Other models (ILP based, Vector Space based, etc) Overall: – Lack of direct comparison among methods – Results are usually compared to “flat” model – Splitting data into train/test sets can be an issue

Relational Probability Trees Decision Trees applied to Relational data Predicts the target class label based on: – same object attributes – attributes + links in “relational neighborhood” (one link away) – counts of attributes and links in the “neighborhood” Enhanced feature selection (Chi-square, pruning, randomization tests) Results were not exciting Neville et al. KDD2003, related work from Blockeel et al. (Artificial Intelligence, 1998), Kramer AAAI-96

Iterative Methods Predicts the target class label based on: –Same object attributes –Attributes and links of relational neighborhood –CLASS LABEL of neighborhood –Features derived from CLASS LABELS Different update strategies: –By threshold in prediction confidence –By top-N most confident predictions –Heuristic-based Slattery & Mitchell, ICML-2000;Neville & Jensen, AAAI-2000; Chakrabarti et al. ACM- SIGMOD-98 Some results with -acts

Relational Bayesian Networks (RBN/PRM) Bayes Net extended to Relational domain Given an “instantiation”, it induces a bayes-net that specifies a joint probability distribution over all attributes of all entities Directed graphical model, with acyclicity constraint. Exact model - Closed form for parameter estimation – Products of conditional probabilities Was applied to simple domains, since the acyclicity constraints is very restrictive to most relational applications Friedman et al, IJCAI-99; Getoor et al., ICML-2001; Taskar et al. IJCAI-2001

Relational Markov Networks (RMN) Extension of CRF idea to Relational Domain Given an instantiation, it induces a Markov Network that specifies a probability distribution of labels, given links and attributes Undirected, Discriminative model Parameter estimation is expensive, requires approximate probabilistic inference (belief propagation) Taskar et al., UAI2002

Relational Dependency Networks (RDN) Dependency Networks extended to Relational domain P(X) = π [ Prob (Xi | Neighbor(Xi)) ] Given an “instantiation”, it induces a DN that specifies an “approximate” joint probability distribution over all attributes of all objects Undirected graphical model, no acyclicity constraint. Approximate model - Simple parameter estimation – approximate inference (Gibbs sampling) Neville & Jensen, KDD-MRDM-2003

Other Models From Neville et al., 2003

Comparing Some Results Comparing PRM, RMN, SVM and M^3N Diff: PRM and RMN Diff: mSVM and RMN RN* (Relational Neighbor) is a very simple Relational Classifier RN* (Macskassy et al., 2003) M^3N(Taskar et al., 2003) PRM RMN

End of overview… now, the -act problem Delivery Request Commit Proposal Request Commit Delivery Request Proposal Delivery Acknowled Request Time Strong correlation with previous and next message Flat data? Sequential data? A “verb” has little or no correlation with other “verbs” of same message