This paper was presented at KDD ‘06 Discovering Interesting Patterns Through User’s Interactive Feedback Dong Xin Xuehua Shen Qiaozhu Mei Jiawei Han Presented.

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

Evaluating the Robustness of Learning from Implicit Feedback Filip Radlinski Thorsten Joachims Presentation by Dinesh Bhirud

Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.

Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.

Christoph F. Eick Questions and Topics Review Dec. 10, Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means.

Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :

Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.

On-line learning and Boosting

Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:

Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.

A Probabilistic Framework for Semi-Supervised Clustering

Re-ranking for NP-Chunking: Maximum-Entropy Framework By: Mona Vajihollahi.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol

Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.

TRADING OFF PREDICTION ACCURACY AND POWER CONSUMPTION FOR CONTEXT- AWARE WEARABLE COMPUTING Presented By: Jeff Khoshgozaran.

Project  Now it is time to think about the project  It is a team work Each team will consist of 2 people  It is better to consider a project of your.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Presented by Zeehasham Rasheed

Fast Algorithms for Association Rule Mining

Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, and Jiawei Han SIGMOD 2002 Presented by: Eddie Date: 2002/12/23.

Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.

Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.

Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.

Ensemble Learning (2), Tree and Forest

Cluster based fact finders Manish Gupta, Yizhou Sun, Jiawei Han Feb 10, 2011.

Active Learning for Class Imbalance Problem

Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.

EM and expected complete log-likelihood Mixture of Experts

1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.

START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.

On information theory and association rule interestingness Loo Kin Kong 5 th July, 2002.

Parallel Mining Frequent Patterns: A Sampling-based Approach Shengnan Cong.

1 Prune-and-Search Method 2012/10/30. A simple example: Binary search sorted sequence : (search 9) step 1  step 2  step 3  Binary search.

CS654: Digital Image Analysis

A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.

MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

Exploiting Group Recommendation Functions for Flexible Preferences.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Genotype Calling Matt Schuerman. Biological Problem How do we know an individual’s SNP values (genotype)? Each SNP can have two values (A/B) Each individual.

Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.

Flat clustering approaches

NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.

Authors: Yutaka Matsuo & Mitsuru Ishizuka Designed by CProDM Team.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

DB Seminar Series: The Subspace Clustering Problem By: Kevin Yip (17 May 2002)

Slides for KDD07 Mining statistically important equivalence classes and delta-discriminative emerging patterns Jinyan Li School of Computer Engineering.

Semi-Supervised Clustering

Lecture 8: Word Clustering

CARPENTER Find Closed Patterns in Long Biological Datasets

2008/12/03: Lecture 20 CMSC 104, Section 0101 John Y. Park

SMEM Algorithm for Mixture Models

Discriminative Frequent Pattern Analysis for Effective Classification

Measuring the Similarity of Rhythmic Patterns

Presentation transcript:

This paper was presented at KDD ‘06 Discovering Interesting Patterns Through User’s Interactive Feedback Dong Xin Xuehua Shen Qiaozhu Mei Jiawei Han Presented by: Jeff Boisvert April 11, 2007

1 Well begun is half done. Aristotle Outline Introduction and Background The Algorithm Examples Conclusions/Future Work Critique of Paper

2 Motivation –discover ‘interesting’ patterns in data –Subjective ‘interestingness’  user –Often too many patterns to assess manually Setting –assume an available set of candidate patterns (freq item sets, etc) –Have user rank a subset of the candidate patterns –Learn from the users ranking –Have user rank more patterns –Learn –… Introduction and Background

3 SVM –I think we have been presented with this enough Clustering –K-clusters - Minimize the maximum distance of each pattern to the nearest sample in a cluster Distance measure –Jaccard distance (between two patterns) Ranking –Linear - i.e. 2 < 3 (difference in ranking would be 3-2 = 1) –Log-Linear - i.e. log(2) < log(3) (difference in ranking would be 0.176) Introduction and Background

4 Outline Introduction and Background The Algorithm Examples Conclusions/Future Work Critique of Paper An algorithm must be seen to be believed. Donald Knuth

5 The Algorithm Overview 1.Prune candidate patterns and micro-clustering 2.Cluster N patterns into k clusters 3.Present k patterns to user for ranking 4.Refine the model with new user rankings 5.Re-rank all N patterns with new model 6.Reduce N=a*N 7.Go to step 2 Areas to discuss –(1) Preprocessing – pruning and micro-clustering –Clustering – see introduction –(2) Selecting the k patterns to present to the user –(3) Modeling the users knowledge/ranking *** Cluster N patterns in k clusters User ranks k patterns Refine model Re-rank all N patterns N=aN

6 The Algorithm (Preprocessing) Pruning –get representative patterns from candidates –start with maximal’s –merge candidates into maximal's –representative pattern = maximal –discard patterns, keep micro-cluster's (maximal’s) Micro-clustering –Two patterns are merged if: D(P 1,P 2 ) < epsilon –D is the Jaccard distance –Epsilon provided by the user (i.e. 0.1) Cluster N patterns in k clusters User ranks k patterns Refine model Re-rank all N patterns N=aN Zaiane, COMPUT 695 notes

7 The Algorithm (k patterns) Cluster N patterns in k clusters User ranks k patterns Refine model Re-rank all N patterns N=aN Clustering patterns –Really have N micro-clusters but … Selecting Patterns –Criteria 1 – patterns presented should not be redundant Redundant patterns often rank close to each other Redundant if same composition/frequency –Criteria 2 – helps refine model of users knowledge of interesting pattern (not uninteresting patterns) Method [Gonzalez, Clustering to minimize the maximum intercluster distance] –Randomly select the first pattern –Second pattern – maximum distance from first pattern –Third pattern – max distance to the nearest of the first and second patterns –… Which k patterns to present to user?

8 The Algorithm (refine model 1) Cluster N patterns in k clusters User ranks k patterns Refine model Re-rank all N patterns N=aN *** main contribution of the paper –How to model the users knowledge? –So far we have only ranked k out of N patterns… Interestingness –Difference between observed frequency and expected frequency f o (P) and f e (P) –Observed from input –Expected calculated from the model of the users knowledge f e (P) = M(P,θ) –If f o (P) and f e (P) are different the pattern is interesting Ranking –if the user ranks P i as more interesting than P j : R[f o (P i ),f e (P i )] > R[f o (P j ),f e (P j )] –Log-linear model R[f o (P),f e (P)] = log f o (P) - log f e (P) –This is a constraint on the model optimization

9 Will have k constraints The Algorithm (refine model 2) Cluster N patterns in k clusters User ranks k patterns Refine model Re-rank all N patterns N=aN Log-Linear Model –Say we have a pattern (P) in a data set of s items, f e (P) is: –Recall ordering of patterns by user as a constraint: –Define a weight vector and new representation of the constraint above: R[f o (P i ),f e (P i )] > R[f o (P j ),f e (P j )]

10 The Algorithm (Re-rank all N patters) Log-Linear Model (cont.) Cluster N patterns in k clusters User ranks k patterns Refine model Re-rank all N patterns N=aN Modified from SVM Black Box –Can now rank ALL N patterns with interesting measure: R[f o (P i ),f e (P i )] > R[f o (P j ),f e (P j )] R[f o (P),f e (P)] = K[v(P),w]

11 The Algorithm (Reduce N) Reduce number of patterns –Discard some patterns N=aN –a is specified by the user –Will reduce the number of patterns to present to user at end –Stop when reached the max number of iterations also specified by the user END OF ALGORITHM Biased belief model –Not presented –Identical formulation to log-linear but assign a users belief probability to each transaction Cluster N patterns in k clusters User ranks k patterns Refine model Re-rank all N patterns N=aN m = number of transactions x k (P) = 1 if the transaction k contains P P k = users belief probability

12 The Algorithm Overview 1.Pre-process - prune / micro-clustering 2.Cluster N patterns into k clusters, present to user 3.Refine the model with new user rankings, re-rank patterns 4.Reduce N=a*N 5.Stop when reached max number of iterations Input parameters –a = shrinking ratio –k = number of user feedback patters –niter =number of iterations to consider (will control number of patterns in output) –Epsilon – micro-clustering parameter –Model type – log-linear vs. biased belief –Ranking type – linear vs. log

13 Outline Introduction and Background The Algorithm Example Conclusions/Future Work Critique of Paper Few things are harder to put up with than the annoyance of a good example. Mark Twain

14 Example 1 Transactions (35) Get microclusters (19) Pick a pattern # Pick a pattern #2 Pick pattern #k If k = 2 present: to the user for ranking Refine Log-linear Model With new f e use SVM to rank all 19 transactions Reduce N Sort transactions by rank, take the top aN, say a=0.1, take the top 17 (19*0.9)

15 Example - 2 Their results on item sets: –Use data to simulate a persons prior knowledge –Partition data into 2 subsets, one background one for observed data –Background = users prior –Accuracy measured by –Data set: 49,046 transactions 2,113 items average length of 74 –First 1000 transactions are observed set –8,234 closed frequent item sets –Micro-clustering reduces to 769 –Compare top k ranked patterns

16 Example - 3 Their results on sequences: –1609 sentences –967 closed sequential patterns –Full feedback: use k = 967

17 Example - 4 Their results compared to other algorithms: –Same data as example 3 (1609 sentences) –They claim theirs is better… Selective Sampling Yu, KDD ‘05. Top-N Shen and Zhai, SIGIR ‘05

18 Outline Introduction and Background The Algorithm Examples Conclusions/Future Work Critique of Paper "I would never die for my beliefs because I might be wrong.” Bertrand Russell

19 Conclusions –Interactive with user –Tries to learn the users knowledge –Flexible (but flexible = many parameters) –Does not work well with sparse data Proposed future work –Study different models for sparse data –Better feedback strategies to maximize learning –Apply to other data types/sets

20 Outline Introduction and Background The Algorithm Examples Conclusions/Future Critique of Paper “He has a right to critcize, who has a heart to help.” Abraham Lincoln

21 Critique Sensitivity to input parameters Guidance selecting input parameters Order of paper Details/graphs in examples No examples that actually use a ‘user’s interactive feedback’

22 Questions It is better to know some of the questions than all of the answers. James Thurber It is not the answer that enlightens, but the question." Eugene Ionesco A wise man’s question contains half the answer. Solomon Ibn Gabirol