1 Outline Criticism to support/confidence Loglinear modeling Casual modeling.

Slides:



Advertisements
Similar presentations
Association Rules Mining
Advertisements

CSE 634 Data Mining Techniques
Data Mining Techniques Association Rule
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Rakesh Agrawal Ramakrishnan Srikant
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
From Variable Elimination to Junction Trees
Association Rules Mining Part III. Multiple-Level Association Rules Items often form hierarchy. Items at the lower level are expected to have lower support.
Data Mining Association Analysis: Basic Concepts and Algorithms
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Learning with Bayesian Networks David Heckerman Presented by Colin Rickert.
Statistical Methods Chichang Jou Tamkang University.
Machine Learning in Bioinformatics’03 Washington D.C. Gene Interaction Analysis Using k-way Interaction Loglinear Model: A Case Study on Yeast Data Xintao.
Fast Algorithms for Association Rule Mining
Mining Association Rules
Computer vision: models, learning and inference Chapter 10 Graphical Models.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Performance and Scalability: Apriori Implementation.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
A Brief Introduction to Graphical Models
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
1 From Association Rules To Causality Presenters: Amol Shukla, University of Waterloo Claude-Guy Quimper, University of Waterloo.
1 Bayesian Methods. 2 Naïve Bayes New data point to classify: X=(x 1,x 2,…x m ) Strategy: – Calculate P(C i /X) for each class C i. – Select C i for which.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS.
Data Mining Find information from data data ? information.
Association Rule Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Lecture 2: Statistical learning primer for biologists
Chapter 6: Mining Frequent Patterns, Association and Correlations
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
Pattern Recognition and Machine Learning
Association Rules & Sequential Patterns. CS583, Bing Liu, UIC 2 Road map Basic concepts of Association Rules Apriori algorithm Sequential pattern mining.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Association rule mining
Association Rules Repoussis Panagiotis.
Probabilistic Data Management
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Frequent patterns and Association Rules
An Algorithm for Bayesian Network Construction from Data
©Jiawei Han and Micheline Kamber
Association Rules & Sequential Patterns
Presentation transcript:

1 Outline Criticism to support/confidence Loglinear modeling Casual modeling

Background — Interaction analysis Association rule, Creighton & Hanash 03 Associations instead of interaction Undirected Need to discretize data Loglinear Modeling, Wu et al. 03 multi-way non-linear interactions Undirected Need to discretize data Graphical gaussian model, Kishino & Waddell 00 Pairwise interactions Undirected Efficient Causal Network Pairwise Directed High complexity

3 Background on Association Rule An association rule X  Y satisfies with minimum confidence and support support, s = P(XUY), probability that a transaction contains {X U Y} confidence, c = P(Y|X), conditional probability that a transaction having X also contains Y Efficient algorithms Apriori by Agrawal & Srikant, VLDB94 FP-tree by Han, Pei & Yin, SIGMOD 2000 etc. Customer buys Y Customer buys both Customer buys X

4 Criticism to Support and Confidence Example 1: (Aggarwal & Yu, PODS98) Among 5000 students  3000 play basketball  3750 eat cereal  2000 both play basket ball and eat cereal play basketball  eat cereal [40%, 66.7%] is misleading because the overall percentage of students eating cereal is 75% which is higher than 66.7%. play basketball  not eat cereal [20%, 33.3%] is far more accurate, although with lower support and confidence

5 We need a measure of dependent or correlated events P(Y|X)/P(Y) is also called the lift of rule X => Y Criticism to Support and Confidence

6 Criticism to lift Suppose a triple ABC is unusually frequent because Case 1: AB and/or AC and/or BC are unusually frequent Case 2: there is something special about the triple that all three occur frequently. Example 2: ( DuMouchel & Pregibon, KDD 01 ) Suppose in a db of patient adverse drug reactions, A and B are two drugs, and C is the occurrence of kidney failure  Case 1: A and B may act independently upon the kidney, many occurrences of ABC is because A and B are sometimes prescribed together  Case 2: A and B may have no effect on the kidney if taken alone, but when taken together a drug interaction occurs that often leads to kidney failure  Case 3: A and B may have small effect on the kidney if taken alone, but when taken together, there is a strong effect.

7 Criticism to lift EXCESS2 Predicted count of all-two-factor model based on two-way distributions Shrinkage estimates, (or we can use raw count)an estimate of the number of transactions containing the item set over and above those that can be explained by the pairwise associations of the items

8 Motivation EXCESS2 can separate case 2 and 3 from case 1, but can not separate between case 2 and 3. need to build many all-two-factor models. For itemset ABCDE, they need to build 15 all-two-factor models, one for each multi-item set (ABC, ABD, …ABCD,… ABCDE) can not fully analyze the interestingness of multi-item associations  E.g., even we know the EXCESS2 for ABCD is large, is it due to ABC, ABD, or ABCD? Fit to get one optimal loglinear model to describe all the possible associations instead of building many all-two-factor models The -terms can precisely describe the interactions of items By analyzing residues, we can pick up the multi-item associations that can not be explained by all the associations included in the fitted model.

9 Saturated log-linear model main effect1-factor effect 2-factor effect which shows the dependency within the distributions of A,B.

10 Computing -term Linear constraints of coefficients UpDown method (Sarawagi et al, EDBT98) Loglinear parameters sum to 0 over all indices

KDD’03 Washington, D.C.11 Interpreting associations Comparison with lift, EXCESS2 Derive association patterns by examining -terms E.g. we can derive positive interaction between AC, negative interaction between AC, no significant interaction between BC, and positive three-factor interaction among ABC Independence model pairwise model fitted model

12 Decomposition Decomposition is necessary as The contingency table from market basket data is too sparse The complexity is exponential in the number of dimensions Step 1.1, build one independence graph Step 1.2, apply graph-theoretical results to decompose the graph into non-decomposable irreducible components

13 Independence graph F I E H G B C D A J Every vertex of the graph corresponds to an variable. Each edge denotes the dependency of the two variables linked A missing edge represents the conditional independence of two variables associated with the two edges Test conditional independence for every pair of variables, controlling for the other variables. Cochran-Mantel-Hasenzel test etc.

14 Independence graph decomposition Graph-theoretical result: If a graph corresponding to a graphical model for one contingency table is decomposable into subgraphs by a clique separator, the MLEs for the parameters of the model can easily be derived by combining the estimates of the models on the lower dimensional tables represented by the simpler subgraphs. Divide and conquer F I E H G B C D A J F I E H G B C D A J C A G A

15 Data generator parametervaluemeaning ntrans10k-1MNumber of transactions nitems50,100Number of different items tlen10Average items per transaction npats10000Number of patterns(large item sets) patlen4Average length of maximal pattern corr0.25Correlation between patterns conf0.75Average confidence in a rule

Other measures 2 x 2 contingency table Objective measures for A=>B

17 Outline Criticism to support/confidence Loglinear modeling Casual modeling

Partial correlation The correlation between two variables after the common effects of the third variables are removed

Causal Interaction Learning Bayesian approaches (search and score), Friedman et al. 00 Apply heuristic searching methods to construct a model and then evaluate it using some scoring measure (e.g., bayesian scoring, entropy, MDL etc.) Averaging over the space of structures is computationally intractable as the number of DAGs is super-exponential in the number of genes Sensitive to the choice of local model Constraint-based conditional independence approaches, PC by Spirtes et al. 93 Instead of searching the space of models, it starts from the complete undirected graph, then thins this graph by removing edges with zero order conditional independence relations, thins again with first order conditional independence relations and so on so force. Slow when dealing with large amount of variables

PC Algorithm

D-Separation X is d-separated from Y, given Z, if all paths from a node in X to a node in Y are blocked, given Z. let p be any path between a vertex in X and a vertex in Y, Z is said to block p if there is a vertex w on p satisfying one of the following:  w has converging arrows along p, and neither w nor any of its descendants are on Z or  w does not have converging arrows along p, and w is in Z. Equally we can say that X and Y are independent conditional on Z.

Path Blockage A path is active, given evidence Z, if Whenever we have the configuration B or one of its descendents are in Z No other nodes in the path are in Z A path is blocked, given evidence Z, if it is not active. A C B