Association Rules with Graph Patterns Yinghui Wu Washington State University Wenfei Fan Jingbo Xu University of Edinburgh Southwest Jiaotong University.

Slides:

Advertisements

Similar presentations

1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.

Advertisements

Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/ Simulation Revised for Graph Pattern Matching.

Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.

New Models for Graph Pattern Matching Shuai Ma ( 马帅 )

The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.

Integrated Coverage and Connectivity Configuration in Wireless Sensor Networks Xiaorui Wang, Guoliang Xing, Yuanfang Zhang*, Chenyang Lu, Robert Pless,

Nadia Andreani Dwiyono DESIGN AND MAKE OF DATA MINING MARKET BASKET ANALYSIS APLICATION AT DE JOGLO RESTAURANT.

Detecting Phantom Nodes in Wireless Sensor Networks Joengmin Hwang Tian He Yongdae Kim Department of Computer Science, University of Minnesota, Minneapolis.

Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.

1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Dept. of Computer Science & Engineering, CUHK1 Trust- and Clustering-Based Authentication Services in Mobile Ad Hoc Networks Edith Ngai and Michael R.

Dynamic Hypercube Topology Stefan Schmid URAW 2005 Upper Rhine Algorithms Workshop University of Tübingen, Germany.

The max-divergence of E’ is: Intuitively, p-divergence of d means that the probability of at least X E’,p edges occurring p-recently is 1/d A (maximal)

Fast Algorithms for Association Rule Mining

Research Project Mining Negative Rules in Large Databases using GRD.

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.

Mining Association Rules

DIDS part II The Return of dIDS 2/12 CIS GrIDS Graph based intrusion detection system for large networks. Analyzes network activity on networks.

Yinghui Wu LFCS Lab Lunch Homomorphism and Simulation Revised for Graph Matching.

Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.

Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.

Yinghui Wu, SIGMOD 2012 Query Preserving Graph Compression Wenfei Fan 1,2 Jianzhong Li 2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute.

Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology.

Complexity of Bellman-Ford Theorem. The message complexity of Bellman-Ford algorithm is exponential. Proof outline. Consider a topology with an even number.

Graph and Topological Structure Mining on Scientific Articles Fan Wang, Ruoming Jin, Gagan Agrawal and Helen Piontkivska The Ohio State University The.

Secure Incremental Maintenance of Distributed Association Rules.

Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.

Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.

Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.

Efficient Backbone Construction Methods in MANETs Using Directional Antennas 1 Shuhui Yang, 1 Jie Wu, 2 Fei Dai 1 Department of Computer Science and Engineering.

Association Rule Mining in Peer-to-Peer Systems Ran Wolff Assaf Shcuster Department of Computer Science Technion I.I.T. Haifa 32000,Isreal.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.

The Sweet Spot between Inverted Indices and Metric-Space Indexing for Top-K–List Similarity Search Evica Milchevski , Avishek Anand ★ and Sebastian Michel.

Yinghui Wu, ICDE Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh.

Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.

University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.

Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.

Distributed Graph Simulation: Impossibility and Possibility 1 Yinghui Wu Washington State University Wenfei Fan University of Edinburgh Southwest Jiaotong.

1 QSX: Querying Social Graphs Approximate query answering Query-driven approximation Data-driven approximation Graph systems.

Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.

KAIS T On the problem of placing Mobility Anchor Points in Wireless Mesh Networks Lei Wu & Bjorn Lanfeldt, Wireless Mesh Community Networks Workshop, 2006.

Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.

Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.

Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, Kevin.

Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.

An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.

Parallelizing Functional Tests for Computer Systems Using Distributed Graph Exploration Alexey Demakov, Alexander Kamkin, and Alexander Sortov

Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.

Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.

CPT-S Advanced Databases 11 Yinghui Wu EME 49.

© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.

CPT-S Advanced Databases 11 Yinghui Wu EME 49.

Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.

Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.

CPT-S Advanced Databases 11 Yinghui Wu EME 49.

Answering pattern queries using views

Parallelizing Sequential Graph Computations

A Study of Group-Tree Matching in Large Scale Group Communications

Discover How Your Business Can Benefit from a Facebook Fanpage

CARPENTER Find Closed Patterns in Long Biological Datasets

Simulation based approach Shang Zechao

Pei Lee, ICDE 2014, Chicago, IL, USA

Resource Allocation for Distributed Streaming Applications

Presentation transcript:

Association Rules with Graph Patterns Yinghui Wu Washington State University Wenfei Fan Jingbo Xu University of Edinburgh Southwest Jiaotong University Xin Wang

Association in social networks Conventional association rules: item set 2 Traditional association rules: X ⇒ Y Association in social networks “if customers x and x′ are friends living in the same city c, there are at least 3 French restaurants in c that x and x′ both like, and if x′ visits a newly opened French restaurant y in c, then x may also visit y.” x x’ French restaurant city French 3 restaurant y Association with topological constraints! Identify customers for a new French restaurant visit

Association via graph patterns more involved than rules for itemsets! 3 acc #1 “fake” acc #2 “claim a prize” (keywords) article blog detect fake account Question 1: How to define association rule via graph patterns? Question 2: How to discovery interesting rules? Question 3: How to use the rules to identify customers? Identify customers for released album x x1x1 Ecuador “Shakira” album x2x2

Graph Pattern Association Rules (GPARs) 4 graph-pattern association rule (GPAR) R(x, y) R(x, y): Q(x, y) ⇒ q(x, y) Q(x, y): a graph pattern; where x and y are two designated nodes q(x, y) is an edge labeled q from x to y (predicate) Q and q as the antecedent and consequent of R, respectively. R(x, French restaurant ): x x’ French restauran t city French 3 restaurant x x’ French restauran t city French 3 restaurant ⇒ : x French restauran t Q(x, French restaurant ) ⇒ like(x, French restaurant )

Graph Pattern Association Rules (GPARs) 5 graph-pattern association rule (GPAR) R(x, y) R(x, y): Q(x, y) ⇒ q(x, y) If there exists a match h that identifies v x and v y as matches of the designated nodes x and y in Q, respectively, then the consequent q(v x, v y ) will likely hold. R(x, French restaurant ): x x’ French restauran t city French 3 restaurant x x’ French restauran t city French 3 restaurant ⇒ : x French restauran t Q(x, French restaurant ) ⇒ like(x, French restaurant ) “if x and x′ are friends living in city c, there are at least 3 French restaurants in c that x and x′ both like, and if x′ visits a newly opened French restaurant y in c, then x may also visit y.”

Support and Confidence 6 Support of R(x, y): Q(x,y) p(x,y) ⇒ u1u1 Le Bernadin New York (city) French 3 restaurant u2u2 u3u3 Per se French 3 restaurant ⇒ x x’ French restauran t city French 3 restaurant # of isomorphic subgraph in single graph?

Support and Confidence 7 Confidence of R(x, y) Candidate # of x with one edge of type q but is not a match for q(x,y) Local closed world assumption v2v2 x x1x1 Ecuador Shakira album x2x2 v v1v1 Ecuador Shakira album MJ’s album v' v'' hobby Pop music “positive” “negative” “unknown”

Discover GPARs 8 The diversified mining problem: Given a graph G, a predicate q(x, y), a support bound and positive integers k and d, find a set S of k nontrivial GPARs pertaining to q(x, y) such that (a) F(S) is maximized; and (b) for each GPAR R ∈ S, supp(R,G) ≥ α and r(PR, x) ≤ d. Mining GPARs for particular event – often leads to same group of entities Difference function Bi-criteria Diversification function x x French restaurant city French 2 restaurant y u1u1 Le Bernadin New York (city) u2u2 u3u3 Per se French 3 restaurant u4u4 Patina LA (city) u5u5 u6u6 French 3 restaurant Asian restaurant X’ x French restaurant city Asian restaurant y Diversified GPARs

Parallel GPAR Discovery 9 A parallel GPAR discovery algorithm DMine coordinator S c divides G into n-1 fragments, each assigned to a processor S i discovers GPARs in parallel by bulk synchronous processing in d rounds S c posts a set M of GPARs to each processor each processor generates GPARs locally by extending those in M new GPARs are collected and assembled by S c in the barrier synchronization phase; S c incrementally updates top-k GPARs set L k coordinator worker 1worker 2worker n … R1R1 R2R2 … RkRk Assembled New GPARs M i+1 LkLk 1. Distribute M i 2. Locally expand M i 3. Synchronization/ Update L k

Identifying entities using GPARs 10 Given a set of GPARs Σ pertaining to the same q(x,y), graph G and confidence threshold η, the set of entities identified by Σ is the set

Entity Identification algorithm 11

Experimental study 12 We used real-world social networks Pokec: 1.63 million nodes of 269 different types, and 30.6 million edges of 11 types (follow, like) Google+: 4 million entities of 5 types and 53.5 million links of 5 types. Algorithms: DMine vs DMine without optimization and GRAMI* Match c vs disVF2, Match and Match S * M. Elseidy, E. Abdelhamid, S. Skiadopoulos, and P. Kalnis.GRAMI: frequent subgraph and pattern mining in a single large graph. PVLDB, 7(7):517–528, 2014.

Scalability of discovery algorithm 13 On average 3.2 time faster

Scalability of Entity identification 14 On average 3.53 time faster Find matches for 24 GPARs over G of size 150M in 45s, using 20 processors

Conclusion and future work 15 Association rules in social and information networks are more involved We propose association rules with graph patterns (GPARs) from syntax, semantics to support and confidence. We studied DMP and EIP problem, for GPARs discovery and identifying potential customers with GPARs. We show that it is feasible to discover and make practical use of GPARs over large networks Future work Extend GPARs by supporting graph patterns as consequent Allow more types of graph pattern matching semantics

16 Thank you! Association Rules with Graph Patterns

GPARs discovery examples 17 usr 2 Disco listen to music usr usr 1 party x x’ Computer science CMU usr 2 usr Professional development personal development “if x follows user1, user1 follows user2, user2 follows x, user1 and user2 share the hobby to listen to music, x and user1 share the hobby of party, and if user2 likes Disco music, then x likes Disco. “ “if x follows x′, both x and x′ went to CMU, both x and x′ are employees of Microsoft, and if x′ was majored in CS, then x was also likely majored in CS.” “if x and x′ follow each other and both like books of profession development, and if x′ likes books about personal development, then so does x.”

GPARs with Different confidence measures 18

Pattern matching semantics 19 We use subgraph pattern defined by subgraph isomorphism Match Q(G): the set of all isomorphic subgraphs Q(x, G): {u 1, u 2, u 3 } u1u1 Le Bernadin New York (city) French 3 restaurant u2u2 u3u3 Per se French 3 restaurant ⇒ x x’ French restauran t city French 3 restaurant

Discovery Algorithm 20 Key components: Local generation of GPARs: expand an existing GPAR by adding new (frequent)edges construct local support and confidence as messages Message Assembling assembles global support/confidence for the same GPAR R retain R with high support in a cached new GPARs set Incremental Maintenance of Lk select two GPARs Ri, Rj that maximizes a revised object function simulates max-dispersion problem Optimization Candidate GPARs filtering using estimated confidence upperbound Automorphism checking via bisimulation as preprocessing x x’ French restaurant city French 3 restaurant