The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP 790-90 Seminar Spring 2011.

Slides:



Advertisements
Similar presentations
Graph Algorithms Algorithm Design and Analysis Victor AdamchikCS Spring 2014 Lecture 11Feb 07, 2014Carnegie Mellon University.
Advertisements

Lecture 15. Graph Algorithms
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Graph Mining Laks V.S. Lakshmanan
 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
gSpan: Graph-based substructure pattern mining
Analysis of Algorithms Depth First Search. Graph A representation of set of objects Pairs of objects are connected Interconnected objects are called “vertices.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
©Brooks/Cole, 2003 Chapter 12 Abstract Data Type.
Frequent Structure Mining Prajwal Shrestha Department of Computer Science The University of Vermont Spring 2015.
Mining Graphs.
Association Analysis (7) (Mining Graphs)
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 9 Instructor: Paul Beame.
COMP171 Depth-First Search.
Course Review COMP171 Spring Hashing / Slide 2 Elementary Data Structures * Linked lists n Types: singular, doubly, circular n Operations: insert,
Efficient Data Mining for Path Traversal Patterns CS401 Paper Presentation Chaoqiang chen Guang Xu.
CSE 321 Discrete Structures Winter 2008 Lecture 25 Graph Theory.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 10 Instructor: Paul Beame.
Huffman Coding Draw the tree, label all nodes w/ index, leaves w/ letter and all links(0 or 1) What is the most frequent letter? What is the least frequent.
CISC220 Fall 2009 James Atlas Nov 13: Graphs, Line Intersections.
Elementary graph algorithms Chapter 22
1 Structures and Strategies for State Space Search 3 3.0Introduction 3.1Graph Theory 3.2Strategies for State Space Search 3.3Using the State Space to Represent.
Mining Graphs with Constrains on Symmetry and Diameter Natalia Vanetik Deutsche Telecom Laboratories at Ben-Gurion University IWGD10 workshop July 14th,
1 Efficiently Mining Frequent Trees in a Forest Mohammed J. Zaki.
Review of Graphs A graph is composed of edges E and vertices V that link the nodes together. A graph G is often denoted G=(V,E) where V is the set of vertices.
Important Problem Types and Fundamental Data Structures
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
“On an Algorithm of Zemlyachenko for Subtree Isomorphism” Yefim Dinitz, Alon Itai, Michael Rodeh (1998) Presented by: Masha Igra, Merav Bukra.
Representing and Using Graphs
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 5 Graph Algorithms.
Graphs. What is a graph? A data structure that consists of a set of nodes (vertices) and a set of edges that relate the nodes to each other The set of.
Frequent Structure Mining Presented By: Ahmed R. Nabhan Computer Science Department University of Vermont Fall 2011.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
Graph Theory and Applications
Graphs A graphs is an abstract representation of a set of objects, called vertices or nodes, where some pairs of the objects are connected by links, called.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Ver Chapter 13: Graphs Data Abstraction & Problem Solving with C++
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
© 2006 Pearson Addison-Wesley. All rights reserved 14 A-1 Chapter 14 Graphs.
Trees Dr. Yasir Ali. A graph is called a tree if, and only if, it is circuit-free and connected. A graph is called a forest if, and only if, it is circuit-free.
Association Analysis (3)
2004/12/31 報告人 : 邱紹禎 1 Mining Frequent Query Patterns from XML Queries L.H. Yang, M.L. Lee, W. Hsu, and S. Acharya. Proc. of 8th Int. Conf. on Database.
Frequent Structure Mining Robert Howe University of Vermont Spring 2014.
Graph Representations And Traversals. Graphs Graph : – Set of Vertices (Nodes) – Set of Edges connecting vertices (u, v) : edge connecting Origin: u Destination:
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
CSC 213 – Large Scale Programming Lecture 31: Graph Traversals.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Gspan: Graph-based Substructure Pattern Mining
Code: BCA302 Data Structures with C Prof. (Dr.) Monalisa Banerjee By.
Graphs ORD SFO LAX DFW Graphs 1 Graphs Graphs
Data Structures & Algorithm Analysis lec(8):Graph T. Souad alonazi
Mining in Graphs and Complex Structures
Mining Frequent Subgraphs
Mining Complex Data COMP Seminar Spring 2011.
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
FP-Growth Wenlong Zhang.
Mining Frequent Subgraphs
Graphs G = (V, E) V are the vertices; E are the edges.
Graph Algorithms DS.GR.1 Chapter 9 Overview Representation
Minimum Spanning Trees (MSTs)
Important Problem Types and Fundamental Data Structures
Finding Frequent Itemsets by Transaction Mapping
GRAPHS.
Presentation transcript:

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011

COMP Data Mining: Concepts, Algorithms, and Applications 2 Mining Complex Patterns  Common Pattern Mining Tasks:  Itemsets (transactional, unordered data)  Sequences (temporal/positional: text, bioseqs)  Tree patterns (semi-structured/XML data, web mining)  Graph patterns (protein structure, web data, social network)

COMP Data Mining: Concepts, Algorithms, and Applications 3 Example Pattern Types ADCBADCB ItemsetSequence A D CB Tree A D CB Graph Can add attributes To nodes To edges Attributes Labels Type (directed or undirected) Set-valued

COMP Data Mining: Concepts, Algorithms, and Applications 4 Induced vs Embedded Sub-trees Induced Sub-trees: S = (V s, E s ) is a sub-tree of T = (V,E) if and only if V s ⊆ V e = (n x, n y ) ∊ E s iff (n x, n y ) ∊ E (n x directly connected to n y ) Embedded Sub-trees: S = (V s, E s ) is a sub-tree of T = (V,E) if and only if V s ⊆ V e = (n x, n y ) ∊ E s iff n x ≤ l n y in T (n x connected to n y ) An induced sub-tree is a special case of embedded sub- tree. We say S occurs in T and T contains S if S is an embedded sub-tree of T If S has k nodes, we call it a k-sub-tree

COMP Data Mining: Concepts, Algorithms, and Applications 5 Mining Frequent Trees Support: the support of a subtree in a database of trees, is the number of trees containing the subtree. A subtree is frequent if its support is at least the minimum support. TreeMiner: Given a database of trees (a forest) and a minimum support, find all frequent subtrees.

COMP Data Mining: Concepts, Algorithms, and Applications 6 String Representation of Trees With N nodes, M branches, F max fanout Adjacency Matrix requires: N(F+1) space Adjacency List requires: 4N-2 space Tree requires (node, child, sibling): 3N space String representation requires: 2N-1 space n0 n1 n2 n3 n4 n5 n6

COMP Data Mining: Concepts, Algorithms, and Applications 7 Tree: String Representation Like an itemset -1 as the backtrack item Assuming only labels on nodes For trees labels on edges can be treated as labels on nodes: edge-label+node-label = new label!

COMP Data Mining: Concepts, Algorithms, and Applications 8 Match labels n0 n1 n5n2 n4 n3 n6 [0,6] [1,5] [2,4] [3,3] [4,4] [5,5] [6,6] Match Label: Support: 1 Tree Subtree vector

COMP Data Mining: Concepts, Algorithms, and Applications 9 An example

COMP Data Mining: Concepts, Algorithms, and Applications 10 Generic Mining Algorithms Horizontal pattern matching based Vertical intersection based BFS or DFS

COMP Data Mining: Concepts, Algorithms, and Applications 11 Candidate Generation & Support Counting Candidate Generation Extend by a node or an edge Avoid duplicates as far as possible

COMP Data Mining: Concepts, Algorithms, and Applications 12 Trees: Systematic Candidate Generation Two subtrees are in the same class iff they share a common prefix string P up to the (k-1)th node A valid element x attached to only the nodes lying on the path from root to rightmost leaf in prefix P Not valid position: Prefix x

COMP Data Mining: Concepts, Algorithms, and Applications 13 Candidate generation Given an equivalence class of k-subtrees, how do we generate candidate (k+1)- subtrees? Main idea: consider each ordered pair of elements in the class for extension, including self extension Sort elements by node label and position

COMP Data Mining: Concepts, Algorithms, and Applications 14 Class extension

COMP Data Mining: Concepts, Algorithms, and Applications 15 Candidate Generation (Join operator) Equivalence Class Prefix: 1 2, Elements: (3,1) (4,0) Self Join New Candidates Join New Equivalence Class Prefix: Elements: (3,1) (3,2) (4,0)

COMP Data Mining: Concepts, Algorithms, and Applications 16 Candidate Generation (Join operator) Equivalence Class Prefix: 1 2, Elements: (3,1) (4,0) Join Self Join New Equivalence Class Prefix: Elements: (4,0) (4,1)

COMP Data Mining: Concepts, Algorithms, and Applications 17 Candidate Generation (Join operator) Equivalence Class Prefix: 1 2, Elements: (3,1) (4,1) Self Join New Equivalence Class Prefix: Elements: (3,1) (3,2) (4,1) (4,2) Join New Candidates

COMP Data Mining: Concepts, Algorithms, and Applications 18 Candidate Generation (Join operator) Equivalence Class Prefix: 1 2, Elements: (3,1) (4,1) Join New Equivalence Class Prefix: Elements: (3,1) (3,2) (4,1) (4,2) SelfJoin New Candidates

COMP Data Mining: Concepts, Algorithms, and Applications 19 Apriori Style TreeMiner