Carson Kai-Sang Leung, Mark Anthony F. Mateo, and Dale A. Brajczuk PAKDD 2008 A Tree-based Approach for Frequent Pattern Mining from Uncertain Data.

Slides:



Advertisements
Similar presentations
Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25.
Advertisements

Salvatore Ruggieri SIGKDD2010 Frequent Regular Itemset Mining 2010/9/2 1.
Association rules and frequent itemsets mining
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System ` Introduction With the deployment of smart card automated.
Frequent Closed Pattern Search By Row and Feature Enumeration
COMP5318 Knowledge Discovery and Data Mining
1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.
Avrilia Floratou, Sandeep Tata, and Jignesh M. Patel ICDE 2010 Efficient and Accurate Discovery of Patterns in Sequence Datasets.
Edi Winarko, John F. Roddick
FP-Growth algorithm Vasiljevic Vladica,
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Data Mining Association Analysis: Basic Concepts and Algorithms
FPtree/FPGrowth (Complete Example). First scan – determine frequent 1- itemsets, then build header B8 A7 C7 D5 E3.
Mining Frequent Itemsets from Uncertain Data Presented by Chun-Kit Chui, Ben Kao, Edward Hung Department of Computer Science, The University of Hong Kong.
FP-Tree/FP-Growth Practice. FP-tree construction null B:1 A:1 After reading TID=1: After reading TID=2: null B:2 A:1 C:1 D:1.
Data Mining Association Analysis: Basic Concepts and Algorithms
FPtree/FPGrowth. FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Then use a recursive divide-and-conquer.
2001 Dimitrios Katsaros Panhellenic Conference on Informatics (ΕΠΥ’8) 1 Efficient Maintenance of Semistructured Schema Katsaros Dimitrios Aristotle University.
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
SEG Tutorial 2 – Frequent Pattern Mining.
1 Efficiently Mining Frequent Trees in a Forest Mohammed J. Zaki.
1 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010.
Frequent Pattern Mining from Time-Fading Streams of Uncertain Data Carson Kai-Sang Leung and Fan Jiang DaWaK
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
VLDB 2012 Mining Frequent Itemsets over Uncertain Databases Yongxin Tong 1, Lei Chen 1, Yurong Cheng 2, Philip S. Yu 3 1 The Hong Kong University of Science.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture
EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
Mining Frequent Patterns without Candidate Generation.
CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011.
SECURED OUTSOURCING OF FREQUENT ITEMSET MINING Hana Chih-Hua Tai Dept. of CSIE, National Taipei University.
Parallel Mining Frequent Patterns: A Sampling-based Approach Shengnan Cong.
Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01
KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
Carson Kai-Sang Leung, Boyu Hao, Fan Jiang ICDE
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
CanTree: a tree structure for efficient incremental mining of frequent patterns Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque ICDM ’ 05 報告者:林靜怡.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
Answering Tree Pattern Queries Using Views Laks V.S. Lakshmanan, Hui (Wendy) Wang, and Zheng (Jessica) Zhao University of British Columbia Vancouver, BC.
2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Date:2004/03/05 Mining Frequent Episodes for relating Financial Events and Stock Trends Anny Ng and Ada Wai-chee Fu PAKDD 2003 報告者: Ming Jing Tsai.
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jiawei Han, Jian Pei, Helen Pinto, Behzad Mortazavi-Asl, Qiming Chen,
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
CS685: Special Topics in Data Mining The UNIVERSITY of KENTUCKY Frequent Itemset Mining II Tree-based Algorithm Max Itemsets Closed Itemsets.
Fast Mining Frequent Patterns with Secondary Memory Kawuu W. Lin, Sheng-Hao Chung, Sheng-Shiung Huang and Chun-Cheng Lin Department of Computer Science.
Discovering Frequent Arrangements of Temporal Intervals Papapetrou, P. ; Kollios, G. ; Sclaroff, S. ; Gunopulos, D. ICDM 2005.
Reducing Number of Candidates
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Byung Joon Park, Sung Hee Kim
Chang-Hung Lee, Jian Chih Ou, and Ming Syan Chen, Proc
CARPENTER Find Closed Patterns in Long Biological Datasets
FP-Tree/FP-Growth Detailed Steps
Dynamic Itemset Counting
Vasiljevic Vladica, FP-Growth algorithm Vasiljevic Vladica,
Mining Frequent Itemsets over Uncertain Databases
Association Rule Mining
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Frequent-Pattern Tree
FP-Growth Wenlong Zhang.
Presentation transcript:

Carson Kai-Sang Leung, Mark Anthony F. Mateo, and Dale A. Brajczuk PAKDD 2008 A Tree-based Approach for Frequent Pattern Mining from Uncertain Data

Outline Motivation UF-Growth algorithm Construction of the UF-Tree Mining of Frequent Patterns from the UF-Tree Improvements to UF-Growth algo. Experimental Results Conslusion

Motivation Over the past decade, there have been numerous studies on mining frequent patterns from precise data. However, there are situations in which users are uncertain about the presence or absence of some items. suspicion

UF-Growth Algorithm The algorithm consists of two operations: The construction of UF-tree The mining of frequent patterns from UF-tree

Construction of the UF-Tree a : 2.7 b: c: d: e: Scan DB minsup = 1 Scan DB 1 1 1

Mining of Frequent Patterns from the UF-Tree expSup({a,e}) = (1*0.72*0.9)+(2* *0.9) = expSup({d,e}) = (1*0.72* )+(2* *0.72) = {a,e} and {d,e} are frequent {e}-projected DB

(Cont.) expSup({d,e}) in {d,e}-projected DB is = *0.72 expSup ({a,d,e})=3*0.5175*0.9= {a}, {a,d}, {a,d,e}, {a,e}, {b}, {b,c}, {c}, {d}, {d,e}, and {e} {e}-projected DB {d,e}-projected DB

Improvements to UF-Growth Algorithm The UF-tree above may appear to require a large amount of memory Improvement 1. To increase the chance of path sharing, we discretize and round the expected support of each tree node up to k dceimal places

(Cont.) 2. The iproved UF-growth does not need to bulid subsequent UF-trees for any non-singleton patterns. To enumerate all its subsets {a,e}, {a,d,e}, {d,e} with their expected supports equal 0.648, and so far. {e}-projected DB To enumerate all its subsets and {a,e}, {a,d,e}, {d,e} with their accumulative expected supports equal , and

Experimental Results

(Cont.)

Conclusion Improvement 1. method may cause false positive.