Download presentation
Presentation is loading. Please wait.
1
New York University EDBT’98 Department of Computer Science Courant Institute of Mathematical Sciences New York University Title Name Department of Computer Science Courant Institute of Mathematical Sciences New York University http://www/? Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set Dao-I Lin and Zvi M. Kedem
2
New York University EDBT’98 Overview uThe importance of maximum frequent set uStructural properties uTraditional one-way search algorithms uPincer-Search algorithm uExperiments on synthetic and census databases uConclusions
3
New York University EDBT’98 Setting uBasic terms: 1,2, …, n: The set of all items Transaction: A set of items Database: A set of transactions User-defined threshold (supp min ): A number in [0,1] Frequent itemset: A combination of items (an itemset) occurring in at least supp min fraction of the database uMaximum frequent set An itemset is frequent if and only if it is a subset a maximal frequent itemset Maximum frequent set: The set of all maximal frequent itemsets uDiscovering the maximum frequent set is a key problem in many data mining applications Association rules, strong rules, episodes, and minimal keys
4
New York University EDBT’98 An Example Database TransactionIitemset 1{1,2,3,5} 2{1,5} 3{1,2} 4{1,2,3} Set supp min to 0.5 Frequent itemsets are {1}, {2}, {3}, {5}, {1,2}, {1,3}, {1,5}, {2,3}, and {1,2,3} since they occur in at least 2 out of 4 transactions Maximum frequent set is {{1,2,3},{1,5}} {1,2,3,4,5} {1,2,3} {1,2}{1,3}{2,3}{1,5} {1}{2}{3} {4}{5}
5
New York University EDBT’98 An Example Database TransactionItemset 1{1,2,3,4,5} 2{1,3} 3{1,2} 4{1,2,3,4} Set supp min to 0.5 Frequent itemsets are {1}, {2}, {3}, {4}, {1,2}, {1,3}, {1,4}, {2,3}, {2,4}, {3,4}, {1,2,3}, {1,2,4}, {1,3,4}, {2,3,4}, and {1,2,3,4} since they occur in at least 2 out of 4 transactions Maximum frequent set is {{1,2,3,4}} {1,2,3,4,5} {1,2,3,4} {1,2}{1,3}{1,4}{2,3} {1}{2}{3}{4} Blue: frequent itemsets Red: maximal frequent itemsets Black: infrequent itemsets {1,2,3}{1,2,4}{1,3,4} {5} {2,3,4} {2,4}{3,4}
6
New York University EDBT’98 Setting uBasic terms: 1,2, …, n: The set of all items Transaction: A set of items Database: A set of transactions User-defined threshold (supp min ): A number in [0,1] Frequent itemset: A combination of items (an itemset) occurring in at least supp min fraction of the database uMaximum frequent set An itemset is frequent if and only if it is a subset a maximal frequent itemset Maximum frequent set: The set of all maximal frequent itemsets uDiscovering the maximum frequent set is a key problem in many data mining applications Association rules, strong rules, episodes, and minimal keys
7
New York University EDBT’98 Two Observations uLet A and B be two itemsets and A B uObservation-1: A infrequent B infrequent (if a transaction does not contain A, it cannot contain B) uObservation-2: B frequent A frequent (if a transaction contains B, it must contain A) {1,2,3,5} {1,2,5}{1,3,5}{1,4,5} {1,5}{2,5}{3,5} {5} A {4,5} {2,3,5}{2,4,5}{3,4,5} {1,2,4,5}{1,3,4,5}{2,3,4,5} {1,2,3,4} {1,2}{1,3}{2,3} {1} {2} {3} {1,2,3}{1,2,4}{1,3,4}{2,3,4} {1,4}{2,4}{3,4} B
8
New York University EDBT’98 Computing the Maximum Frequent Set uObservation-1 leads to bottom-up search algorithms, such as AIS (AIS93), Apriori (AS94), OCD (MTV94), SETM (HS95), DHP (PCY95), Partition (SON95), ML-T2+ (HF95), Sampling (T96), DIC (BMUT97), Clique (ZPOL97) uObservation-2 leads to top-down search algorithms, such as TopDown (ZPOL97), guess-and-correct (MT97) {1,2,3,4,5} {1,2,3,4}{1,2,3,5}{1,2,4,5}{1,3,4,5}{2,3,4,5} {1,2,5}{1,3,5}{1,4,5}{2,3,5}{2,4,5}{3,4,5} {1,5}{2,5}{3,5}{4,5} {1,2,3,4} {1,2,3}{1,2,4}{1,3,4}{2,3,4} {1,2}{1,3}{2,3}{1,4}{2,4}{3,4} {1}{2}{3}{4} {5} Blue: frequent itemsets Red: maximal frequent itemsets Black: infrequent itemsets {5}
9
New York University EDBT’98 Complexity of One-Way Search uFor bottom-up search, every frequent itemset is explicitly examined (in the example, until {1,2,3,4} is examined) uFor top-down search, every infrequent itemset is explicitly examined (in the example until {5} is examined) {1,2,3,4,5} {1,2,3,4}{1,2,3,5}{1,2,4,5}{1,3,4,5}{2,3,4,5} {1,2,5}{1,3,5}{1,4,5}{2,3,5}{2,4,5}{3,4,5} {1,5}{2,5}{3,5}{4,5} {1,2,3,4} {1,2,3}{1,2,4}{1,3,4}{2,3,4} {1,2}{1,3}{2,3}{1,4}{2,4}{3,4} {1}{2}{3}{4} {5} Blue: frequent itemsets Red: maximal frequent itemsets Black: infrequent itemsets {5}
10
New York University EDBT’98 {1,2,3,4,5} {1,2,3,4} {1,3,4,5}{1,2,3,5}{1,2,4,5}{2,3,4,5} {1,2,3}{1,2,4}{1,3,4}{2,3,4} {1,2,5}{1,3,5}{1,4,5}{2,3,5}{2,4,5}{3,4,5} {1,2}{1,3}{1,4}{2,3}{2,4}{3,4} {1,5} {2,5}{3,5}{4,5} {1}{2}{3}{4} {5} Blue: frequent itemsets Red: maximal frequent itemsets Black: infrequent itemsets Green: itemsets not examined Pincer Search: Combining Top-down and Bottom-up Searches uUse Observation-1 to eliminate candidates in the top-down search uUse Observation-2 to eliminate candidates in the bottom-up search uThis example shows how combining both searches could dramatically reduce the number of candidates examined the pass of reading the database
11
New York University EDBT’98 MFCS: A New Data Structure Maintained uFor bottom-up search: Candidate set (as usual) uFor top-down search: Use a new dynamically maintained data structure: maximum frequent candidate set (MFCS) uMFCS is a set of itemsets: Union of its subsets contains all known frequent itemsets Union of its subsets does not contain any currently known infrequent itemsets It is of minimum cardinality uMFCS supports efficient coordination between bottom-up and top-down searches
12
New York University EDBT’98 {1,2,3,4,5} {1,2,3,4} {1,3,4,5} {1,3,4}{1,4,5} {1,2}{1,3}{1,4}{2,3}{2,4}{3,4}{1,5}{2,5}{3,5}{4,5} {1}{2}{3}{4}{5} By {2,5} By {3,5} By {4,5} Pincer-Search: Search Path
13
New York University EDBT’98 Pincer-Search Algorithm 01. L 0 := ; k := 1; C 1 := {{ i } | i } 02. MFCS := {{1,2,...,n}}; MFS := 03. while C k 04. read database and count supports for C k and MFCS 05. MFS := MFS { frequent itemsets in MFCS } 06. determine frequent set L k and and infrequent set S k 07. use S k to update MFCS 08. generate new candidate set C k+1 (join, recover, and prune) 09. k := k +1 10. return MFS
14
New York University EDBT’98 Performance: Observations and Experiments uNon-monotone property of the maximum frequent set Both the number of candidates and the number of of frequent itemsets increase as the supp min decreases NOT true for the number of maximal frequent itemsets –If MFS is {{1,2},{2,3},{3,4}} when supp min is 9% –If supp min decreases to 6% then MFS could become {{1,2,3}} This property will NOT help bottom-up search algorithms However, this property may help the Pincer-Search algorithm uConcentrated and scattered distributions Concentrated: on each level, the frequent itemsets have many common items; the frequent items tend to cluster (Narrow and tall) Scattered: the frequent itemsets do not have many common items (Wide and flat)
15
New York University EDBT’98 Scattered Distributions
16
New York University EDBT’98 Scattered Distributions
17
New York University EDBT’98 Concentrated Distributions
18
New York University EDBT’98 Concentrated Distributions
19
New York University EDBT’98 Census Data
20
New York University EDBT’98 Conclusions uPincer-Search is good for concentrated distributions uIn general, can use Adaptive Pincer-Search uMore experiments on real-life databases needed
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.