Presentation is loading. Please wait.

Presentation is loading. Please wait.

V Storage Manager Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Similar presentations


Presentation on theme: "V Storage Manager Shahram Ghandeharizadeh Computer Science Department University of Southern California."— Presentation transcript:

1 V Storage Manager Shahram Ghandeharizadeh Computer Science Department University of Southern California

2 Traces Make sure your persistent BDB is configured with 256 MB of memory. Make sure your persistent BDB is configured with 256 MB of memory. With a trace, say 21, use its “21Objs.Save” to create and populate your persistent database. Subsequently, use its “Trace21.1KGet” to debug your software. With a trace, say 21, use its “21Objs.Save” to create and populate your persistent database. Subsequently, use its “Trace21.1KGet” to debug your software.  Start with 1 thread and expand to 2, 3, and 4. Try to make your software as efficient as possible. If it is too slow (maybe because of low byte hit rates) then you may not be able to run “Trace21.1MGet”. Try to make your software as efficient as possible. If it is too slow (maybe because of low byte hit rates) then you may not be able to run “Trace21.1MGet”.

3 Questions

4 Questions Will there be another release of the workload generator before Friday? Will there be another release of the workload generator before Friday?  I do not anticipate one unless there is a bug report. Is there an obvious item missing from the current workload generator? Is there an obvious item missing from the current workload generator?  Mandatory: Invocation of the method to report cache and byte hit rates.  Optional: Dump the content of the cache to analyze the behavior of your cache replacement technique.

5 Hints BDB-Disk is a full-fledged storage manager with a buffer pool, locking, crash-recovery, index structures. BDB-Disk is a full-fledged storage manager with a buffer pool, locking, crash-recovery, index structures.  Configure its buffer pool size to be 256 MB. V Functionalities Cache Replacement BDB-DiskBDB-Mem

6 Hints Your implementation may need to keep track of different counters. Example: count the number of requests issued (and the number of requests serviced from the main-memory instance of BDB) to compute the cache hit rate. Your implementation may need to keep track of different counters. Example: count the number of requests issued (and the number of requests serviced from the main-memory instance of BDB) to compute the cache hit rate. How to do this with multiple worker threads? How to do this with multiple worker threads?

7 Hints Your implementation may need to keep track of different counters. Example: count the number of requests issued to compute the cache hit rate. Your implementation may need to keep track of different counters. Example: count the number of requests issued to compute the cache hit rate. How to do this with multiple worker threads? How to do this with multiple worker threads?  The interlocked function provides a mechanism for synchronizing access to a variable that is shared by multiple threads.  You may define a “long” variable and use InterlockedIncrement: “long cntr; InterlockedIncrement(&cntr);”  Make sure to include  Make sure to include

8 Hints To compute byte hit rates, you need to maintain two counters and increment them by the size of the referenced object. To compute byte hit rates, you need to maintain two counters and increment them by the size of the referenced object. Use “InterlockedExchangeAdd” function to perform an atomic addition of two 32 bit values. Use “InterlockedExchangeAdd” function to perform an atomic addition of two 32 bit values.  Example: a = a + b;  InterlockedExchangeAdd(&a, &b); Other Interlocked methods might be useful to you, such as InterlockedExchangePointer. Other Interlocked methods might be useful to you, such as InterlockedExchangePointer.

9 Hints With invocation of methods, local variables are pushed on the stack of a thread. With invocation of methods, local variables are pushed on the stack of a thread.  4 different threads invoking a method will have 4 different sets of mutually exclusive local variables as declared by that method. Foo(){ Char res[200]; Int cntr; …}  A global variable is not part of the stack and must be protected when multiple threads are manipulating it. How?

10 Hints With invocation of methods, local variables are pushed on the stack of a thread. With invocation of methods, local variables are pushed on the stack of a thread.  4 different threads invoking a method will have 4 different sets of mutually exclusive local variables as declared by that method. Foo(){ Char res[200]; Int cntr; …}  A global variable is not part of the stack and must be protected when multiple threads are manipulating it. How?  Consider making it a variable local to a method. Ask: Does this variable have to be global?  Use critical sections.  Manage memory.

11 Hints With invocation of methods, local variables are pushed on the stack of a thread. With invocation of methods, local variables are pushed on the stack of a thread.  4 different threads invoking a method will have 4 different sets of mutually exclusive local variables as declared by that method. Foo(){ Char res[200]; Int cntr; …}  Similarly, memory allocated from the heap (new/malloc) is not a part of the stack and must be managed.  No memory-leaks.

12 Hints Consider an admission control technique. Consider an admission control technique.  Without admission control:  Everytime an object is referenced and it is not in memory then you place it in memory.  With admission control:  Every time a disk resident object is referenced, compare its Q value with the minimum Q value to see if it should be admitted into memory.

13 Fast Algorithms for Mining Association Rules (by R. Agrawal and R. Srikant) Shahram Ghandeharizadeh Computer Science Department University of Southern California

14 Terminology Objective: Discover association Rule over basket data. Objective: Discover association Rule over basket data. Example: 98% of customers who purchase tires and auto accessories also get automotive services done. Example: 98% of customers who purchase tires and auto accessories also get automotive services done. Motivation: valuable for cross-marketing and attached mailing applications. Motivation: valuable for cross-marketing and attached mailing applications.  Watch Googlezon, http://www.youtube.com/watch?v=AT9ho2G0N_Y Requirements: Requirements:  Fast algorithms,  Must manipulate large data sets.

15 Problem Statement

16 Terminology Association rule X  Y has confidence c, Association rule X  Y has confidence c, Out of those transactions that contain X, c% also contain Y. Association rule X  Y has support s, Association rule X  Y has support s, s% of transactions in D contain X and Y. Note:  X  A doesn’t mean X+Y  A  May not have minimum support  X  A and A  Z doesn’t mean X  Z  May not have minimum confidence

17 Example I = {beer, chips, salsa, nail-polish, toothpaste, toilet- paper} I = {beer, chips, salsa, nail-polish, toothpaste, toilet- paper} D = {T1, T2, T3, …., T9999999} D = {T1, T2, T3, …., T9999999}  T1 = {beer, chips, salsa}  T2 = {beer, toilet-paper}  T3 = {nail-polish, toothpaste} TID is the unique identifier for each transaction. TID is the unique identifier for each transaction. If X = {beer} then both T1 and T2 contain X. If X = {beer} then both T1 and T2 contain X. If X = {beer, chips} then T1 contains X. If X = {beer, chips} then T1 contains X. If X = {beer, nail-polish} then no transaction contains X. If X = {beer, nail-polish} then no transaction contains X. The rule {beer, chips} => {salsa} with confidence 90% if 90% of transactions that contain {beer, chips} also contain {salsa}. The rule {beer, chips} => {salsa} with confidence 90% if 90% of transactions that contain {beer, chips} also contain {salsa}.  NOTE: {beer, chips} intersect {salsa} is empty, satisfying the constraint of the formal problem specification. The rule {beer, chips} => {salsa} has support 75% if 75% of transactions contain {beer, chips, salsa}. The rule {beer, chips} => {salsa} has support 75% if 75% of transactions contain {beer, chips, salsa}.

18 Example (Cont…) Assume: Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} What is the confidence in {nail-polish} => {tooth-paste}? What is the confidence in {nail-polish} => {tooth-paste}?

19 Example (Cont…) Assume: Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} What is the confidence in {nail-polish} => {tooth-paste}? What is the confidence in {nail-polish} => {tooth-paste}?  100% because 5000 out of 5,000 transactions that contain {nail-polish} also contain {tooth- paste}.

20 Example (Cont…) Assume: Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} What is the confidence in {beer} => {salsa}? What is the confidence in {beer} => {salsa}?  25% because 1000 out of 5000 transactions that contain {beer} also contain {salsa}

21 Example (Cont…) Assume: Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} What is the confidence in {salsa} => {chips}? What is the confidence in {salsa} => {chips}?

22 Example (Cont…) Assume: Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} What is the confidence in {salsa} => {chips}? What is the confidence in {salsa} => {chips}?  100% because 6000 out of 6000 transactions that contain {salsa} also contain {chips}

23 Example (Cont…) Assume: Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} What is the confidence in {salsa} => {nail- polish}? What is the confidence in {salsa} => {nail- polish}?

24 Example (Cont…) Assume: Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} What is the confidence in {salsa} => {nail- polish}? What is the confidence in {salsa} => {nail- polish}?  5/6 (83.33%) because 5000 out of 6000 transactions that contain {salsa} also contain {chips}  Note:  Support for {salsa, nail-polish} is

25 Example (Cont…) Assume: Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} What is the confidence in {salsa} => {nail- polish}? What is the confidence in {salsa} => {nail- polish}?  5/6 (83.33%) because 5000 out of 6000 transactions that contain {salsa} also contain {chips}  Note:  Support for {salsa, nail-polish} is 50% (5000 out of 10000)  Support for {slasa} is

26 Example (Cont…) Assume: Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} What is the confidence in {salsa} => {nail- polish}? What is the confidence in {salsa} => {nail- polish}?  5/6 (83.33%) because 5000 out of 6000 transactions that contain {salsa} also contain {chips}  Note:  Support for {salsa, nail-polish} is 50% (5000 out of 10000)  Support for {slasa} is 60% (6000 out of 10000)  Conf = 50% / 60% = 83.33%

27 Example (Cont…) Assume: Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} What is the confidence in {beer, chips} => {toilet-paper}? What is the confidence in {beer, chips} => {toilet-paper}?

28 Example (Cont…) Assume: Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} What is the confidence in {beer, chips} => {toilet-paper}? What is the confidence in {beer, chips} => {toilet-paper}?  0% because none of the transactions satisfy this association rule.

29 Example (Cont…) Assume: Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} What is the support in {beer} => {toilet- paper}? What is the support in {beer} => {toilet- paper}?

30 Example (Cont…) Assume: Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} What is the support in {beer} => {toilet- paper}? What is the support in {beer} => {toilet- paper}?  40% because 4000 transactions (out of 10,000) contain {beer, toilet-paper}

31 Example (Cont…) Assume: Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} What is the support in {chips} => {salsa}? What is the support in {chips} => {salsa}?

32 Example (Cont…) Assume: Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} What is the support in {chips} => {salsa}? What is the support in {chips} => {salsa}?  60%, 6000 transactions contain {chips, salsa}.

33 Example Queries Compute all association rules with support and confidence greater than 55%. Compute all association rules with support and confidence greater than 55%.  Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} Answer: Answer:

34 Example Queries Compute all association rules with support and confidence greater than 55%. Compute all association rules with support and confidence greater than 55%.  Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} Answer: Answer:  {chips} => {salsa},  {salsa} => {chips}

35 Example Queries Compute all association rules with support > 30% and confidence greater than 40%. Compute all association rules with support > 30% and confidence greater than 40%.  Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} Answer: Answer:

36 Example Queries Compute all association rules with support > 30% and confidence greater than 45%. Compute all association rules with support > 30% and confidence greater than 45%.  Assume:  1000 transactions {beer, chips, salsa}  4000 transactions {beer, toilet-paper}  5000 transactions {nail-polish, tooth-paste, chips, salsa} Answer: Answer:  {chips} => {salsa},  {salsa} => {chips},  {nail-polish} => {tooth-paste},  {tooth-paste} => {nail-polish},  {nail-polish} => {chips},  {nail-polish}=>{tooth-paste},  {nail-polish} => {salsa}  ….

37 Divide the Problem into Two 1. Find all sets of items that have support above minimum support.  Itemsets with minimum support are called large itemsets and all others small itemsets.  Algorithms: Apriori and AprioriTid. 2. Use large itemsets to generate the desired rules.  For every large itemset l, find all non-empty subsets of l. Let a denote one subset.  For every subset a, output a rule of the form a => { {l} – {a} } if support(l) / support(a) is at least minconf.  Say ABCD and AB are large itemsets  Compute conf = support(ABCD) / support(AB)  If conf >= minconf AB  CD holds.

38 Conquer Focus on item 1: Focus on item 1: 1. Find all sets of items that have support above a pre-specified minimum support. Example: Example: Assume the following database: Assume the following database: Itemsets with minimum support of 2 transactions? Itemsets with minimum support of 2 transactions?

39 Conquer Focus on item 1: Focus on item 1: 1. Find all sets of items that have support above a pre-specified minimum support. Example: Example: Assume the following database: Assume the following database: Itemsets with minimum support of 2 transactions? Itemsets with minimum support of 2 transactions?

40 How? General idea: General idea:  Multiple passes over the data  First pass – count the support of individual items.  Subsequent pass  Generate Candidates using previous pass’s large itemset.  Go over the data and check the actual support of the candidates.  Stop when no new large itemsets are found.

41 How? Make several passes of DB. Make several passes of DB. Pass 1: count item occurrences to determine the large 1-itemsets. Pass 1: count item occurrences to determine the large 1-itemsets.

42 How? Make several passes of DB. Make several passes of DB. Pass 1: count item occurrences to determine the large 1-itemsets. Pass 1: count item occurrences to determine the large 1-itemsets.  Notice that {4} is missing! Pass 2: Compute the following query: Pass 2: Compute the following query: SELECT p.item1, q.item1 FROM L1 p, L1 q WHERE p.item1 < q.item1

43 How? Make several passes of DB. Make several passes of DB. Pass 1: count item occurrences to determine the large 1-itemsets. Pass 1: count item occurrences to determine the large 1-itemsets.  Notice that {4} is missing! Pass 2: Compute the priori-gen query and count the support for each by making a pass of DB. Pass 2: Compute the priori-gen query and count the support for each by making a pass of DB.

44 How? Make several passes of DB. Make several passes of DB. Pass 1: count item occurrences to determine the large 1-itemsets. Pass 1: count item occurrences to determine the large 1-itemsets.  Notice that {4} is missing! Pass 2: Compute the priori-gen query and count the support for each by making a pass of DB. Pass 2: Compute the priori-gen query and count the support for each by making a pass of DB.  Drop those with support < minsup Pass j (j >= 3): Compute candidate set using apriori-gen algorithm Pass j (j >= 3): Compute candidate set using apriori-gen algorithm

45 Apriori-gen Algorithm Intuition: Generate the candidate itemsets to be counted in a pass by using only the itemsets found large in the previous pass. Intuition: Generate the candidate itemsets to be counted in a pass by using only the itemsets found large in the previous pass.  How?  Note that when k=2, this query computes a large number of rows: the cartesian product of L1 – number of rows in L1. If L1 has 100 rows, the resulting number of rows is 9900 (10000-100).

46 Apriori-gen Algorithm Intuition: Generate the candidate itemsets to be counted in a pass by using only the itemsets found large in the previous pass. Intuition: Generate the candidate itemsets to be counted in a pass by using only the itemsets found large in the previous pass.  What is the result when k = 3? What is the SQL command?

47 Apriori-gen Algorithm Intuition: Generate the candidate itemsets to be counted in a pass by using only the itemsets found large in the previous pass. Intuition: Generate the candidate itemsets to be counted in a pass by using only the itemsets found large in the previous pass.  What is the result when k = 3? INSERT into Ck SELECT p.item1, p.item2, q.item2 FROM L2 p, L2 q WHERE p.item1 = q.item1 and p.item2 < q.item2 Result?

48 Apriori-gen Algorithm Intuition: Generate the candidate itemsets to be counted in a pass by using only the itemsets found large in the previous pass. Intuition: Generate the candidate itemsets to be counted in a pass by using only the itemsets found large in the previous pass.  What is the result when k = 3? INSERT into Ck SELECT p.item1, p.item2, q.item2 FROM L2 p, L2 q WHERE p.item1 = q.item1

49 Apriori-gen Algorithm Intuition: Generate the candidate itemsets to be counted in a pass by using only the itemsets found large in the previous pass. Intuition: Generate the candidate itemsets to be counted in a pass by using only the itemsets found large in the previous pass.  What is the result when k = 3? Computed by the SQL query. Computed by making a pass on the DB.

50 Intuition Any subset of large itemset is large. Therefore To find large k-itemset  Create candidates by combining large k-1 itemsets.  Delete those that contain any subset that is not large.

51 Assumptions & Definitions Items in each transaction are kept sorted in their lexicographic order. Items in each transaction are kept sorted in their lexicographic order. Number of items in an itemset is its size. Number of items in an itemset is its size. An itemset of size k is a k-itemset. An itemset of size k is a k-itemset. Each itemset has a count field to store the support for this itemset. Each itemset has a count field to store the support for this itemset. L k is set of large k-itemsets (those with minimum support). L k is set of large k-itemsets (those with minimum support). C k is set of candidate k-itemsets. Its members are potential members of L k. C k is set of candidate k-itemsets. Its members are potential members of L k.

52 Apriori Algorithm

53 Important detail: Important detail:  With apriori-gen, the join may compute items whose subset do NOT exist in L k-1. Prune these by deleting an item c of C k such that some (k-1)- subset of c is not in L k-1.  Example:  Let L3 be {{1 2 3}, {1 2 4}, {1 3 4}, {1 3 5}, {2 3 4}}  What is output for C 4 ?

54 Apriori Algorithm Important detail: Important detail:  With apriori-gen, the join may compute items whose subset do NOT exist in L k-1. Prune these by deleting an item c of C k such that some (k-1)- subset of c is not in L k-1.  Example:  Let L3 be {{1 2 3}, {1 2 4}, {1 3 4}, {1 3 5}, {2 3 4}}  { {1 2 3 4}, {1 3 4 5} }  Subsets of {1 2 3 4} are { {1 2 3}, {2 3 4}, {1 3 4}, {1 2 4}}  Subsets of {1 3 4 5} are { {1 3 4}, {1 3 5}, {3 4 5}, {1 4 5}}

55 Correctness Show that Show that Join extends L k-1 with all items Apriori removes those whose (k-1) subsets are not in L k-1 Prevents duplications Any subset of large itemset must also be large

56 AIS & STEM AIS & STEM generate candidate itemsets based on transactions. AIS & STEM generate candidate itemsets based on transactions.  Apriori uses the large itemsets to generate larger itemsets. Example: Example:  Let L3 be {{1 2 3}, {1 2 4}, {1 3 4}, {1 3 5}, {2 3 4}}  With AIS, in pass 4, when encountering a transaction with items {1 2 4 5}, AIS and STEM generate the following five candidate sets:  {1 2 3} => {1 2 3 4} and {1 2 3 5}  {1 2 4} => {1 2 4 5}  {1 3 4} => {1 3 4 5}  {2 3 4} => {2 3 4 5}

57 AprioriTid Uses the database only once to count support for 1-itemsets in Pass 1. Uses the database only once to count support for 1-itemsets in Pass 1. Builds a storage set C^ k Builds a storage set C^ k  Members has the form  Members has the form  X k are potentially large k-items in transaction TID.  For k=1, C^ 1 is the database. Uses C^ k in pass k+1. Uses C^ k in pass k+1. Advantages: Advantages:  C^ k could be smaller than the database.  If a transaction does not contain a candidate k-itemset, then C^ k will not have an entry for this transaction.  For large k, each entry may be smaller than the transaction  The transaction might contain only few candidates.

58 How? (Assume minsup = 2) 1. Make a pass of DB and count item occurrences to determine the large 1- itemsets.

59 How? (Assume minsup = 2) 1. Make a pass of DB and count item occurrences to determine the large 1- itemsets.

60 How? (Assume minsup = 2) 1. Make a pass of DB and count item occurrences to determine the large 1- itemsets. You are Here!

61 How? (Assume minsup = 2) 2. Construct C^1  Note that C^1 = Database You are Here!

62 How? (Assume minsup = 2) 4. Compute C2 by invoking apriori-gen You are Here!

63 How? (Assume minsup = 2) 9. Compute C2 by invoking apriori-gen You are Here!

64 How? (Assume minsup = 2) 10. Compute C^2 Notice what happened to T100 Notice what happened to T100 You are Here!

65 How? (Assume minsup = 2) 12. Compute L2 All entries of C2 with Support >= 2 You are Here!

66 How? (Assume minsup = 2) Iter 2, Step 4: Compute C3 You are Here! ?

67 How? (Assume minsup = 2) Iter 2, Step 4: Compute C3 You are Here!

68 How? (Assume minsup = 2) Iter 2, Step 9: Count Support You are Here!

69 How? (Assume minsup = 2) Iter 2, Step 10: Compute C^3 Transactions 100 and 400 are gone! You are Here!

70 How? (Assume minsup = 2) Iter 2, Step 12: Generate L3 You are Here!

71 How? (Assume minsup = 2) Iter 3, Step 4: Generate C4 You are Here! ?

72 How? (Assume minsup = 2) Iter 3, Step 4: Generate C4 Since C4 is empty, terminate the algorithm. You are Here! Empty set

73 Apriori versus Apriori-TD Sizes of the candidate sets, Ck, is smaller with Apriori-TD with larger values of k. Sizes of the candidate sets, Ck, is smaller with Apriori-TD with larger values of k. C k with Apriori & AprioriTid LkLkLkLk

74 Apriroi versus AprioriTid AprioriTid outperforms Apriori when AprioriTid outperforms Apriori when  C^ k fits in memory, and  the distribution of the large itemsets has a long tail. AprioriTid jumps Because C^k does not fit in memory

75 Execution Time Per Pass In the earlier passes, Apriori does better than AprioriTid. In the earlier passes, Apriori does better than AprioriTid. AprioriTid is better than Apriori in later passes. AprioriTid is better than Apriori in later passes.

76 Apriori & AprioriTid Similarities; both: Similarities; both:  Use the same candidate generation procedure, counting the same itemsets.  Observe a drop in the number of candidate itemsets in the later passes. Differences: Differences:  In each pass, Apriroi examine every transaction. AprioriTid scan C^k and the size of C^k becomes smaller than the database size in each pass.  When C^k fits in main memory, AprioriTid does not incur the cost of writing and reading C^k.

77 AprioriHybrid Key idea: Key idea:  Use Apriori in the initial passes  Switch to AprioriTid when it expects C^k at the end of the pass will fit in memory. How to esimtate if C^k fits in memory in the next pass? How to esimtate if C^k fits in memory in the next pass?

78 Cost of Switching Switching in the last pass incurs the cost of constructing C^ without using it. Switching in the last pass incurs the cost of constructing C^ without using it.  In the kth pass, AprioriHybird incurs the cost of constructing C^ k+1.  If there are no large (k+1)-itmesets (i.e., this is the last pass), the algorithm terminates.  With Apriori, the algorithm also terminates without making a pass of the transactions.  AprioriHybrid build C^ k+1 and then terminates.

79 Comparison AprioriHybrid is faster if there is a gradual decline in the size of C^k. AprioriHybrid is faster if there is a gradual decline in the size of C^k. AprioriHybrid switched in the last pass!

80 Comparison (Cont…) If C^k remains large until nearly the end and then has an abrupt drop then AprioriHybrid will be the same as Apriori. If C^k remains large until nearly the end and then has an abrupt drop then AprioriHybrid will be the same as Apriori.

81 Question

82 Question Why is AprioriTid worse than Apriori? Why is AprioriTid worse than Apriori?  Is AprioriTid better than Apriori for some experiment reported in this paper? If not then why?

83 Answer Why is AprioriTid worse than Apriori? Why is AprioriTid worse than Apriori?  C^k is large in the first few passes, killing the overall execution time.

84 Characteristicis For a fixed collection of system parameters (e.g., minimum support level): For a fixed collection of system parameters (e.g., minimum support level):  Response time increases linearly as a function of the number of transactions.  With larger number of items (1000 versus 10,000), the execution time decreases a little as the average support for an item decreased. Fewer itemsets provides faster execution times.

85 Rest of this Semester Project is due mid-night on Friday, April 24. Project is due mid-night on Friday, April 24. Review for midterm on April 28 th. 4 papers: Review for midterm on April 28 th. 4 papers:  Variant indexes.  Access path selection.  Overview of query optimization.  Mining Association Rules. Midterm 2 on April 30 th. Midterm 2 on April 30 th. Meeting with the teams during 1 st week of May. Meeting with the teams during 1 st week of May.  E-mail to schedule meeting to follow.


Download ppt "V Storage Manager Shahram Ghandeharizadeh Computer Science Department University of Southern California."

Similar presentations


Ads by Google