Big Data Analytics: HW#2

Slides:

Advertisements

Similar presentations

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.

Advertisements

A distributed method for mining association rules

CSE 634 Data Mining Techniques

Association rules and frequent itemsets mining

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.

Mining Multiple-level Association Rules in Large Databases

Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.

Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.

Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms

Ex. 11 (pp.409) Given the lattice structure shown in Figure 6.33 and the transactions given in Table 6.24, label each node with the following letter(s):

Data Mining Association Analysis: Basic Concepts and Algorithms

Data Mining Association Analysis: Basic Concepts and Algorithms

6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.

1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.

Fast Algorithms for Association Rule Mining

Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.

Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.

Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.

Ch5 Mining Frequent Patterns, Associations, and Correlations

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.

IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.

Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.

Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

Association rule mining Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf). Assume all data.

Association rule mining Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf). Assume all data.

Association Rule Mining

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.

Charles Tappert Seidenberg School of CSIS, Pace University

Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.

 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.

HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.

Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.

Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.

Improvement of Apriori Algorithm in Log mining Junghee Jaeho Information and Communications University,

By Shivaraman Janakiraman, Magesh Khanna Vadivelu.

Mining Association Rules in Large Database This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed.

Stats 202: Statistical Aspects of Data Mining Professor Rajan Patel

MapReduce MapReduce is one of the most popular distributed programming models Model has two phases: Map Phase: Distributed processing based on key, value.

Data Mining Find information from data data ? information.

Mining Dependent Patterns

Data Mining Association Analysis: Basic Concepts and Algorithms

Data Mining: Concepts and Techniques

Homework Assignment #1 J. H. Wang Oct. 11, 2016.

Data Mining: Concepts and Techniques

Association rule mining

Homework #2 J. H. Wang Oct. 19, 2017.

Mining Association Rules

Big Data Analytics: HW#3

Frequent Pattern Mining

Waikato Environment for Knowledge Analysis

Chapter 6 Tutorial.

Big Data Analytics: HW#2

Targeted Association Mining in Time-Varying Domains

Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak

Association Rule Mining

Transactional data Algorithm Applications

Proposal for Term Project Operating Systems, Fall 2018

732A02 Data Mining - Clustering and Association Analysis

Frequent patterns and Association Rules

MIS2502: Data Analytics Association Rule Mining

Market Basket Analysis and Association Rules

Department of Computer Science National Tsing Hua University

Homework #2 J. H. Wang Oct. 18, 2018.

Charles Tappert Seidenberg School of CSIS, Pace University

Presentation transcript:

Big Data Analytics: HW#2 By J. H. Wang Mar. 31, 2017

Homework #2 Chap.6: (* denotes the optional implementation projects) 6.8 6.14 *6.7,*6.15(a) (* denotes the optional implementation projects) Due: 2 weeks (Apr. 14, 2017)

Exercises for Chap.6 6.8: A database has four transactions. Let min_sup=60% and min_conf=80%. Cust_ID TID Items_bought (in the form of brand-item_category) 01 T100 {King’s-Crab, Sunset-Milk, Dairyland-Cheese, Best-Bread} 02 T200 {Best-Cheese, Dairyland-Milk, Goldenfarm-Apple, Tasty-Pie, Wonder-Bread} T300 {Westcoast-Apple, Dairyland-Milk, Wonder-Bread, Tasty-Pie} 03 T400 {Wonder-Bread, Sunset-Milk, Dairyland-Cheese}

(a) At the granularity of item_category, (e. g (a) At the granularity of item_category, (e.g. item_i could be ”Milk”), for the rule template,  Xtransaction, buys(X,item1)buys(X,item2)=> buys(X,item3) [s,c] list the frequent k-itemset for the largest k, and all the strong association rules (with their support s and confidence c) containing the frequent k- itemset for the largest k. (b) At the granularity of brand-item_category, (e.g. item_i could be ”Sunset- Milk”), for the rule template,  Xcustomer, buys(X,item1)buys(X,item2)=> buys(X,item3) list the frequent k-itemset for the largest k (but do not print any rules).

6.14: The following contingency table summarizes supermarket transaction data, where hot dogs refers to the transactions containing hot dogs, !(hot dogs) refers to the transactions that do not contain hot dogs, hamburgers refers to the transactions containing hamburgers, !(hamburgers) refers to the transactions that do not contain hamburgers. hot dogs !(hot dogs) total hamburgers 2000 500 2500 !(hamburgers) 1000 1500 Total 3000 5000

(a) suppose that the association rule “hot dogs => hamburgers” is mined. Given a minimum support threshold of 25% and a minimum confidence threshold of 50%, is this association rule strong? (b) Based on the given data, is the purchase of hot dogs independent of the purchase of hamburgers? If not, what kind of correlation relationship exists between the two? (c) Compare the use of the all_confidence, max_confidence, Kulczynski, and cosine measures with lift and correlation on the given data.

Implementation Projects *6.7: Using a programming language that you are familiar with, such as C++ or Java, implement two frequent mining algorithms introduced in this chapter: (1) Apriori, (2) FP-Growth. Compare the performance of each algorithm with various kinds of large data sets. Write a report to analyze the situations (e.g., data size, data distribution, minimal support threshold setting, and pattern density) where one algorithm may perform better than the others, and state why.

6. 15: The DBLP data set (www. informatik. uni-trier *6.15: The DBLP data set (www.informatik.uni-trier.de/~ley/db/) consists of over one million entries of research papers published in computer science conferences and journals. Among these entries, there are a good number of authors that have coauthor relationships. (a) Propose a method to efficiently mine a set of coauthor relationships that are closely correlated (e.g. often coauthoring papers together).

Note on Optional Implementation Projects Completion of each implementation project is considered extra bonus Implementation has to be done individually (one person per team) You can use any programming language to implement

Homework Submission For hand-written exercises, please hand in your homework in class (paper version) Remember to write your student ID For implementation projects, please submit a compressed file containing your source codes, and documentation on how to compile, install, or configure the environment.

Thanks for Your Attention!