Entity Tables, Relationship Tables We Classify using any Table (as the Training Table) on any of its columns, the class label column. Medical Expert System:

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Learning Fuzzy Association Rules and Associative Classification Rules Jianchao Han Computer Science Department California State University Dominguez Hills.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
1 Data Warehousing. 2 Data Warehouse A data warehouse is a huge database that stores historical data Example: Store information about all sales of products.
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Asssociation Rules Prof. Sin-Min Lee Department of Computer Science.
DATA MINING -ASSOCIATION RULES-
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
RoloDex Model The Data Cube Model gives a great picture of relationships, but can become gigantic (instances are bitmapped rather than listed, so there.
Recommender systems Ram Akella November 26 th 2008.
Mining Association Rules
Mining Association Rules
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
1 Data Mining, Database Tuning Tuesday, Feb. 27, 2007.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Performance Improvement for Bayesian Classification on Spatial Data with P-Trees Amal S. Perera Masum H. Serazi William Perrizo Dept. of Computer Science.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Data Mining 1 Data Mining is one aspect of Database Query Processing (on the "what if" or pattern and trend end of Query Processing, rather than the "please.
Netflix We Classify using a TraingingTable on a column (possibly composite), called class label column. The Netflix Contest classification use the RentsTrainingTable,
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Association Rule Mining March 5, 2009.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Chapter 8 Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering.
Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.
Association Rule Mining
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
The Universality of Nearest Neighbor Sets in Classification and Prediction Dr. William Perrizo, Dr. Gregory Wettstein, Dr. Amal Shehan Perera and Tingda.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
The Universality of Nearest Neighbor Sets in Classification and Prediction Dr. William Perrizo, Dr. Gregory Wettstein, Dr. Amal Shehan Perera and Tingda.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Elsayed Hemayed Data Mining Course
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
Fast Similarity Metric Based Data Mining Techniques Using P-trees: k-Nearest Neighbor Classification  Distance metric based computation using P-trees.
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Chapter 14 – Association Rules and Collaborative Filtering © Galit Shmueli and Peter Bruce 2016 Data Mining for Business Analytics (3rd ed.) Shmueli, Bruce.
P Left half of rt half ? false  Left half pure1? false  Whole is pure1? false  0 5. Rt half of right half? true  1.
Item-Based P-Tree Collaborative Filtering applied to the Netflix Data
A Research Oriented Study Report By :- Akash Saxena
North Dakota State University Fargo, ND USA
All Shortest Path pTrees for a unipartite undirected graph, G7 (SP1, SP2, SP3, SP4, SP5)
K Nearest Neighbor Classification
Data Mining extracting knowledge from a large amount of data
Entity Tables, Relationship Tables is in Course Student Enrollments
Outline Introduction Background Our Approach Experimental Results
North Dakota State University Fargo, ND USA
Ensembles.
Netflix Netflix Cinematch predicts potential new ratings.
North Dakota State University Fargo, ND USA
pTrees predicate Tree technologies
Presentation transcript:

Entity Tables, Relationship Tables We Classify using any Table (as the Training Table) on any of its columns, the class label column. Medical Expert System: Using entity Training Table Patients(PID,Symptom1,Symptom2,Symptom3,Disease) and class label Disease, we can classify new patients based on symptoms using Nearest Neighbor, Model based (Decision Tree, Neural Network, SVM, SVD, etc.) classification. Netflix Contest: Using relationship TrainingTable, Rents(UID,MID,Rating,Date) and class label Rating, classify new (UID,MID,Date) tuples. How (since there are no feature columns really)? Movie User Rents UID CNAME AGE TranID Rating Date MID Mname Date Netflix Item Customer buys C# CNAME AGE TranID Count Date I# Iname Suppl Date Price Market Basket Course Student Enrollments S# SNAME GEN C# S# GR C# CNAME ST TERM Educational Medical Ward Patients Has is in PID PNAME Symptom1 Symptom2 Symptom3 Disease WID Wname Capacity We Cluster on a Table also, but usually just to "prepare" a classification TrainingTable (move up a semantic hierarchy, e.g., in Items, cluster on PriceRanges in 100 dollar intervals or to determine classes in the first place (each cluster is then declared a separate class) ). We do Association Rule Mining (ARM) on relationships (e.g., the Market Basket Research; ARM on "buys" relationship). We can let Near Neighbor Movies vote? What makes a movie, mid, "near" to MID? If mid is rated similarly to MID by users, uid 1,..., uid.. Here we use correlations for near instead of an actual distance once again. We can let Near Neighbor Users vote? What makes a user, uid, "near" to UID? If uid rates similarly to UID on movies, mid 1,..., mid. So we use correlations for near instead of distance. This is typical when classifying on a relationship table.

Key a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 =C a 11 a 12 a 13 a 14 a 15 a 16 a 17 a 18 a 19 a 20 t t t t t t t t t t t t t t t t t t t t a 5 a 6 a 10 =C a 11 a 12 a 13 a 14 distance from Workspace for the 3 nearest neighbors so far distance=2, don’t replace distance=4, don’t replace distance=4, don’t replace distance=3, don’t replace distance=3, don’t replace distance=2, don’t replace distance=3, don’t replace distance=2, don’t replace distance=1, replace t distance=2, don’t replace distance=2, don’t replace distance=3, don’t replace distance=2, don’t replace 0 1 C=1 wins! WALK THRU a kNN classification example : 3NN CLASSIFICATION of an unclassified sample, a=( a 5 a 6 a 11 a 12 a 13 a 14 ) = =( ) distance=2, don’t replace

Key a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 =C a 11 a 12 a 13 a 14 a 15 a 16 a 17 a 18 a 19 a 20 t t t t t t t t t t t t t t t t t WALK THRU of required 2 nd scan to find Closed 3NN set. Does it change vote? d=2, include it also d=4, don’t include d=4, don’t include d=3, don’t include d=3, don’t include d=2, include it also d=3, don’t include d=2, include it also d=1, already voted d=2, include it also d=2, include it also d=3, don’t replace d=2, include it also d=2, already voted d=1, already voted 0 1 Vote after 1 st scan. YES! C=0 wins now! t t a 5 a 6 a 10 =C a 11 a 12 a 13 a 14 distance t Unclassified sample: NN set after 1 st scan d=2, include it also d=2, include it also

C C C C WALK THRU: C3NN using P-trees a key t 12 t 13 t 15 t 16 t 21 t 27 t 31 t 32 t 33 t 35 t 51 t 53 t 55 t 57 t 61 t 72 t 75 a a a a a a a a a a a a a a a a a a C C a a a a a a a a a Ps Ps a a a a a a a a No neighbors at distance=0 First let all training points at distance=0 vote, then distance=1, then distance=2,... until  3 For distance=0 (exact matches) constructing the P-tree, P s then AND with P C and P C’ to compute the vote. (black denotes complement, red denotes uncomplemented

C C C C a key t 12 t 13 t 15 t 16 t 21 t 27 t 31 t 32 t 33 t 35 t 51 t 53 t 55 t 57 t 61 t 72 t 75 a a a a a a a a a a a a a a a a a a a 10 =C 1 0 a a a a a a a a a P D(s,1) a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a Construct Ptree, P S(s,1) = OR P i = P |s i -t i |=1; |s j -t j |=0, j  i = OR P S(s i,1)   S(s j,0) OR P5P5 P6P6 P 11 P 12 P 13 P 14 j  {5,6,11,12,13,14}-{i} 0 1 i= 5,6,11,12,13,14 WALK THRU: C3NNC distance=1 nbrs:

key t 12 t 13 t 15 t 16 t 21 t 27 t 31 t 32 t 33 t 35 t 51 t 53 t 55 t 57 t 61 t 72 t 75 a a a a a 10 C 1 0 a a a a OR{all double-dim interval-Ptrees}; P D(s,2) = OR P i,j P i,j = P S(s i,1)  S(s j,1)  S(s k,0) k  {5,6,11,12,13,14}-{i,j} i,j  {5,6,11,12,13,14} a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a P 5,6 P 5,11 P 5,12 P 5,13 P 5,14 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a P 6,11 P 6,12 P 6,13 P 6,14 a a a a a a a a a a a a a a a a a a a a a a a a P 11,12 P 11,13 P 11,14 a a a a a a a a a a a a a a a a P 12,13 P 12,14 We now have 3 nearest nbrs. We could quite and declare C=1 winner? a a a a a a a a P 13,14 We now have the C3NN set and we can declare C=0 the winner! WALK THRU: C3NNC distance=2 nbrs:

In this example, there were no exact matches (dis=0 nbrs or similarity=6 neighbors) for the sample. There were two nbrs found at a distance of 1 (dis=1 or sim=5) and nine dis=2, sim=4 nbrs. All 11 neighbors got an equal votes even though the two sim=5 are much closer neighbors than the nine sim=4. Also processing for the 9 is costly. A better approach would be to weight each vote by the similarity of the voter to the sample (We will use a vote weight function which is linear in the similarity (admittedly, a better choice would be a function which is Gaussian in the similarity, but, so far, it has been too hard to compute). As long as we are weighting votes by similarity, we might as well also weight attributes by relevance also (assuming some attributes are more relevant than others. e.g., the relevance weight of a feature attribute could be the correlation of that attribute to the class label). P-trees accommodate this method very well (in fact, a variation on this theme won the KDD-cup competition in 02 ( ) and is published in the so-called Podium Classification methods. Notice though that the Ptree method (Horizontal Processing of Vertical Data or HPVD) really relies on a bonafide distance, in this case Hamming Distance or L 1. However, the Vertical Processing of Horizontal Data (VPHD) doesn't. VPHD therefore works even when we use a correlation as the notion of "near". If you have been involve in the DataSURG Netflix Prize efforts, you will note that we used Ptree technology to calculate correlations but not to find near neighbors through correlations.

Movie User Rents UID CNAME AGE TID Rating Date MID Mname Date Again, Rents is also a Table ( Rents(TID, UID,MID,Rating,Date) ), using the class label column, Rating, we can classify potential new ratings. Using these predicted ratings, we can recommend Rating=5 rentals. Does ARM require a binary relationship? YES! But since every Table gives a binary relationship between any two of its columns we can do ARM on them. E.g., we could do ARM on Patients(symptom1, disease) to try to determine which diseases symptom1 alone implies (with high confidence) AND in some cases, the columns of a Table are really instances of another entity. E.g., Images, e.g., Landsat Satellite Images, LS(R, G, B, NIR, MIR, TIR), with wavelength intervals in micrometers: B=(.45,.52], G=(.52,.60], R=(.63,.69], NIR=(.76,.90], MIR=(2.08, 2.35], TIR=(10.4, 12.5] Known recording instruments (that record the number of photons detected in a small space of time in any a given wavelength range) seem to be very limited in capability (eg, our eyes, CCD cameras...). If we could record all consecutive bands (e.g., of radius.025 µm) then we would have a relationship between pixels (given by latitude-longitude) and wavelengths. Then we could do ARM on Imagery!!!! Netflix

Intro to Association Rule Mining (ARM) Given any relationship between entities, T (e.g., a set of Transactions an enterprise performs) and I (e.g., a set of Items which are acted upon by those transactions). e.g., in Market Basket Research (MBR) the transactions, T, are the checkout transactions (a customer going thru checkout) and the items, I, are the Items available for purchase in that store. The itemset, T(I), associated with (or related to) a particular transaction, T, is the subset of the items found in the shopping cart or market basket that the customer is bringing through check out at that time). an Association Rule, A  C, associates two disjoint subsets of I (called Itemsets). (A is called the antecedent, C is called the consequent) The support [set] of itemset A, supp(A), is the set of of t's that are related to every a  A, e.g., if A={i 1,i 2 } and C={i 4 } then supp(A)={t 2, t 4 } (support ratio = {t 2,t 4 }| / |{t 1,t 2,t 3,t 4,t 5 }| = 2/5 ) Note: | | means set size or count of elements in the set. The support [ratio] of rule A  C, supp(A  C), is the support of {A  C}=|{t 2,t 4 }|/|{t 1,t 2,t 3,t 4, t 5 }|=2/5 The confidence of rule A  C, conf(A  C), is supp(A  C) / supp(A) = (2/5) / (2/5) = 1 Data Miners typically want to find all STRONG RULES, A  C, with supp(A  C) ≥ minsupp and conf(A  C) ≥ minconf (minsupp, minconf are threshold levels) Note that conf(A  C) is also just the conditional probability of t being related to C, given that t is related to A). e.g., Horizontal Trans TT(I)Table t 1 i 1 t 2 i 1, i 2, i 4 t 3 i 1, i 3 t 4 i 1, i 2, i 4 t 5 i 3, i 4 T I A t1t1 t2t2 t3t3 t4t4 t5t5 i1i1 i2i2 i3i3 i4i4 C Its graph

APRIORI Association Rule Mining: Given a Transaction-Item Relationship, the APRIORI algorithm for finding all Strong I-rules can be done: 1. vertically processing of a Horizontal Transaction Table (HTT) or 2. horizontally processing of a Vertical Transaction Table (VTT). In 1., a Horizontal Transaction Table (HTT) is processed through vertical scans to find all Frequent I-sets (I-sets with support  minsupp, e.g., I-sets "frequently" found in transaction market baskets). In 2. a Vertical Transaction Table (VTT) is processed thru horizontal operations to find all Frequent I-sets Then each Frequent I-set found is analyzed to determine if it is the support set of a strong rule. Finding all Frequent I-sets is the hard part. To do this efficiently, APRIORI Algorithm takes advantage of the "downward closure" property for Frequent I-sets: If an I-set is frequent, then all its subsets are also frequent. E.g., in the Market Basket Example, If A is an I-subset of B and if all of B is in a given Transaction's basket, the certainly all of A is in that basket too. Therefore Supp(A)  Supp(B) whenever A  B. First, APRIORI scans to determine all Frequent 1-item I-sets (contain 1 item; therefore called 1-Itemsets), next APRIORI uses downward closure to efficiently find candidates for Frequent 2-Itemsets, next APRIORI scans to determine which of those candidate 2-Itemsets is actually Frequent, next APRIORI uses downward closure to efficiently find candidates for Frequent 3-Itemsets, next APRIORI scans to determine which of those candidate 3-Itemsets is actually Frequent,... Until there are no candidates remaining (on the next slide we walk through an example using both a HTT and a VTT)

Horizontal Transaction Table (HTT) minsupp is set by the querier at 1/2 and minconf at 3/4 (note minsupp and minconf can expressed as counts rather than as ratios. If so, since there are 4 transactions, then as counts, minsupp=2 and minconf=3): or a Vertical Transaction Table (VTT) (downward closure property of "frequent") Any subset of a frequent itemset is frequent. APRIORI METHOD: Iteratively find the Frequent k-itemsets, k=1,2,... Find all strong association rules supported by each frequent Itemset. (C k will denote candidate k-itemsets generated at each step. F k will denote frequent k-itemsets). ARM The relationship between Transactions and Items can be expressed in a Iset supports Frequent (supp  2) Start by finding Frequent 1-ItemSets. 2 1 s, 3 2 s, 3 3 s, 1 4, 3 5 s TID

HTT Scan D C1C1 TID C 2 Scan D C2C2 F 3 = L 3 Scan D P 1 2 //\\ 1010 P 2 3 //\\ 0111 P 3 3 //\\ 1110 P 4 1 //\\ 1000 P 5 3 //\\ 0111 Build Ptrees: Scan D L 1 ={1}{2}{3}{5} P 1 ^P 2 1 //\\ 0010 P 1 ^P 3 2 //\\ 1010 P 1 ^P 5 1 //\\ 0010 P 2 ^P 3 2 //\\ 0110 P 2 ^P 5 3 //\\ 0111 P 3 ^P 5 2 //\\ 0110 L 2 ={13}{23}{25}{35} P 1 ^P 2 ^P 3 1 //\\ 0010 P 1 ^P 3 ^P 5 1 //\\ 0010 P 2 ^P 3 ^P 5 2 //\\ 0110 L 3 ={235} F 1 = L 1 F 2 = L 2 {123} pruned since {12} not frequent {135} pruned since {15} not frequent Example ARM using uncompressed Ptrees (note: I have placed the 1-count at the root of each Ptree) C3C3 itemset {2 3 5} {1 2 3} {1,3,5} ARM

L3L3 L1L1 L2L2 1-ItemSets don’t support Association Rules (They will have no antecedent or no consequent). Are there any Strong Rules supported by Frequent=Large 2-ItemSets (at minconf=.75)? {1,3}conf{1}  {3} = supp{1,3}/supp{1} = 2/2 = 1 ≥.75 STRONG conf{3}  {1} = supp{1,3}/supp{3} = 2/3 =.67 <.75 {2,3}conf{2}  {3} = supp{2,3}/supp{2} = 2/3 =.67 <.75 conf{3}  {2} = supp{2,3}/supp{3} = 2/3 =.67 <.75 {2,5}conf{2}  {5} = supp{2,5}/supp{2} = 3/3 = 1 ≥.75 STRONG! conf{5}  {2} = supp{2,5}/supp{5} = 3/3 = 1 ≥.75 STRONG! {3,5}conf{3}  {5} = supp{3,5}/supp{3} = 2/3 =.67 <.75 conf{5}  {3} = supp{3,5}/supp{5} = 2/3 =.67 <.75 Are there any Strong Rules supported by Frequent or Large 3-ItemSets? {2,3,5}conf{2,3}  {5} = supp{2,3,5}/supp{2,3} = 2/2 = 1 ≥.75 STRONG! conf{2,5}  {3} = supp{2,3,5}/supp{2,5} = 2/3 =.67 <.75 conf{3,5}  {2} = supp{2,3,5}/supp{3,5} = 2/3 =.67 <.75 No subset antecedent can yield a strong rule either (i.e., no need to check conf{2}  {3,5} or conf{5}  {2,3} since both denominators will be at least as large and therefore, both confidences will be at least as low. No need to check conf{3}  {2,5} or conf{5}  {2,3} DONE! 2-Itemsets do support ARs. ARM

E.g., the $1,000,000 Netflix Contest was to develop a ratings prediction program that can beat the one Netflix currently uses (called Cinematch) by 10% in predicting what rating users gave to movies. I.e., predict rating(M,U) where (M,U)  QUALIFYING(MovieID, UserID). Netflix uses Cinematch to decide which movies a user will probably like next (based on all past rating history). All ratings are " 5-star " ratings (5 is highest. 1 is lowest. Caution: 0 means “ did not rate ” ). Unfortunately rating=0 does not mean that the user "disliked" that movie, but that it wasn't rated at all. Most “ ratings ” are 0. Therefore, the ratings data sets are NOT vector spaces! One can approach the Netflix contest problem as a data mining Classification/Prediction problem. A "history of ratings given by users to movies “, TRAINING(MovieID, UserID, Rating, Date) is provided, with which to train your predictor, which will predict the ratings given to QUALIFYING movie-user pairs (Netflix knows the rating given to Qualifying pairs, but we don't.) Since the TRAINING is very large, Netflix also provides a “ smaller, but representative subset ” of TRAINING, PROBE(MovieID, UserID) (~2 orders of magnitude smaller than TRAINING). Netflix gave 5 years to submit QUALIFYING predictions. That contest was won in the late summer of 2009, when the submission window was about 1/2 gone. The Netflix Contest Problem is an example of the Collaborative Filtering Problem which is ubiquitous in the retail business world (How do you filter out what a customer will want to buy or rent next, based on similar customers?). Collaborative Filtering is the prediction of likes and dislikes (retail or rental) from the history of previous expressed purchase of rental satisfactions (filtering new likes thru the historical filter of “collaborator” likes)

ARM in Netflix? Movie User Rents UID CNAME AGE Rating Date MID Mname Date ARM used for pattern mining in the Netflix data? In general we look for rating patterns (RP) that have the property RP true ==> MID r (movie, MID, is rated r). Singleton Homogeneous Rating Patterns:RP = N r Multiple Homogeneous Rating Patterns:RP = N 1 r &... & N k r Multiple Heterogeneous Rating Patterns:RP = N 1 r 1 &... & N k r k