5(I,C) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4(I,C) 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 3(I,C) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2(I,C) 0 0 0 0 0 0 0 0 0 0 0 0 0 0.

Slides:



Advertisements
Similar presentations
K nearest neighbor and Rocchio algorithm
Advertisements

x – independent variable (input)
Classification and risk prediction
More Methodology; Nearest-Neighbor Classifiers Sec 4.7.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
ROC Curves.
Recommender systems Ram Akella November 26 th 2008.
Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence.
Supervised Learning and k Nearest Neighbors Business Intelligence for Managers.
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
Chapter 9 – Classification and Regression Trees
CSE 5331/7331 Fall 2011 P-Value and Statistical Significance Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
ICS 178 Introduction Machine Learning & data Mining Instructor max Welling Lecture 6: Logistic Regression.
Computing & Information Sciences Kansas State University Friday. 30 Nov 2007CIS 560: Database System Concepts Lecture 39 of 42 Friday, 30 November 2007.
Faust is fast… takes about 15 seconds on the same dataset that takes over 9 hours with knn, 40 minutes with pTree knn. I’m ready to take on oblique, need.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
O 0 r 1 v 1 r 2 v 2 r 3 v 3 v 4 dim2 dim1 Algorithm-1: Look for dimension where clustering best. Below, dimension=1 (3 clusters: {r 1,r 2,r 3,O}, {v 1,v.
Linear Discrimination Reading: Chapter 2 of textbook.
Chapter 20 Data Analysis and Mining. 2 n Decision Support Systems  Obtain high-level information out of detailed information stored in (DB) transaction-processing.
1 Appendix D: Application of Genetic Algorithm in Classification Duong Tuan Anh 5/2014.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
1 p1 p2 p7 2 p3 p5 p8 3 p4 p6 p9 4 pa pf 9 pb a pc b pd pe c d e f a b c d e f X x1 x2 p1 1 1 p2 3 1 p3 2 2 p4 3 3 p5 6 2 p6.
Given k, k-means clustering is implemented in 4 steps, assumes the clustering criteria is to maximize intra- cluster similarity and minimize inter-cluster.
O 0 r 1 v r v3 r v2 v dim2 dim1 Anomaly detection using pTrees (AKA outlier determination?) Some pTree outlier detection papers “A P-tree-based Outlier.
Covariance matrices for all of the classes are identical, But covariance matrices are arbitrary.
6-hop myrrh example (from Damian). Market agency targeting advertising to friends of customers: Entities: 1. advertisements 2. markets 3. merchants 4.
Level-0 FAUST for Satlog(landsat) is from a small section (82 rows, 100 cols) of a Landsat image: 6435 rows, 2000 are Tst, 4435 are Trn. Each row is center.
CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.
Overview Data Mining - classification and clustering
Bootstrapped Optimistic Algorithm for Tree Construction
R r vv r m R r v v v v r r v m V v r v v r v FAUST Oblique (our best alg?) P R =P (X dot d)
Algebraic Expressions. Basic Definitions A term is a single item such as: An expression is a collection of terms d 5b -2c 3c2c3d2a 2a+3a3b-b4g-2g+g.
Introduction to Classifiers Fujinaga. Bayes (optimal) Classifier (1) A priori probabilities: and Decision rule: given and decide if and probability of.
Neural Networks References: “Artificial Intelligence for Games” "Artificial Intelligence: A new Synthesis"
Near Neighbor Classifiers and FAUST Faust is really a Near Neighbor Classifier (NNC) in which, for each class, we construct a big box neighborhood (bbn)
Data Mining Introduction to Classification using Linear Classifiers
Item-Based P-Tree Collaborative Filtering applied to the Netflix Data
Alan P. Reynolds*, David W. Corne and Michael J. Chantler
Clustering CSC 600: Data Mining Class 21.
Chapter 7. Classification and Prediction
But it's pure0 so this branch ends
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Faust is fast…  takes about 15 seconds on the same dataset that takes over 9 hours with knn, 40 minutes with pTree knn.   I’m ready to take on oblique,
CS 235 Decision Tree Classification
North Dakota State University Fargo, ND USA
PTrees (predicate Trees) fast, accurate , DM-ready horizontal processing of compressed, vertical data structures Project onto each attribute (4 files)
Exam #3 Review Zuyin (Alvin) Zheng.
we call it the bip stride=m [level=1] pMap of pM
A Fast and Scalable Nearest Neighbor Based Classification
Data Mining extracting knowledge from a large amount of data
From: Perrizo, William Sent: Thursday, February 02, :45 AM To: 'Mark Silverman' The Satlog (Landsat Satellite) data set from UCI Machine Learning.
PA>c = Pm om ... Pk+1 ok+1 Pk
North Dakota State University Fargo, ND USA
Functional Analytic Unsupervised and Supervised data mining Technology
Thresholds to use for pkmc se < 54 < ve < 59 < vi sl
The Multi-hop closure theorem for the Rolodex Model using pTrees
North Dakota State University Fargo, ND USA
Full IRIS corrected setosa versicol verginic
FAUST{pdq,std} (FAUST{pdq} using number of gap standard deviations)
pTree-k-means-classification-sequential (pkmc-s)
FAUST{pms,std} (FAUST{pms} using # gap std
PAj>c=Pj,m om...ok+1Pj,k oi is AND iff bi=1, k is rightmost bit position with bit-value "0", ops are right binding. c = bm ...
Supervised machine learning: creating a model
se se se se se se se se se se ve ve ve ve ve ve ve ve ve ve vi vi vi
FAUST{pms,std} (FAUST{pms} using # gap std
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

5(I,C) (I,C) (I,C) (I,C) Collaborative filtering, AKA customer preference prediction, AKA Business Intelligence, is critical for on-line retailers (Netflix, Amazon, Yahoo...). It's just classical classification: based on a rating history training set, predict how customer, c, would rate item, i? Use relationships to find "neighbors" to predict rating(c=3,i=5)? 5(C,I) C I 2345 C (I,C) I Rolodex Relationship model 4(C,I) C I (C,I) C I (C,I) C I (C,I) C I 2345 Binary Relationship model Find all customers whose rating history is similar to that of c=3. I.e., for each rating, k=1,2,3,4,5, find all other customers who give that rating to the movies that c=3 gives that rating to, which is k k 3 where k is a customer pTree from the relationship k(C,I). Then find the intersection of those k- CustomerSet: &k k 3 and let those resulting customers vote or predict rating(c=3,i=5) TrainingSet C I Rating (C,I) (I,C) C I C 3(I,C) I (I,C) C 5(I,C) I 2345 Multihop Relationship model

5(I,C) (I,C) (I,C) (I,C) (C,I) (I,C) C I C 4(I,C) Multihop model I C 5(I,C) Collaborative filtering (AKA: customer preference prediction or Business Intelligence) is critical for on-line retailing (e.g., Netflix, Amazon, Yahoo...). Use MRRYH to predict rating(c=3, i=5)? 5(C,I) C I 2345 C (I,C) I Rolodex model 4(C,I) C I (C,I) C I (C,I) C I (C,I) C I 2345 Binary model Approach 2: Judging that rating=3 is "no opinion", focus (count) on the middle customer axis??????

50% Satlog-Landsat stride=64, classes: redsoil cotton greysoil dampgreysoil stubble verydampgreysoil R Rir R ir Rir R ir RG R G r cl cgdsv Rclass Gir G ir G Gir G ir Gclass ir1 ir1ir ir1 ir ir1class ir2 ir2class r cl cgdsv r cgdsv r cgdsv For 50% Satlog-Landsat stride=320, we get: Note that for stride=320, the means are way off and it therefore will probably produce very inaccurate classification.. A level-0 pVector is a bit string with 1 bit per record. A level-1 pVector is a bit string with 1 bit per record stride which gives truth of a predicate applied to record stride. A n-level pTree consists of a level-k pVector (k=0...n-1) all with the same predicate and s.t. each level-k stride is a contained within one level-k-1 stride. 320-bit strides start end cls cls 320 stride _ R G ir1 ir2 cls means stds means stds means stds means stds

50% stride=64 R cls G cls ir cls ir cls

r r r v v r m r r v v v r r v m v v r v v r v APPENDIX: FAUST Oblique formula: P (X o d)<a X any set of vectors (e.g., a training class). To separate r s from v s using means_midpoint as the cut-point, calculate a as follows: a Viewing m r, m v as vectors ( e.g., m r ≡ origin  pt_m r ), a = ( m r +(m v -m r )/2 ) o d = (m r +m v )/2 o d d D≡ mrmv.D≡ mrmv. Let d = D/|D|. What if d points away from the intersection,, of the Cut-hyperplane (Cut-line in this 2-D case) and the d-line (as it does for class=V, where d = (m v  m r )/|m v  m r | ? Then a is the negative of the distance shown (the angle is obtuse so its cosine is negative). But each v o d is a larger negative number than a=(m r +m v )/2 o d, so we still want v o d < ½(m v +m r ) o d d

r r r v v r m r r v v v r r v m v v r v v r v P X o d < a = P  d i X i <a FAUST Oblique vector of stds D≡ m r  m v, d=D/|D| To separate r from v: Using the vector of stds cutpoint, calculate a as follows: d Viewing m r, m v as vectors, a = ( m r + m v ) o d std r +std v std r std r +std v std v What are the purple stds? approach-1: for each coordinate (or dimension) calculate the stds of the coordinate values and for the vector of those stds. Let's remind ourselves that the formula given Md's formula, does not require looping through the X-values but requires only one AND program across the pTrees. P X o d < a = P  d i X i <a

r r r v v r m r r v v v r r v m v v r v v r v pm r | P Xod<a = P  d i X i <a FAUST Oblique D≡ m r  m v, d=D/|D| Approach 2 To separate r from v: Using the stds of the projections, calculate a as follows: d r|r| r|r| |r|r |r|r |r|r pm v | v|v| v|v| |v|v |v|v |v|v a = pm r + (pm v -pm r ) = pstd r +pstd v pstd r pm r *pstd r + pm r *pstd v + pm v *pstd r - pm r *pstd r pstd r +pstd v By pm r, we mean this distance, m r o d, which is also mean{r o d|r  R} By pstd r, std{r o d|r  R} next? pm r + (pm v -pm r ) = pstd v +2pstd r 2pstd r pm r *2pstd r + pm r *pstd v + pm v *2pstd r - pm r *2pstd r 2pstd r +pstd v In this case the predicted classes will overlap (i.e., a given sample point may be assigned multiple classes) therefore we will have to order the class predictions.

FAUST Satlog evaluation R G ir1 ir2 mn R G ir1 ir2 std Oblique level-0 using midpoint of means 1's 2's 3's 4's 5's 7's True Positives: False Positives: NonOblique lev-0 1's 2's 3's 4's 5's 7's True Positives: Class Totals-> NonOblq lev-1 50% 1's 2's 3's 4's 5's 7's True Positives: False Positives: Oblique level-0 using means and stds of projections (w/o cls elim) 1's 2's 3's 4's 5's 7's True Positives: False Positives: Oblique lev-0, means, stds of projections (w cls elim in order) Note that none occurs 1's 2's 3's 4's 5's 7's True Positives: False Positives: a = pm r + (pm v -pm r ) = pstd v +2pstd r 2pstd r pm r *pstd v + pm v *2pstd r pstd r +2pstd v Oblique level-0 using means and stds of projections, doubling pstd 1's 2's 3's 4's 5's 7's True Positives: False Positives: Oblique lev-0, means, stds of projs, doubling pstd r, classify, eliminate in 2,3,4,5,7,1 ord 1's 2's 3's 4's 5's 7's True Positives: False Positives: So the number of FPs is drastically reduced and TPs somewhat reduced. Is that better? If we parameterize the 2 (doubling) and adjust to max TPs and min FPs, what is the optimal multiplier parameter value? Next, low-to-high std elimination ordering. Oblique lev-0, means,stds of projs, doubling pstd r, classify, elim 3,4,7,5,1,2 ord 1's 2's 3's 4's 5's 7's True Positives: False Positives: above=(std+stdup)/gap below=(std+stddn)/gapdn suggest ord abv below abv below abv below abv below avg red green ir1 ir2 cls avg s1/(2s1+s2) elim ord: TP: FP: tot TP s1/(s1+s2) FP TP 2s1/(2s1+s2) FP no elim ord TP 2s1/(2s1+s2) FP TP 2s1/(2s1+s2) FP TP 2s1/(2s1+s2) FP TP s1/(s1+s2) FP level1 50%

Can MYRRH classify? (pixel classification?) Try 4-hop using attributes of IRIS(Cls,SL,SW,PL,PW) stride=10 level-1 val SL SW PL PW setosa setosa setosa setosa setosa versicolor versicolor versicolor versicolor versicolor virginica virginica virginica virginica virginica SL SW PL rnd(PW/10) PL SW PW PW SL SL CLS se ve vi C={se} A={3,4} A  C confident? = 1/2 ct( & pw  & sw  A R sw S pw & sl  & cls  C U cls T sl )/ ct(& pw  & sw  A R sw S pw ) R S T U pl={1,2} pl={1}

1-hop: IRIS(Cls,SL,SW,PL,PW) stride=10 level-1 val SL SW PL PW setosa setosa setosa setosa setosa versicolor versicolor versicolor versicolor versicolor virginica virginica virginica virginica virginica SL SW PL rnd(PW/10) SW C={se} A={3,4} 1-hop A  C is more confident: = 1 R sw= {3,4} CLS se ve vi ct(R A & cls  {se} R cls ) / ct(R A ) sw= {3,4} sw= {3,4} But what about just taking R {class} ? Gives {3,4}  se {2,3}  ve {3}  vi This is not very differentiating of class. Include the other three? SW CLS se ve vi SL CLS se ve vi PL CLS se ve vi PW CLS se ve vi {4,5}  se{5,6}  ve{6,7}  vi {3,4}  se{2,3}  ve{3}  vi {1,2}  se{3,4,5}  ve{5,6}  vi {0}  se{1,2}  ve{1,2}  vi These rules were derived from the binary relationships only. A minimal Decision Tree Classifier suggested by the rules: / \ PW=0 else | sePL  {3,4} & SW=2 & SL=5 else | ve 2 of 3 of: else PL  {3,4,5} | SW={2,3} vi SL={5,6} | ve I was hoping for a "Look at that!" but it didn't happen ;-)

2-hop stride=10 level-1 val SL SW PL PW setosa setosa setosa setosa setosa versicolor versicolor versicolor versicolor versicolor virginica virginica virginica virginica virginica SL SW PL rnd(PW/10) PL SL SL CLS se ve vi T U ct(OR pl  A T pl & cls  C U cls ) / ct(OR pl  A T pl ) A={1,2} C={se} =1 Mine out all confident se-rules with minsup = 3/4: sl={4,5} Closure: If A  {se} is nonconfident and A  U se then B  {se} is nonconfident for all B  A. So starting with singleton A's: ct(T pl=1 & U se ) / ct(T pl=1 ) = 2/2 yes. ct(T pl=2 & U se ) / ct(T pl=2 ) = 1/1 yes. ct(T pl=3 & U se ) / ct(T pl=3 ) = 0/1 no. ct(T pl=4 & U se ) / ct(T pl=4 ) = 0/1 no. ct(T pl=5 & U se ) / ct(T pl=5 ) = 1/2 no. ct(T pl=6 & U se ) / ct(T pl=6 ) = 0/1 no. etc. A= {1,3} {1,4} {1,5} or {1,6} will yield nonconfidence and A  U se so all supersets will yield nonconfidence. A= {2,3} {2,4} {2,5} or {2,6} will yield nonconfidence but the closure property does not apply. A= {1,2} will yield confidence. I conclude that this closure property is just too weak to be useful. And also it appears from this example that trying to use myrrh to do classification (at least in this way) does not appear to be productive.

Lev2-50% stride640, classes: redsoil cotton greysoil dampgreysoil stubble verydampgreysoil RG R G 4567r cl cgdsv Rir R ir Rir R ir Rclass R r cl cgdsv Gir G ir Gir G ir Gclass G r cl cgdsv ir1ir ir ir ir1class G r cl cgdsv ir2class ir