Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 K-nearest neighbor methods William Cohen 10-601 April 2008.

Similar presentations


Presentation on theme: "1 K-nearest neighbor methods William Cohen 10-601 April 2008."— Presentation transcript:

1 1 K-nearest neighbor methods William Cohen 10-601 April 2008

2 2 But first….

3 3 Onward: multivariate linear regression Univariate Multivariate row is example col is feature

4 4 X Y

5 5

6 6

7 7 ACM Computing Surveys 2002

8 8

9 9 Review of K-NN methods (so far)

10 10 Kernel regression aka locally weighted regression, locally linear regression, LOESS, … What does making the kernel wider do to bias and variance?

11 11 BellCore’s MovieRecommender Participants sent email to videos@bellcore.com System replied with a list of 500 movies to rate on a 1-10 scale (250 random, 250 popular) –Only subset need to be rated New participant P sends in rated movies via email System compares ratings for P to ratings of (a random sample of) previous users Most similar users are used to predict scores for unrated movies (more later) System returns recommendations in an email message.

12 12 Suggested Videos for: John A. Jamus. Your must-see list with predicted ratings: 7.0 "Alien (1979)" 6.5 "Blade Runner" 6.2 "Close Encounters Of The Third Kind (1977)" Your video categories with average ratings: 6.7 "Action/Adventure" 6.5 "Science Fiction/Fantasy" 6.3 "Children/Family" 6.0 "Mystery/Suspense" 5.9 "Comedy" 5.8 "Drama"

13 13 The viewing patterns of 243 viewers were consulted. Patterns of 7 viewers were found to be most similar. Correlation with target viewer: 0.59 viewer-130 (unlisted@merl.com) 0.55 bullert,jane r (bullert@cc.bellcore.com) 0.51 jan_arst (jan_arst@khdld.decnet.philips.nl) 0.46 Ken Cross (moose@denali.EE.CORNELL.EDU) 0.42 rskt (rskt@cc.bellcore.com) 0.41 kkgg (kkgg@Athena.MIT.EDU) 0.41 bnn (bnn@cc.bellcore.com) By category, their joint ratings recommend: Action/Adventure: "Excalibur" 8.0, 4 viewers "Apocalypse Now" 7.2, 4 viewers "Platoon" 8.3, 3 viewers Science Fiction/Fantasy: "Total Recall" 7.2, 5 viewers Children/Family: "Wizard Of Oz, The" 8.5, 4 viewers "Mary Poppins" 7.7, 3 viewers Mystery/Suspense: "Silence Of The Lambs, The" 9.3, 3 viewers Comedy: "National Lampoon's Animal House" 7.5, 4 viewers "Driving Miss Daisy" 7.5, 4 viewers "Hannah and Her Sisters" 8.0, 3 viewers Drama: "It's A Wonderful Life" 8.0, 5 viewers "Dead Poets Society" 7.0, 5 viewers "Rain Man" 7.5, 4 viewers Correlation of predicted ratings with your actual ratings is: 0.64 This number measures ability to evaluate movies accurately for you. 0.15 means low ability. 0.85 means very good ability. 0.50 means fair ability.

14 14 Algorithms for Collaborative Filtering 1: Memory-Based Algorithms (Breese et al, UAI98) v i,j = vote of user i on item j I i = items for which user i has voted Mean vote for i is Predicted vote for “active user” a is weighted sum weights of n similar users normalizer

15 15 Basic k-nearest neighbor classification Training method: –Save the training examples At prediction time: –Find the k training examples (x 1,y 1 ),…(x k,y k ) that are closest to the test example x –Predict the most frequent class among those y i ’s. Example: http://cgm.cs.mcgill.ca/~soss/cs644/projects/simard/ http://cgm.cs.mcgill.ca/~soss/cs644/projects/simard/

16 16 What is the decision boundary? Voronoi diagram

17 17 Convergence of 1-NN x y x1x1 y1y1 x2x2 y2y2 neighbor P(Y|x 1 ) P(Y|x’’) P(Y|x) assume equal let y*=argmax Pr(y|x)

18 18 Basic k-nearest neighbor classification Training method: –Save the training examples At prediction time: –Find the k training examples (x 1,y 1 ),…(x k,y k ) that are closest to the test example x –Predict the most frequent class among those y i ’s. Improvements: –Weighting examples from the neighborhood –Measuring “closeness” –Finding “close” examples in a large training set quickly

19 19 K-NN and irrelevant features ++++++++oooooooooooooooooo ?

20 20 K-NN and irrelevant features + + + + + + + + o o oo o o o o o o o o o o o o o o ?

21 21 K-NN and irrelevant features + + + + + + + + o o oo o o o o o o o o o o o o o o ?

22 22 Ways of rescaling for KNN Normalized L1 distance: Scale by IG: Modified value distance metric:

23 23 Ways of rescaling for KNN Dot product: Cosine distance: TFIDF weights for text: for doc j, feature i: x i =tf i,j * idf i : #occur. of term i in doc j #docs in corpus #docs in corpus that contain term i

24 24 Combining distances to neighbors Standard KNN: Distance-weighted KNN:

25 25

26 26

27 27 William W. Cohen & Haym Hirsh (1998): Joins that Generalize: Text Classification Using WHIRL in KDD 1998: 169-173.Joins that Generalize: Text Classification Using WHIRLKDD 1998: 169-173

28 28

29 29

30 30 M1 M2 Vitor Carvalho and William W. Cohen (2008): Ranking Users for Intelligent Message Addressing in ECIR-2008, and current work with Vitor, me, and Ramnath BalasubramanyanRanking Users for Intelligent Message AddressingECIR-2008

31 31 Computing KNN: pros and cons Storage: all training examples are saved in memory –A decision tree or linear classifier is much smaller Time: to classify x, you need to loop over all training examples (x’,y’) to compute distance between x and x’. –However, you get predictions for every class y KNN is nice when there are many many classes –Actually, there are some tricks to speed this up…especially when data is sparse (e.g., text)

32 32 Efficiently implementing KNN (for text) IDF is nice computationally

33 33 Tricks with fast KNN K-means using r-NN 1.Pick k points c 1 =x 1,….,c k =x k as centers 2.For each x i, find D i =Neighborhood(x i ) 3.For each x i, let c i =mean(D i ) 4.Go to step 2….

34 34 Efficiently implementing KNN d j2 d j3 d j4 Selective classification: given a training set and test set, find the N test cases that you can most confidently classify

35 35 Train once and select 100 test cases to classify


Download ppt "1 K-nearest neighbor methods William Cohen 10-601 April 2008."

Similar presentations


Ads by Google