1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.

Slides:



Advertisements
Similar presentations
Recommender System A Brief Survey.
Advertisements

Recommender Systems & Collaborative Filtering
Item Based Collaborative Filtering Recommendation Algorithms
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
1 RegionKNN: A Scalable Hybrid Collaborative Filtering Algorithm for Personalized Web Service Recommendation Xi Chen, Xudong Liu, Zicheng Huang, and Hailong.
@ Carnegie Mellon Databases User-Centric Web Crawling Sandeep Pandey & Christopher Olston Carnegie Mellon University.
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
Web Search – Summer Term 2006 VI. Web Search - Ranking (c) Wolfgang Hürst, Albert-Ludwigs-University.
Knowledge Management with Documents
Web Search - Summer Term 2006 III. Web Search - Introduction (Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Item-based Collaborative Filtering Idea: a user is likely to have the same opinion for similar items [if I like Canon cameras, I might also like Canon.
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
CS246: Page Selection. Junghoo "John" Cho (UCLA Computer Science) 2 Page Selection Infinite # of pages on the Web – E.g., infinite pages from a calendar.
TrustWalker: A Random Walk Model for Combining Trust-based and Item-based Recommendation Mohsen Jamali & Martin Ester Simon Fraser University, Vancouver,
1 Text Mining and Information Retrieval Qiang Yang HKUST Thanks: Professor Dik Lee, HKUST.
Web Search – Summer Term 2006 IV. Web Search - Crawling (c) Wolfgang Hürst, Albert-Ludwigs-University.
1 Cross-Selling with Collaborative Filtering Qiang Yang HKUST Thanks: Sonny Chee.
1 Crawling the Web Discovery and Maintenance of Large-Scale Web Data Junghoo Cho Stanford University.
Presented By: Wang Hao March 8 th, 2011 The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Collaborative Filtering CMSC498K Survey Paper Presented by Hyoungtae Cho.
How to Crawl the Web Junghoo Cho Hector Garcia-Molina Stanford University.
1 Intelligent Crawling Junghoo Cho Hector Garcia-Molina Stanford InfoLab.
Google and the Page Rank Algorithm Székely Endre
PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University.
Item-based Collaborative Filtering Recommendation Algorithms
Presented By: - Chandrika B N
1 Announcements Research Paper due today Research Talks –Nov. 29 (Monday) Kayatana and Lance –Dec. 1 (Wednesday) Mark and Jeremy –Dec. 3 (Friday) Joe and.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
LOGO Recommendation Algorithms Lecturer: Dr. Bo Yuan
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
Anatomy of a search engine Design criteria of a search engine Architecture Data structures.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Author(s): Rahul Sami and Paul Resnick, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
1 Social Networks and Collaborative Filtering Qiang Yang HKUST Thanks: Sonny Chee.
Overview of Web Ranking Algorithms: HITS and PageRank
1 Efficient Crawling Through URL Ordering by Junghoo Cho, Hector Garcia-Molina, and Lawrence Page appearing in Computer Networks and ISDN Systems, vol.
Web Search Algorithms By Matt Richard and Kyle Krueger.
EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan.
Collaborative Filtering  Introduction  Search or Content based Method  User-Based Collaborative Filtering  Item-to-Item Collaborative Filtering  Using.
Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
1 Collaborative Filtering & Content-Based Recommending CS 290N. T. Yang Slides based on R. Mooney at UT Austin.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
1 CS 430: Information Discovery Lecture 5 Ranking.
How to Crawl the Web Hector Garcia-Molina Stanford University Joint work with Junghoo Cho.
Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
CS 440 Database Management Systems Web Data Management 1.
Efficient Crawling Through URL Ordering By: Junghoo Cho, Hector Garcia-Molina, and Lawrence Page Presenter : Omkar S. Kasinadhuni Simerjeet Kaur.
ItemBased Collaborative Filtering Recommendation Algorithms 1.
Item-Based Collaborative Filtering Recommendation Algorithms
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Jan 27, Digital Preservation Seminar1 Effective Page Refresh Policies for Web Crawlers Written By: Junghoo Cho & Hector Garcia-Molina Presenter:
1 Efficient Crawling Through URL Ordering Junghoo Cho Hector Garcia-Molina Lawrence Page Stanford InfoLab.
Automated Information Retrieval
Recommender Systems & Collaborative Filtering
The Anatomy of a Large-Scale Hypertextual Web Search Engine
CS 440 Database Management Systems
Bring Order to The Web Ruey-Lung, Hsiao May 4 , 2000.
Information retrieval and PageRank
ITEM BASED COLLABORATIVE FILTERING RECOMMENDATION ALGORITHEMS
Knowledge Management with Documents
Presentation transcript:

1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee

2 Motivation Question: A user bought some products already what other products to recommend to a user? Collaborative Filtering (CF) Automates “circle of advisors”. +

3 Collaborative Filtering “..people collaborate to help one another perform filtering by recording their reactions...” (Tapestry) Finds users whose taste is similar to you and uses them to make recommendations. Complimentary to IR/IF. IR/IF finds similar documents – CF finds similar users.

4 Example Which movie would Sammy watch next? Ratings 1--5 If we just use the average of other users who voted on these movies, then we get Matrix= 3; Titanic= 14/4=3.5 Recommend Titanic! But, is this reasonable?

5 Types of Collaborative Filtering Algorithms Collaborative Filters Open Problems Sparsity, First Rater, Scalability

6 Statistical Collaborative Filters Users annotate items with numeric ratings. Users who rate items “similarly” become mutual advisors. Recommendation computed by taking a weighted aggregate of advisor ratings.

7 Basic Idea Nearest Neighbor Algorithm Given a user a and item i First, find the the most similar users to a, Let these be Y Second, find how these users (Y) ranked i, Then, calculate a predicted rating of a on i based on some average of all these users Y How to calculate the similarity and average?

8 Statistical Filters GroupLens [Resnick et al 94, MIT] Filters UseNet News postings Similarity: Pearson correlation Prediction: Weighted deviation from mean

9 Pearson Correlation

10 Pearson Correlation Weight between users a and u Compute similarity matrix between users Use Pearson Correlation (-1, 0, 1) Let items be all items that users rated

11 Prediction Generation Predicts how much user a likes an item i (a stands for active user) Make predictions using weighted deviation from the mean : sum of all weights (1)

12 Error Estimation Mean Absolute Error (MAE) for user a Standard Deviation of the errors

13 Example SammyDylanMathew Sammy Dylan Mathew Users Correlation =0.83

14 Open Problems in CF “Sparsity Problem” CFs have poor accuracy and coverage in comparison to population averages at low rating density [GSK + 99]. “First Rater Problem” (cold start prob) The first person to rate an item receives no benefit. CF depends upon altruism. [AZ97]

15 Open Problems in CF “Scalability Problem” CF is computationally expensive. Fastest published algorithms (nearest-neighbor) are n 2. Any indexing method for speeding up? Has received relatively little attention.

16 The PageRank Algorithm Fundamental question to ask What is the importance level of a page P, Information Retrieval Cosine + TF IDF  does not give related hyperlinks Link based Important pages (nodes) have many other links point to it Important pages also point to other important pages

17 The Google Crawler Algorithm “Efficient Crawling Through URL Ordering”, Junghoo Cho, Hector Garcia-Molina, Lawrence Page, Stanford “Modern Information Retrieval”, BY-RN Pages 380—382 Lawrence Page, Sergey Brin. The Anatomy of a Search Engine. The Seventh International WWW Conference (WWW 98). Brisbane, Australia, April 14-18,

18 Page Rank Metric Web Page P T1T1 T2T2 TNTN Let 1-d be probability that user randomly jump to page P; “d” is the damping factor. (1- d) is the likelihood of arriving at P by random jumping Let N be the in degree of P Let C i be the number of out links (out degrees) from each T i C=2 d=0.9

19 How to compute page rank? For a given network of web pages, Initialize page rank for all pages (to one) Set parameter (d=0.90) Iterate through the network, L times

20 Example: iteration K=1 A B C IR(P)=1/3 for all nodes, d=0.9 nodeIR A1/3 B C

21 Example: k=2 A B C nodeIR A0.4 B0.1 C0.55 l is the in-degree of P Note: A, B, C’s IR values are Updated in order of A, then B, then C Use the new value of A when calculating B, etc.

22 Example: k=2 (normalize) A B C nodeIR A0.38 B0.095 C0.52

23 Crawler Control All crawlers maintain several queues of URL’s to pursue next Google initially maintains 500 queues Each queue corresponds to a web site pursuing Important considerations: Limited buffer space Limited time Avoid overloading target sites Avoid overloading network traffic

24 Crawler Control Thus, it is important to visit important pages first Let G be a lower bound threshold on IR(P) Crawl and Stop Select only pages with IR>G to crawl, Stop after crawled K pages

25 Test Result: 179,000 pages Percentage of Stanford Web crawled vs. P ST – the percentage of hot pages visited so far

26 Google Algorithm (very simplified) First, compute the page rank of each page on WWW Query independent Then, in response to a query q, return pages that contain q and have highest page ranks A problem/feature of Google: favors big commercial sites