LOGO Recommendation Algorithms Lecturer: Dr. Bo Yuan

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Content-based Recommendation Systems
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Web Search – Summer Term 2006 VI. Web Search - Ranking (c) Wolfgang Hürst, Albert-Ludwigs-University.
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Item-based Collaborative Filtering Idea: a user is likely to have the same opinion for similar items [if I like Canon cameras, I might also like Canon.
Using a Trust Network To Improve Top-N Recommendation
1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.
CSM06 Information Retrieval Lecture 3: Text IR part 2 Dr Andrew Salway
Top-N Recommendation Algorithm Based on Item-Graph
Ch 4: Information Retrieval and Text Mining
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Information Retrieval
Chapter 5: Information Retrieval and Web Search
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
CS 277: Data Mining Recommender Systems
Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.
Item-based Collaborative Filtering Recommendation Algorithms
Web search basics (Recap) The Web Web crawler Indexer Search User Indexes Query Engine 1 Ad indexes.
1 Announcements Research Paper due today Research Talks –Nov. 29 (Monday) Kayatana and Lance –Dec. 1 (Wednesday) Mark and Jeremy –Dec. 3 (Friday) Joe and.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Item Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karpis, Joseph KonStan, John Riedl (UMN) p.s.: slides adapted from:
Collaborative Filtering Recommendation Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Lecture 12 IR in Google Age. Traditional IR Traditional IR examples – Searching a university library – Finding an article in a journal archive – Searching.
Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif
11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.
INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.
CSM06 Information Retrieval Lecture 4: Web IR part 1 Dr Andrew Salway
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Toward the Next generation of Recommender systems
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Collaborative Filtering Presented by; Ghulam Mujtaba MS CS, IBA, Karachi.
Chapter 6: Information Retrieval and Web Search
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
Collaborative Filtering  Introduction  Search or Content based Method  User-Based Collaborative Filtering  Item-to-Item Collaborative Filtering  Using.
Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
The Effect of Dimensionality Reduction in Recommendation Systems
Collaborative Data Analysis and Multi-Agent Systems Robert W. Thomas CSCE APR 2013.
Temporal Diversity in Recommender Systems Neal Lathia, Stephen Hailes, Licia Capra, and Xavier Amatriain SIGIR 2010 April 6, 2011 Hyunwoo Kim.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Techniques for Collaboration in Text Filtering 1 Ian Soboroff Department of Computer Science and Electrical Engineering University of Maryland, Baltimore.
Recommender Systems. Recommender Systems (RSs) n RSs are software tools providing suggestions for items to be of use to users, such as what items to buy,
Collaborative Filtering Zaffar Ahmed
1 CS 430: Information Discovery Lecture 11 Latent Semantic Indexing.
1 Data Mining: Text Mining. 2 Information Retrieval Techniques Index Terms (Attribute) Selection: Stop list Word stem Index terms weighting methods Terms.
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Overview on Web Mining and Recommendation June 13, CENG 770.
Analysis of massive data sets Prof. dr. sc. Siniša Srbljić Doc. dr. sc. Dejan Škvorc Doc. dr. sc. Ante Đerek Faculty of Electrical Engineering and Computing.
Item-Based Collaborative Filtering Recommendation Algorithms
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
3: Search & retrieval: Structures. The dog stopped attacking the cat, that lived in U.S.A. collection corpus database web d1…..d n docs processed term-doc.
Automated Information Retrieval
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Text & Web Mining 9/22/2018.
Information retrieval and PageRank
Chapter 5: Information Retrieval and Web Search
Recommendation Systems
Latent Semantic Analysis
Presentation transcript:

LOGO Recommendation Algorithms Lecturer: Dr. Bo Yuan

Overview  Tf-idf  Vector Space Model  Latent Semantic Analysis  PageRank  Collaborative Filtering 2 more relevant less relevant

Information Overload 4

Recommendation Systems  A system that predicts a user’s rating or preference to an item.  Help people discover interesting or informative stuff that they wouldn't have thought to search for.  One of the most influential applications of data mining.  Content-Based Filtering  Focuses on the characteristics of items.  Recommends items similar to those that a user liked in the past.  Collaborative Filtering  Predicts what users will like based on their similarity to other users.  Similar to asking the opinions of friends.  Does not rely on machine analysable contents. 5

Junk Advertisement 6 Your Trash Can Be Someone's Treasure!

Targeted Advertisement 7 Ads Engine Knowledge Base Who are you? What are you browsing? Where are you? Previous Record

Mobile Advertisement Platform 8

Music Recommendation 9 Keywords Preference Popularity Keywords Preference Popularity Feedback (rating, like vs. dislike) Ranking

Tf-idf  Given a collection of documents and a query word, how relevant is a document to the word?  Some words appear more frequently than others.  Term Frequency (TF)  Raw frequency  tf (t, d) =  Inverse Document Frequency (IDF)  idf (t, D) =  Tf-idf  tf-idf (t, d, D) = tf(t, d)×idf(t, D) 10

Tf-idf  Multiple query words 11 Doc 1Doc 2Doc 3Doc 4 the best0102 car3500 Term-Document Matrix

Vector Space Model  An algebraic model for representing text documents as vectors.  Cosine Similarity 12 tf-idf weighting

Vector Space Model  Synonymy  Different words, same meaning  Car, Vehicle, Automobile  Small cosine values  unrelated  Poor recall  Polysemy  One word, different meanings  Apple Computer vs. Apple Juice  Large cosine values  related  Poor precision  Let’s work in a more informative space.  Merge dimensions with similar meanings.  Singular Value Decomposition 13

Latent Semantic Analysis 14

Latent Semantic Analysis 15

Original Matrix 16

Decomposition 17

Decomposition 18

Rank K Approximation 19 K=2

Items in 2D Space 20

Documents in 2D Space 21

Document Cosine Similarity 22 Original Transformed

Query 23 Cosine Similarity to Current Documents CM

Linked Documents 24

PageRank  Given a set of hyperlinked documents, how to evaluate the relative importance of each document?  A hyperlink to a page counts as a vote of support.  The importance of vote from a page depends on its own PageRank and the number of outbound links.  The PageRank of page is determined by the number and PageRank metric of all pages that link to it.  The outbound links of a page do not affect its PageRank value.  Difficult to manipulate inbound links.  A key factor determining a page’s ranking in the search results of Google. 25

PageRank 26 AB CD d: damping factor (0.85)

PageRank 27

Monetary Success  Stanford University received 1.8 million shares for allowing Google Inc. to use this technique.  Sergey Brin: US$ 24 billion (2013)  Larry Page: US$ 24 billion (2013)  Made totally US$ 336 million in return by  Two years after Google’s IPO  Around US$ 187 per share  How about if the shares are sold today?  Current Endowment: US$ 21.4 billion  One of the largest single academic licensing transactions  Cloning Technology: US$ 225 million in royalties 28

Collaborative Filtering  Core Idea:  People get the best recommendation from others with similar tastes.  Workflow:  Creates a rating or purchase matrix.  Finds similar people by matching their ratings.  Recommends items that similar people rate highly.  Memory-Based CF  User-Based vs. Item-Based  Model-Based CF  Things to know:  Gray Sheep  Shilling Attack  Cold Start 29

User-Based CF 30

User-Based CF 31

Item-Based CF 32 U: Users that have rated both i and j. I: All items that have been rated by User a.

Item-Based CF 33 U: Users that have rated both i and j. I: Items that the user has rated and have dev values. U: Users that have rated i.

Item-Based CF 34 CustomerItem 1Item 2Item 3 John532 Mark34Didn't rate it LucyDidn't rate it25 Slope One

Model-Based CF 35 Class Label Training Samples Attributes

Model-Based CF 36

Netflix Prize  A public company providing DVD-rental service  Target:  To predict whether someone will enjoy a movie based on how much they liked or disliked other movies.  To improve the score of its own Cinematch by 10%  RMSE (Root Mean Squared Error)  Training Set:   480,189 users, 17,770 movies,100,480,507 ratings 37

KDD Cup 38

42

Reality Mining 43

Reality Mining 44

Reading Materials  P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, “Grouplens: an Open Architecture for Collaborative Filtering of Netnews”, in Proceedings of the ACM Conference on Computer Supported Cooperative Work, pp. 175–186,  D. Billsus and M. Pazzani, “Learning Collaborative Information Filters”, in Proceedings of the 15th International Conference on Machine Learning, pp ,  B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-Based Collaborative Filtering Recommendation Algorithms”, in Proceedings of the 10th international Conference on World Wide Web,  X. Su and T. Khoshgoftaar, “A Survey of Collaborative Filtering Techniques”, Advances in Artificial Intelligence,  L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank Citation Ranking: Bringing Order to the Web”, Technical Report, Stanford InfoLab,  S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman, “Indexing by Latent Semantic Analysis”, JASIS, vol. 41(6), pp ,  E. Nathan and A. Pentland, “Reality Mining: Sensing Complex Social Systems”, Personal and Ubiquitous Computing, vol. 10(4), pp ,

Review  Why do we need recommendation algorithms?  What does tf-idf stand for?  What is the definition of cosine similarity?  What are the practical issues of the vector space model?  What is the main procedure of Latent Semantic Analysis?  How is PageRank calculated?  What are the two groups of recommendation algorithms?  What is the core idea behind collaborative filtering?  What are the limitations of collaborative filtering? 47