Using Game Reviews to Recommend Games Michael Meidl, Steven Lytinen DePaul University School of Computing, Chicago IL Kevin Raison Chatsubo Labs, Seattle.

Slides:



Advertisements
Similar presentations
Prediction Modeling for Personalization & Recommender Systems Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Advertisements

A Vector Space Model for Automatic Indexing
Chapter 5: Introduction to Information Retrieval
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
2 Information Retrieval System IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
Dimensionality Reduction PCA -- SVD
Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.
Ranking models in IR Key idea: We wish to return in order the documents most likely to be useful to the searcher To do this, we want to know which documents.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Learning for Text Categorization
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
K nearest neighbor and Rocchio algorithm
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
Ch 4: Information Retrieval and Text Mining
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
Recommender systems Ram Akella November 26 th 2008.
Chapter 5: Information Retrieval and Web Search
Utilising software to enhance your research Eamonn Hynes 5 th November, 2012.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
COMPUTER-ASSISTED PLAGIARISM DETECTION PRESENTER: CSCI 6530 STUDENT.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
Chapter 6: Information Retrieval and Web Search
1 Computing Relevance, Similarity: The Vector Space Model.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
CPSC 404 Laks V.S. Lakshmanan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides at UC-Berkeley.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Vector Space Models.
Introduction to String Kernels Blaz Fortuna JSI, Slovenija.
Techniques for Collaboration in Text Filtering 1 Ian Soboroff Department of Computer Science and Electrical Engineering University of Maryland, Baltimore.
Cold Start Problem in Movie Recommendation JIANG CAIGAO, WANG WEIYAN Group 20.
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Information-Theoretic Co- Clustering Inderjit S. Dhillon et al. University of Texas, Austin presented by Xuanhui Wang.
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Term weighting and Vector space retrieval
IR Homework #2 By J. H. Wang Apr. 13, Programming Exercise #2: Query Processing and Searching Goal: to search for relevant documents Input: a query.
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
IR 6 Scoring, term weighting and the vector space model.
Plan for Today’s Lecture(s)
Semantic Processing with Context Analysis
Text Based Information Retrieval
Vector-Space (Distributional) Lexical Semantics
Representation of documents and queries
From frequency to meaning: vector space models of semantics
INF 141: Information Retrieval
Latent Semantic Analysis
Presentation transcript:

Using Game Reviews to Recommend Games Michael Meidl, Steven Lytinen DePaul University School of Computing, Chicago IL Kevin Raison Chatsubo Labs, Seattle WA

Recommender Systems are Everywhere

Our Task Provide a game player with recommendations of games s/he has not played (and will like) Recommendations are based on two sources of information: Corpus of game reviews (free-form text) Knowledge about which games a user already likes (user’s numerical rankings)

Review of Assassin’s Creed Unity “its complex Abstergo storyline has long since jumped the shark… the story is much darker in tone than anything else in the series…hard to get bored… the attention to detail … is nothing short of astonishing” 6 out of 10 -Mark Walton What else will Mark like?

Recommender System Techniques 1.Collaborative-based – System compares you to other users, and recommends what they’ve liked or bought – May know nothing else about the products or other items that they recommend Amazon, Barnes&Noble, CDW, …

?? Collaborative Example (Candeliier et al., 2007)

Recommender System Techniques 2.Content-based – System uses information about the items it recommends (e.g., recommend books by the same author, or same genre) – Might not use information about other customers/users

Content-based Example Tom Hanks Daisy Ridley DramaSciFiComedyDid I like it? Movie 1xxx Movie 2xx Movie 3xxxxx Movie 4XX ???

Recommender System Techniques 3.Hybrid: some combination of collaborative- based and content-based

Our game recommender 1.Content-based: A game “representation” is based on the (free-form text) reviews written by a community of users 2.User profile is based on a small sample of items liked by the user

Corpus Reviews from 400,000 reviews of 8279 different games Mixture of professional reviews and user reviews

Representing games Representation of each game is constructed from a corpus of free- form text reviews of games Games represented as vectors Vector features are based on co- occurrence of word pairs: adjectives and “context words”

Vector space model Originated in information retrieval Task: judge “similarity” of documents (e.g., game reviews) Document representation: bag of words

Vector space model 1.Build a vocabulary – terms which are “important” in the collection of documents 2.Build the document representations – What terms from the vocabulary appear in the document, and how frequently relative to other documents? 3.Starting with a document, what others are similar?

Vocabulary: [story, plot, animation, interest, bore, astonish, series, complex,….] “its complex Abstergo storyline has long since jumped the shark… the complex story is much darker in tone … hard to get bored… the attention to detail … is nothing short of astonishing” Vector: [1, 0, 0, 0, 1, 1, 1, 2, …]

Vector space, cont. Vector values are typically “normalized” to account for a document’s length, the frequency of each term across documents,… Documents are similar if their vectors are similar [1, 0, 0, 0, 1, 1, 1, 2] [1, 1, 0, 1, 2, 1, 2, 2] similar [0, 1, 2, 1, 0, 3, 0, 0] dissimilar

Feature space 700 adjectives were chosen as most relevant to the description of games (Zagal and Tomuro 2010) Bootstrapping approach, began with adjectives modifying “gameplay” “context words”: words that appear in a window of +- 2 words from an adjective Over 3,500,000 adjective-context word pairs Unworkable feature space size

Feature space 700 adjectives were chosen as most relevant to the description of games (Zagal and Tomuro 2010) Bootstrapping approach, began with adjectives modifying “gameplay” “context words”: words that appear in a window of +- 2 words from an adjective Over 3,500,000 adjective-context word pairs Unworkable feature space size

Feature space 700 adjectives were chosen as most relevant to the description of games (Zagal and Tomuro 2010) Bootstrapping approach, began with adjectives modifying “gameplay” “context words”: words that appear in a window of +- 2 words from an adjective Over 3,500,000 adjective-context word pairs Unworkable feature space size

Reduction of Feature Space Using Co-clustering Simultaneously cluster two sets of related items while minimizing loss of mutual information (Dhillon, Mellela and Mohdha 2003) In our case, a set of adjectives X and a set of “context words” Y Input: X,Y Output: X’ = {X 1 X 2, …, X m }, a partition of X Y’ = {Y 1, Y 2, … Y n } a partition of Y

Reduction of Feature Space Using Co-clustering Simultaneously cluster two sets of related items while minimizing loss of mutual information (Dhillon, Mellela and Mohdha 2003) In our case, a set of adjectives X and a set of “context words” Y Input: X,Y Output: X’ = {X 1 X 2, …, X m }, a partition of X Y’ = {Y 1, Y 2, … Y n } a partition of Y

Reduction of Feature Space Using Co-clustering Simultaneously cluster two sets of related items while minimizing loss of mutual information (Dhillon, Mellela and Mohdha 2003) In our case, a set of adjectives X and a set of “context words” Y Input: X,Y Output: X’ = {X 1 X 2, …, X m }, a partition of X Y’ = {Y 1, Y 2, … Y n } a partition of Y

Representation of Games Collection of reviews for a game were treated as one “document” Games represented as vectors Vector feature = pair of (adjective cluster) and (context word cluster) Frequency of co-occurrence of clusters were counted, and weighted in various ways

Recommending games G = games already liked by a user G’ = all games user has already played (including disliked ones) S = “seeds” – a small subset of G N = games that user does not know R = games that our system recommends

Recommending games R = the k games in N with minimum distance from any of the members of S |R| = k

Evaluation “Live” testing was not available to us Instead, offline testing: Recommend k games (|R| = k) in G’ – S Find overlap between R and G

Evaluation We conducted a n-fold cross- validation of our system’s performance Number of folds n = |G’| / |S| Partition G’ into G’/|S| folds Measure performance n times for each S

Evaluation We conducted a n-fold cross- validation of our system’s performance Number of folds n = |G’| / |S| Partition G’ into G’/|S| folds Measure performance n times for each S

Evaluation We measured performance in terms of precision precision = |R ∩ (G-S)| / |R| Precision tends to be highest for small k and decrease as k increases

Evaluation We measured performance in terms of precision precision = |R ∩ (G-S)| / |R| Precision tends to be highest for small k and decrease as k increases

Evaluation We also varied: Weighting techniques for features Dimensionality of co- clustering

Feature Weighting Most common: tf-idf Document frequency = # of documents in which a cluster pair appears Term frequency (cluster pairs) is multiplied by the inverse of the document frequency

Other Feature Weighting tf: “raw” co-occurrence counts tf-normc: normalize frequency across documents (“column- wise” normalization) boolean: feature value is 1 if cluster pair appears, 0 if not

Results: Feature Weighting

Results: Co-cluster dimensions

Results: Co-clustering vs. “Bag of words”

Conclusions Representation of games using approach based on adjective – context word pairs produces high quality recommendations Precision of first recommendation is 85-90%

Conclusions Precision is approximately 80% even for 10 recommendations Co-clustering technique dramatically reduces feature space while maintining high precision Dimensionality reduced from 3,500,000 to 1,000 in 10 x 100 co- clustering