Topics in AI: Applied Natural Language Processing Information Extraction and Recommender Systems for Video Games Supervised by Dr. Noriko Tomuro Fall –

Slides:



Advertisements
Similar presentations
A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Advertisements

Relevance Feedback Limitations –Must yield result within at most 3-4 iterations –Users will likely terminate the process sooner –User may get irritated.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Information Retrieval in Practice
Search Engines and Information Retrieval
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Topics in AI: Applied Natural Language Processing Information Extraction and Recommender Systems for Video Games: Gameplay Krishna Achuthan, Stephanie.
Videogame Project Progress Evaluated previous work Crawled Giantbomb game database Identified entities in review text Clustering adjectives.
Web Mining Research: A Survey
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
UCB BioText TREC 2003 Participation Participants: Marti Hearst Gaurav Bhalotia, Presley Nakov, Ariel Schwartz Track: Genomics, tasks 1 and 2.
Web queries classification Nguyen Viet Bang WING group meeting June 9 th 2006.
Topics in AI: Applied Natural Language Processing Information Extraction and Recommender Systems for Video Games ‘Gameplay’ Feature November 2, 2009.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Topics in AI: Applied Natural Language Processing Information Extraction and Recommender Systems for Video Games Nouns in Reviews.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Self-organizing Conceptual Map and Taxonomy of Adjectives Noriko Tomuro, DePaul University Kyoko Kanzaki, NICT Japan Hitoshi Isahara, NICT Japan April.
Overview of Search Engines
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Databases & Data Warehouses Chapter 3 Database Processing.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Information Retrieval – and projects we have done. Group Members: Aditya Tiwari ( ) Harshit Mittal ( ) Rohit Kumar Saraf ( ) Vinay.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Context-based Search in Topic Centered Digital Repositories Christo Dichev, Darina Dicheva Winston-Salem State University Winston-Salem, N.C. USA {dichevc,
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Mini-Project on Web Data Analysis DANIEL DEUTCH. Data Management “Data management is the development, execution and supervision of plans, policies, programs.
Open Information Extraction using Wikipedia
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
It is impossible to guarantee that all relevant pages are returned (even inspected) (Figure 1): Millions of pages available, many of them not indexed in.
Semiautomatic domain model building from text-data Petr Šaloun Petr Klimánek Zdenek Velart Petr Šaloun Petr Klimánek Zdenek Velart SMAP 2011, Vigo, Spain,
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
CSC 594 Topics in AI – Text Mining and Analytics
1 Generating Comparative Summaries of Contradictory Opinions in Text (CIKM09’)Hyun Duk Kim, ChengXiang Zhai 2010/05/24 Yu-wen,Hsu.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Using Game Reviews to Recommend Games Michael Meidl, Steven Lytinen DePaul University School of Computing, Chicago IL Kevin Raison Chatsubo Labs, Seattle.
Unsupervised Relation Detection using Automatic Alignment of Query Patterns extracted from Knowledge Graphs and Query Click Logs Panupong PasupatDilek.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Information Retrieval in Practice
Visual Information Retrieval
Search Engine Architecture
Multimedia Information Retrieval
CSE 635 Multimedia Information Retrieval
Presentation transcript:

Topics in AI: Applied Natural Language Processing Information Extraction and Recommender Systems for Video Games Supervised by Dr. Noriko Tomuro Fall – 2009/2010

Project Goals A. Videogame Search/Retrieval System Allows users to search videogames by multiple criteria: Basic information of a game (e.g. developer, publisher, genre, platform), plus theme/concept -- but this is nothing new “Qualitative features” of a game, based on the content (e.g. gameplay, visual style, sound & music) => uniqueness of our system Incorporate rating scores B.Videogame Recommendation System Recommends games which are “similar” to a given game. Similarity measured (and ranked) by multiple criteria (above). Personalize search retrieval results to individual users.

Steps & Tasks 1. Construct a “Videogame Lexicon” Lists of titles, characters, locations, concepts/themes <= Extract from GiantBomb and Gamespot 2.Associate those features with each game Traverse links in GiantBomb => Create a relational database At this point, we can have a preliminary system which allows users to search games by basic information and theme/concept features. Start designing the interface of the system.

Relational DB structure

Steps & Tasks (cont.) 3.For qualitative features, use NLP techniques to obtain info from Gamespot review texts. We start with the feature ‘gameplay’. Tackle other features if time permits. 1.As pre-processing, annotate all Gamespot review texts with the words in our videogame lexicon (i.e., named entities of various types, such as title, characters) => Re-generate the data in which named entities are indicated (and multi-word named entities are concatenated), e.g. “Mario_Bros./TTL”

Steps & Tasks (cont.) Example: Short for Armed Assault (which would have made an infinitely better title), it's much easier to think of ArmA as the spiritual sequel to 2001's critically acclaimed Cold War Crisis, an innovative military themed game that's as much simulation as it is shooter. That's because ArmA is the product of Bohemia Interactive, the European developer responsible for Operation Flashpoint.…. Legend: Title Title abbreviation Developer or Publisher Concept/Theme Genre

Steps & Tasks (cont.) 2.Extract sentences in Gamespot reviews which express gameplay Generate a set of adjectives which modify “gameplay” (independently, from Google n-gram data), and cluster them. Each cluster will represent a semantic category of the member adjectives, e.g. speed, difficulty addictive gameplay good gameplay unique gameplay excellent gameplay

Steps & Tasks (cont.) 3.Manually inspect Gamespot reviews and identify other words/phrases/patterns (besides the words “gameplay” or “play”). Automatically extract all sentences which have those adjectives and/or match the patterns => Manually filter incorrect sentences. 4. For each game, assign the adjectival semantic categories/clusters and/or the gameplay expression patterns => Those are the values of the ‘gameplay’ feature of the game. Store them in the database for the game. 5.We may also want to do similar clustering for concepts/themes (which we extracted from GiantBomb).

Clustering ‘Gameplay’ Adjectives addictive good unique actual excellent Noun 1Noun 2Noun 3Noun 4Noun 5 Mutual Information:

‘Gameplay’ Feature Addictive Obsessive Hooking Enslaving Habit-forming Involving Originative Original Ingenious New Leading-edge Innovative Game Review Match

Other Tasks a)[Step 2] Cluster adjectives to derive the adjectival semantic categories Use the Google n-gram data; extract adjectives from the bi-grams “XX gameplay”. Other patterns from 3-grams and 4-grams. Try clustering into 5-10 categories/clusters. Also, soft clustering. b)Apply a named entity tagger (e.g. Stanford NER) to all Gamespot reviews In order to pick up more named entities But we need to train the tool… c)Apply partial parsing to the extracted sentences (which express gameplay) In order to accurately identify the relevant adjective(s) in a sentence

Other Tasks (cont.) d)Incorporate rating scores e)Derive weights on the features Which features are more important than others? Feature weighting will help the ranking of the matched results (in search/retrieval) and the recommendations.

Game Recommendation Process Recommendation List Game Clusters

Clustering Games using User Ratings Game 1531 Game 2531 Game Game 4143 Game 5543 User 1User 2User 3User 4User 5

Measuring Game Similarity Game 1 Game 2 Developer ID Genre ID ESRB ID Character ID Location ID People ID Clustering by Match

Idea for Deriving Feature Weights Clustering by specific quality feature match Clustering by User Ratings % of Overlap Feature Importance

Recommendation Generation Using Google Page-Rank Algorithm