Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Text Categorization.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Chapter 5: Introduction to Information Retrieval
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Personalised Search on the World Wide Web Originally by Micarelli, Gasparetti, Sciarrone & Gauch
Information Retrieval Ch Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Information Retrieval
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Web Data Mining and Applications Part I
Chapter 5: Information Retrieval and Web Search
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Query Expansion.
Text mining.
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Xiaoying Sharon Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Chapter 6: Information Retrieval and Web Search
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
Personalized Course Navigation Based on Grey Relational Analysis Han-Ming Lee, Chi-Chun Huang, Tzu- Ting Kao (Dept. of Computer Science and Information.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Citation-Based Retrieval for Scholarly Publications 指導教授:郭建明 學生:蘇文正 M
Artificial Intelligence Techniques Internet Applications 4.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
Data Mining and Text Mining. The Standard Data Mining process.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Xiaoying Sharon Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Information Organization: Overview
User Characterization in Search Personalization
Lecture 12: Relevance Feedback & Query Expansion - II
Author: Kazunari Sugiyama, etc. (WWW2004)
Text Categorization Assigning documents to a fixed set of categories
CSE 635 Multimedia Information Retrieval
Information Retrieval
Web Mining Research: A Survey
Information Organization: Overview
Presentation transcript:

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423

Next lecture Paper: Personalized Mobile search engine Some concepts Ontology Cyc, WordNet, ODP(Open directory project) Entropy disorder, uncertainty the smaller, the better Ranking SVM (Support Vector Machines) Supervised machine learning K-means clustering

Review on information retrieval Text representation Find all the unique words/terms in all documents, each term is a feature Each document is represented as a vector Weight for each term Term frequency: for one particular document Inversed document frequency : the same for all documents Query is also a vector Comparison with Query The similarity between query vector and each document vector Cosine similarity or other distance measure (Euclidean distance) Ranking The documents with higher similarity are more relevant to query PageRank

Evaluation Precision and Recall For a query, If a system finds 200 results, among them 50 are relevant. The human labels ( model solutions) have 120 relevant documents. Precision and recall curve AUC: Area under curve Top N precision ARR: the average rank of the documents rated as “relevant”

Personalised search Personalised information retrieval A related area is called Adaptive Hypermedia

Information gathering Information gathering approach Implicit, Explicit, Both Type of information Queries, clicked documents, snippets of documents Cashed web pages, dwell time on page, desktop documents , calendar items Tags and bookmarks on online social applications User supplied information User’s categorical interests Source of information Server side, Client –side, user intervention

Information representation User model Short-term interests, long-term interests Static, dynamic, periodic Terms or conceptual terms (use wordnet, ontology) Vector-based Models where user’s interests are maintained in a vector of weighted keywords (concepts). Semantic network based Models where user’s interests are maintained in a network structure of terms and related terms (concepts and related concepts)

Two directions Query adaptation or result adaptation Query expansion/adaptation implicit individualised User model Aggregate Usage information (search logs) Not user-focused Pseudo-relevance feedback Global analysis explicit individual, relevance feedback, interactive query expansion

Search results filtering/adaptation Different applications: individual, aggregate, web search or recommendation, databases search Typically use supervised machine learning Relevant, not relevant: binary classification Training data: Labeled data Assume clicked docs are relevant Machine learning methods KNN: K nearest neighbour Naïve Bayes SVM: Support vector machines …

Evaluation In lab setting users Quantitative & Qualitative System performance User evaluation, system usability Data sets open web corpora, in-lab generated logs, TREC collection, search engine query logs subset of annotated documents from specific sites