Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Multimedia Database Systems
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Learning for Text Categorization
IR Models: Overview, Boolean, and Vector
Information Retrieval Review
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ISP 433/533 Week 2 IR Models.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Personalised Search on the World Wide Web Originally by Micarelli, Gasparetti, Sciarrone & Gauch
Web Mining Research: A Survey
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Vector Methods 1.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
1 Web Information Retrieval Web Science Course. 2.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Web Data Mining and Applications Part I
Chapter 5: Information Retrieval and Web Search
Documents as vectors Each doc j can be viewed as a vector of tf.idf values, one component for each term So we have a vector space terms are axes docs live.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
1 Searching through the Internet Dr. Eslam Al Maghayreh Computer Science Department Yarmouk University.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Similar Document Search and Recommendation Vidhya Govindaraju, Krishnan Ramanathan HP Labs, Bangalore, India JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 6: Information Retrieval and Web Search
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Web- and Multimedia-based Information Systems Lecture 2.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Vector Space Models.
1 CS 391L: Machine Learning Text Categorization Raymond J. Mooney University of Texas at Austin.
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni
Cold Start Problem in Movie Recommendation JIANG CAIGAO, WANG WEIYAN Group 20.
Information Retrieval
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
Post-Ranking query suggestion by diversifying search Chao Wang.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Dr. Susan Gauch University of Arkansas CSCE When is a rock not a rock? Exploiting Ontologies for Personalized Search and Recommendations February 19, 2016.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Information Retrieval and Web Search
Information Retrieval and Web Search
Text & Web Mining 9/22/2018.
Representation of documents and queries
Data Mining Chapter 6 Search Engines
Chapter 5: Information Retrieval and Web Search
Information Retrieval and Web Design
Information Retrieval and Web Design
Presentation transcript:

Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet

Outline Background Motivation Collecting User Information Building Conceptual Profiles Using User Profiles in Search –Misearch Using User Profiles in Recommender Systems –MyCiteSeer x Issues with User Profiles

Background Information retrieval (IR) studies the indexing and retrieval of textual documents Searching for pages on the World Wide Web is the most recent “killer app” Concerned with retrieving relevant documents to a query Concerned with retrieving from large sets of documents efficiently

Web Search System Query String IR System Ranked Documents 1. Page1 2. Page2 3. Page3. Document corpus Web Spider

The Vector-Space Model Assume t distinct terms remain after preprocessing; call them index terms or the vocabulary. These “orthogonal” terms form a vector space. Dimension = t = |vocabulary| Each term, i, in a document or query, j, is given a real-valued weight, w ij. Both documents and queries are expressed as t-dimensional vectors: d j = (w 1j, w 2j, …, w tj )

Graphic Representation T3T3 T1T1 T2T2 Q = 0T 1 + 0T 2 + 2T 3 Is D 1 or D 2 more similar to Q? How to measure the degree of similarity? Distance? Angle? D 2 = 3T 1 + 7T 2 + T D 1 = 2T 1 +3T 2 + 5T

Cosine Similarity Measure Cosine similarity measures the cosine of the angle between two vectors. Inner product normalized by the vector lengths.  t3t3 t1t1 t2t2 D1D1 D2D2 Q  D 1 is 6 times better match than D 2 using cosine similarity CosSim(d j, q) =

Motivation Search engines contain very large collections –Google reports over 1 trillion web pages Receive very short queries –68% are 3 words long or less Users examine few results –rarely go beyond first page –rarely examine more than 1 result –Exacerbated by small mobile screens

Ambiguity How return precise results with ambiguous queries? Return results based on simple key- word matches No consideration of differing meanings If the query is “salsa”, is it……

Dealing with Ambiguity Expand user queries using a thesaurus –“An Expert System for Searching in Full-Text,” Susan Gauch, 1990 –Basically, make query vectors longer so more likely to match documents Represent documents and queries using high- level concepts instead of keywords –“Conceptual Search with KeyConcept,” Susan Gauch, 2010 –Basically, make reduce dimensions in vectors to provide conceptual match

Ontologies A structured set of concepts Where do ontologies come from?

Semantic Web Manually build ontologies Experts manually tag data items Very “intelligent” but not scalable

IR Community Use implicit ontologies –Wikipedia –Open Directory Project Develop automated techniques to tag items Not as “intelligent” but much more scalable

Need for Personalization All users get identical results for identical queries No distinction between veterinarian and child for query “beagle puppy” Need for personalized results based on background and current context How pick best 10 (or 1!) result for _you_?

How to Personalize Build a user profile that represents user interests –Collect information –Construct user profile –Use user profile for personalized interactions

Collecting User Information Explicit user information –Users fill in site-specific surveys –Users too lazy busy –Data may be deliberately accidentally inaccurate –Information becomes out of date

Implicit user information –Software collects information about user activity as they perform regular activities –Information is indirect noisy –Various approaches used by well- known applications

Implicit Sources Browsing histories –User connects to Internet via a proxy –User periodically shares history –Pros: captures browsing activity at multiple sites –Cons: captures history from only one computer

My Browsing History

Used to Autofill urls

Implicit Sources Desktop toolbar –User must install desktop toolbar –Communication between toolbar and site –Pros: interactions tracked across multiple sites access to desktop windows, file system –Cons: user must install software fine line between toolbar and spyware

Google’s Toolbar

Used to Personalize Search

Implicit Sources –User Account user activity is tracked via cookies/session variables best if user signs in to retain same profile across multiple machines –Pros: users tracked across all interactions –Cons: only works at one site users must create an account

Amazon’s Login

Used for Recommendations

Our Approach –Personalization based on implicit data –Represent profile using weighted conceptual taxonomy –Use profile for personalization in many different ways OBIWAN – Web browsing Misearch – Web search MyCiteSeer x – recommender system

Building a Conceptual Profile Need an ontology for the domain Need a collection of text that represents the user’s interests Need classification technique –train classifier with training data –classify user texts w.r.t ontology/taxonomy/concept hierarchy/thesaurus/knowledge base –accumulate weights

Building the User Profile

User Profile Representation Entertainment 0.01 Homemaking 0.04 Cooking 0.49 Lessons 0.3 Videos 0.1 Root

MiSearch User search histories –information available to search engine itself –collect the user’s queries, clicked on search results –no software installed Users create accounts –login –just track userid in a cookie during the session –Similar to Amazon, Ebay, etc.

Personalizing Search Results Submit query to Internet search engine (e.g., Google) Categorize each result into same concept hierarchy to create result profiles –top 3 levels of ODP, ~3,000 categories Calculate similarity between result profile and user profile

Ambiguous: “canon book”

User Profile (Classics)

User Profile (Photography)

MyCiteSeer x Categorize contents of CiteSeer x with respect to ACM CCS topic hierarchy Users create an account Capture their queries and clicked-on documents Build a conceptual profile Compare user concepts to document concepts to create recommendations

User interested in IR

Their recommendations

User interested in multimedia

Their recommendations

Recent Work Bridge gap between Semantic Web and Information Retrieval –Semi-automatically build domain- specific ontologies Do text mining from domain-specific literature collection

Conclusions Information on which to base user profiles can be collected via interactions with a specific site Conceptual profiles can be used to improve search (misearch) Conceptual profiles can be used to provide conceptual recommendations for the CiteSeer x collection Creates issues for profile sharing and user privacy Leads to work on how to reuse/expand/build ontologies for narrow domains