“You May Also Like” Results in Relational Databases Kostas Stefanidis Department of Computer Science University of Ioannina, Greece Joint work with Marina.

Slides:



Advertisements
Similar presentations
A Domain Level Personalization Technique A. Campi, M. Mazuran, S. Ronchi.
Advertisements

AVATAR: Advanced Telematic Search of Audivisual Contents by Semantic Reasoning Yolanda Blanco Fernández Department of Telematic Engineering University.
Preference-Aware Publish/Subscribe Delivery with Diversity Marina Drosou Department of Computer Science University of Ioannina, Greece Joint work with.
On Novelty in Publish/Subscribe Delivery Kostas Stefanidis * Dept. of Computer Science and Engineering, Chinese University of Hong Kong, Hong Kong *Work.
Preferential Publish/Subscribe Marina Drosou Department of Computer Science University of Ioannina, Greece Joint work with Evaggelia Pitoura and Kostas.
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
Chapter 12 Information Systems Chapter Goals Define the role of general information systems Explain how spreadsheets are organized Create spreadsheets.
Tagging Systems Mustafa Kilavuz. Tags A tag is a keyword added to an internet resource (web page, image, video) by users without relying on a controlled.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3 The Basic (Flat) Relational Model.
Perk: Personalized Keyword Search in Relational Databases through Preferences Marina Drosou Computer Science Department University of Ioannina, Greece.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
Locality Optimizations in OceanStore Patrick R. Eaton Dennis Geels An introduction to introspective techniques for exploiting locality in wide area storage.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
ReDrive: Result-Driven Database Exploration through Recommendations Marina Drosou, Evaggelia Pitoura Computer Science Department University of Ioannina.
Recommender systems Ram Akella November 26 th 2008.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Marina Drosou Department of Computer Science University of Ioannina, Greece Thesis Advisor: Evaggelia Pitoura
Mining Association Rules of Simple Conjunctive Queries Bart Goethals Wim Le Page Heikki Mannila SIAM /8/261.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Golder and Huberman, 2006 Journal of Information Science Usage Patterns of Collaborative Tagging System.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Survey Of Music Information Needs, Uses, and Seeking Behaviors Jin Ha Lee J. Stephen Downie Graduate School of Library and Information Science University.
Attribute Extraction and Scoring: A Probabilistic Approach Taesung Lee, Zhongyuan Wang, Haixun Wang, Seung-won Hwang Microsoft Research Asia Speaker: Bo.
Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,
Information Systems: Databases Define the role of general information systems Describe the elements of a database management system (DBMS) Describe the.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Taxonomies of Visualization Techniques CMPT 455/826 - Week 12, Day 2 w12d2 Sept-Dec
Recommendation system MOPSI project KAROL WAGA
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
Presented By :Ayesha Khan. Content Introduction Everyday Examples of Collaborative Filtering Traditional Collaborative Filtering Socially Collaborative.
1 Business System Analysis & Decision Making – Data Mining and Web Mining Zhangxi Lin ISQS 5340 Summer II 2006.
Chapter 6: Information Retrieval and Web Search
Summarization of XML Documents K Sarath Kumar. Outline I.Motivation II.System for XML Summarization III.Ranking Model and Summary Generation IV.Example.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Understanding Groups & Teams Ch 15. Understanding Groups Group Two or more interacting and interdependent individuals who come together to achieve particular.
SINGULAR VALUE DECOMPOSITION (SVD)
Personalized Course Navigation Based on Grey Relational Analysis Han-Ming Lee, Chi-Chun Huang, Tzu- Ting Kao (Dept. of Computer Science and Information.
Marina Drosou, Evaggelia Pitoura Computer Science Department
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
Date: 2012/07/02 Source: Marina Drosou, Evaggelia Pitoura (CIKM’11) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Website: Answering Continuous Queries Using Views Over Data Streams Alasdair J G Gray Werner.
Week 2 The lecture for this week is designed to provide students with a general overview of 1) quantitative/qualitative research strategies and 2) 21st.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Information Design Trends Unit Five: Delivery Channels Lecture 2: Portals and Personalization Part 2.
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Context-driven Access to Personalized Digital Multimedia Libraries Invited Talk at the 1st International Conference on Digital Libraries New Dehli, India.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Chapter 12 Information Systems.
SIS: A system for Personal Information Retrieval and Re-Use
Text & Web Mining 9/22/2018.
Week 11: Database Management System
Web Mining Department of Computer Science and Engg.
Collaborative Filtering Non-negative Matrix Factorization
Recommendation Systems
Data exploration and visualization
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

“You May Also Like” Results in Relational Databases Kostas Stefanidis Department of Computer Science University of Ioannina, Greece Joint work with Marina Drosou and Evaggelia Pitoura

Introduction Typically, users interact with a database system by formulating queries This interaction mode assumes that users: ▫are familiar with the content of the database ▫have a clear understanding of their information needs PersDB LyonDMOD Laboratory, University of Ioannina Since databases get larger and accessible to a more diverse and less technically-oriented audience, a new recommendation-oriented form of interaction seems attractive and useful

Introduction Motivated by the way recommenders work, we consider recommending to users tuples not in the results of their queries but of potential interest Examples: ▫When asking for a Woody Allen movie, we could recommend a Woody Allen biography ▫When looking for drama movies produced in England with Oscar nominations, we could recommend similar movies with BAFTA awards PersDB LyonDMOD Laboratory, University of Ioannina

“You May Also Like” Results Given a user query Q, a database system D reports a number of results R(Q) in the form of tuples Besides R(Q), we would like to locate and recommend to the user a set of tuples that may also be of interest to the user We call this set of tuples “You May Also Like” tuples or, for short, Ymal results PersDB LyonDMOD Laboratory, University of Ioannina

Directions for Ymal Computation How to compute Ymal results? ▫Use the particular results of an imposed query ▫Use the results of similar past queries or the results of queries imposed by similar users  Similar to traditional recommendation systems ▫Use information from resources external to the database, such as the web PersDB LyonDMOD Laboratory, University of Ioannina

A Taxonomy of Ymal Techniques PersDB LyonDMOD Laboratory, University of Ioannina Ymal computation techniques Current-state techniques History-based techniques External sources techniques

A Taxonomy of Ymal Techniques PersDB LyonDMOD Laboratory, University of Ioannina Ymal computation techniques Current-state techniques History-based techniques External sources techniques Current-state techniques Techniques that exploit the content and schema of a current query result and database instance

A Taxonomy of Ymal Techniques PersDB LyonDMOD Laboratory, University of Ioannina Ymal computation techniques Current-state techniques History-based techniques External sources techniques History-based techniques Techniques that exploit the history of previously submitted queries to the database system, e.g. by using query logs or logs of query results

A Taxonomy of Ymal Techniques PersDB LyonDMOD Laboratory, University of Ioannina Ymal computation techniques Current-state techniques History-based techniques External sources techniques Techniques that exploit resources external to the database, such as related published results and reports, relevant web pages, thesaurus or ontologies

Movies Example m-idtitleyearrank PersDB LyonDMOD Laboratory, University of Ioannina m-ida-idrole a-idf_namel_namegender m-idgenre m-idd-id f_namel_namegender Genre MovieCast DirectDirector Actor

Outline Current-state techniques History-based techniques External sources techniques Summary PersDB LyonDMOD Laboratory, University of Ioannina

Current-state Techniques Available information: ▫The results R(Q) computed for a query Q Ymal results for Q can be computed based on: ▫Local analysis of the intrinsic properties of R(Q)  Exploit the content and schema of R(Q) ▫Global analysis of the properties of the database D  Exploit the content and schema of D PersDB LyonDMOD Laboratory, University of Ioannina

Local Analysis Techniques Ymal current-state techniques can be distinguished between: ▫Local analysis techniques  Content-based approach  Locate common information patterns appearing in R(Q)  Employ such information to recommend tuples of D that do not belong in R(Q) but exhibit similar behavior  Example: Query about M. Freeman movie titles. Since M. Freeman acts usually in detective roles, report movies in which other actors play detective roles  Schema-based approach ▫Global analysis techniques ▫Hybrid analysis techniques PersDB LyonDMOD Laboratory, University of Ioannina m-idtitleyearrank m-ida-idrolea-idf_namel_namegender MovieCast Actor

Local Analysis Techniques Ymal current-state techniques can be distinguished between: ▫Local analysis techniques  Content-based approach  Schema-based approach  Expand the tuples of R(Q) through joins  Locate common information patterns in the expanded result tuples  Employ such information to recommend tuples  Example: Query about M. Freeman movies. Enhance the result with movies genres. Based on the most frequent genres, report other movies of those genres ▫Global analysis techniques ▫Hybrid analysis techniques PersDB LyonDMOD Laboratory, University of Ioannina m-idtitleyearrank m-ida-idrolea-idf_namel_namegender MovieCast Actor m-idgenre Genre

Global Analysis Techniques Ymal current-state techniques can be distinguished between: ▫Local analysis techniques ▫Global analysis techniques  Content-based approach  Rely on correlations of specific attribute values, selectivities  Example: When querying for Walter Matthau movies, report a number of Jack Lemmon movies, since they often star together  Schema-based approach  Rely on correlations among relations or attributes to direct the expansion of tuples in R(Q) ▫Hybrid analysis techniques PersDB LyonDMOD Laboratory, University of Ioannina

Hybrid Analysis Techniques Ymal current-state techniques can be distinguished between: ▫Local analysis techniques ▫Global analysis techniques ▫Hybrid analysis techniques  Combine local and global analysis  Combine content and schema information PersDB LyonDMOD Laboratory, University of Ioannina

Local Analysis Computation (Content-based) Discover interesting patterns in the results R(Q) of a query Q How to find them? Search for frequently appearing attribute values in R(Q) To quantify attribute value appearances, we use a value-frequency matrix M R(Q) : ▫M R(Q) has one row for each attribute A 1, …, A m of R(Q) ▫M R(Q) has one column for each distinct attribute value v 1, …, v m appearing in R(Q) tuples ▫M R(Q) (i, j) contains the number of occurrences of v j for A i in R(Q) PersDB LyonDMOD Laboratory, University of Ioannina

Local Analysis Computation (Content-based) Given a query Q and the corresponding value-frequency matrix M R(Q), present p Ymal results How? ▫Locate the k elements in M R(Q) with the highest values ▫For each element, construct a select-project-join query to retrieve interesting results  Each element contributes a number of Ymal results according to its value in M R(Q), For example, consider the function: where M’ R(Q) (i, j) = M R(Q) (i, j), if M R(Q) (i, j) is one of the k most frequent values and M’ R(Q) (i, j) = 0 otherwise PersDB LyonDMOD Laboratory, University of Ioannina

Content-based Example Query: Present movies staring Lee Phelps Results: PersDB LyonDMOD Laboratory, University of Ioannina m-idtitleyearranka-idm-idrolea-idf-namel-namegender 4619 Abbott and Costello in Hollywood Detective372839LeePhelpsM Money to Loan Policeman372839LeePhelpsM Bride Came C.O.D Policeman372839LeePhelpsM Thin Man Goes Home Policeman372839LeePhelpsM Beast From 20,000 Fathoms Cop372839LeePhelpsM … … ……………………

Content-based Example Ymal Results: p = 3, k = 2 Part of the value-frequency matrix PersDB LyonDMOD Laboratory, University of Ioannina PolicemanDetectiveCopBartenderGuard Role m-idtitleyearranka-idm-idrolea-idf-namel-namegender Talented Mr. Ripley Policeman411152ManuelRuffiniM Love Letter Policeman420874BsakuSatohM I, Robot Detective297823CraigMarchM

Local Analysis Computation (Schema-based) Discover interesting patterns in the expanded (through joins) results of Q General directions: ▫Expand R(Q) towards different directions through join operations ▫For each such expansion, construct a value-frequency matrix ▫Select the matrix with the most frequent value appearances ▫Favor matrices with fewer join operations PersDB LyonDMOD Laboratory, University of Ioannina

Schema-based Example Query: Present movies staring Lee Phelps Possible directions for expansion: Genre and Direct ▫Most common patterns are observed for Genre (Drama appears 216 times and Comedy appears 120 times) Expanded Results: PersDB LyonDMOD Laboratory, University of Ioannina titleyearrankrolef-namel-namegendergenre Abbott and Costello in Hollywood DetectiveLeePhelpsMComedy Money to Loan PolicemanLeePhelpsMDrama Bride Came C.O.D PolicemanLeePhelpsMComedy Thin Man Goes Home 19457PolicemanLeePhelpsMDrama Beast From 20,000 Fathoms CopLeePhelpsMSci-Fi … …………………

Schema-based Example Ymal Results: p = 3, k = 2 PersDB LyonDMOD Laboratory, University of Ioannina titleyearrankrolef-namel-namegendergenre Pinocchio20024PinocchioRobertoBenigniMComedy Berlin SammyAsadSchwarzMDrama Monster NewscasterJim R.ColemanMDrama

Global Analysis Computation Ymal computation is guided by statistics maintained for database D Such statistics include: ▫Correlations among attribute values of D ▫Correlations among relations in D PersDB LyonDMOD Laboratory, University of Ioannina

Global Analysis Computation (Content-based) Correlations between pairs of attribute values can be maintained in a (x × y × y) value-correlation matrix V, where: ▫x is the number of attributes in D ▫y is the number of distinct attribute values in D How V can be used for a query Q, for reporting p Ymal results? ▫Locate the k attribute values that are most correlated to the selection predicates of Q  The more correlated a value is, the more Ymal results it will contribute PersDB LyonDMOD Laboratory, University of Ioannina

Content-based Example Query: Present romance movies Romance appears along with other genres for the same movies as many times as shown below: Drama (2801)Comedy (2398)Musical (538) Action (351)Adventure (323)Thriller (267) Fantasy (263)Crime (263)Family (234) War (199)Short (162)Mystery (131) Ymal Results: p = 3, k = 2 Since the mostly correlated values to romance are drama and comedy, Ymal results include two drama movies and a comedy one. PersDB LyonDMOD Laboratory, University of Ioannina

Global Analysis Computation (Schema-based) Use correlations among the relations of D to direct joining operations ▫Correlations between relations can be maintained in a (z × z) relation- correlation matrix A, where z is the number of relations in D Example: ▫Let Cast be strongly correlated with Actor. When querying for specific actor names, we could present roles that these actors have portrayed ▫Let Movie be strongly correlated with Genre. When querying for movies that are directed by S. Spielberg, we could enhance the result with information about the genres of S. Spielberg’s movies PersDB LyonDMOD Laboratory, University of Ioannina

Hybrid Analysis Computation Hybrid methods exploit both local and global properties ▫A content-based approach is to use attribute values that are both frequent and strongly correlated ▫A schema-based approach is to join R(Q) with other relations with regards to correlations among relations, as well as, frequent appearances of attribute values in those relations PersDB LyonDMOD Laboratory, University of Ioannina

A Taxonomy of Current-state Techniques PersDB LyonDMOD Laboratory, University of Ioannina Local AnalysisGlobal AnalysisHybrid Analysis Content-based Most frequent values in R(Q) Most correlated values in D Combine frequent and correlated values Schema-based Direct joins through frequencies of values in expanded R(Q) Direct joins through correlations among relations in D Direct joins through frequencies in expanded R(Q) and correlations among relations in D

Outline Current-state techniques History-based techniques External sources techniques Summary PersDB LyonDMOD Laboratory, University of Ioannina

History-based Techniques Such techniques assume available information about the previous interactions of the users with the database What is the available information? Two alternatives: ▫Log query results or ▫Log queries themselves Are the two alternatives equivalent? ▫The result of each query depends on the database instance PersDB LyonDMOD Laboratory, University of Ioannina

History-based Techniques History-based techniques are similar to traditional recommendation systems In general, recommendation systems are distinguished between: ▫Content-based  Recommend data items similar to those the user preferred in the past ▫Collaborative  Recommend data items that similar users liked in the past ▫Hybrid  Combine content-based and collaborative techniques PersDB LyonDMOD Laboratory, University of Ioannina

History-based Techniques Motivated by the above classification, Ymal history-based techniques are distinguished between: ▫Query-based, similar to content-based recommendations ▫User-based, similar to collaborative recommendations ▫Hybrid PersDB LyonDMOD Laboratory, University of Ioannina

Query-based Technique Query-based techniques Ymal results for a query Q, include results of a set of logged queries that are the most similar to Q Similarity function example: R(Q): results of Q PersDB LyonDMOD Laboratory, University of Ioannina History-based techniques Query-based techniques Exploit similarities among queries User-based techniques Exploit similarities among users Hybrid techniques Exploit similarities among queries and users

User-based Technique User-based techniques Ymal results for a query Q imposed by a user U, include results of a set of logged queries imposed by those users that exhibit the most similar behavior to U Similarity function example: Q(U): set of queries imposed by U PersDB LyonDMOD Laboratory, University of Ioannina History-based techniques Query-based techniques Exploit similarities among queries User-based techniques Exploit similarities among users Hybrid techniques Exploit similarities among queries and users

Hybrid Technique Hybrid techniques Ymal results for a query Q imposed by a user U, include the results of the most similar queries to Q out of those that were imposed by similar users to U Recent queries reflect better the current trends and user interests? PersDB LyonDMOD Laboratory, University of Ioannina History-based techniques Query-based techniques Exploit similarities among queries User-based techniques Exploit similarities among users Hybrid techniques Exploit similarities among queries and users

Outline Current-state techniques History-based techniques External sources Summary PersDB LyonDMOD Laboratory, University of Ioannina

External Sources Current-state and history-based techniques exploit intrinsic information of the database ▫E.g. correlation among attribute values and relations themselves What about cases where relationships among data items are not captured in the database, even if present? ▫Information retrieved from external sources can be used for the computation of Ymal results External sources: well-organized information available over the Web in the form of articles, reports and reviews in collectively maintained knowledge repositories ▫E.g. Wikipedia, LibraryThing PersDB LyonDMOD Laboratory, University of Ioannina

Examples 1.Query about Sofia Coppola movies ▫Use external information, to recommend Francis Ford Coppola movies (their relationship is not reflected in the schema) 2.Query about movies of various directors ▫If most of these directors are Asian, recommend other movies by Asian directors (the origin of directors cannot be found in the schema) PersDB LyonDMOD Laboratory, University of Ioannina

Outline Current-state techniques History-based techniques External sources techniques Summary PersDB LyonDMOD Laboratory, University of Ioannina

Summary We present a first approach to compute Ymal results We organize various alternatives into categories Open issues: ▫Locate common pairs of keywords appearing in R(Q) ▫Exploit information about the importance of each relation attribute ▫Log query results ▫Maintain statistics about query results ▫Report novel, fresh or diverse information PersDB LyonDMOD Laboratory, University of Ioannina

Thank You PersDB LyonDMOD Laboratory, University of Ioannina