Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Google News Personalization: Scalable Online Collaborative Filtering
HT06, Position Paper, Tagging, Taxonomy, Flickr, Academic Article, ToRead, Presentation Cameron Marlow, Mor Naaman, danah boyd, Marc Davis Yahoo! Research.
Center for E-Business Technology Seoul National University Seoul, Korea Socially Filtered Web Search: An approach using social bookmarking tags to personalize.
Efficient Network Aware Search in Collaborative Tagging Sites… Sihem Amer Yahia, Michael Benedikt Laks V.S. Lakshmanan, Julia Stoyanovichy PRESENTED BY,
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Collective Collaborative Tagging System Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana.
Enabling the Social Web Krishna P. Gummadi Networked Systems Group Max Planck Institute for Software Systems.
1 How Could We All Get Along on the Web 2.0? The Power of Structured Data on the Web Sihem Amer Yahia Yahoo! Research.
Flickr Tags Network Mustafa Kilavuz. Tags A tag is a keyword Search, spam detection, reputation systems, personal organization and metadata.
Tagging Systems Austin Wester. Tags A keywords linked to a resource (image, video, web page, blog, etc) by users without using a controlled vocabulary.
Tagging Systems Mustafa Kilavuz. Tags A tag is a keyword added to an internet resource (web page, image, video) by users without relying on a controlled.
1 Individual and Social Behavior in Tagging Systems Elizeu Santos-Neto David Condon, Nazareno Andrade Adriana Iamnitchi, Matei Ripeanu 20th ACM International.
Towards Semantic Web Mining Bettina Berndt Andreas Hotho Gerd Stumme.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Social Bookmarking & Research What Delicious can do for you.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Recommender Systems; Social Information Filtering.
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
A Measurement-driven Analysis of Information Propagation in the Flickr Social Network WWW09 报告人: 徐波.
Item-based Collaborative Filtering Recommendation Algorithms
Tag-based Social Interest Discovery
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Item Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karpis, Joseph KonStan, John Riedl (UMN) p.s.: slides adapted from:
Social scope: Enabling Information Discovery On Social Content Sites
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
No Title, yet Hyunwoo Kim SNU IDB Lab. September 11, 2008.
Recommendation system MOPSI project KAROL WAGA
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Wang-Chien Lee i Pervasive Data Access ( i PDA) Group Pennsylvania State University Mining Social Network Big Data Intelligent.
Date: 2013/8/27 Author: Shinya Tanaka, Adam Jatowt, Makoto P. Kato, Katsumi Tanaka Source: WSDM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Estimating.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
ON THE SELECTION OF TAGS FOR TAG CLOUDS (WSDM11) Advisor: Dr. Koh. Jia-Ling Speaker: Chiang, Guang-ting Date:2011/06/20 1.
Collaborative Information Retrieval - Collaborative Filtering systems - Recommender systems - Information Filtering Why do we need CIR? - IR system augmentation.
The Birth & Growth of Web 2.0 COM 415-Fall II Ashley Velasco (Prince)
Semantic Visualization What do we mean when we talk about visualization? - Understanding data - Showing the relationships between elements of data Overviews.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Algorithmic Detection of Semantic Similarity WWW 2005.
Retroactive Answering of Search Queries Beverly Yang Glen Jeh.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Web Review The Web Web 1.0 Web 2.0 Future of the Web Internet Programming - Chapter 01:XHTML1.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
On Frequent Chatters Mining Claudio Lucchese 1 st HPC Lab Workshop 6/15/12 1st HPC Workshp - Claudio Lucchese.
KMS & Collaborative Filtering Why CF in KMS? CF is the first type of application to leverage tacit knowledge People-centric view of data Preferences matter.
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Presentation by Jason Schlemmer. Making the website clear – explain who you are and what you do.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
Flickr Tag Recommendation based on Collective Knowledge Hyunwoo Kim SNU IDB Lab. August 27, 2008 Borkur Sigurbjornsson, Roelof van Zwol Yahoo! Research.
Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
Analysis of massive data sets Prof. dr. sc. Siniša Srbljić Doc. dr. sc. Dejan Škvorc Doc. dr. sc. Ante Đerek Faculty of Electrical Engineering and Computing.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
On Stability, Clarity, and Co-occurrence of Self-Tagging Aixin Sun and Anwitaman Datta Nanyang Technological University Singapore.
Content Reuse and Interest Sharing in Tagging Communities
Improving searches through community clustering of information
Personalized Social Image Recommendation
Dagstuhl Seminar on Ranked XML Querying
Presentation transcript:

Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008

2 Disclaimers No XML No Querying No religion Lots of Ranking Millions of people with different opinions A hint of db and ir

3 Abstract Collaborative tagging and rating sites constitute a unique opportunity to leverage implicit and explicit social ties between users in search and recommendations. In the first part of the talk, we explore different ranking semantics which account for content popularity within a network, thereby going beyond traditional query relevance. We show that the accuracy of ranking is tied to users behavior. In the second part of the talk, we describe a set of novel questions that arise under the new ranking semantics. The first question is to revisit data processing in the presence of power law distributions and tag sparsity, and indexing in light of different user behaviors. We then explore different ways of explaining recommendations followed by a discussion on diversifying results. Diversity is a well-known problem in recommender systems, referred to as over-specialization, and in Web search. We propose to leverage explanations to achieve diversity on the basis that the same users tend to endorse similar content. Finally, we note that different topics (e.g., sports, photography) are popular at different points in time and argue for time-aware recommendations. We conclude with a brief description of the infrastructure of Royal Jelly, a scalable social recommender system built on top of Hadoop.

4 Outline Motivation Ranking Almost-new questions Royal Jelly Wilder ideas

5 Recommendations (Amazon) but who are these people?

6 Explaining recommendations in x.qui.site Leveraging user-user similarities Multiple recommendation methods –Friends network –Shared-bookmark-interest –Shared-tag-interest –Shared-bookmark-tag-interest Multiple recommendation types –Bookmarks –Users –Tags

Yahoo! Movies now

Reviewers biases in Yahoo! Movies Leveraging item-item similarities Socially Meaningful Attribute Collections –Sets of items which are easy to label and serve as a socially meaningful reference set: Adventure movies starring Johnny Depp Woody Allen Comedies Scary movies from the 80’s Moderate French restaurants in Southern CA Similarities between movies are defined based on their SMACs

9 Social Context Heuristic Recommenders –Content / Item-based (purple column): discover items similar to i 2 (seed items) and see how u 2 has rated them –Collaborative / User-based (green row): discover users similar to u 2 (seed users) and see how they rate i 2 –Fusion / Filterbots: leveraging both similar items and similar users u1u1 u2u2... unun i1i i2i2 4? 5 : : imim 52 4

10 Outline Motivation Ranking Almost-new questions Royal Jelly Wilder ideas

11 New ranking semantics Collaborative tagging/reviewing sites contains a lot of high-quality user- generated: Flickr, YouTube, del.icio.us, Yahoo! Movies Users need help to sift through the large number of available items Not only relevance (in a traditional Web sense) but also about people whose opinion matters

12 Data model Items: photos in Flickr, movies in Y!Movies, URLs in del.icio.us Users: Seekers or Taggers Tagging/rating/reviewing: endorsements from users –u  Taggers, Items(u) = {i  Items | Tagged(u)} –Taggers(i, t) = {v | Tagged(v,i,t)} Network: implicit and explicit social links –u  Seekers, Network(u) = {v  Taggers | Link(u, v, w)} –Flickr friends, people with similar movie tastes, del.icio.us network

13 Search Given a seeker s and a query Q (set of tags), return items which are most relevant to Q and are most popular in s’s network f and g are monotone, assume f = count, g = sum

14 Hotlists Evaluate different hotlist generation methods in del.icio.us to see how best they predict user’s tagging actions 116,177 users who tagged 175,691 distinct URLs using 903 tags, for a total of 2,322,458 tagging actions for 1 month Each method defined by its seed and scope and returns the 10 best ranked items

15 People who matter friends url-interest tag-url-interest Coverage - overlap of hotlist with u’s tagging actions, averaged over users in scope

16 Coverage 42.9% 81.7% 8.6% 61%

17 Outline Motivation Ranking Almost-new questions –pre-processing&indexing –explanation: why a recommendation –diversity: be innovative, stay relevant –time-awareness: what matters when Royal Jelly Wilder ideas

18 Pre-processing Tags are sparse and may mean different things –Co-occurrence analysis, association rules, ontologies, EM Tails are long, very long –cut tails? average among very different users?

Social Meaningfulness in Y! Movies

20 Indexing Hotlists –global (1 inverted list), global-tag (900 lists, 1 list/popular tag), friends, url-interest, tag-url-interest (1 list/user) Search: –1 list/per (user,keyword) pair –1 list/groups of similar users –Cluster indices based on common user behavior Behavior does change

21 Explanation Users relate to social biases and influences What to display? –all influencers: does not scale –top influencers –distribution of opinions among influencers 80% of your friends bookmarked this link this reviewer rates this movie better than 40% of all reviewers How to display it? –e.g., natural language pattern, visual pattern Some relationship to DB annotations

22 Diversity Well-know problem in recommender systems (over- specialization) and IR (Web search) In recommendations: –Stay as close as possible to the user’s interests –But not too close Woody Allen Comedies Restaurants serving Chinese in the east village in NYC –Post-processing based on items objective attributes Many possible top-k sets Pick the most diverse Explanation-based diversity The same people (items) recommend the same items Does not require presence of objective attributes Independent from recommendation method

23 Time-awareness Recommender systems focus on most recent (hot) items Recovering old URLs in del.icio.us –Some URLs are tagged heavily for a certain period then slows down – how to find those worth recovering? Anticipating new URLs –New URLs come into the system, often tagged with very few initial users – how to detect those with potential? Topic grouping and time patterns are key: –Event-driven activity (election, photography) –Utilizing per topic time patterns

Posts with tag “photography”: consistent time pattern New Year Weekends Average: 2948 STDEV: 533

Iowa New Hampshire Richardson Out Thompson Out Average: 240 STDEV: 105 Posts with tag “election”: event-driven tagging MichiganFlorida

26 Outline Motivation Ranking Almost-new questions Royal Jelly Wilder ideas

Royal Jelly

Hadoop-Pig Based Processing del.icio.us backup database MySQL Extract research9 quicknever database MySQL Load distributed analysis and index / view generation Daily analysis for a window of several months worth of data Explanation Diversity

Wilder ideas Automatic user assessments –Users are willing to create new content –And rate it! –Let them rate recommendations –And help us define evaluation benchmarks Make DB social! –Social-awareness in databases and query languages Different DB organizations Different query semantics –SQL: a Social Query Language? Who thinks like me? Who does not?