Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc.

Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc Spaniol, Gerhard Weikum

February 2, 2009Perspektivenvorlesung Social Tagging Networks Definition: Social Tagging Network Website where people publish + tag information review + rate information publish their interests maintain network of friends interact with friends Common examples: Flickr (images) YouTube (videos) del.icio.us (bookmarks) Librarything (books) Discogs (CDs) CiteULike (papers) Facebook Myspace (media)

February 2, 2009Perspektivenvorlesung Some Statistics Flickr: (as of Nov 2008) 3+ billion photos, 3 million new photos per day Facebook: (as of Nov 2008) 10+ billion photos, 30+ million new photos per day 120 million active users 150,000 new users per day Myspace: (as of Apr 2007) 135 million users (6th largest country on Earth) 2+ billion images (150,000 req/s), millions added daily 25 million songs 60TB videos StudiVZ.net: (as of Nov 2008) 11 million users 300 million images, 1 million added daily Huge volume of highly dynamic data

February 2, 2009Perspektivenvorlesung Showcase: librarything.com Ratings Tags Books Others

February 2, 2009Perspektivenvorlesung librarything.com: Social Interaction Explicit Friends Similar Users Comments

February 2, 2009Perspektivenvorlesung librarything.com: Tag Clouds

February 2, 2009Perspektivenvorlesung librarything.com: Search Search results independent of the querying user (and the social context)

February 2, 2009Perspektivenvorlesung librarything.com: Search Search automatically expanded with similar tags (synonyms)

February 2, 2009Perspektivenvorlesung Librarything.com: Recommendations Recommendations depend on user and tags (but not on social context)

February 2, 2009Perspektivenvorlesung Librarything.com: Recommendations Explanation for the recommendation

February 2, 2009Perspektivenvorlesung Librarything.com: Explanations

February 2, 2009Perspektivenvorlesung Outline Search in Social Tagging Networks –Graph Model –Different Information Needs Effective Query Scoring Efficient Query Evaluation Summary & Further Challenges

February 2, 2009Perspektivenvorlesung Querying Social Tagging Networks travel vldb travel norway

February 2, 2009Perspektivenvorlesung Querying Social Tagging Networks travel vldb travel norway harry potter travel trip travel icde travel mexico travel travel norway travel vldb probability data mining foundations

February 2, 2009Perspektivenvorlesung Information Need 1: Globally Popular travel vldb travel norway harry potter travel trip travel icde travel mexico travel travel norway travel vldb probability data mining foundations Most frequently tagged items „best“ Tags by all users equally important harry potter or ?

February 2, 2009Perspektivenvorlesung Information Need 2: Similar Users harry potter travel trip travel icde travel mexico travel vldb travel travel norway travel vldb probability data mining foundations travel or ?

February 2, 2009Perspektivenvorlesung Information Need 2: Similar Users harry potter travel trip travel icde travel mexico travel vldb travel travel norway travel vldb probability data mining foundations travel or ? Tags by users with similar tags/items („brothers in spirit“) more important

February 2, 2009Perspektivenvorlesung Information Need 3: Trusted Friends harry potter travel trip travel icde travel vldb travel travel norway travel vldb probability selling probability data mining foundations probability selling probability harry potter travel mexico or ?

February 2, 2009Perspektivenvorlesung Information Need 3: Trusted Friends harry potter travel trip travel icde travel vldb travel travel norway travel vldb probability selling probability data mining foundations probability selling probability harry potter travel mexico or ? Tags by closely related and well-known users more important

February 2, 2009Perspektivenvorlesung Towards Social-Aware Social Search Search results may depend on –Global popularity of items –Spiritual context of the querying user (users with similar books and/or tags) –Social context of the querying user (known and trusted friends)

February 2, 2009Perspektivenvorlesung Outline Search in Social Tagging Networks Effective Query Scoring –Quantifying Friendship Strengths –User-specific Scoring Functions –Experimental Evaluation Efficient Query Evaluation Summary & Further Challenges

February 2, 2009Perspektivenvorlesung Notation U set of users T set of tags I set of items tags(u): tags used by user u items(u): items tagged by user u items(t): items tagged with tag t by at least one user df(t): number of items tagged with tag t tf u (i,t): number of times user u tagged item i with tag t tf(i,t): number of times item i was tagged with tag t

February 2, 2009Perspektivenvorlesung Quantifying Friendship Strengths Global „friendship“ strength: Spiritual friendship strength Social friendship strength Integrated friendship strength

February 2, 2009Perspektivenvorlesung Spritual Friendship Strength Several alternatives: based on overlap of tag usage: based on overlap of tagged items: For all: P spirit (u,u):  normalization such that tags(u): tags used by user u items(u): items tagged by user u u u‘ overlap in interests of u and u‘ overlap of behavior (tagging, searching, rating, …) u u‘ harry potter wizard deathly hallows philosopher stone

February 2, 2009Perspektivenvorlesung Graph-Based Friendship Strength u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 set P social (u,u):=0 normalization such that u2u2 P social (,u‘) u3u3 u4u4 u5u5 u6u6 u7u7 u‘ distance of u and u‘ in user network

February 2, 2009Perspektivenvorlesung Integrated Friendship Strength Query-dependent mixture of spiritual friendship strength social friendship strength background model (global) (0 ,  1;  +  1) P int (u,u‘)

February 2, 2009Perspektivenvorlesung Excursion: Scoring in Text Retrieval Importance of t in the collection (the less frequent, the better) Importance of t for item i (the more frequent, the better) General scoring framework: Hand-tuned instance: Okapi BM25 Linear combination for query scores

February 2, 2009Perspektivenvorlesung Towards a User-specific Score Convert into user-specific social frequency: global friendship strength Compute user-specific social score [SIGIR 2008]

February 2, 2009Perspektivenvorlesung Including Tag Expansion Problem: Users use different tags for similar things  poor recall (missing relevant results) Solution: 1. Define notion of similar tags 2. Expand queries with similar tags 3. Modify scoring function for expanded queries Example: MPI, MPII, MPI-INF, MPI-CS, Max-Planck-Institut, D5, AG5, DB&IS, MMCI, UdS, Saarland University, …

February 2, 2009Perspektivenvorlesung Heuristics for finding similar tags Co-Occurrence heuristics: Tags t 1 and t 2 similar if they occur (almost) always together Specialization heuristics: Tag t 2 specialization of t 1 if t 1 occurs (almost) whenever t 2 occurs Example: t 1 =Europe, t 2 =Germany

February 2, 2009Perspektivenvorlesung Scoring Expanded Queries Naive approach: For query tag t, add similar tags t‘ with sim(t,t‘)>δ to query Better: auto-tuning incremental expansion For query tag t, consider only expansion with highest combined score per item „international crime“ expanded by „mafia camorra yakuza …“ But: „transportation disaster“ expanded by „train car bus plane …“ Result quality drops due to topic drift

February 2, 2009Perspektivenvorlesung Experimental Evaluation: Effectiveness Systematic evaluation of result quality difficult Three possible setups: Manual queries + human assessments Queries+assessments derived from external info (ex: DMOZ categories) Automated assessments from context of user –Items tagged by friends –Items tagged in the future   ?

February 2, 2009Perspektivenvorlesung Prototype [VLDB/SIGIR 2008 demo]

February 2, 2009Perspektivenvorlesung Preliminary User Study LibraryThing user study: [Data Engineering Bulletin, June 2008] 6 librarything users with reasonably large library and friend sets Overall 49 queries like „mystery magic“, „wizard“, „yakuza“ Crawled (part of) librarything: ~1,3 mio books, ~15 mio tags, ~12,000 users, ~18,000 friends Measured NDCG[10] 0.00.20.50.81.0 0.00.5460.5720.5680.565 0.20.5640.5720.5790.581- 0.50.5390.5520.559-- 0.80.5150.546--- 1.00.465---- α (social)  (spiritual) Result quality generally very high Combination of spiritual and social friends is best

February 2, 2009Perspektivenvorlesung Outline Search in Social Tagging Networks Effective Query Scoring Efficient Query Evaluation –Threshold Algorithms –ContextMerge –Experimental Evaluation Summary & Further Challenges

February 2, 2009Perspektivenvorlesung Algorithmic Overview Input: query q={t 1 …t n } for user u, α,  Output: k items with highest scores Goals: –Avoid computing all results –Minimize disk I/O and CPU load –Utilize precomputed information on disk + „harry potter“ ……………………..

February 2, 2009Perspektivenvorlesung Excursion: Threshold Algorithms for Text IR Input: query q={t 1 …t n } lists L(t p ) with pairs, sorted by score(i,t p )↓ Output: k items with highest aggregated score Family of Threshold Algorithms: scan lists in parallel maintain partial candidate results with score bounds terminate as soon as top-k results are stable

February 2, 2009Perspektivenvorlesung Example: Top-1 for 2-term query (NRA) L1L1 L2L2 top-1 item min-k: candidates A: 0.9 G: 0.3 H: 0.3 I: 0.25 J: 0.2 K: 0.2 D: 0.15 D: 1.0 E: 0.7 F: 0.7 B: 0.65 C: 0.6 A: 0.3 G: 0.2

February 2, 2009Perspektivenvorlesung Example: Top-1 for 2-term query (NRA) top-1 item min-k: candidates 0.9 ?A: score: [0.9;1.9] 0.9 A: 0.9 G: 0.3 H: 0.3 I: 0.25 J: 0.2 K: 0.2 D: 0.15 D: 1.0 E: 0.7 F: 0.7 B: 0.65 C: 0.6 A: 0.3 G: 0.2 ? ??: score: [0.0;1.9] L1L1 L2L2

February 2, 2009Perspektivenvorlesung Example: Top-1 for 2-term query (NRA) top-1 item min-k: candidates 0.9 ?A: score: [0.9;1.9] 0.9 ? 1.0D: score: [1.0;1.9] 1.0 A: 0.9 G: 0.3 H: 0.3 I: 0.25 J: 0.2 K: 0.2 D: 0.15 D: 1.0 E: 0.7 F: 0.7 B: 0.65 C: 0.6 A: 0.3 G: 0.2 ? ??: score: [0.0;1.9] L1L1 L2L2

February 2, 2009Perspektivenvorlesung 1.0 Example: Top-1 for 2-term query (NRA) top-1 item min-k: candidates 0.9 ?A: score: [0.9;1.9] 0.3 ?G: score: [0.3;1.3] ? 1.0D: score: [1.0;1.3] A: 0.9 G: 0.3 H: 0.3 I: 0.25 J: 0.2 K: 0.2 D: 0.15 D: 1.0 E: 0.7 F: 0.7 B: 0.65 C: 0.6 A: 0.3 G: 0.2 ? ??: score: [0.0;1.3] L1L1 L2L2

February 2, 2009Perspektivenvorlesung 1.0 Example: Top-1 for 2-term query (NRA) top-1 item min-k: candidates 0.9 ?A: score: [0.9;1.6] ? 1.0D: score: [1.0;1.3] 0.3 ?G: score: [0.3;1.0] No more new candidates considered A: 0.9 G: 0.3 H: 0.3 I: 0.25 J: 0.2 K: 0.2 D: 0.15 D: 1.0 E: 0.7 F: 0.7 B: 0.65 C: 0.6 A: 0.3 G: 0.2 ? ??: score: [0.0;1.0] L1L1 L2L2

February 2, 2009Perspektivenvorlesung 1.0 Example: Top-1 for 2-term query (NRA) top-1 item min-k: candidates 0.9 ?A: score: [0.9;1.6] ? 1.0D: score: [1.0;1.3] Algorithm safely terminates A: 0.9 G: 0.3 H: 0.3 I: 0.25 J: 0.2 K: 0.2 D: 0.15 D: 1.0 E: 0.7 F: 0.7 B: 0.65 C: 0.6 A: 0.3 G: 0.2 ? 1.0D: score: [1.0;1.25] 0.9 ?A: score: [0.9;1.55] ? 1.0D: score: [1.0;1.2] 0.9 ?A: score: [0.9;1.5] ? 1.0D: score: [1.0;1.2] 0.9 0.4A: score: [1.3;1.3] 1.3 L1L1 L2L2

February 2, 2009Perspektivenvorlesung Can we reuse this here? harry 0.95 0.85 0.51 travel 0.87 0.82 0.69 No, scores specific to querying user and parameter setting! : harry (  =0.2,  =0.5) 0.98 0.84 0.45 : harry (  =0.0,  =0.8) 0.90 0.89 0.56 : harry (  =1.0,  =0.0) 0.90 0.89 0.56 : harry (  =0.5,  =0.5) 0.90 0.86 0.64 : harry (  =0.0,  =1.0) 0.90 0.89 0.56 : harry (  =0.2,  =0.5) 0.98 0.84 0.45 : harry (  =0.0,  =0.8) 0.90 0.89 0.56 : harry (  =1.0,  =0.0) 0.90 0.89 0.56 : harry (  =0.5,  =0.5) 0.90 0.86 0.64 : harry (  =0.0,  =1.0) 0.90 0.89 0.56 : harry (  =0.2,  =0.5) 0.98 0.84 0.45 : harry (  =0.0,  =0.8) 0.90 0.89 0.56 : harry (  =1.0,  =0.0) 0.90 0.89 0.56 : harry (  =0.5,  =0.5) 0.90 0.86 0.64 : harry (  =0.0,  =1.0) 0.90 0.89 0.56 : harry (  =0.2,  =0.5) 0.98 0.84 0.45 : harry (  =0.0,  =0.8) 0.90 0.89 0.56 : harry (  =1.0,  =0.0) 0.90 0.89 0.56 : harry (  =0.5,  =0.5) 0.90 0.86 0.64 : harry (  =0.0,  =1.0) 0.90 0.89 0.56 Number of lists to precompute would explode! (#tags  #users  parameter space)

February 2, 2009Perspektivenvorlesung Revisiting the Social Frequency independent of user u dependent of user u Compute sf u (i,t) on the fly from tf(i,t), friends of u and their tagged documents

February 2, 2009Perspektivenvorlesung Top-K in Social Networks: ContextMerge Precomputed lists: ITEMS(t): pairs, sorted by tf(i,t)↓ USERITEMS(u‘,t): pairs, unsorted FRIENDS(u): pairs, sorted by F(u,u‘)↓ ITEMS(harry): 47 32 26 FRIENDS( ): 0.12 0.10 0.085 … … USERITEMS(, harry): already exist in systems

February 2, 2009Perspektivenvorlesung ContextMerge Adapted Threshold Algorithm for query u,t: Scan ITEMS(t) and FRIENDS(u) in parallel pick „best“ list –If ITEMS(t): read next entry –If FRIENDS(u): read USERITEMS(u‘,t) for next friend u‘ –Maintain candidates with bounds for min and max score and current results ITEMS(harry): 47 32 26 … FRIENDS( ): 0.12 0.10 0.085 …

February 2, 2009Perspektivenvorlesung ContextMerge Adapted Threshold Algorithm for query u,t: Scan ITEMS(t) and FRIENDS(u) in parallel pick „best“ list –If ITEMS(t): read next entry –If FRIENDS(u): read USERITEMS(u‘,t) for next friend u‘ –Maintain candidates with bounds for min and max score and current results ITEMS(harry): 47 32 26 … FRIENDS( ): 0.12 0.10 0.085 … User-indep part of sf: User-spec part of sf: 47 ?  |U| compute min score bound compute max score bound

February 2, 2009Perspektivenvorlesung ContextMerge Adapted Threshold Algorithm for query u,t: Scan ITEMS(t) and FRIENDS(u) in parallel pick „best“ list –If ITEMS(t): read next entry –If FRIENDS(u): read USERITEMS(u‘,t) for next friend u‘ –Maintain candidates with bounds for min and max score and current results ITEMS(harry): 47 32 26 … FRIENDS( ): 0.12 0.10 0.085 … User-indep part of sf: User-spec part of sf: 47 ?  |U| User-indep part of sf: User-spec part of sf: ? 0.12·|U|  47  |U|  0.88·|U|

February 2, 2009Perspektivenvorlesung Experimental Evaluation: Efficiency Testbed: 3 large crawls of real social networks –Flickr: 10 mio pictures, ~50,000 users –Del.icio.us: ~175,000 bookmarks, ~12,000 users –Librarything: ~6.5 mio books, ~10,000 users Queries: –150 frequent tag pairs –for each query pick user with „enough“ results & friends Abstract cost measure  disk load Baseline: full merge + sort

February 2, 2009Perspektivenvorlesung Experimental Evaluation: Efficiency (  =0) α 2-8 times better than baseline

February 2, 2009Perspektivenvorlesung Outline Search in Social Tagging Networks Effective Query Scoring Efficient Query Evaluation Summary & Further Challenges

February 2, 2009Perspektivenvorlesung Summary Need for social-aware social search, supporting –global –social –spiritual information needs Social scoring –integrating global, collection, and social context –including dynamic tag expansion ContextMerge: scalable implementation

February 2, 2009Perspektivenvorlesung Further Challenges Meaningful & common benchmark Incremental maintenance for high dynamics Extend to ratings, user weights, item weights, … Extend to non-tags (like image features) Automatic query parameterization Meaningful explanations of results Exploit dynamics (hot topics, evolving groups,….) Social-Aware Search & Recommendations at planet scale

February 2, 2009Perspektivenvorlesung Thank you. Questions?

Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc.

Similar presentations

Presentation on theme: "Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc.

Similar presentations

Presentation on theme: "Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc."— Presentation transcript:

Similar presentations

About project

Feedback