Presentation is loading. Please wait.

Presentation is loading. Please wait.

Community-Centric Information Exploration on Social Content Sites Sihem Amer-Yahia and Cong Yu Yahoo! Research New York M3SN, Shanghai, China March 29.

Similar presentations


Presentation on theme: "Community-Centric Information Exploration on Social Content Sites Sihem Amer-Yahia and Cong Yu Yahoo! Research New York M3SN, Shanghai, China March 29."— Presentation transcript:

1 Community-Centric Information Exploration on Social Content Sites Sihem Amer-Yahia and Cong Yu Yahoo! Research New York M3SN, Shanghai, China March 29 th, 2009

2 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 2 | M3SN, Shanghai, China, 2009 Acknowledgement Michael Benedikt, Oxford Alban Galland, INRIA Jian Huang, Penn State Laks Lakshmanan, UBC Julia Stoyanovich, Columbia Acknowledgement (thanks to Facebook)

3 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 3 | M3SN, Shanghai, China, 2009 Social Networking Sites Are Popular!

4 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 4 | M3SN, Shanghai, China, 2009 An Emerging Trend: Social Content Integration

5 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 5 | M3SN, Shanghai, China, 2009 Social Content Sites (SCS) Web destinations that let users: –Consume content-oriented information: videos, photos, news articles, etc. –Engage in social activities with their friends and people of similar interests Two major driving factors: –Incorporating social activities improves the attractiveness of traditional content sites The “similar traveler” feature improves the user duration on Yahoo! Travel significantly –Incorporating content is critical to the value of social networking sites A significant amount of user time is spent on browsing other people’s photos, notes, etc. Privacy: important but not the focus here

6 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 6 | M3SN, Shanghai, China, 2009 Main Challenges in SCS Social Content Management –Data must be maintained and analyzed at web-scale: the underlying social content graph can be enormous. (Edward Chang’s keynote later) –Information has to be integrated from various places. Information Discovery –Current: pure content based search or simple popularity/rating based browsing –Future: effectively and efficiently leverage the underlying social graph in more sophisticated ways Information Presentation –Current: a single ranked list based on relevance –Future: metrics other than relevance / groups and clusters / explanations Information Exploration

7 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 7 | M3SN, Shanghai, China, 2009 Talk Outline Emerging Trend of Social Content Sites (SCS) Information Exploration on SCS –Community Centric Model –Information Discovery: Improving Search Performance through Clustering –Information Presentation: Diversification via Explanation SocialScope and Jelly Further Challenges Conclusion

8 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 8 | M3SN, Shanghai, China, 2009 Information Exploration on SCS Major Paradigms –User-based: browsing the content of your friends or other users you simply follow (Facebook, twitter) –Search: content based keywords matching (most traditional content sites) –Recommendations: hotlists, tag-based hotlists (Amazon, many collaborative tagging sites) –Query: database-style querying with complex conditions (not many so far) But which ones are more frequent?

9 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 9 | M3SN, Shanghai, China, 2009 Search vs. Recommendation Majority of those searches are in fact recommendations in disguise! –Users are usually NOT looking for specific items –Majority of the sessions are seeking recommendations with geographical and topical constraints Ranking the paradigms (for neurotic people who insist on orders): “Recommendation” ~ User-based > Search > Query

10 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 10 | M3SN, Shanghai, China, 2009 Recommendation 2.0 (or 1.5) Users demand a lot more from us –Be able to specify the topic: “what are the good destinations for a romantic trip in Asia?” –Be able to specify the community: “what’s hot among my tech-savvy friends?” –Be able to present the results in ways that can maximize the understanding of those results But we do know a lot more about the users and the contents (through the users) –Richer user profiles –Friendships and relationships –Content activities (rating movies, tagging travel destinations, etc.) –Communication activities (IM, email, twitter follows, Facebook wall messages, etc.)

11 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 11 | M3SN, Shanghai, China, 2009 Data Model: Social Content Graph John Joe NYC AA Amy friend visit fun, club Yankees Football id: 301 destination state: NY id: 302 destination state: MI college: umich id: 102 people id: 101 people id: 103 people A social content graph is a logical graph structure where the labeled nodes represent users and objects, and the labeled edges represent relations between users and items, as well as activities users perform on items or other users. Both nodes and edges can have structural attributes.

12 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 12 | M3SN, Shanghai, China, 2009 A Community-Centric Model Communities of users as the central concept –Community: a group of “similar” or “related” users –Recommendations are performed per community and results from each community are assembled together in the end –A user can belong to multiple communities: A vector-based representation: user: Challenge I: Community Generation –Topic based –Relationship based –Activity based Challenge II: Community Selection –Given a user session, determine the appropriate set of communities to work with

13 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 13 | M3SN, Shanghai, China, 2009 A Case Study on Communities in del.icio.us Target usersCommunity sizeCoverage tag-based 123526,85617% per tag shared URLs 123522782% shared URLs 20520385% How do different communities predict future user behavior? Hotlist generation in del.icio.us –simplest form of recommendation –global hotlist applicable to all users, but has poor quality –friendship communities are scarce –Shared tags, shared URLs and tags, shared URLs communities -- very high quality, but applicable to few users only

14 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 14 | M3SN, Shanghai, China, 2009 A Case Study in Yahoo! Travel on Topic-Based Communities

15 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 15 | M3SN, Shanghai, China, 2009 Data in Yahoo! Travel Users –Yahoo! users Destinations –cities, attractions, etc. Tagging activities: –Destination tags: editorial + custom –Self tags: mostly custom Goal: –Given a user session (user + tags + region), recommending travel destinations to the user Approach: –Discover topics for users, tags and destinations based on tagging actions –Construct communities based on topics

16 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 16 | M3SN, Shanghai, China, 2009 Joint work with Jian Huang Intuition –Each destination is modeled as a document consists of all tags being used on it –Tags are treated as words in the document Apply Latent Dirichlet Allocation [BNJ03] –A probabilistic generative model for document topic discovery –Produces the following conditional probabilities Prob(w|t): given a latent topic, the probability of the tag word belongs to that topic Prob(t|d): given a document, the probability of the document being about a certain latent topic Prob(t|u) = (Prob(t|d i )), where u tagged d i Topic Discovery through LDA

17 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 17 | M3SN, Shanghai, China, 2009 Examples of Topic Discovery Topic 1: Seaside/Water related Tokens: beach, scuba, summer, diving, fishing, snorkeling, island, lake … Top cities: San Diego, Honolulu, Chicago, Miami, Seattle … Topic 3: Nightlife, City-life Tokens: nightlife, gambling, wine, drinking, exciting, casino … Top cities: Las Vegas, New York City, Los Angeles, San Francisco … Topic 0: Romantic/Luxury Tokens: romantic, 4-star, shopping, spa, golf, luxury, honeymoon … Top cities: Cancun, San Juan, Grand Canyon, Bangkok … Topic 2: Arts/Historical/Cultural Tokens: architecture, sightseeing, art, history, culture, cathedral, castle … Top cities: Paris, London, Boston, Istanbul Amsterdam, Rome, Hong Kong … San Diego Las Vegas Paris Cancun

18 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 18 | M3SN, Shanghai, China, 2009 Community Generation Identify destinations belong to the same topic –For each topic t, identify destination d belongs to t: D(t) = { d | Prob(t|d) > threshold } Generate topical communities –D(u) = { d | user u visited destination d } –Interests-based topical community C interest (t) = { u | > threshold } –Expertise-based topical community C expert (t) = { u | > threshold } D(u) D(t)

19 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 19 | M3SN, Shanghai, China, 2009 Community Selection Each user session consists of three pieces of information –(user, tag, region) –Region is treated as a Boolean constraint –We combine and convert user and tag into a single topical vector Prob(t|(u,w)) = w 1 ·Prob(t|u) + w 2 ·(Prob(w|t)·Prob(t)/Prob(w)) A community C(t) is selected if Prob(t|(u,w)) is greater than a threshold The choice between Interests or Experts Communities is heuristically decided –User is an expert: leverage interests-based community –User is not an expert: leverage experts-based community

20 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 20 | M3SN, Shanghai, China, 2009 Example Results

21 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 21 | M3SN, Shanghai, China, 2009 Talk Outline Emerging Trend of Social Content Sites (SCS) Information Exploration on SCS –Community Centric Model –Information Discovery: Improving Search Performance through Clustering –Information Presentation: Diversification via Explanation SocialScope and Jelly Further Challenges Conclusion

22 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 22 | M3SN, Shanghai, China, 2009 Querying Items within Social Network Problem Definition: –Given a user and a query, retrieve items relevant to the query based only on the user’s network –The score of the item depends on keyword and user –E.g., score(u,q,i) = cnt(v), where tag(v,i,q) and friend(u,v) –A common scenario on social tagging sites like del.icio.us Naïve Approach –One inverted index for each (user,keyword) –Apply top-k threshold algorithms at query time –Optimal efficiency –But, high storage overhead

23 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 23 | M3SN, Shanghai, China, 2009 Space Overhead of Per-User Indexing item score i7 i1 i2 i3 i4 i5 i6 i8 16 73 65 62 40 39 18 16 user Jane i7 i5 i9 i2 i6 i5 i8 i3 user Ann 10 53 36 30 15 14 10 5 scoreitem tag = music item score i7 i1 i8 i4 i2 i3 i6 i9 15 30 29 27 25 23 20 13 user Jane i4 i5 i2 i8 i7 i1 i6 i3 user Ann 60 99 80 78 75 72 63 50 scoreitem tag = news Conservative Example: –100K users, 1M items, 1K tags –20 tags/item from 5% of the taggers –10 bytes per inverted list entry –1 Terabyte index!

24 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 24 | M3SN, Shanghai, China, 2009 Clustering to the Rescue! Joint work with Michael Benedikt, Laks Lakshamanan, Julia Stoyanovich Clustered Approach –Group users based on shared behavior into communities Shared items Shared items with same tags –One inverted index for each (group,keyword) –Replace item score with score upper bound (among all users in the group) –Sacrifice a bit of query performance for indexing space efficiency Community-based (seeker clustering): –Groups are keyword-independent –E.g., shared items, shared tags Behavior-based (tagger clustering): –Groups are keyword-specific –E.g., Jane may be similar to John on “travel”, but very different from him on “sports”.

25 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 25 | M3SN, Shanghai, China, 2009 User Clustering Jane Chris Sarah Ann Community-based Behavior-based one user, one cluster one inverted index per cluster upper-bound score per item one user, many clusters one inverted index per cluster upper-bound score per item

26 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 26 | M3SN, Shanghai, China, 2009 Community-based Clustering: Space

27 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 27 | M3SN, Shanghai, China, 2009 Community-based Clustering: Performance

28 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 28 | M3SN, Shanghai, China, 2009 Talk Outline Emerging Trend of Social Content Sites (SCS) Information Exploration on SCS –Community Centric Model –Information Discovery: Improving Search Performance through Clustering –Information Presentation: Diversification via Explanation SocialScope and Jelly Further Challenges Conclusion

29 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 29 | M3SN, Shanghai, China, 2009 Information Presentation A broad definition: Given the set of relevant items, identify the appropriate subset of items to be shown to the user(s) in the right organization and with the right amount of information. Appropriate Subset: –Timeliness –Geographical closeness –Diversity –Etc. Right Organization: –Ranking –Grouping –Faceted Navigation Right Amount of Information –Explanation

30 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 30 | M3SN, Shanghai, China, 2009 A Case Study in Recommendation Joint work with Laks Lakshmanan While relevance is important to recommendation, others are critical too: –Novelty: avoid returning results that users are likely to know already. –Serendipity: aim to return less relevant results that might give users a pleasant surprise. –Diversity: avoid returning results that are too similar to each other. Recommendation Diversification: From the pool of candidate items, identify a list of items that are dissimilar to each other while maintaining a high cumulative relevance, i.e., strike a good balance between relevance and diversity.

31 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 31 | M3SN, Shanghai, China, 2009 Existing Solutions for Diversification Attribute-Based Diversification –Diversity semantics: pair-wise distance functions based on item attributes (e.g., movie attributes). –Combining with relevance: Threshold either relevance or distance, maximize the other Optimize an overall score as a weighted combination of relevance and distance –Algorithm Perform traditional recommendation Obtain the attributes of each candidate item and compute the pair-wise distance Ad-hoc methods follow how diversity and relevance are combined A Major Problem: –Lack of attributes suitable for estimating distance between pairs of items: e.g., URLs in del.icio.us, photos on Flickr, videos on Vimeo.

32 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 32 | M3SN, Shanghai, China, 2009 Intuition –Explanation is the set of objects because of which a particular item is recommended to the user. –Two items share similar explanations are likely to be similar to each other. Explanation for Item-Based Strategies Explanation for Collaborative Filtering Strategies (social!) Explanation-Based Diversification

33 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 33 | M3SN, Shanghai, China, 2009 Explanation-Based Diversity Pair-wise diversity distance between two recommended items –Standard similarity measures like Jaccard similarity and cosine similarity –E.g. (Distance based on Jaccard similarity) Diversity for the set of recommended results (S)

34 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 34 | M3SN, Shanghai, China, 2009 Benefits and Practicality of Explanation-Based Diversification Applicable to items without attributes or whose attributes are difficult to analyze –Common on social content sites Explanations are by-products of many recommendation processes –They can be maintained with little overhead See our poster on Monday for details

35 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 35 | M3SN, Shanghai, China, 2009 Talk Outline Emerging Trend of Social Content Sites (SCS) Information Exploration on SCS –Community Centric Model –Information Discovery: Improving Search Performance through Clustering –Information Presentation: Diversification via Explanation SocialScope and Jelly Further Challenges Conclusion

36 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 36 | M3SN, Shanghai, China, 2009 SocialScope Platform Facebook Y! Sports Content Integrator Social Content Analyzer Social Query Evaluator Activity Manager Social Content Grouper Social Content Graph Data Manager Query / Result OpenSocial API Activities Y! IM OpenSocial API Content Management Information Discovery Important Information Flow (1) Raw/derived social nodes and links (2) Social content sub-graph (3) Relevant social content sub-graph (4) Relevant social nodes and links Information Presentation Social Content Ranker Social Content Admin User User Interface

37 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 37 | M3SN, Shanghai, China, 2009 A Graph Based Logical Algebra Framework Designed for information discovery on social content sites Aim to provide a declarative way of specifying analysis and query tasks –Uniformity and flexibility –Opportunities for performance optimization Basic operators: –Node Selection (σ N ), Link Selection (σ L ) –Composition, Semi-Join –Node Aggregation, Link Aggregation –Details in [CIDR09]

38 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 38 | M3SN, Shanghai, China, 2009 A Simple Search Task John’s friends People visited Denver John’s friends who visited Denver Their activities

39 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 39 | M3SN, Shanghai, China, 2009 Jelly: A Language Over Social Content Sites Designed with a focus on community-centric information exploration applications –Most useful applications A restricted implementation of the SocialScope algebra –Based on nested relation model, instead of full graph model Built-in primitives for topic and community generation –Topic Generation, Community Extraction –Recommendation Generation –Group Generation, Explanation Generation

40 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 40 | M3SN, Shanghai, China, 2009 A Simple (Incomplete) Example Topic Generation and Community Extraction Recommendation Generation Information Presentation generate topics for item into topics from tagging R using LDA (seed, th=0.8) seed R.item group R.tag weight-with count() generate communities into experts from topics T, tagging R where T.*.item = R.item using jaccard-similarity (seed, th=0.7) seed (R.user, T.topic) list R.item generate recommendations into candidates given user u, query q from experts T where Selected (T.topic, u, q) using count-users (seed) seed T.*.item list T.user generate explanations into results from candidates C, tagging T where C.item = T.item using identity (seed) seed C.item list T.user weight-with count()

41 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 41 | M3SN, Shanghai, China, 2009 Talk Outline Emerging Trend of Social Content Sites (SCS) Information Exploration on SCS –Community Centric Model –Information Discovery: Improving Search Performance through Clustering –Information Presentation: Diversification via Explanation SocialScope and Jelly Further Challenges Conclusion

42 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 42 | M3SN, Shanghai, China, 2009 Scalability Challenges Social content graphs are large –10’s of millions of users –millions (movies, destinations) or billions (web links) of items –Challenge 1: being able to analyze the graph efficiently We need to design Map-Reduce frameworks that are suitable for those analyses Activities are being generated at a very high rate –millions of status updates per day (Facebook, twitter) –Similar amount of content activities (browsing, tagging, etc.) –Challenge 2: being able to incrementally incorporate new activities into the existing model

43 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 43 | M3SN, Shanghai, China, 2009 Semantic Challenges and Questions Connections are multi-dimensional –Two main categories on the surface: explicit (friendship) vs. implicit (shared activities) –However: not all friendship links are the same and not all activities are the same –Challenge: for a given user on a given topic, identify the appropriate set of explicit and implicit links expert versus interest based communities is one step toward that goal Temporal, geographical, and other external influences –How does a community evolve at time goes on and as members moves around? Social graph interactions –How does one social graph (say Facebook) impact another (say MySpace)?

44 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 44 | M3SN, Shanghai, China, 2009 Conclusion The emerging trend of social content sites present many challenges –Social information integration –Information discovery –Information presentation The notion of communities and topics are crucial for effective and efficient information exploration on those sites –Community discovery and analysis –Community-based information exploration Lots more! At Yahoo!, we are focusing on building systems that can address those challenges

45 Sihem Amer-Yahia and Cong Yu Yahoo! Research New York 45 | M3SN, Shanghai, China, 2009 Questions?


Download ppt "Community-Centric Information Exploration on Social Content Sites Sihem Amer-Yahia and Cong Yu Yahoo! Research New York M3SN, Shanghai, China March 29."

Similar presentations


Ads by Google