Download presentation
Presentation is loading. Please wait.
Published byChad Lester Modified over 9 years ago
1
Personalizing Java based Answers for Hundreds of Millions of Users Anurag Gupta Senior Architect, Yahoo Answers & Groups anuragg@yahoo-inc.com
2
Agenda Industry Gaps Vision Strategy Use Cases Architecture Next Steps
3
2010: Resurgence of Q&A JanFebMarAprMayJunJulAugSepOctNovDec 2010: A year of highlights… 2011: The story continues… Quora, Location-based Q&A apps (Crowd Beacon, Hipster), Facebook Questions and Mahalo pivoting, Answers.com acquisition… LaunchAcquisition Investment Mobile play... Yahoo! Answers is still #1 (twice size of nearest competitor)
4
- 4 - Meeting unmet needs: –Improving signal to noise ratio –Beyond realtime: creating User Generated Content of lasting, evergreen value –Organising people’s knowledge and opinion for mass consumption –Allowing people to connect and share based on common interests, locations etc. –Providing platforms for people to become regarded as experts Identifying untapped monetisation opportunities –Mining intent and interest and information from participating users Why this activity? Companies entering market to address deficiencies of Social Media
5
- 5 - Industry Gaps Personal RelevanceUser ReputationContent Quality No understanding or filtering of content by interest Lack of understanding of quality contributors / content – poor signals Spam management No filtering of content by social circle or user reputation Persona vs. Real identityNo distinction between knowledge vs. conversational Q&A Almost no ability to post location-specific questions and filter content by location No topic specific reputation (PeopleRank) No ‘memory’ – hard to surface previous questions around topic Limited action, reaction, interaction loops – opportunity to improve engagement through notifications/follows No community tools for users to engage outside of Q&A
6
- 6 - Yahoo Answers is the place to share opinions, experience & knowledge around personal interests
7
- 7 - Y! Answers: Leading Site with over 2X next competitor Unique Users - Comscore % Reach - Comscore Jun-11M/MY/Y Jun-11M/MY/Y Reference 745 M-2%11% Wikimedia Foundation Sites 399 M-3%5%Wikimedia Foundation Sites54%-1%-5% Yahoo! Answers 245 M-2%17% Yahoo! Answers33%0%5% Baidu Answers 109 M4%10%Baidu Answers15%6%-1% eHow 82 M-8%13% eHow11%-6%1% Answers.com Sites 72 M-19%5%Answers.com Sites10%-17%-6% 245MM Consumers 9M Registered Users 2M Contributors
8
- 8 - Strengthen core and reach out Personalization, User Interest Graph User Reputation Distribution Ecosystem Monetization
9
Personalization & Relevance Insights Users Content Ads APIs Publisher Partners Yahoo Partner Data APIs User clicks Social graph Ranked content, video, ads Connected Devices User Generated Content, tagging User Generated Content, tagging
10
Personalization & Relevance Finance Sports News 3 rd party publisher Ads Content & Ad Server In-memory user-content-relevance_score Users Collaborative Filtering, social, geo, time User Segments Advertisers Social Graph ‘like’ User Interest Graph Tag User clicks Search terms Ranked content & ad Interactions: UGC, tags, Q&A Publishers Gaps drive acquisition of new relevant long-tail content Search Content- Tags Ad & Content Feeds
11
Yahoo Answers Personalization Use Cases Learn about new users’ interests (cold-start) Show relevant questions to user that comes via search engine Show relevant questions to Answerer on Y! Answers or 3 rd party site Use knowledge of user interests to increase user engagement, page views, reach, monetization
12
# Best Answers Attributed To Answerer Useful Vote PeopleRank of Viewer who voted “useful” Answerers with High PeopleRank Viewer’s interest Question Popularity Quality of Answers High quality High relevance Q&A page Answers: Relevance & Content Quality Like Vote User Interest Graph Answerability Increase signal to noise ratio Reward content creators with relevant audience Help audience discover relevant high quality content Green – Y! wide Yellow – Answers specific
13
Architecture for Online & Offline Computation Front-End Middle-tier NoSQL Long Tail Cache Oracle User Profile Services Tags User interest Content search terms, UGC Answers serving New Offline on Hadoop Grid userId, contentId, relevance_score 3 rd party feeds Feed Acquisition Notification Fast path PeopleRank Question Popularity Answerability Quality of Answers Collaborative Filtering Thumbs-up Tags Relevance computation New Online serving
14
Offline Relevance Computation Answers Data on Grid User Interest Graph 1, userID 2, viewer interests PeopleRank 3, viewer interests 4, top answerers 5, top answerers 6, Qs answered 3b, viewer interests 4b, popular Qs Relevance Computation 7, userID-Q-relevance_score
15
Incremental Online Relevance Computation Front End Middle Tier Answers Oracle Database 1, click, search, UGC 2 UPS 3, userID, tags 4, viewer interests PeopleRank 5, viewer interests 6, top answerers 7, top answerers 8, Qs answered 5b, viewer interests 6b, popular Qs Relevance Computation 9, relevant Qs 10, relevant Qs
16
Next Steps Move Oracle batch processing to Hadoop grid Get Answers data on Hadoop grid Annotation of source property for user interest Detect useful vs. interesting feedback User Interest Graph PeopleRank Tag computation Bucketing infrastructure Notification services
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.