Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Copyright IBM Corporation 2010 IBM Research On the Quality of Inferring Interests From Social Neighbors Zhen Wen, Ching-Yung Lin IBM T. J. Watson Research.

Similar presentations


Presentation on theme: "© Copyright IBM Corporation 2010 IBM Research On the Quality of Inferring Interests From Social Neighbors Zhen Wen, Ching-Yung Lin IBM T. J. Watson Research."— Presentation transcript:

1 © Copyright IBM Corporation 2010 IBM Research On the Quality of Inferring Interests From Social Neighbors Zhen Wen, Ching-Yung Lin IBM T. J. Watson Research Center

2 IBM Research © Copyright IBM Corporation 2010 | Motivation  Modeling user interests enables personalized services –More relevant search/recommendation results –More targeted advertising  Data about users are sparse –Many user profiles are static, incomplete and/or outdated –<10% employees actively participate social software [Brzozowski2009]  Inferring user interests from neighbors can be a solution –Also bring up a concern of exposing user’s private information How true are “You are who you know”, “Birds of a Feather Flocks Together”?

3 IBM Research © Copyright IBM Corporation 2010 | Challenges in Observing Users  Diverse types of media –Public social media (friending, blogs, etc.)  Data are public but limited (esp. in enterprises) –Private communication media (email, instant messaging, face-to-face meetings, etc)  Much more data  Privacy is a major issue

4 IBM Research © Copyright IBM Corporation 2010 | Example of Diverse Types of Media Number of people participated in top 3 media in an Enterprise with 400K employees Number of entries: Social bookmarking: 400K Electronic communication: 20M File sharing: 140K

5 IBM Research © Copyright IBM Corporation 2010 | Our Goals  How well a user’s interests can be inferred from his/her social neighbors?  Can the diverse types of media be combined to improve inferring user interests from social neighbors?  Can the quality of the inference be predicted based of features of social neighbors? –Only sufficiently accurate inference may help personalized services

6 IBM Research © Copyright IBM Corporation 2010 | Our Approach  Infer user interests from social neighbors –Model user interests based on multiple types of information they accessed –Construct employee social network from communication data –Infer using social influence model  Study the relationship between inference quality and network characteristics –Identify effective factors to ensure high quality results for applications

7 IBM Research © Copyright IBM Corporation 2010 | SmallBlue: Unlock the Power of Business Networks & Protect Privacy Expertise: Search for people who know “xyz” in my networks.. Ego: Show my personal network evolution and social capital Net: See how experts or community connect Reach: helps me to understand this person and my formal and information paths to Reach him.. Whisper: Social Network enabled personalized live recommender.. Productivity: Social Network Analysis Service helps company understand how to enhance productivity. Synergy: Personalized Search crawling Distributed Streams DBs & Feeds 20,000,000 emails & SameTime messages 1,000,000 Learning click data 14,000,000 KnowledgeView, SalesOne, …, access data 1,000,000 Lotus Connections (blogs, flie sharing, bookmark) data 200,000 people’s consulting financial databases 400,000 IBMers organization/demographic data 400,000 webpages and knowledge assets Social Network Analysis & Visualization, Expertise Mining, and Multi-Channel Human Network/Behavior Analysis Live Data

8 IBM Research © Copyright IBM Corporation 2010 | Privacy as Fundamental Human Rights and Global Privacy Laws (United Nations) Universal Declaration of Human Rights [1948] Article 12: No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honor and reputation. Everyone has the right to the protection of the law against such inference or attacks. EU Directive 95/46/EC Article 2 (a): –Personal data shall mean  any information  relating to  an identified or identifiable  natural person EU Directive 95/46/EC Article 7: –Personal data may be processed only if:  The data subject has unambiguously given his consent; or  for the performance of a contract to which the data subject is party or in order to take steps at the request of the data subject prior to entering into a contract; or  for compliance with a legal obligation to which the controller is subject; or…

9 IBM Research © Copyright IBM Corporation 2010 | Dataset  25315 users’ contributed content –20M email/chats –400K social bookmarks –20K shared public files –Profile information  Job role, division, news categories of interests, etc  Infer social network based on email/chats X’: number of emails

10 IBM Research © Copyright IBM Corporation 2010 | User Interests Model – Implicit Interests  Model users’ interests implicitly indicated by their contributed content –Extract latent topics from the multiple types of content using LDA –Select top-N distinct topics as the implicit interests model of a user The degree the user is interested The similarity of topics

11 IBM Research © Copyright IBM Corporation 2010 | User Interests Model – Explicit Interests  29% users manually specify interests in their profile –A list of selected terms  From a static 1120-term taxonomy related to work  Compare implicit and explicit interests –Explicit interests models are more limited  Implicit interests cover 60.4% explicit interests  Explicit interests cover 2.2% implicit interests

12 IBM Research © Copyright IBM Corporation 2010 | Infer Interests Based on Social Influence  Social influence model –Network autocorrelation model [Leenders02]  Social influence represented as a weighted combination of neighbors’ attributes The weight is an exponential function of the social distance

13 IBM Research © Copyright IBM Corporation 2010 | Inference Quality ConditionMaxMeanSt. Deviation Using social bookmark data only59.4%19.2%10.7% Using file sharing data only44.9%12.7%7.2% Using email/IM data only62.1%29.6%14.1% Using all three data100%45.1%21.7%  Implicit interests: how close the inferred top-20 topics to the ground truth –Significant advantage in combining multiple sources –Large variance can affect practical application, thus need predict when to infer interests –Much better recall than precision  Explicit interests: precision and recall of inferred terms MeasureMeanSt. Deviation Precision30.1%26.9% Recall61.5%27.6%

14 IBM Research © Copyright IBM Corporation 2010 | Can Inference Quality be Predicted?  Hypothesis: inference quality can be predicted from social network properties –User activeness: the amount of contribution –In-degree –Out-degree –Betweenness –User management role  Use Support Vector Regression to perform prediction  Evaluate prediction –Precision/recall of the prediction (10-fold cross validation) –Use prediction to improve inference  Only infer when we predict it’s high quality

15 IBM Research © Copyright IBM Corporation 2010 | Quality Prediction Results  Precision/recall of prediction  Improve inference MeasureImproved toImprovement (%) Precision60.5%101% Recall85.7%39.3% Implicit Interests Explicit Interests

16 IBM Research © Copyright IBM Corporation 2010 | Feature Comparison  “Leave-one-feature-out" comparisons of prediction results Most social influences are from 1&2-degree neighbors You neighbors decide how well you can be inferred You neighbors’ network positions may be even more important than how active they are –Formal organizational properties  Manager neighbors are more important in inference –i.e., more social influence (about 5% more)

17 IBM Research © Copyright IBM Corporation 2010 | Related Work  User modeling –Use behavioral data of the Ego  [Shepitsen08, Song05, Stoyanovic08, Teevan05] –Use data of 1-degree neighbors  Issued the same query ([Piwowarski07, White09])  Collaborative filtering ([Goldberg92])  Social influence and correlation –Correlation and related factors in social networks  [Singla08,Blei03, Crandall08, Anagnostopoulos08, Tang09] –Infer user profiles in online communities  [Mislove2010]

18 IBM Research © Copyright IBM Corporation 2010 | Conclusion  There’s large variance in the quality of inferring user interests from social neighbors  The “recall” of the inference is much better than “precision”  The inference quality can be predicted from social network properties

19 IBM Research © Copyright IBM Corporation 2010 | Questions?


Download ppt "© Copyright IBM Corporation 2010 IBM Research On the Quality of Inferring Interests From Social Neighbors Zhen Wen, Ching-Yung Lin IBM T. J. Watson Research."

Similar presentations


Ads by Google