Suleyman Cetintas 1, Monica Rogati 2, Luo Si 1, Yi Fang 1 Identifying Similar People in Professional Social Networks with Discriminative Probabilistic.

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Google News Personalization: Scalable Online Collaborative Filtering

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.

Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)

Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University

Jierui Xie, Boleslaw Szymanski, Mohammed J. Zaki Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA {xiej2, szymansk,

Investigation of Web Query Refinement via Topic Analysis and Learning with Personalization Department of Systems Engineering & Engineering Management The.

Presented by Zeehasham Rasheed

University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.

1 Matching DOM Trees to Search Logs for Accurate Webpage Clustering Deepayan Chakrabarti Rupesh Mehta.

Large-Scale Cost-sensitive Online Social Network Profile Linkage.

Towards Boosting Video Popularity via Tag Selection Elizeu Santos-Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu University of British Columbia -

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.

Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.

Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.

Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.

Table 3:Yale Result Table 2:ORL Result Introduction System Architecture The Approach and Experimental Results A Face Processing System Based on Committee.

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Mining Cross-network Association for YouTube Video Promotion Ming Yan, Jitao Sang, Changsheng Xu*. 1 Institute of Automation, Chinese Academy of Sciences,

Personalized Search Cheng Cheng (cc2999) Department of Computer Science Columbia University A Large Scale Evaluation and Analysis of Personalized Search.

No. 1 Classification and clustering methods by probabilistic latent semantic indexing model A Short Course at Tamkang University Taipei, Taiwan, R.O.C.,

Pressed For Success Merrimack College October 19, 2009.

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

Collaborative Filtering versus Personal Log based Filtering: Experimental Comparison for Hotel Room Selection Ryosuke Saga and Hiroshi Tsuji Osaka Prefecture.

Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.

Xiaowei Ying, Xintao Wu Univ. of North Carolina at Charlotte PAKDD-09 April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova ， Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.

Retrieval of Highly Related Biomedical References by Key Passages of Citations Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.

Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.

1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.

Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.

Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,

Recommendation in Scholarly Big Data

Chapter 7. Classification and Prediction

Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan

Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2

Collective Network Linkage across Heterogeneous Social Platforms

COMBINED UNSUPERVISED AND SEMI-SUPERVISED LEARNING FOR DATA CLASSIFICATION Fabricio Aparecido Breve, Daniel Carlos Guimarães Pedronette State University.

Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.

Postdoc, School of Information, University of Arizona

Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16

Citation-based Extraction of Core Contents from Biomedical Articles

Example: Academic Search

Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

Actively Learning Ontology Matching via User Interaction

Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^

WSExpress: A QoS-Aware Search Engine for Web Services

Presentation transcript:

Suleyman Cetintas 1, Monica Rogati 2, Luo Si 1, Yi Fang 1 Identifying Similar People in Professional Social Networks with Discriminative Probabilistic Models Acknowledgements: This research was partially supported by the following grants IIS , CNS , IIS Any opinions, findings, conclusions expressed in this paper are the authors', and do not necessarily reflect those of the sponsors. Professional Social Networks (PSNs): ▫ Business oriented social networks with core services such as recruiting, job seeking, expert/profile search, item recommendation, ad-targeting, etc. ▫ Information about users can be obtained from heterogeneous sources such as i) profile content, ii) social graph, and iii) user activities on the website. Challenge: Identify similar professionals in PSNs ▫ All of the above core services rely on successful identification of similar people ▫ No prior work. Related work on identifying similar users in social networks or matching people i) use information from a single source (user profile), or ii) did not differentiate the information from different sources in a principled way Key Fact: Different sources provide different insights on user similarity ▫ Propose a novel discriminative probabilistic model that i) identifies latent content and social graph classes for people with similar profile content and social graph similarity patterns ii) learns a specialized similarity model for each latent class Introduction and Motivation Given the similarity features f v, f v ={c v,g v,u v }: the proposed model can be constructed as follows: ▫ where s v in {1,-1} indicates whether the v th pair is similar or not, P(z|c v ) and P(t|g v ) denote the probability of choosing the latent classes z and t given c v and g v. N z and N t are the number of latent content and graph classes that are chosen to be 3 and 2 respectively by AIC. P(s v |z,t, f v ) can be modeled with by logistic function as where λ zti is the weight for the i th feature f i v under the latent classes z and t. P(z|c v,α) can be modeled by a soft-max function where Z c v is the normalization factor. P(t|, g v, β) can be modeled similarly. Parameters of P(s v =1|f v ) can be learned by the EM Algo.: ▫ E-step: Compute P(z, t|f v ) ▫ M-step: The following M-step update rules are derived. The update rule for β can be achieved similarly with the update rule for α. Proposed model (referred as Latent_CG_Mod) is compared to: ▫ i) a model that only models latent content classes (referred as Latent_C_Mod) ▫ ii) a model that only models latent social graph classes (referred as Latent_G_Mod) ▫ iii) a model that does not consider any latent classes – corresponding to Logistic Regression (referred as LogReg_Mod) Evaluation Metric (F 1 ) ▫ The models are evaluated by the common F 1 measure as precision and recall are both important. ▫ The “*” symbol indicates statistical significance level with p-value < 0.1 (paired t-tests). Results ▫ Both Latent_G_Mod and Latent_C_Mod achieve improvements over LogReg_Mod, and are comparable to each other. This shows that by having higher flexibility via introducing a latent class and allowing the combination weights vary accordingly, improvements can be achieved. ▫ Latent_CG_Mod achieves the best performance by modeling the latent content and graph classes that provide more flexibility than Latent_C_Mod and Latent_G_Mod, and much more flexibility than the LogReg_Mod. ▫ It is shown that differentiating pairs with different profile content and social graph similarity patterns, and specializing the similarity model for different pairs of people that share similar similarity patterns is important for achieving higher similarity accuracy. Discriminative Probabilistic ModelExperiments and Results MethodsF1F1 LogReg_Mod Latent_G_Mod Latent_C_Mod Latent_CG_Mod * Identifying similar people is an important task for professional social networks. Different people pairs have different profile content and social graph similarity patterns, and it is important to learn specialized similarity models for people with different similarity patterns. Novel discriminative probabilistic model that identifies latent content and social graph classes for people with similar content and social graph similarity patterns, and learns a specialized similarity model for each latent class. Experiments on real world data from LinkedIn show the effectiveness of the proposed discriminative model. Conclusion Department of Computer Sciences 1 Purdue University, West Lafayette, IN, 47907, USA LinkedIn Corp. 2 Mountain View, CA, 95054, USA Experiments conducted on a proprietary dataset from LinkedIn, that is constructed via the following steps: ▫ A set of 2200 “key profiles” are selected from the intersection of a) profiles popular among recruiters & b) profiles popular within general users (sets identified by mining the large scale recruiter/user activity logs of 6 months). ▫ Using Lucene, public profiles in the PSN are indexed, and each key profile is used as a structured query to retrieve top 100 “candidate profiles”. From those 100, 10 candidate profiles are selected under 3 strategies: i) top 10, ii) bottom 10, iii) sampled 10 (ranks 1-10, , and randomly). ▫ A total of profile pairs are identified for annotation. Each profile pair is annotated by 3 annotators from CrowdFlower from 1 to 4 (most similar). Final rating for each pair is calculated by the average raring weighted by annotator trust. ▫ Pairs with similarity rating > 2.5 are regarded as similar people (4633 pairs). Pairs with rating < 2 (2419 pairs) combined with another set of 2373 (less similar) pairs randomly selected from public member profiles, as the negative set. Out of the total 9425 pairs, two-thirds is used for training, one-third is used for testing. ▫ For each profile pair v, 5 content similarity features c v (comparing users' titles, industries, skills, specialties, associations), 3 social graph features g v (utilizing users' common connections, common groups, and whether their profiles are co-viewed or not), and 5 website usage features u v (utilizing the similarity in profile, search, inbox, news/sharing, accounts/settings page usage patterns) are extracted, and used as the set of features f v ={c v,g v, u v }. Dataset