Download presentation
Presentation is loading. Please wait.
1
1 Heterogeneous Cross Domain Ranking in Latent Space Bo Wang 1, Jie Tang 2, Wei Fan 3, Songcan Chen 1, Zi Yang 2, Yanzhu Liu 4 1 Nanjing University of Aeronautics and Astronautics 2 Tsinghua University 3 IBM T.J. Watson Research Center, USA 4 Peking University
2
2 Introduction The web is becoming more and more heterogeneous Ranking is the fundamental problem over web –unsupervised v.s. supervised –homogeneous v.s. heterogeneous
3
3 Motivation Heterogeneous cross domain ranking Main Challenges 1) How to capture the correlation between heterogeneous objects? 2) How to preserve the preference orders between objects across heterogeneous domains? Main Challenges 1) How to capture the correlation between heterogeneous objects? 2) How to preserve the preference orders between objects across heterogeneous domains?
4
4 Outline Related Work Heterogeneous cross domain ranking Experiments Conclusion
5
5 Related Work Learning to rank –Supervised: [Burges, 05] [Herbrich, 00] [Xu and Li, 07] [Yue, 07] –Semi-supervised: [Duh, 08] [Amini, 08] [Hoi and Jin, 08] –Ranking adaptation: [Chen, 08] Transfer learning –Instance-based : [Dai, 07] [Gao, 08] –Feature-based : [Jebara, 04] [Argyriou, 06] [Raina, 07] [Lee, 07] [Blitzer, 06] [Blitzer, 07] –Model-based : [Bonilla, 08]
6
6 Outline Related Work Heterogeneous cross domain ranking –Basic idea –Proposed algorithm: HCDRank Experiments Conclusion
7
7 Query: “data mining” Conference Expert Latent Space Source Domain Target Domain mis-ranked pairs
8
8 The Proposed Algorithm — HCDRank How to optimize?How to define? Non-convex Dual problem
9
9 alternately optimize matrix M and D O(2T*sN logN) Construct transformation matrix O(d 3 ) learning in latent space O(sN logN) O((2T+1)*sN log(N) + d 3
10
10 Outline Related Work Heterogeneous cross domain ranking Experiments –Ranking on Homogeneous data –Ranking on Heterogeneous data –Ranking on Heterogeneous tasks Conclusion
11
11 Experiments Data sets –Homogeneous data set: LETOR_TR 50/75/106 queries with 44/44/25 features for TREC2003_TR, TREC2004_TR and OHSUMED_TR –Heterogeneous academic data set: ArnetMiner.org 14,134 authors, 10,716 papers, and 1,434 conferences –Heterogeneous task data set: 9 queries, 900 experts, 450 best supervisor candidates Evaluation measures –MAP –NDCG
12
12 Ranking on Homogeneous data LETOR_TR –We made a slight revision of LETOR 2.0 to fit into the cross- domain ranking scenario –three sub datasets: TREC2003_TR, TREC2004_TR, and OHSUMED_TR Baselines
13
13 Cosine Similarity=0.01 OHSUMED_TR TREC2004_TRTREC2003_TR Cosine Similarity=0.23 Cosine Similarity=0.18
14
14 Training Time
15
15 Ranking on Heterogeneous data ArnetMiner data set (www.arnetminer.org)www.arnetminer.org 14,134 authors, 10,716 papers, and 1,434 conferences Training and test data set: –44 most frequent queried keywords from log file Author collection: Libra, Rexa and ArnetMiner Conference collection: Libra, ArnetMiner Ground truth: –Conference: online resources –Expert: two faculty members and five graduate students from CS provided human judgments for expert ranking
16
16 Feature Definition FeaturesDescription L1-L10Low-level language model features H1-H3High-level language model features S1How many years the conference has been held S2The sum of citation number of the conference during recent 5 years S3The sum of citation number of the conference during recent 10 years S4How many years have passed since his/her first paper S5The sum of citation number of all the publications of one expert S6How many papers have been cited more than 5 times S7How many papers have been cited more than 10 times
17
17 Expert Finding Results
18
18 Feature Correlation Analysis
19
19 Ranking on Heterogeneous tasks Expert finding task v.s. best supervisor finding task Training and test data set: –expert finding task: ranking lists from ArnetMiner or annotated lists –best supervisor finding task: 9 most frequent queries from log file of ArnetMiner For each query, we collected 50 best supervisor candidates, and sent emails to 100 researchers for annotation Ground truth: –Collection of feedbacks about the candidates (yes/ no/ not sure)
20
20 Feature Definition FeaturesDescription L1-L10Low-level language model features H1-H3High-level language model features B1The year he/she published his/her first paper B2The number of papers of an expert B3The number of papers in recent 2 years B4The number of papers in recent 5 years B5The number of citations of all his/her papers B6The number of papers cited more than 5 times B7The number of papers cited more than 10 times B8PageRank score SumCo1-SumCo8The sum of coauthors’ B1-B8 scores AvgCo1-AvgCo8The average of coauthors’ B1-B8 scores SumStu1-SumStu8The sum of his/her advisees’ B1-B8 scores AvgStu1-AvgStu8The average of his/her advisees’ B1-B8 scores
21
21 Best supervisor finding results
22
22 Experimental Results
23
23 Outline Related Work Heterogeneous cross domain ranking Experiments Conclusion
24
24 Conclusion Formally define the problem of heterogeneous cross domain ranking and propose a general framework We provide a preferred solution under the regularized framework by simultaneously minimizing two ranking loss functions in two domains The experimental results on three different genres of data sets verified the effectiveness of the proposed algorithm
25
25 Data Set
26
26 Ranking on Heterogeneous data A subset of ArnetMiner (www.arnetminer.org)www.arnetminer.org 14134 authors, 10716 papers, and 1434 conferences 44 most frequent queried keywords from log file Author collection: –For each query, we gathered top 30 experts from Libra, Rexa and ArnetMiner Conference collection: –For each query, we gathered top 30 conferences from Libra and ArntetMiner Ground truth: –Three online resources http://www.cs.ualberta.ca/~zaiane/htmldocs/ConfRanking.html http://www3.ntu.edu.sg/home/ASSourav/crank.htm http://www.cs-conference-ranking.org/conferencerankings/alltopics.html –Two faculty members and five graduate students from CS provided human judgments
27
27 Best supervisor finding Training/test set and ground truth –724 mails sent –Fragment of mail 27 – Feedbacks in effect > 82 (increasing) – Rate each candidate by the definite feedbacks (yes/no)
28
28 Ranking on Heterogeneous tasks For expert finding task, we can use results from ArnetMiner or annotated lists as training data For best supervisor task, 9 most frequent queries from log file of ArnetMiner are used –For each query, we sent emails to 100 researchers Top 50 researchers by ArnetMiner Top 50 researchers who start publishing papers only in recent years (91.6% of them are currently graduates or postdoctoral researchers) –Collection of feedbacks 50 best supervisor candidates (yes/ no/ not sure) Also add other candidates –Ground truth
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.