1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia 11/27/2007 Advisor: I. Budak Arpinar Committee: Prashant Doshi Robert J. Woods
2 OUTLINE Background Expertise Profiles Ranking Experts Collaboration Networks Expansion Results and Evaluation Conclusion Demo SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
3 BACKGROUND Semantic Web What ? Extension of current Web Attach Meaning to Data Why ? Under Utilization of Current Web HTML Limitations Goal Enhance Information Exchange Automatic Information Discovery Interoperability of Services SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
4 BACKGROUND Semantic Web Technologies XML RDF/RDFS/OWL URI Ontology SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks “David Billington is a Professor of Mathematics” David Billington Mathematics David Billington Mathematics David Billington
5 BACKGROUND Semantic Web Common Challenges Entity Disambiguation Ontology Mapping/Alignment Trust/Provenance Semantic Association Discovery Application Social Networks Bio-Informatics National Security GPS Data Mining SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
6 BACKGROUND Social Networks What ? Connected through Social Relationships Characteristics Clustering Coefficient (connectedness to neighbors) Centrality (average shortest path length) Geodesic (shortest path length) SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
7 BACKGROUND Peer-Review Process What ? Review scholarly manuscripts Challenges Slow Conflict of Interest Finding Suitable Reviewers Arbitrary Knowledge Approach Research Diversification Emerging Fields SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
8 CONTRIBUTIONS Applicability of Semantics Finding Expertise Fine Levels of Granularity Finding Experts Taxonomy Collaboration Networks Discovery of Unknown Experts SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
9 SEMEF SEMantic Expert Finder Finding Expertise (Expertise Profiles) Collecting Expertise Quantifying Expertise Finding (Ranking) Experts w/ and w/o taxonomy Collaboration Networks Geodesic C-Nets SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
10 EXPERTISE PROFILES Collecting Expertise Collect All Publication Map papers to topic Quantify all papers Publications Dataset DBLP 473,296 papers (conference/session names - Nov. 2007) ACM, IEEE, Science Direct 29,454 papers (abstracts/index terms) Combined 476,299 papers SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
11 EXPERTISE PROFILES Collecting Expertise Papers-to-Topics Dataset Combined (476,299) Topics (320) Relationships (676,569) Expertise Profiles (560,792) SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
12 EXPERTISE PROFILES Quantifying Expertise Mapping each paper to distinct value Publication Impact Hector Garcia-Molina (248 papers ) E. F. Codd (49 papers ) Citeseer Impact Statistics (1221 venues) DBLP URIs SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
13 EXPERTISE PROFILES Figure 1: Expertise Profile author_A topic 1 (4.50) paper topic 2 (1.86)topic 3 (3.08) paper 2 paper paper 4 paper 6 paper
14 RANKING EXPERTS Taxonomy of Topics Session names Conference Names O’CoMMA Paper Abstracts Index Terms Figure 2: Taxonomy of Topics
15 RANKING EXPERTS Case 1 Single Topic without Taxonomy Traverse all Expertise Profiles Sum impact, (papers topics) Case 2 Single Topic with Taxonomy Traverse all Expertise Profiles Sum impact, (papers topics, subtopics) SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Prevent Expertise Overestimation 1) Map 2) Papers to leaf nodes only
16 RANKING EXPERTS Case 3 Array of Topics without Taxonomy Same as Case 2 Case 4 Array of Topics with Taxonomy Filter input topics Sum impact, (papers topics, subtopics) SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
17 COLLABORATION NETWORKS EXPANSION Geodesic Figure 3: Geodesic Relationships author_A author_1 author_Bauthor_A author_B author_2author_A author_B opus:Article_in_Proceedings_179 opus:Proceedings_543 opus:Article_in_Proceedings_35opus:Article_in_Proceedings_8 author_A STRONG MEDIUMUNKNOWN WEAK opus:author opus:Article_in_Proceedings_291 opus:author opus:Article_in_Proceedings_3 opus:author opus:isIncludedIn opus:author
18 COLLABORATION NETWORKS EXPANSION C-Net Ordering Cluster of Experts Collaboration Strength* * Newman, M. E. J.: Coauthorship Networks and Patterns of Scientific Collaboration. National Academy of Sciences of the United States of America, 1(101): , (2004). coauthor_1 {0.73, 0.5} Super Node {14.80} coauthor_2 {1.81, 1.0} coauthor_3 {0.73, 0.5} coauthor_4 {0.73, 0.5} coauthor_5 {1.54, 1.0} coauthor_n {1.1, 0.8} Figure 3: Geodesic Relationships
19 RESULTS AND EVALUATION Evaluation WWW Search Track (2005/6/7) Input Topics Call For Papers SWETO-DBLP Subset (67,366 authors) DBLP (560,792) Validation Collaboration Networks Expansion SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
20 RESULTS AND EVALUATION Validation Table 1: Past PC Lists comparison with SEMEF 52% % 58% % 65% % 73% % 79% % 82% % 85% % 85% % 85% % 29/3421/2526/2940/48 Total Search Search Average 83 35%10(top) 0-10% Search 2005 Cumulative Percentage in PC List Search Track (Number of PC Members in SEMEF List) Percentage in SEMEF List
21 RESULTS AND EVALUATION Validation Figure 4: Average Number of PC in SEMEF List
22 RESULTS AND EVALUATION Validation Figure 5: Average PC Distribution in SEMEF List
23 RESULTS AND EVALUATION Collaboration Networks Expansion Table 4: PC Chair – SEMEF List Geodesic Relationships WEAK Chair Chair1 Search Chair1 Search 2007 PC List (Number of Expert Relationships) EXTREMELY WEAK MEDIUM STRONG Relationships Chair1 Search Chair Above Average Expertise (in PC) WEAK Chair Chair1 Search Chair1 Search 2007 SEMEF (Number of Expert Relationships) EXTREMELY WEAK MEDIUM STRONG Relationships Chair1 Search Chair Chair2 Above Average Expertise (in PC) Table 3: PC Chair – PC Member Geodesic Relationships
24 CONCLUSION Expertise Profiles Publication Data Publication Impact Statistics Papers-to-Topics Relationships Ranking Experts w/ and w/o Taxonomy Single and Array of Topics Collaboration Networks Expansion Semantic Association Discovery Geodesic C-Nets SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
25 DEMO Web Application Apache Tomcat 6.0 Java Server Pages Ubuntu 7.10 Delroy Cameron Masters Thesis Computer Science, University of Georgia
26 RELATED WORK Particle Swarm Algorithm ExpertiseNets Expertise Browser Experience Atoms Expertise Recommender Change history Tech Support Heuristics Profiling, Identification, Supervisor SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
27 RELATED WORK Web-Based Communities Expert Rank Formal Probabilistic Models Candidate Models Document Models RDF-Matcher SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
28 EXPERTISE PROFILE ALGORITHM Algorithm findExpertiseProfile(researcherURI, list of publications) create ‘empty expertise profile’ foreach paper of researcher do get ‘topics’ list of paper (using papers-to-topics dataset) get ‘publication impact’ if ‘publication impact’ is null do ‘publication impact’ default weight else ‘weight’ ‘publication impact’ + existing ‘weight’ from expertise profile if ‘expertise profile’ contains ‘topic’ do update ‘expertise profile’ with else add pair to ‘expertise profile’ end return ‘expertise profile’ SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
29 RANKING EXPERTS ALGORITHM Algorithm rankValue(researcherURI, list of topics) set expertRank to zero create temp ‘expertise profile’ filter topics foreach topic in filtered topics list do get ‘papers’ for this topic (using papers-to-topics dataset) foreach paper in papers list do if researcher is author do get ‘publication impact’ as ‘weight’ expertRankValue = expertRankValue + ‘publication impact’ add pair to temporary ‘expertise profile’ end if end return ‘rankValue’ SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks