Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection Boanerges Aleman-Meza, Meenakshi Nagarajan,

Slides:



Advertisements
Similar presentations
Chapter 14: Usability testing and field studies
Advertisements

 Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Extracting and Utilizing.
SciVal Experts & SciVal Funding Information Sessions.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
A Taxonomy-based Model for Expertise Extrapolation Delroy Cameron, Amit P. Sheth Ohio Center for Excellence in Knowledge-enabled Computing (Kno.e.sis)
Software Testing and Quality Assurance
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
INTERPRET MARKETING INFORMATION TO TEST HYPOTHESES AND/OR TO RESOLVE ISSUES. INDICATOR 3.05.
Data Mining.
Integrating data sources on the World-Wide Web Ramon Lawrence and Ken Barker U. of Manitoba, U. of Calgary
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
Scientific method - 1 Scientific method is a body of techniques for investigating phenomena and acquiring new knowledge, as well as for correcting and.
Chapter 3 Preparing and Evaluating a Research Plan Gay and Airasian
Semantic Web Technology Evaluation Ontology (SWETO): A test bed for evaluating tools and benchmarking semantic applications WWW2004 (New York, May 22,
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Hollis Day, MD, MS Susan Meyer, PhD.  Four domains for effective practice outlined in the Interprofessional Education Collaborative’s “Core Competencies.
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
An Intelligent Broker Architecture for Context-Aware Systems A PhD. Dissertation Proposal in Computer Science at the University of Maryland Baltimore County.
Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection Boanerges Aleman-Meza, Meenakshi Nagarajan,
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Chapter 1: Introduction to Statistics
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Search Engines and Information Retrieval Chapter 1.
IMSS005 Computer Science Seminar
CRITICAL APPRAISAL OF SCIENTIFIC LITERATURE
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Experimental Research Methods in Language Learning Chapter 16 Experimental Research Proposals.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
University of Sunderland CIFM03Lecture 2 1 Quality Management of IT CIFM03 Lecture 2.
Information commitments, evaluative standards and information searching strategies in web-based learning evnironments Ying-Tien Wu & Chin-Chung Tsai Institute.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
NATIONAL INSTITUTES OF HEALTH CHALLENGE GRANT APPLICATIONS Dan Hoyt Survey, Statistics, and Psychometrics(SSP) Core Facility March 11, 2009.
Requirements Engineering Methods for Requirements Engineering Lecture-30.
BEHAVIORAL TARGETING IN ON-LINE ADVERTISING: AN EMPIRICAL STUDY AUTHORS: JOANNA JAWORSKA MARCIN SYDOW IN DEFENSE: XILING SUN & ARINDAM PAUL.
Author Name Disambiguation in Medline Vetle I. Torvik and Neil R. Smalheiser August 31, 2006.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
Semantic Clipboard User Interface is integrated in the Browser Architecture of the Semantic Clipboard Illustration of a license incompliant content reuse.
Meenakshi Nagarajan PhD. Student KNO.E.SIS Wright State University Data Integration.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Jed Hassell, Boanerges Aleman-Meza, Budak ArpinarBoanerges Aleman-MezaBudak Arpinar 5 th International Semantic Web Conference Athens, GA, Nov. 5 – 9,
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.
EBM --- Journal Reading Presenter :呂宥達 Date : 2005/10/27.
1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Extracting value from grey literature Processes and technologies for aggregating and analysing the hidden Big Data treasure of the organisations.
An Ontological Approach to Financial Analysis and Monitoring.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Semantic Web in Context Broker Architecture Presented by Harry Chen, Tim Finin, Anupan Joshi At PerCom ‘04 Summarized by Sungchan Park
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
CRITICALLY APPRAISING EVIDENCE Lisa Broughton, PhD, RN, CCRN.
Planning for Instruction and Assessments. Matching Levels Ensure that your level of teaching matches your students’ levels of knowledge and thinking.
Rigor and Transparency in Research
WIS/COLLNET’2016 Nancy, France
Knowledge Discovery in the Semantic Web
E-Commerce Theories & Practices
Using Friendship Ties and Family Circles for Link Prediction
iSRD Spam Review Detection with Imbalanced Data Distributions
[jws13] Evaluation of instance matching tools: The experience of OAEI
CVE.
Presentation transcript:

Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection Boanerges Aleman-Meza, Meenakshi Nagarajan, Cartic Ramakrishnan, Li Ding, Pranam Kolari, Amit P. Sheth, I. Budak Arpinar, Anupam Joshi, Tim Finin LSDIS Lab, Dept, of C.S., Univ. of Georgia Dept. of C.S.E.E., Univ. of Maryland WWW 2006 (Nominated for Best Paper Award)

Outline Introduction. Motivation and Background. Integration of Two Social Networks. COI Detection. Conclusions and Future Work.

Introduction In this paper, we describe a Semantic Web application that detects Conflict of Interest (COI) relationships among potential reviewers and authors of scientific papers. COI is typically known as a situation that may bias a decision. –It can be caused by a variety of factors such as family ties, business or friendship ties. Detection COI is necessary in many situations, such as peer-review of scientific research papers or proposals. –To ensure impartial decisions.

Introduction (cont.) In some cases, it can be difficult to detect COI because of the lack of available information. However, there exists implicit and/or explicit information in the form of social networks. –LinkedIn.com: comprising people from information technology areas, could be used to detect COI in situations such as IPO or company acquisitions. –MySpace.com, Friendster, and Hi5. –… The importance of social network applications is even not only considering the millions of users but also due to the millions of dollars they are worth.

Introduction (cont.) Although social networks can provide data to detect COI, one important problem lies in the lack of integration among sites hosting them. –Moreover, privacy concerns prevent such sites from openly sharing their data. We chose publicly available social network data to address the challenge of COI detection (in the context of peer-paper-review). –The DBLP bibliography provides collaboration network data by virtue of the explicit co-author relationship among authors. –We used a multitude of FOAF (Friend-Of-A-Friend) documents from the Web where the “knows” relationship is explicitly stated (to construct a social network).

Introduction (cont.) We create a populated ontology (network) by integrating entities and relationships from the above two social networks. –Challenges: Entity disambiguation: –DBLP has different entries that in the real world refer to the same person. »“R. Guha” and “Ramanathan V. Guha” –A fundamental challenge in developing Semantic Web applications involving heterogeneous, real-world data. The ontology is used to determine a degree of COI between the reviewers and authors.

Conflict of Interest Detection Problem COI situations should be identified to produce impartial decisions. Many organizations have strict definitions of what constitutes a COI. –The NIH (National Institutes of Health) defines COI in the context of the grant review process as: A Conflict Of Interest (COI) in scientific peer review exists when a reviewer has an interest in a grant or cooperative agreement application or an R&D contract proposal that is likely to bias his or her evaluation of it. A reviewer who has a real conflict of interest with an application or proposal may not participate in its review.

Conflict of Interest Detection Problem (cont.) One major cause for bias is professional or social relationships between potential reviewers and authors of the material to be reviewed. In this paper, we address the problem of COI detection in the context of peer-review processes. –We believe that the techniques presented here are applicable for COI detection in other scenarios as well.

The Peer-Review Process The process is commonly supported by semi-automated tools, such as conference management systems. Typically, one person designated as Program Committee (PC) chair, is in charge of the proper assignment of papers to be reviewed by PC members of a conference. –Assigning papers to reviewers is probably one of the most challenging tasks for the Chair. Conference management systems support this task by relying on reviewers specifying their expertise and/or bidding on papers. –Allow the Chair to modify the assignments. The key is to ensure that there are qualified reviewers for a paper and that they will not have a-priori bias for or against the paper.

The Peer-Review Process (cont.) The assignments rely on the knowledge of Chair about any particular strong social relationships that might cause possible COIs. However, due to the proliferation of interdisciplinary research, the Chair cannot be expected to keep up with the ever-changing landscape of collaborative relationships among researchers. Hence, conference management systems need to help the Chair with the detection of COIs.

The Peer-Review Process (cont.) Contemporary conference management systems support COI detection in different manners. –EDAS checks for COIs based on declarations of possible conflicts by the PC members. –Microsoft Research’s CMT tool allows authors to indicate COIs with reviewers. –Confious automatically detects theses COIs based mainly on “similar s” or “co-authorship” criteria. The “co-authorship” criterion identifies users that have co-authored at least one paper in the past. The straight forward criteria can miss out on COIs as exemplified by one recent case (??). –Co-author in question has a hyphened last name (??).

Online Social Networks A social network is a set of people connected by a set of social relationships. –Friendship, co-working, …, etc. The entity Person is the fundamental concept in online social networks. –With one or several of its properties. Name, ID, …, etc. –Different sources might use different set of properties. –Such heterogeneous contexts and entity identifiers necessitate entity disambiguation.

Online Social Networks (cont.) A link is another important concept in social networks. –Some sources directly provide links among person entities. foaf:knows –Other sources provide links via metadata. co-author meta information of DBLP

Social Networks Analysis Focus on the analysis of patterns of relationships among social entities (e.g., people). –Analysis of networks of criminals. –Finding influential individuals. Our work is fundamentally different than these previous approaches as: –It aims to develop and test an approach in integrating two social networks. –And using semantic association discovery techniques for identification of COI relationships.

Integration of Two Social Networks We bring together a semi-structured semantic social network (FOAF) with a structured social network extracted from the underlying co-authorship network in DBLP. –They were chosen based on: They are representative. Refer to real-world persons. Publicly available. We describe the challenges involved with respect to entity disambiguation that have to be addressed to merge entities across these sources that in real-world refer to the same person.

FOAF & DBLP FOAF: –The Friend of a Friend (FOAF) data source is created independently by many authors to publish information about themselves and their social relationships. foaf:name. foaf:knows. DBLP: –A fixed structure to identify persons by their names. –Persons are associated by co-author relationships (fixed structure).

FOAF & DBLP (cont.)

Data Cleaning We started with a set of authors of papers in the 2004 and earlier international Semantic Web Conference and the Program committee members of these conferences. The set of people and their friends are likely to publish their personal profiles in FOAF and their names usually also appear in DBLP.

Data Cleaning (cont.) DBLP-SW: –We collected 38,027 person entities that have up to three hops of social distance from those persons in Semantic Web (SW) conferences. FOAF-EDU: –We first used the value of foaf:name to perform data cleaning. For those containing special characters (i.e., ‘,’ ‘?’) –To identify persons whose FOAF documents residing on ‘edu’ websites. –21,308 persons

Entity Disambiguation The goal is to find entities that might have multiple references in DBLP and/or FOAF that refer to the same entity in real-life. We adapted a name-reconciliation algorithm [SIGMOD 2005]. –It employs a rigorous form of semantic similarity defined as a combination of the similarity between attributes. The similarity of their names and affiliations. The number of common co-author relationships. Weights are manually assigned to the attributes based on aspects such as the number of entities that have values for certain attributes.

Entity Disambiguation (cont.) We found that these weights and merge thresholds were quite effective through several experiments. The output of the disambiguation algorithm populates two result sets: –A “sameAs” set. Entity pairs identified as the same entity. 633 pairs. –An “ambiguous” set. Entity pairs having a good probability of being the same but without sufficient information to be reconciled with certainty. 6,347 pairs.

Entity Disambiguation (cont.) The lack of a gold standard prevented us from using precision and recall metrics. We measured statistics of false positives and false negatives by manually inspecting random samples of entity pairs from both the sameAs set and the ambiguous set. –We picked 6 random samples, each having 50 entity pairs. –False positive: in the sameAs set. –False negative: in the ambiguous set. The false negatives in any ambiguous set will be between 2.8% and 7.8%. The false positives will be between 0.3% and 0.9%.

Entity Disambiguation (cont.) Reasons for false negatives: –Entity pairs under comparison had a high similarity in atomic attribute values but had very few association attribute matches. –A pair of entities had very few attributes for comparison, but had a high match in the most semantically relevant attributes. Their threshold (similarity) was not high enough for them to be reconciled.

Entity Disambiguation (cont.) We concluded based on experiments, that altering the weights and thresholds alone did not improve the results. The task of entity disambiguation is very difficult.

COI Detection Levels of COI: –Definite COI: a reviewer is one of the listed authors in a paper to be reviewed. –High potential COI: the existence of close or strong relationships among an author of a submitted paper and a reviewer. –Medium potential COI: a reviewer and an author of a paper to be reviewed have close relationships with a third party. E.g., the same PhD advisor. –Low potential COI: Weak of distant relationships between a reviewer and an author. Can be ignored.

COI Detection (cont.) Weighting Relationships for COI Detection: –The relationship foaf:knows is used to explicitly list the person that are known to someone. The assertion (A knows B) is usually subjective and imperfect. We assigned a weight of 0.5 to all 34,824 foaf:knows relationships in the FOAF-EDU dataset. –The second type of relationship is the co-author relationship. A good indicator for collaboration among authors. We used the ratio of number of co-authored publications vs. total of his/her publications as the weight for the co-author relationship. Asymmetric.

COI Detection (cont.) Detection of COI: –Analyze how two persons are connected by direct relationships or through sequences of relationships. –Our previous work on discovery of “semantic associations” is directly applicable for COI detection. –A semantic association is basically a path (of relationships) between entities. We find semantic associations containing up to 3 relationships. –For two entities (one reviewer, one author), we find all semantic associations (paths) between them. ar

COI Detection (cont.) The following cases are considered: –Reviewer and author are directly related (through foaf:knows and/or co-author). High potential COI for at least one relationship having weight >= 0.3. Medium potential COI for at least one relationship having weight in [ ). Low potential COI for at least one relationship having weight < 0.1 –Reviewer and author are not directly related but they are directly related to (at least) one common person (intermediary). Medium potential COI: –There are many (i.e., 10) such intermediaries in common. –The relationships connecting to the intermediary have weight >= 0.3. Low potential COI: –Other cases ra i

COI Detection (cont.) –Reviewer and author are indirectly related through a semantic association containing three relationships. The collaborators (or friends) of the reviewer and author have some tie. The level is Low potential COI. ra ii

Experimental Results We selected a subset of papers and reviewers from 2004 International World Wide Web Conference. –The scenario included a subset of 15 PC members of the Semantic Web Track and 10 of the accepted papers having topics related to such track. –We compared our application with the COI detection of the Confious conference management system. Utilizes first and last names to identify at least one co-authored paper in the past (between reviewers and authors).

Experimental Results (cont.)

Confious misses COI situations that our application does not miss because ambiguous entities in DBLP are reconciled in our approach. Our approach provides detailed information such as the level of potential COI as well as the cause. The results of our approach are enhanced by the relationships coming from the FOAF social network. –However, in cases of Table 6 there was no situation of two persons having a foaf:knows relationship and not having co- author relationships between them.

Experimental Results (cont.) We manually verified the COI assessments in Table 6. While in most cases our approach validated very well, very few cases did not. –False-negatives caused by lack of information. FOAF documents are not updated (latest). –Co-editing rather than co-authorship. We believe that the assessment should still be that of “low” level of potential COI.

Conclusions and Future Work We described how our approach for COI detection is based on semantic technologies techniques and provided an evaluation of its applicability using an integrated social network from the FOAF social network and the DBLP co-authorship network. –We provided details on how these networks were integrated. A demo of the application is available (lsdis.cs.uga.edu/projects/semdis/coi)