Download presentation
Presentation is loading. Please wait.
Published byGeorge Bryan Reeves Modified over 9 years ago
1
Retrieval of Highly Related Biomedical References by Key Passages of Citations Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
2
Outline Background Problem definition The proposed technique: KPC Empirical evaluation Conclusion 2
3
Background 3
4
Highly Related Biomedical References Two references are highly related if they share similar core contents, including –Goals, methods, and findings 4
5
5 Selected by biomedical experts for They are highly related to each other Recommended by PubMed, but not highly related to
6
Retrieval of the Highly Related References It is essential for biomedical researchers, database managers, and text mining systems –Cross-validation of highly related evidence But it is challenging as well –Difficult to recognize and extract the core contents of the references 6
7
Problem Definition 7
8
Goal & Contribution Goal –Given a reference r, retrieve references that are highly related to r Contribution –Developing a technique KPC (Key Passage of Citations) that estimates inter-article similarity based on key passages of out-link citations in each article 8
9
Basic Idea Out-link citations in an article x can indicate how the core content of x is related to other studies However, two highly related articles may cite different references The passages that discuss the out- link citations can be used to estimate inter-article similarity 9 x out-link citation in-link citation
10
Related Work Inter-article similarity based on out-link citations –Example: Bibliographic coupling (BC) –Weakness: two highly related articles may cite different references Inter-article similarity based on in-link citations –Example: Co-citation (CC) and context passages around in-link citations –Weakness: (1) Many articles have very few (or even no) in-link citations; (2) It’s difficult to collect in-link citations 10
11
The Proposed Technique: KPC 11
12
Extraction of Key Passages Key passage (KP) of a citation c in an article x consists of two parts –Title of x (for general context of citation) –Text immediately before each position p where c is mentioned in x (for specific context of citation) 12
13
Estimation of Inter-Article Similarity 13... KPs of Out- link citations in d1 KPs of Out- link citations in d2
14
14 More specifically, similarity between two articles d1 and d2 = Note: O d1 and O d2 are sets of out-link citations in d1 and d2, respectively, and PM (Passage Matching) between two KPs =
15
Empirical Evaluation 15
16
The data Two sets of articles –Highly related biomedical articles: For each gene-disease pair, collect the biomedical articles that biomedical experts selected to annotate the pair (noted by DisGeNET) –Near-miss biomedical articles (Non-highly related articles): For each gene-disease pair, collect articles using two queries: “g NOT d” and “d NOT g” 16
17
Data statistics –53 gene-disease pairs –10,119 articles, including 53 targets and 10,066 candidates –When considering their out-link citations, 313,571 articles involved in the experiment 17
18
The Systems to Be Evaluated (1) Linear fusion (BC+KPC): The similarity between two articles was the weighted sum of the similarity values produced by BC and KPC (2) Sequence fusion (BC KPC): BC was used to rank articles first, and KPC was used only when two articles had the same BC similarity (3) Machine-learning based fusion (BC+KPC_SVM): Support vector machine (SVM) was employed to fuse BC and KPC 18
19
Evaluation Criterion MAP (Mean Average Precision) –If a ranker can rank higher those articles that are highly related to r, average precision (AP) for the gene-disease pair will be higher. –MAP is simply the average of the AP values for all gene-disease pairs 19
20
Result Performance difference between BC and KPC was not statistically significant When the integration weight (b) was 0.5 or 0.8, BC+KPC performed significantly better than BC 20
21
All the integrated systems performed significantly better than BC BC+KPC with integration weight (b) set to 0.5 performed best, however it did not perform statistically better than the other two integrated systems 21
22
Conclusion 22
23
It is essential but challenging to retrieve highly related biomedical references Key passages that comment out-link citations can provide helpful information to retrieve highly related articles The idea of KPC should be –fused with other similarity measures, and –incorporated into search engines 23
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.