Retrieval of Highly Related Biomedical References by Key Passages of Citations Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Suleyman Cetintas 1, Monica Rogati 2, Luo Si 1, Yi Fang 1 Identifying Similar People in Professional Social Networks with Discriminative Probabilistic.
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
Video retrieval using inference network A.Graves, M. Lalmas In Sig IR 02.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
1 Extending Link-based Algorithms for Similar Web Pages with Neighborhood Structure Allen, Zhenjiang LIN CSE, CUHK 13 Dec 2006.
Data Fusion Eyüp Serdar AYAZ İlker Nadi BOZKURT Hayrettin GÜRKÖK.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
1 PageSim: A Link-based Similarity Measure for the World Wide Web Zhenjiang Lin, Irwin King, and Michael, R., Lyu Computer Science & Engineering, The Chinese.
Scalable Text Mining with Sparse Generative Models
Knowledge Integration for Gene Target Selection Graciela Gonzalez, PhD Juan C. Uribe Contact:
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
1 A Discriminative Approach to Topic- Based Citation Recommendation Jie Tang and Jing Zhang Presented by Pei Li Knowledge Engineering Group, Dept. of Computer.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
No Title, yet Hyunwoo Kim SNU IDB Lab. September 11, 2008.
 CiteGraph: A Citation Network System for MEDLINE Articles and Analysis Qing Zhang 1,2, Hong Yu 1,3 1 University of Massachusetts Medical School, Worcester,
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
Text Mining In InQuery Vasant Kumar, Peter Richards August 25th, 1999.
COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.
How to read a scientific paper
Research Paper Recommender System Monica D ă g ă diţ ă.
Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.
Reduction of Training Noises for Text Classifiers Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Algorithmic Detection of Semantic Similarity WWW 2005.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,
A Knowledge-Based Search Engine Powered by Wikipedia David Milne, Ian H. Witten, David M. Nichols (CIKM 2007)
Reference Collections: Collection Characteristics.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Page Ranking Algorithms for Digital Libraries Submitted By: Shikha Singla MIT-872-2K11 M.Tech(3 rd Sem) Information Technology.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Proximity-based Ranking of Biomedical Texts Rey-Long Liu * and Yi-Chih Huang * Dept. of Medical Informatics Tzu Chi University Taiwan.
Automatic Assignment of Biomedical Categories: Toward a Generic Approach Patrick Ruch University Hospitals of Geneva, Medical Informatics Service, Geneva.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.
Citation-Based Retrieval for Scholarly Publications 指導教授:郭建明 學生:蘇文正 M
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Statistical techniques for video analysis and searching chapter Anton Korotygin.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Automatic selection of references for the creation of a biomedical literature review using citation mapping Houcemeddine Turki Faculty of Medicine of Sfax,
Improving Health Question Classification by Word Location Weights
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Parts of an Academic Paper
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Presentation 王睿.
Human Wellbeing Index.
Citation-based Extraction of Core Contents from Biomedical Articles
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Learning to Rank with Ties
Topic: Semantic Text Mining
Presentation transcript:

Retrieval of Highly Related Biomedical References by Key Passages of Citations Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan

Outline Background Problem definition The proposed technique: KPC Empirical evaluation Conclusion 2

Background 3

Highly Related Biomedical References Two references are highly related if they share similar core contents, including –Goals, methods, and findings 4

5 Selected by biomedical experts for  They are highly related to each other Recommended by PubMed, but not highly related to

Retrieval of the Highly Related References It is essential for biomedical researchers, database managers, and text mining systems –Cross-validation of highly related evidence But it is challenging as well –Difficult to recognize and extract the core contents of the references 6

Problem Definition 7

Goal & Contribution Goal –Given a reference r, retrieve references that are highly related to r Contribution –Developing a technique KPC (Key Passage of Citations) that estimates inter-article similarity based on key passages of out-link citations in each article 8

Basic Idea Out-link citations in an article x can indicate how the core content of x is related to other studies However, two highly related articles may cite different references  The passages that discuss the out- link citations can be used to estimate inter-article similarity 9 x out-link citation in-link citation

Related Work Inter-article similarity based on out-link citations –Example: Bibliographic coupling (BC) –Weakness: two highly related articles may cite different references Inter-article similarity based on in-link citations –Example: Co-citation (CC) and context passages around in-link citations –Weakness: (1) Many articles have very few (or even no) in-link citations; (2) It’s difficult to collect in-link citations 10

The Proposed Technique: KPC 11

Extraction of Key Passages Key passage (KP) of a citation c in an article x consists of two parts –Title of x (for general context of citation) –Text immediately before each position p where c is mentioned in x (for specific context of citation) 12

Estimation of Inter-Article Similarity KPs of Out- link citations in d1 KPs of Out- link citations in d2

14 More specifically, similarity between two articles d1 and d2 = Note: O d1 and O d2 are sets of out-link citations in d1 and d2, respectively, and PM (Passage Matching) between two KPs =

Empirical Evaluation 15

The data Two sets of articles –Highly related biomedical articles: For each gene-disease pair, collect the biomedical articles that biomedical experts selected to annotate the pair (noted by DisGeNET) –Near-miss biomedical articles (Non-highly related articles): For each gene-disease pair, collect articles using two queries: “g NOT d” and “d NOT g” 16

Data statistics –53 gene-disease pairs –10,119 articles, including 53 targets and 10,066 candidates –When considering their out-link citations, 313,571 articles involved in the experiment 17

The Systems to Be Evaluated (1) Linear fusion (BC+KPC): The similarity between two articles was the weighted sum of the similarity values produced by BC and KPC (2) Sequence fusion (BC  KPC): BC was used to rank articles first, and KPC was used only when two articles had the same BC similarity (3) Machine-learning based fusion (BC+KPC_SVM): Support vector machine (SVM) was employed to fuse BC and KPC 18

Evaluation Criterion MAP (Mean Average Precision) –If a ranker can rank higher those articles that are highly related to r, average precision (AP) for the gene-disease pair will be higher. –MAP is simply the average of the AP values for all gene-disease pairs 19

Result Performance difference between BC and KPC was not statistically significant When the integration weight (b) was 0.5 or 0.8, BC+KPC performed significantly better than BC 20

All the integrated systems performed significantly better than BC BC+KPC with integration weight (b) set to 0.5 performed best, however it did not perform statistically better than the other two integrated systems 21

Conclusion 22

It is essential but challenging to retrieve highly related biomedical references Key passages that comment out-link citations can provide helpful information to retrieve highly related articles The idea of KPC should be –fused with other similarity measures, and –incorporated into search engines 23