Proximity-based Ranking of Biomedical Texts Rey-Long Liu * and Yi-Chih Huang * Dept. of Medical Informatics Tzu Chi University Taiwan.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Site Level Noise Removal for Search Engines André Luiz da Costa Carvalho Federal University of Amazonas, Brazil Paul-Alexandru Chirita L3S and University.
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
Exploring Reduction for Long Web Queries Niranjan Balasubramanian, Giridhar Kuamaran, Vitor R. Carvalho Speaker: Razvan Belet 1.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Active Learning and Collaborative Filtering
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
The Marathi Portal with a Search Engine Center for Indian Language Technology Solutions, IIT Bombay.
Evaluating Search Engine
Retrieving Actions in Group Contexts Tian Lan, Yang Wang, Greg Mori, Stephen Robinovitch Simon Fraser University Sept. 11, 2010.
Video retrieval using inference network A.Graves, M. Lalmas In Sig IR 02.
Rutgers’ HARD Track Experiences at TREC 2004 N.J. Belkin, I. Chaleva, M. Cole, Y.-L. Li, L. Liu, Y.-H. Liu, G. Muresan, C. L. Smith, Y. Sun, X.-J. Yuan,
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Chapter 5: Information Retrieval and Web Search
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Personalisation Seminar on Unlocking the Secrets of the Past: Text Mining for Historical Documents Sven Steudter.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
1 Text Classification for Healthcare Information Support Rey-Long Liu ( 劉瑞瓏 ) Dept. of Medical Informatics Tzu Chi University, Taiwan.
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
Chapter 6: Information Retrieval and Web Search
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.
Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.
Reduction of Training Noises for Text Classifiers Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Summarization of XML Documents K Sarath Kumar. Outline I.Motivation II.System for XML Summarization III.Ranking Model and Summary Generation IV.Example.
Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Retrieval of Highly Related Biomedical References by Key Passages of Citations Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
1 A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao and ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,
Intelligent DataBase System Lab, NCKU, Taiwan Josh Jia-Ching Ying, Eric Hsueh-Chan Lu, Wen-Ning Kuo and Vincent S. Tseng Institute of Computer Science.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.
Basics of Databases and Information Retrieval1 Databases and Information Retrieval Lecture 1 Basics of Databases and Information Retrieval Instructor Mr.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Key Blog Distillation: Ranking Aggregates Presenter : Yu-hui Huang Authors :Craig Macdonald, Iadh Ounis.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Queensland University of Technology
An Efficient Algorithm for Incremental Update of Concept space
Improving Health Question Classification by Word Location Weights
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Learning to Rank Shubhra kanti karmaker (Santu)
Accounting for the relative importance of objects in image retrieval
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Citation-based Extraction of Core Contents from Biomedical Articles
Feature Selection for Ranking
Dynamic Category Profiling for Text Filtering and Classification
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
INF 141: Information Retrieval
Learning to Rank with Ties
Information Retrieval and Web Design
Introduction to Search Engines
Presentation transcript:

Proximity-based Ranking of Biomedical Texts Rey-Long Liu * and Yi-Chih Huang * Dept. of Medical Informatics Tzu Chi University Taiwan

Outline Research background Problem definition The proposed approach: PRE Empirical evaluation Conclusion A Proximity-based Ranker Enhancer2

Research Background A Proximity-based Ranker Enhancer3

Biomedical Information Need Biomedical research requires relevant evidences in the huge and ever-growing biomedical literature Retrieval of the evidences requires a system that –Accepts a natural language query for a biomedical information need, and –Ranks relevant texts higher for access or processing A Proximity-based Ranker Enhancer4

An Example Info Need Query: urinary tract infection, criteria for treatment and admission (from OHSUMED) –A disease as the target concept (i.e., urinary tract infection ) –Two concepts about the scenario of the information need (i.e., treatment and admission ) Neither special nor related to any disease A Proximity-based Ranker Enhancer5

Problem Definition A Proximity-based Ranker Enhancer6

Goals Explore how text rankers may be improved by considering the completeness of query concepts appearing in a nearby area of the text being ranked Develop a technique PRE (Proximity-based Ranker Enhancer) that –Measures contextual completeness of query concepts appearing in a nearby area in the text –Serves as a supplement to improve existing rankers A Proximity-based Ranker Enhancer7

Related Work Biomedical text ranking –Using synonyms and considering diversity of passages, without considering term proximity Text ranking –Individual text scoring techniques (e.g., BM25) and learning to rank techniques (e.g., Ranking SVM), without considering term proximity Improving ranking by term proximity –Term proximity is employed, but contextual completeness was not considered A Proximity-based Ranker Enhancer 8

The Proposed Approach: PRE A Proximity-based Ranker Enhancer9

System Overview A Proximity-based Ranker Enhancer10 Text Ranker Development Training Testing Underlying RankerPRE Text Ranking TF in d User Query (q) Text (d) TF (Term Frequency) Assessment Training Data Ranked Texts

TF Assessment A Proximity-based Ranker Enhancer11 Three types of term proximity –Overall proximity (QTermTF) –Individual proximity (IndiP) –Collective proximity (CollP) A term t may get a large TF increment in d, if –Many query terms appear frequently in d –Query terms are individually near to t at some places, and –Query terms collectively appear at a place near to t

A Proximity-based Ranker Enhancer12 RTF(t,d,q) = TF(t,d)+TFincrement(t,d,q) TFincrement(t,d,q) = QtermTF(d, q) IndiP(t, d, q) ×CollP(t, d, q) QtermTF(d,q) = Total TF of query terms in d IndiP(t,d,q) =Σ m  M - { t } SigmoidWeight(Mindist(t,m))/ MaxIndiP Mindist(x,y) = shortest distance between x and y in d SigmoidWeight(dt) = 1/(1+e -( ( |q|-1)-dt) ) CollP(t,d,q) = Max k  K {  m  M - { t } SigmoidWeight(dist(t,k,m))}/MaxCollP, where K is the set positions at which t appears in d dist(t,k,m) = Distance between t (at position k) and m

Empirical Evaluation A Proximity-based Ranker Enhancer13

Experimental Data OHSUMED –A popular database of biomedical queries and references – 106 queries –348,566 references –16,140 query-reference pairs Definitively relevant Possibly relevant Not relevant A Proximity-based Ranker Enhancer14

Underlying Rankers A Proximity-based Ranker Enhancer15

Baseline Ranker Enhancer Three state-of-the-art techniques that enhanced text rankers by term proximity –The t-function t() by [Tao & Zhai, 2007] –The p-function p() by [Cummins & O’Riordan, 2009] –The proximity language model PLM by [Zhao & Yun, 2009]. A Proximity-based Ranker Enhancer16

Evaluation Criteria Evaluating how relevant references are ranked higher for users to access –Mean average precision (MAP) –Normalized discount cumulative gain at x A Proximity-based Ranker Enhancer17

Results A Proximity-based Ranker Enhancer18

A Proximity-based Ranker Enhancer19

Conclusion A Proximity-based Ranker Enhancer20

Term proximity may be comprehensively applied to improving various kinds of text rankers It is helpful to integrate three types of term proximity –Overall proximity –Individual proximity –Collective proximity Term proximity information may be encoded to re-assess TF of each term A Proximity-based Ranker Enhancer21