Proximity-based Ranking of Biomedical Texts Rey-Long Liu * and Yi-Chih Huang * Dept. of Medical Informatics Tzu Chi University Taiwan
Outline Research background Problem definition The proposed approach: PRE Empirical evaluation Conclusion A Proximity-based Ranker Enhancer2
Research Background A Proximity-based Ranker Enhancer3
Biomedical Information Need Biomedical research requires relevant evidences in the huge and ever-growing biomedical literature Retrieval of the evidences requires a system that –Accepts a natural language query for a biomedical information need, and –Ranks relevant texts higher for access or processing A Proximity-based Ranker Enhancer4
An Example Info Need Query: urinary tract infection, criteria for treatment and admission (from OHSUMED) –A disease as the target concept (i.e., urinary tract infection ) –Two concepts about the scenario of the information need (i.e., treatment and admission ) Neither special nor related to any disease A Proximity-based Ranker Enhancer5
Problem Definition A Proximity-based Ranker Enhancer6
Goals Explore how text rankers may be improved by considering the completeness of query concepts appearing in a nearby area of the text being ranked Develop a technique PRE (Proximity-based Ranker Enhancer) that –Measures contextual completeness of query concepts appearing in a nearby area in the text –Serves as a supplement to improve existing rankers A Proximity-based Ranker Enhancer7
Related Work Biomedical text ranking –Using synonyms and considering diversity of passages, without considering term proximity Text ranking –Individual text scoring techniques (e.g., BM25) and learning to rank techniques (e.g., Ranking SVM), without considering term proximity Improving ranking by term proximity –Term proximity is employed, but contextual completeness was not considered A Proximity-based Ranker Enhancer 8
The Proposed Approach: PRE A Proximity-based Ranker Enhancer9
System Overview A Proximity-based Ranker Enhancer10 Text Ranker Development Training Testing Underlying RankerPRE Text Ranking TF in d User Query (q) Text (d) TF (Term Frequency) Assessment Training Data Ranked Texts
TF Assessment A Proximity-based Ranker Enhancer11 Three types of term proximity –Overall proximity (QTermTF) –Individual proximity (IndiP) –Collective proximity (CollP) A term t may get a large TF increment in d, if –Many query terms appear frequently in d –Query terms are individually near to t at some places, and –Query terms collectively appear at a place near to t
A Proximity-based Ranker Enhancer12 RTF(t,d,q) = TF(t,d)+TFincrement(t,d,q) TFincrement(t,d,q) = QtermTF(d, q) IndiP(t, d, q) ×CollP(t, d, q) QtermTF(d,q) = Total TF of query terms in d IndiP(t,d,q) =Σ m M - { t } SigmoidWeight(Mindist(t,m))/ MaxIndiP Mindist(x,y) = shortest distance between x and y in d SigmoidWeight(dt) = 1/(1+e -( ( |q|-1)-dt) ) CollP(t,d,q) = Max k K { m M - { t } SigmoidWeight(dist(t,k,m))}/MaxCollP, where K is the set positions at which t appears in d dist(t,k,m) = Distance between t (at position k) and m
Empirical Evaluation A Proximity-based Ranker Enhancer13
Experimental Data OHSUMED –A popular database of biomedical queries and references – 106 queries –348,566 references –16,140 query-reference pairs Definitively relevant Possibly relevant Not relevant A Proximity-based Ranker Enhancer14
Underlying Rankers A Proximity-based Ranker Enhancer15
Baseline Ranker Enhancer Three state-of-the-art techniques that enhanced text rankers by term proximity –The t-function t() by [Tao & Zhai, 2007] –The p-function p() by [Cummins & O’Riordan, 2009] –The proximity language model PLM by [Zhao & Yun, 2009]. A Proximity-based Ranker Enhancer16
Evaluation Criteria Evaluating how relevant references are ranked higher for users to access –Mean average precision (MAP) –Normalized discount cumulative gain at x A Proximity-based Ranker Enhancer17
Results A Proximity-based Ranker Enhancer18
A Proximity-based Ranker Enhancer19
Conclusion A Proximity-based Ranker Enhancer20
Term proximity may be comprehensively applied to improving various kinds of text rankers It is helpful to integrate three types of term proximity –Overall proximity –Individual proximity –Collective proximity Term proximity information may be encoded to re-assess TF of each term A Proximity-based Ranker Enhancer21