Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13.

Similar presentations


Presentation on theme: "Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13."— Presentation transcript:

1 Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

2 Outline Background –Text ranking –Biomedical information needs An approach to enhancing text rankers in the biomedical domain Evaluation Conclusion 2

3 Research Background 3

4 Text Ranking Goal –Given a query q and a set T of texts retrieved for q, ranking those texts (in T) according to their degrees of relevance to q Motivation –Reducing information overload, since T is often quite huge, even a smart search engine is used –Text ranking is a key issue in information retrieval, and often a “secret” component for search engines 4

5 An Example Ranker 5

6 Biomedical Information Need Biomedical research requires relevant evidences in the huge and ever-growing biomedical literature Retrieval of the evidences requires a system that –Accepts a natural language query for a biomedical information need, and –Ranks relevant texts higher for access or processing 6

7 An Example Query: urinary tract infection, criteria for treatment and admission (from OHSUMED) –A disease as the target concept (i.e., urinary tract infection ) –Two concepts about the scenario of the information need (i.e., treatment and admission ) Neither special nor related to any disease 7

8 Contextual Completeness Biomedical queries need to be well-formed, and so call for a retrieval system that considers contextual completeness of each query concept t in the text d –Contextual completeness of t in d is the extent to which the query concepts other than t appear in nearby areas in d 8

9 An Example 9 In children with an acute febrile illness, what is the efficacy of single medication therapy with acetaminophen or ibuprofen in reducing fever? [From Lin & Demner- Fushman, 2006] PICO Task Answer Strength

10 An Approach to Improving Rankers for Biomedical Info Needs 10

11 11 Goals An approach PRE (Proximity-based Ranker Enhancer) that –Measures contextual completeness of query concepts appearing in a nearby area in the text –Serves as a supplement to improve existing rankers

12 12 Contrast with Related Work Biomedical text ranking –Using synonyms and considering diversity of passages, without considering term proximity Text ranking –Individual text scoring techniques (e.g., BM25) and learning to rank techniques (e.g., Ranking SVM), without considering term proximity Improving ranking by term proximity –Term proximity is employed, but contextual completeness was not considered

13 System Overview 13 Text Ranker Development Training Testing Underlying RankerPRE Text Ranking TF in d User Query (q) Text (d) TF (Term Frequency) Assessment Training Data Ranked Texts

14 TF Assessment 14 Three types of term proximity –Overall proximity (QTermTF) –Individual proximity (IndiP) –Collective proximity (CollP) A term t may get a large TF increment in d, if –Many query terms appear frequently in d –Query terms are individually near to t at some places, and –Query terms collectively appear at a place near to t

15 15 RTF(t,d,q) = TF(t,d)+TFincrement(t,d,q) TFincrement(t,d,q) = QtermTF(d, q) IndiP(t, d, q) ×CollP(t, d, q) QtermTF(d,q) = Total TF of query terms in d IndiP(t,d,q) =Σ m  M - { t } SigmoidWeight(Mindist(t,m))/ MaxIndiP Mindist(x,y) = shortest distance between x and y in d SigmoidWeight(dt) = 1/(1+e -( ( |q|-1)-dt) ) CollP(t,d,q) = Max k  K {  m  M - { t } SigmoidWeight(dist(t,k,m))}/MaxCollP, where K is the set positions at which t appears in d dist(t,k,m) = Distance between t (at position k) and m

16 16

17 Empirical Evaluation 17

18 Experimental Data OHSUMED –A popular database of biomedical queries and references – 106 queries –348,566 references –16,140 query-reference pairs Definitively relevant Possibly relevant Not relevant 18

19 TREC Genomics 2006 –28 queries (topics) and 27,999 query-passage pairs Definitively relevant, possibly relevant, and not relevant –13,993 query-reference pairs TREC Genomics 2007 –36 queries and 35,996 query-passage pairs Relevant and not relevant –22,913 query-reference pairs 19

20 Underlying Rankers 20

21 Baseline Ranker Enhancer Three state-of-the-art techniques that enhanced text rankers by term proximity –The t-function: t() [Tao & Zhai, 2007] –The p-function: p() [Cummins & O’Riordan, 2009] –The proximity language model: PLM [Zhao & Yun, 2009] 21

22 Evaluation Criteria Evaluating how relevant references are ranked higher for users to access –Mean average precision (MAP) –Normalized discount cumulative gain at x (NDCG@X) 22

23 Results 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 Conclusion 31

32 Contextual completeness of query concepts in the texts is essential in ranking biomedical texts To measure contextual completeness, it is helpful to integrate three types of term proximity –Overall proximity –Individual proximity –Collective proximity Existing rankers may be comprehensively enhanced 32

33 33 Thank You!


Download ppt "Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13."

Similar presentations


Ads by Google