Presentation is loading. Please wait.

Presentation is loading. Please wait.

Survey Jaehui Park 2008. 07. 17.. Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.

Similar presentations


Presentation on theme: "Survey Jaehui Park 2008. 07. 17.. Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested."— Presentation transcript:

1 Survey Jaehui Park 2008. 07. 17.

2 Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested in Issues in Information Retrieval – About crawling, indexing, searching and ranking methods How to process multi-term queries in information retrieval environments – Ex) Today US Today Today Weather Paris Today Weather -> Multi-term queries express more complex information need than single queries. 2

3 Copyright  2008 by CEBT Main Topic  Long Queries in Keyword Search  Keywords: – Compound query, Evidence Combination, Phrasal Query, Multi-term Query, Multiple Keyword Search, Multiword Unit, and so on.  Issues proximity or distance syntactic structure (order) semantic NLP remedies … 3

4 Copyright  2008 by CEBT Proximity  An intuitive concept for processing multiple term queries  Readings Term Proximity Scoring for Keyword-Based Retrieval Systems – [ECIR 2003] Yves Rasolofo and Jacques Savoy Efficiency vs. Effectiveness in Terabyte-Scale Information Retrieval – [TREC 2005] Stefan Buttcher and Charles L. A. Clarke Efficient Text Proximity Search – [SPIRE 2007] Ralf Schenkel, et al. Why Bigger Windows Are Better Than Smaller Ones – [TR-UM 1997] Ron Papka and James Allan … 4

5 Term Proximity Scoring for Keyword-Based Retrieval Systems Yves Rasolofo and Jacques Savoy European Colloquium on IR Research(ECIR) 2003, LNCS 2633 2008. 07. 17. Presented by Jaehui Park

6 Copyright  2008 by CEBT Introduction  Phrase, term proximity or term distance in IR Focus on adding a word pair scoring module Okapi probabilistic model + proximity measurement  Previous work Salton & McGil [1983] – Generating statistical phrases based on word co-occurrence Fagan [1987] – Considering syntactic relation or syntactic structures Mitra et al. [1997] – “Once a good basic ranking scheme is used, the use of phrases do not have a major effect on precision at high ranks” Arampatzis et al.[2000] – The lack of success when using NLP technique in IR Hawking & Thistlewaite [1996] – The use of proximity scoring within the PADRE system (Z-mode method) 6

7 Copyright  2008 by CEBT Okapi  Okapi [Robertson & Spark Jones 1976] Document ranking function according to their relevance to a given search query based on the probabilistic retrieval model Considering – Term frequency – Document length The weight for a given term t i in document d 7

8 Copyright  2008 by CEBT Okapi  Okapi [Robertson & Spark Jones 1976] (continued) The weight for the term t i within a query The retrieval status value (for a document according to a query) 8

9 Copyright  2008 by CEBT Term Proximity Weighting  Improving retrieval performance by using term proximity scoring  Assumption If a document contains sentences having at least two query terms within them, the probability that this document will be relevant must be greater. The closer are the query terms, the higher is the relevance probability.  Objective Assigning more importance to those keywords having a short distance between their occurrences. 9

10 Copyright  2008 by CEBT Term Proximity Weighting  1. expand the request(query) using keyword pairs extracted from the query’s wording  2. compute a term pair instance weight “information retrieval “ : 1.0 “the retrieval of medical information” : 0.11 (1/9) 10

11 Copyright  2008 by CEBT Term Proximity Weighting  3. sum all the corresponding term pairs  4. compute the contribution of all occurring term pairs in the document  5. compute the final retrieval status value 11

12 Copyright  2008 by CEBT Experiments  Test Collections TREC-8 document (528,155 docs) – Financial Times, Federal Register, Foreign Broadcast Information Service, LA Times TREC-9, TREC-10 (1,692,096 docs) 12

13 Copyright  2008 by CEBT Experiments  Evaluation 13

14 Copyright  2008 by CEBT Experiments  Evaluation 14

15 Copyright  2008 by CEBT Experiments  Evaluation 15

16 Copyright  2008 by CEBT Conclusion  The impact of a new term proximity algorithm on retrieval effectiveness for keyword-based system was examined. Improve ranking for documents having query term pairs occurring within a given distance constraint.  The term proximity scoring approach Improve precision after retrieving a few documents 16


Download ppt "Survey Jaehui Park 2008. 07. 17.. Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested."

Similar presentations


Ads by Google