Rutgers’ HARD Track Experiences at TREC 2004 N.J. Belkin, I. Chaleva, M. Cole, Y.-L. Li, L. Liu, Y.-H. Liu, G. Muresan, C. L. Smith, Y. Sun, X.-J. Yuan,

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Understanding ‘Sharpe Ratio’ – By Prof. Simply Simple TM When you decide to buy a car what is it that you evaluate? Is it only speed? size? fuel efficiency…
Developing and Evaluating a Query Recommendation Feature to Assist Users with Online Information Seeking & Retrieval With graduate students: Karl Gyllstrom,
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Chapter 10 Decision Making © 2013 by Nelson Education.
DY An Information Retrieval Approach based on Discourse Type D. Y. Wang, R. W. P. Luk, K.F. Wong 1 and K.L. Kwok 2 NLDB 2006 Department of.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
A Quality Focused Crawler for Health Information Tim Tang.
Search Engines and Information Retrieval
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Rutgers Information Interaction Lab at TREC 2005: Trying HARD N.J. Belkin, M. Cole, J. Gwizdka, Y.-L. Li, J.-J. Liu, G. Muresan, D. Roussinov*, C.A. Smith,
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Personalizing the Digital Library Experience Nicholas J. Belkin, Jacek Gwizdka, Xiangmin Zhang SCILS, Rutgers University
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, (2014) BERLIN CHEN, YI-WEN CHEN, KUAN-YU CHEN, HSIN-MIN WANG2 AND KUEN-TYNG YU Department of Computer.
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Translingual Topic Tracking with PRISE Gina-Anne Levow and Douglas W. Oard University of Maryland February 28, 2000.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.
Information Retrieval Evaluation and the Retrieval Process.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Implicit Acquisition of Context for Personalization of Information Retrieval Systems Chang Liu, Nicholas J. Belkin School of Communication and Information.
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Information Retrieval Effectiveness of Folksonomies on the World Wide Web P. Jason Morrison.
 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.
Knowledge based Personalization by Wonjung Kim. Outline Introduction Background – InfoQuilt system Personalization in InfoQuilt Related Work Conclusions.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
Algorithmic Detection of Semantic Similarity WWW 2005.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Jhu-hlt-2004 © n.j. belkin 1 Information Retrieval: A Quick Overview Nicholas J. Belkin
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Proximity-based Ranking of Biomedical Texts Rey-Long Liu * and Yi-Chih Huang * Dept. of Medical Informatics Tzu Chi University Taiwan.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Queensland University of Technology
Feature Selection for Ranking
Relevance and Reinforcement in Interactive Browsing
Retrieval Performance Evaluation - Measures
Presentation transcript:

Rutgers’ HARD Track Experiences at TREC 2004 N.J. Belkin, I. Chaleva, M. Cole, Y.-L. Li, L. Liu, Y.-H. Liu, G. Muresan, C. L. Smith, Y. Sun, X.-J. Yuan, X.-M. Zhang H1: Assessors with low familiarity with a topic will find that documents on topic that have high readability are relevant, and those with low readability are not relevant; assessors with high familiarity, vice versa. H2: Assessors with low familiarity with a topic will find that documents on topic with a low number for average syllables per word are relevant, and those with a high number are not relevant; assessors with high familiarity, vice versa. H3: Assessors with low familiarity with a topic will find that documents on topic which have a high ratio of concrete to abstract words are relevant, and those with a low ratio of concrete to abstract terms are not relevant; assessors with high familiarity, vice versa. H4: The subjectivity or objectivity of a document will determine its membership in a genre; modifying the ranks of documents according to how well they fit to the desired genre will increase performance. H5: Classification of documents by subjectivity or objectivity using an SVM procedure will lead to better performance than classification using a linear regression model. H6: Documents of a single genre will have language models characteristic of that genre; the similarity, as measured by Kulbach-Leibler distance between a document’s language model and the relevant genre language model indicates whether that document is a member of the genre; modifying the ranks of documents according to how well the fit to the desired genre will increase performance. H7: Documents of a single genre will include terms characteristic of that genre; re-ranking lists of documents retrieved according to a topically-based query by running a query containing both the topical terms and the relevant genre terms will increase performance. H8: The vocabulary of documents is specific for the geographic area that they refer to. More specifically, US documents can be distinguished from non-US documents. H9: Names of US states and important US cities are an indication of US documents; names of other countries are an indication of non-US documents. GenreFamiliarityGeography Regression model KL distanceQuery expansion Language models Readability Abstract / concrete words Language models Flesch scoresSyllables / word Hypotheses & the various improvement methods used based on metadata Abstract and concrete word frequencies are poor predictors of documents selected for familiarity Higher readability scores have some correlation with low familiarity documents LM genre prediction performs poorly e.g. news stories, opinion-editorial LM geography prediction may be promising HARDSOFT Avg. Prec.Rel10R-Prec.Avg. Prec.Rel10R-Prec. Baseline (SD) (0.1999) (2.789) (0.2133) (0.1841) (3.020) (0.1809) Run 1 (SD) ** (0.2065) 2.644* (3.053) (0.2196) ** (0.1861) 3.133* (3.209) ** (0.1888) Run 2 (SD) (0.1983) (2.852) (0.2080) (0.1746) (3.029) (0.1666) * p<.05, ** p <.01 Evaluation of Two Experimental Runs Goal Combination of Evidence Simple weighted average Z-scores based combination k = confidence_level Use iterative experimental design: test the most important hypotheses first, then extend the experiment. Prefer approaches independent of training set: readability scores, concreteness/abstractness, style features, etc, rather models of the exemplars in the training set. Successful: Document concreteness and document readability for users with low familiarity to the topic. Geography models of non-US documents. Unsuccessful: Genre models based on training topics. Further exploration: Include stopwords in language models; they may capture document style. Language models smoothing based on the entire corpus rather than the training documents. Use training judgments for validating models rather than building models. Conclusions To test techniques for using knowledge about various aspects of the information seeker’s context to improve IR system performance. We were particularly concerned with such knowledge which could be gained through implicit sources of evidence, rather than explicit questioning of the information seeker. Therefore we used topic metadata rather than clarification forms.