User Errors in Formulating Queries and IR Techniques to Overcome Them Birger Larsen Information Interaction and Information Architecture Royal School of.

Slides:



Advertisements
Similar presentations
FOR PROFESSIONAL OR ACADEMIC PURPOSES September 2007 L. Codina. UPF Interdisciplinary CSIM Master Online Searching 1.
Advertisements

GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Modern Information Retrieval Chapter 1: Introduction
Multilingual Text Retrieval Applications of Multilingual Text Retrieval W. Bruce Croft, John Broglio and Hideo Fujii Computer Science Department University.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Modern Information Retrieval Chapter 1: Introduction
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
INFO 624 Week 3 Retrieval System Evaluation
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Information retrieval: overview. Information Retrieval and Text Processing Huge literature dating back to the 1950’s! SIGIR/TREC - home for much of this.
Creating Knowledge V, 2008 A search thesaurus for the domain of linguistics Creating a domain specific search tool on the basis of user behaviour study.
Overview of Search Engines
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Personalization of the Digital Library Experience: Progress and Prospects Nicholas J. Belkin Rutgers University, USA
Search Engines and Information Retrieval Chapter 1.
Applying Text Classification in Conference Management: Some Lessons Learned Andreas Pesenhofer, Helmut Berger, Michael Dittenbach, Andreas Rauber.
CONCLUSION & FUTURE WORK Normally, users perform triage tasks using multiple applications in concert: a search engine interface presents lists of potentially.
1 Intra- and interdisciplinary cross- concordances for information retrieval Philipp Mayr GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany.
The Cognitive Perspective in Information Science Research Anthony Hughes Kristina Spurgin.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Modern Information Retrieval Computer engineering department Fall 2005.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Personal Information Management Vitor R. Carvalho : Personalized Information Retrieval Carnegie Mellon University February 8 th 2005.
Proposal for Term Project J. H. Wang Mar. 2, 2015.
The Internet 8th Edition Tutorial 4 Searching the Web.
D AFFODIL Strategic Support Evaluated Claus-Peter Klas Norbert Fuhr Andre Schaefer University of Duisburg-Essen.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
인지구조기반 마이닝 소프트컴퓨팅 연구실 박사 2 학기 박 한 샘 2006 지식기반시스템 응용.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Information Retrieval in Context of Digital Libraries - or DL in Context of IR Peter Ingwersen Royal School of LIS Denmark –
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Recuperação de Informação Cap. 01: Introdução 21 de Fevereiro de 1999 Berthier Ribeiro-Neto.
National Technical University of Ukraine “Kiev Polytechnic Institute” Heat and energy design faculty Department of automation design of energy processes.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
L&I SCI 110: Information science and information theory Instructor: Xiangming(Simon) Mu Sept. 9, 2004.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
UOS Personalized Search Zhang Tao 장도. Zhang Tao Data Mining Contents Overview 1 The Outride Approach 2 The outride Personalized Search System 3 Testing.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
WHIM- Spring ‘10 By:-Enza Desai. What is HCIR? Study of IR techniques that brings human intelligence into search process. Coined by Gary Marchionini.
Human Computer Interaction Lecture 21 User Support
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Human Computer Interaction Lecture 21,22 User Support
Proposal for Term Project
Information Retrieval
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Chapter 11 user support.
Recuperação de Informação
Presentation transcript:

User Errors in Formulating Queries and IR Techniques to Overcome Them Birger Larsen Information Interaction and Information Architecture Royal School of Library and Information Science Copenhagen, Denmark

2 Outline Searching patents Information transfer model Search problems Possible solutions Query errors Proactive and contextual feedback Automatic result classification Clustering and visualization Conclusions

3 Cognitive communication system at a given point in time Recipient World Model World Model Problem Space State of Uncertainty Current Cognitive State Perceived object Signs (Patent text) Information Context Situation B Context Situation A Transformation Interaction Cognitive free fall Information processing stages Interpretation Cognitive-Emotional Level of System Linguistic Level of System Documents (patents) Generator From Ingwersen & Järvelin (2005): The Turn, p. 33

4 Search problems Basically hard to do good, comprehensive searches Especially in documents as complex as patents Most operational systems are Boolean (exact match) Great power but many pitfalls; training needed Best match (ranking) systems are available, but mostly for end users May not be adapted very well to patents or take advantage of their special characteristic and potentials Often lacks the power of Boolean searching

5 Patent searching problems Missing and erroneous data Many useful fields, but not all required or entered correctly Differences across agencies Patent authors may actively try to hide important facts… Investigators may deal with quite different subject matter from task to task Limited domain knowledge Problems getting an overview of a given topic Inventiveness, creativity and care needed

6 Solutions Handling query errors Low-level spell checks may reduce errors significantly e.g., Google's “Did you mean …” More advanced error detection techniques may be implemented Can draw on past searcher behaviour, query logs, document and database data, including field specific information Google Suggest Amazon patented approach Proactive search support

Example Amazon’s query correction Large proportion of erroneous queries Wants to give an answer anyway Use contextual user data to correct typos etc. Non-matching terms in multi-term queries are compared to any terms co-occurring with matching terms in the query log Non-matching terms are replaced and used in the query Draws on the power on millions of past queries (Very probably plays a large role in major web search engines) Can be extended to include corpus data and temporal aspects Might be extended to identify typos/mis-entries in documents At indexing time or interactively at search time Based on US patent

8 Solutions Boolean and Best Match integration Best match and Boolean already integrated internally in several IR models and systems E.g., InQuery/Lemur based on inference networks Challenge to design user-friendly and flexible ways of formulating queries with both perspectives Other major IR techniques Relevance Feedback Latent Semantic Analysis Rajashekar & Croft (1995) Rocchio (1971) Dumais (2004)

9 Solutions Proactive and contextual feedback Context aware solutions that attempt to give situation specific advice or present additional options Indicate potential query errors (typos and syntax) Suggest additional search terms Suggest useful actions or moves, e.g., propose co-authors to already entered authors Draw on knowledge about typical tasks, semantic tools, corpus and log data The right support at the right time From Schaefer et al. (2005)

10 Solutions Automatic result classification Partition large result sets  better overview Apply various text classification techniques on the full text Use patent classification Indicate relevant parts of patents Structured document retrieval (e.g., INEX) Combine with semantic knowledge of patent composition

11 Solutions Clustering and visualization Cluster and visualize large amounts of patents on the fly Provide better overview Challenges in implementation e.g., labels From the ‘Aureka’ system © Thomson Scientific

12 Conclusions Patent search problems Complex documents, data and query errors, vocabulary mismatch, information overload Many existing IR techniques can be adapted and combined to alleviate these Make use of patent characteristics, e.g., structure and fields Challenge to combine these into integrated systems and useful interfaces Input needed from industry partners Tasks, search problems, data deficiencies, query logs, test persons and test cases

13 References Dumais, S. (2004). Latent Semantic Analysis. In: Cronin, E.B, ed., Annual Review of Information Science and Technology, vol. 38, 2004, Ingwersen, P. and Järvelin, K. (2005): The Turn - Integration of informAtion Seeking and Retrieval in Context. Springer. xiv, 448 p. (The Information Retrieval Series ; 18) Otega, R.E. & Bowman, D.E. (2002): System and Method for Correcting Spelling Errors in Search Queries Using both Matching and Non-matching Terms. US patent Rajashekar, T. B. and Croft, W. B. (1995): Combining Automatic and Manual Index Representations in Probabilistic Retrieval. Journal of the American Society for Information Science, 46(4), Rocchio, J. J. (1971): Relevance feedback in information retrieval. In: Salton, G. ed. The SMART retrieval system : experiments in automatic document processing. Englewood Cliffs, NJ: Prentice Hall, p (Prentice-Hall series in automatic computation) Schaefer, A., Jordan, M., Klas, C.-P. & Fuhr, N. (2005): Active Support for Query Formulation in Virtual Digital Libraries: a case study with DAFFODIL. In: Rauber, A., Christodoulakis, S. & Toja, A. M. eds. Research and Advanced Technology for Digital Libraries, 9th European Conference, ECDL 2005, Vienna, Austria, September 18-23, 2005, Proceedings. Berlin: Springer, (LNCS 3652)