Presentation is loading. Please wait.

Presentation is loading. Please wait.

User Errors in Formulating Queries and IR Techniques to Overcome Them Birger Larsen Information Interaction and Information Architecture Royal School of.

Similar presentations


Presentation on theme: "User Errors in Formulating Queries and IR Techniques to Overcome Them Birger Larsen Information Interaction and Information Architecture Royal School of."— Presentation transcript:

1 User Errors in Formulating Queries and IR Techniques to Overcome Them Birger Larsen Information Interaction and Information Architecture Royal School of Library and Information Science Copenhagen, Denmark

2 2 Outline Searching patents Information transfer model Search problems Possible solutions Query errors Proactive and contextual feedback Automatic result classification Clustering and visualization Conclusions

3 3 Cognitive communication system at a given point in time Recipient World Model World Model Problem Space State of Uncertainty Current Cognitive State Perceived object Signs (Patent text) Information Context Situation B Context Situation A Transformation Interaction Cognitive free fall Information processing stages Interpretation Cognitive-Emotional Level of System Linguistic Level of System Documents (patents) Generator From Ingwersen & Järvelin (2005): The Turn, p. 33

4 4 Search problems Basically hard to do good, comprehensive searches Especially in documents as complex as patents Most operational systems are Boolean (exact match) Great power but many pitfalls; training needed Best match (ranking) systems are available, but mostly for end users May not be adapted very well to patents or take advantage of their special characteristic and potentials Often lacks the power of Boolean searching

5 5 Patent searching problems Missing and erroneous data Many useful fields, but not all required or entered correctly Differences across agencies Patent authors may actively try to hide important facts… Investigators may deal with quite different subject matter from task to task Limited domain knowledge Problems getting an overview of a given topic Inventiveness, creativity and care needed

6 6 Solutions Handling query errors Low-level spell checks may reduce errors significantly e.g., Google's “Did you mean …” More advanced error detection techniques may be implemented Can draw on past searcher behaviour, query logs, document and database data, including field specific information Google Suggest Amazon patented approach Proactive search support http://www.google.com/webhp?complete=1

7 Example Amazon’s query correction Large proportion of erroneous queries Wants to give an answer anyway Use contextual user data to correct typos etc. Non-matching terms in multi-term queries are compared to any terms co-occurring with matching terms in the query log Non-matching terms are replaced and used in the query Draws on the power on millions of past queries (Very probably plays a large role in major web search engines) Can be extended to include corpus data and temporal aspects Might be extended to identify typos/mis-entries in documents At indexing time or interactively at search time Based on US patent 6144958

8 8 Solutions Boolean and Best Match integration Best match and Boolean already integrated internally in several IR models and systems E.g., InQuery/Lemur based on inference networks Challenge to design user-friendly and flexible ways of formulating queries with both perspectives Other major IR techniques Relevance Feedback Latent Semantic Analysis Rajashekar & Croft (1995) http://www.lemurproject.org/ Rocchio (1971) Dumais (2004) http://www.lemurproject.org/

9 9 Solutions Proactive and contextual feedback Context aware solutions that attempt to give situation specific advice or present additional options Indicate potential query errors (typos and syntax) Suggest additional search terms Suggest useful actions or moves, e.g., propose co-authors to already entered authors Draw on knowledge about typical tasks, semantic tools, corpus and log data The right support at the right time From Schaefer et al. (2005)

10 10 Solutions Automatic result classification Partition large result sets  better overview Apply various text classification techniques on the full text Use patent classification Indicate relevant parts of patents Structured document retrieval (e.g., INEX) Combine with semantic knowledge of patent composition http://www.clusty.com/ http://inex.is.informatik.uni-duisburg.de/

11 11 Solutions Clustering and visualization Cluster and visualize large amounts of patents on the fly Provide better overview Challenges in implementation e.g., labels From the ‘Aureka’ system © Thomson Scientific

12 12 Conclusions Patent search problems Complex documents, data and query errors, vocabulary mismatch, information overload Many existing IR techniques can be adapted and combined to alleviate these Make use of patent characteristics, e.g., structure and fields Challenge to combine these into integrated systems and useful interfaces Input needed from industry partners Tasks, search problems, data deficiencies, query logs, test persons and test cases

13 13 References Dumais, S. (2004). Latent Semantic Analysis. In: Cronin, E.B, ed., Annual Review of Information Science and Technology, vol. 38, 2004, 189-230. Ingwersen, P. and Järvelin, K. (2005): The Turn - Integration of informAtion Seeking and Retrieval in Context. Springer. xiv, 448 p. (The Information Retrieval Series ; 18) Otega, R.E. & Bowman, D.E. (2002): System and Method for Correcting Spelling Errors in Search Queries Using both Matching and Non-matching Terms. US patent 6144958. Rajashekar, T. B. and Croft, W. B. (1995): Combining Automatic and Manual Index Representations in Probabilistic Retrieval. Journal of the American Society for Information Science, 46(4), 272-283. Rocchio, J. J. (1971): Relevance feedback in information retrieval. In: Salton, G. ed. The SMART retrieval system : experiments in automatic document processing. Englewood Cliffs, NJ: Prentice Hall, p. 313-323. (Prentice-Hall series in automatic computation) Schaefer, A., Jordan, M., Klas, C.-P. & Fuhr, N. (2005): Active Support for Query Formulation in Virtual Digital Libraries: a case study with DAFFODIL. In: Rauber, A., Christodoulakis, S. & Toja, A. M. eds. Research and Advanced Technology for Digital Libraries, 9th European Conference, ECDL 2005, Vienna, Austria, September 18-23, 2005, Proceedings. Berlin: Springer, 414-425. (LNCS 3652)


Download ppt "User Errors in Formulating Queries and IR Techniques to Overcome Them Birger Larsen Information Interaction and Information Architecture Royal School of."

Similar presentations


Ads by Google