The Status of Retrieval Evaluation in the Patent Domain Mihai Lupu Vienna University of Technology Information Retrieval Facility PaIR.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

What is a CAT?. Introduction COMPUTER ADAPTIVE TEST + performance task.
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Multimedia Database Systems
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Information Retrieval Visualization CPSC 533c Class Presentation Qixing Zheng March 22, 2004.
Engineering Village ™ ® Basic Searching On Compendex ®
Information Retrieval in Practice
Search Engines and Information Retrieval
ISP 433/533 Week 2 IR Models.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Information Retrieval in Practice
INFO 624 Week 3 Retrieval System Evaluation
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
© Tefko Saracevic1 Search strategy & tactics Governed by effectiveness&feedback.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
Software Requirements
© Anselm Spoerri Lecture 13 Housekeeping –Term Projects Evaluations –Morse, E., Lewis, M., and Olsen, K. (2002) Testing Visual Information Retrieval Methodologies.
Lessons Learned from Information Retrieval Chris Buckley Sabir Research
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
Patent CLEF John Tait, Chief Scientific Officer, IRF.
Overview of Search Engines
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
THOMSON SCIENTIFIC Web of Science 7.0 via the Web of Knowledge 3.0 Platform Access to the World’s Most Important Published Research.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for.
Search Engines and Information Retrieval Chapter 1.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
 Knowledge Acquisition  Machine Learning. The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
H. Lundbeck A/S3-Oct-151 Assessing the effectiveness of your current search and retrieval function Anna G. Eslau, Information Specialist, H. Lundbeck A/S.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Jane Reid, AMSc IRIC, QMUL, 16/10/01 1 Evaluation of IR systems Jane Reid
Evaluating a Research Report
Modern Information Retrieval Computer engineering department Fall 2005.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Applying the KISS Principle with Prior-Art Patent Search Walid Magdy Gareth Jones Dublin City University CLEF-IP, 22 Sep 2010.
Experimental Research Methods in Language Learning Chapter 16 Experimental Research Proposals.
1 CHBE Orientation Program Searching the Literature.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
How to read a scientific paper
The Internet 8th Edition Tutorial 4 Searching the Web.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
TREC-CHEM The TREC Chemical IR Track Mihai Lupu 1, John Tait 1, Jimmy Huang 2, Jianhan Zhu 3 1 Information Retrieval Facility 2 York University 3 University.
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Information Retrieval
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
1 INFILE - INformation FILtering Evaluation Evaluation of adaptive filtering systems for business intelligence and technology watch Towards real use conditions.
Information Retrieval in Practice
Information Retrieval in Practice
Information Retrieval (in Practice)
Modern Information Retrieval
Walid Magdy Gareth Jones
Multimedia Information Retrieval
Search Techniques and Advanced tools for Researchers
Applying Key Phrase Extraction to aid Invalidity Search
Introduction to Information Retrieval
Presentation transcript:

The Status of Retrieval Evaluation in the Patent Domain Mihai Lupu Vienna University of Technology Information Retrieval Facility PaIR - Oct 24 th, 2011 – Glasgow, UK 1

Outline Introduction Practice – In the IP community – In the IR community Discussion PaIR - Oct 24 th, 2011 – Glasgow, UK 2

Evaluation for the Patent domain High Recall: a single missed document can invalidate a patent Session based: single searches may involve days of cycles of results review and query reformulation Defendable: Process and results may need to be defended in court

Introduction – IR Evaluation “Efficient and effective system” Time and space: efficiency – Generally constrained by pre-development specification E.g. real-time answers vs. batch jobs E.g. index-size constraints – Easy to measure Good results: effectiveness – Harder to define --> more research into it And…

Introduction – user studies – Does a 2% increase in some retrieval performance measure actually make a user happier? – Does displaying a text snippet improve usability even if the underlying method is 10% weaker than some other method? – Hard to do – Mostly anecdotal examples – IR people don’t like to do it (though it’s starting to change)

Introduction – user studies Miscommunication – a problem – E.g. “index”, “high recall” “Evaluation” itself PaIR - Oct 24 th, 2011 – Glasgow, UK 6

Practice in the IP world Commercial world – no extensive experimentation – based on practice and experience – highly competitive and yet often collaborative – not one tool is ever declared the best – Source of articles World Patent Information Journal 7

Example of evaluation [Emmerich:2009] – case study analysis pharma – compares first-level patent data value-added patent information – Chemical Abstracts and Thomson Scientific background: – valued-added patent information sources are the incumbents here 8

Case study 1 a prior art search for pantoprazole focusing on worldwide literature – particular interest: product protection manufacturing processes formulation/combination patents methods of disease treatment 9

Case study 1 Search Strategy 10 a structure-based strategy considering specific and Markush structures, as well as keyword based strategy a broad collection of keywords has been searched, comprising the generic name of pantoprazole, various brand names and development codes. Chemical names have been excluded from the search deliberately.

Case Study 1 Results – Value-added search: 587 inventions of these – each source had at least 3.6% unique results (one had 19.6%) – overlaps: 68.8% – Full-text search : 1097 inventions not found: 117 inventions new : 651 inventions 11

Case study 1 Analysis – comparison of precision Value-added data: <1% false hits Full-text search: >30% non relevant – why different results: value-added data: – procedural differences in indexing (not everything is indexed: not all patent documents and not all chemical formulas) – coverage full-text search: – coverage value-added data vs full-text search +Asian patent publications with no English text – compositions with could be used to deliver this and other drugs – decision to index only some structures 12

Case study 1 Analysis – failures of full-text key patents cannot be found due to – representation as a chemical structure only (potentially part of a Markush structure) not standardized chemical names misspellings 13

Case Study 1 Conclusions of this case study – multiple sources need to be used – a set of characteristics of each tool/source Our conclusions based on this study – 1 query – impossible to repeat (not enough details) – evaluation merges collection coverage and query capabilities – 14

Case Study 2 [Annies:2009] reviews – search and display capabilities of e-journal search sites – value for full-text prior art analysis Target data/systems – e-journals’ publishers’ websites – ! many discarded from the beginning “many search sites are not useful for professional prior art searching, since they lack advanced search and display features critical for fast and effective retrieval and evaluation” 15

Case Study 2 minimum set of requirements 1.site should cover a larger number of e-journals 2.provide advanced search options (e.g. at least Boolean logic with wildcards) 3.provide advanced display features (e.g. at least search keywords highlighting) out of 200 sites available to the author, 4 fulfilled these 3 basic requirements 16

Case Study 2 search features analysis – how query can be defined – search by fields? – other features: date filtering, phrase searching, stemming, wildcards, citation search, proximity operators display features analysis – keyword highlighting on different colors based on concepts other features – save/history options – RSS feeds and search alerts – open access chemical structure search – none of the 4 – 2 of the

Case Study 2 Conclusions of this case study – e-journal search options offered by publishers are insufficient for professional search why? – patent information professionals search for rather hidden information – they apply more complex search strategies for comprehensive retrieval – following aspects found problematic: search and display features limited spread of journals across non-cooperating publishers 18

Case Study 2 Our conclusions on this case study – absolutely no mentioning of search effectiveness – starting point is a predefined wish list – ‘evaluation’ is all-encompassing (from coverage, to search, to display) – 19

Other evaluations Community based – e.g.Intellogist, PIUG wiki, LinkedIn Groups – evaluation is done ‘in the wild’ – experiences shared e.g. 20

Other evaluations LinkedIn groups 21

Practice in the IR world it looks at: 1.effectiveness of the core engine 2.repeatability of experiments 3.statistical significance of experiments […] 20. user interface 22

Practice in the IR World organize large evaluation campaigns – TREC – CLEF – NTCIR – INEX – FIRE – … 23

Evaluation campaigns two types of ‘interesting’ campaigns – those which use patent data and simulate patent search – those which evaluate IR features identified as useful by patent professionals e.g. – session-based search – relevance feedback 24

CLEF-IP Topics – patent documents 2009: topic = a mixture of all documents pertaining to a patent - wrong form 2010: topic = an application document – better – selection process defined topic pool (recent documents) textual content must be present in the publication the topic patent must have at least 3 citations in their search reports 25

CLEF-IP Relevance assessments – different degrees of relevance from Applicant – less important from Search Report (examiner) – important from opposition procedure (competitor)- most important – by definition incomplete 26

CLEF-IP Evaluation procedure and measures – pretty much the same as all other IR evaluation campaigns – one new measure introduced in 2010 PRES [Magdy:2010] – recall oriented: lenient on runs which return lots of relevant documents but not necessarily highly ranked, hard to systems which return only a few relevant documents at the top 27

NTCIR first eval campaign to have a patent-related task in 2002 – test collection[s] 2 years full text JP 5 years abstracts JP and 5 years abstracts EN topics created manually from news articles – all in 5 languages (JP, EN, KO, trad/simplified CN) – 6 for training and 25 for testing graded relevance (4 levels) 2003/2004 – first invalidity search campaign (similar to Prior Art) – results had to order passages of the document in order of importance to the query – human evaluations again 7 train topics, 103 test topics (34 manually evaluated, 69 based on the search report) topic – JP patent application rejected by the JPO topics translated into EN and simplified CN 28

NTCIR NTCIR-5 ( ) – document retrieval, passage retrieval, classification NTCIR-6 ( ) – JP retrieval, EN retrieval, classification NTCIR-7 ( ) – classification of research papers in IPC NTCIR-8 ( ) – same as 7 + trend map creation NTCIR-9 ( ) no longer retrieval, but Patent MT task 29

Evaluation campaigns & users Different levels of user involvement – Based on subjectivity levels 1.Relevant/non-relevant assessments Used largely in lab-like evaluation as described before 2.User satisfaction evaluation Some work on 1., very little on 2. – User satisfaction is very subjective UIs play a major role Search dissatisfaction can be a result of the non- existence of relevant documents 30

RF evaluation campaign TREC Relevance Feedback track – from 2008 to concentrated on the algorithm itself – participants were given the same sets of judged docs and used their own algorithms to retrieve new docs 2009 concentrated on finding good sets of docs to base their retrieval on – each participant submitted one or two sets of 5 documents for each topic, 3-5 other participants ran with those docs  get a system independent score of how good those docs were 2010 focuses even more, on 1 document only (how good it is for RF) 31

Session-based IR evaluation first organized in 2010 Objectives 1.to see if a system can improve results based on knowledge of a previous query 2.to evaluate system performance over an entire session rather than the usual ad-hoc methodology 32 “A search engine may be able to better serve a user not by ranking the most relevant results to each query in the sequence, but by ranking results that help “point the way” to what the user is really looking for, or by complementing results from previous queries in the sequence with new results, or in other currently-unanticipated ways.”

Session-based IR evaluation 150 query pairs – original query : reformulated query – three types of reformulations specifications generalizations drifting/parallel reformulations for each query, participants submit 3 ranked lists: 1.over the original query 2.over the reformulated query only 3.over the reformulated query taking into account the original one 33

Outline Introduction Practice – In the IP community – In the IR community Discussion PaIR - Oct 24 th, 2011 – Glasgow, UK 34

Discussion We need to be able to convey the focus of an evaluation campaign PaIR - Oct 24 th, 2011 – Glasgow, UK 35

Aspects of investigation Low-level features are those features which form the back-end of a system – Type of index refers to comparisons between different indices, such as: full-text indices (e.g. vector-space models, simple inverted lists), manually created semantic indices (e.g. Chemical Abstracts Services), automatically created semantic indices (e.g. based on information extraction techniques), non-textual images (e.g. chemical formulas, images). A cell on this row will be coloured if the evaluation compared at least two different types of indices (preferably on the same data). It shall also be coloured if the comparison involves the same index, but with different parameters or scoring functions. – Query types refers to the choices offered to the user in inputting his or her request for information. This is related to the types of indices available, but several query types can be compared for the same index. A cell on this row shall be coloured if the comparison involves two or more query types for the same or for different indices. – Cross-Language search refers to comparisons between systems allowing the query and the index to be in different languages. A cell on this row shall be coloured if the evaluation involves at least two systems which translate either the query or the documents to be indexed. Note that in general this will imply different index types and/or queries for the different languages. PaIR - Oct 24 th, 2011 – Glasgow, UK 36

Aspects of investigation High-level features are those features which are important for the system as a whole, but do not relate directly to its retrieval functionality – Coverage refers to comparisons of two or more systems based on the size of the collection they are able to index and search. A cell in this row shall be coloured if the two systems have different sets of documents indexed, but also if an arbitrary decision has been taken to index only portions of the documents (e.g. an index which contains only the claims of the patents vs another one which uses the same technology but also indexes the abstract and description). – Interface refers to comparisons between the graphical user interface of two or more systems. A cell in this row shall be coloured if observations are being made with respect to any aspects involving the user's interaction with the system (e.g. help in formulating the query, search session-related option, enhanced document displays) – Filtering refers to the existence or non-existence of possibilities to filter data based on aspects not-necessarily present in the documents. We refer here mostly to legal status information. A cell in this row shall therefore be coloured if a discrimination is made with regards to the possibility to filter the results based on the jurisdiction where, or date when a patent is in force. – Costs refers to comparisons where the monetary costs of using the systems are mentioned. A cell on this row shall be coloured if direct cost implications are observed. PaIR - Oct 24 th, 2011 – Glasgow, UK 37

Examples PaIR - Oct 24 th, 2011 – Glasgow, UK 38

Another dimension: Patent searches and Risk Risk ~ money (invested / to lose) amount of risk  resources committed  expected precision and recall 39 risk State-of-the-Art Patentability Freedom-to-operate Validity [ Trippe:2011 ]

Risk and Recall higher risk does not require higher recall – validity requires only one document to be found – freedom-to-operate is the top recall requester miss a document  face prosecution and lose investment 40 State-of-the-Art Patentability Freedom-to-operate Validity recall [ Trippe:2011 ]

Risk and Precision match almost completely 41 State-of-the-Art Patentability Freedom-to-operate Validity precision [ Trippe:2011 ]

Conclusion IR Evaluation for Patents is two folded: holistic usability, commercial utility not repeatable, case-study based component focused repeatable, stat. significant unconvincing to end-users The two are not in competition initial steps towards each other. Differences MUST be explicitly expressed and understood

Thank you PaIR - Oct 24 th, 2011 – Glasgow, UK 43

Effectiveness evaluation lab-like vs. user-focused Do user preferences and Evaluation Measures Line up? – SIGIR 2010: Sanderson, Paramita, Clough, Kanoulas Results are mixed: some experiments show correlations, some not This latest article shows the existence of correlations – User preferences is inherently user dependent – Domain specific IR will be different 44