TREC-CHEM The TREC Chemical IR Track Mihai Lupu 1, John Tait 1, Jimmy Huang 2, Jianhan Zhu 3 1 Information Retrieval Facility 2 York University 3 University.

Slides:



Advertisements
Similar presentations
Search Strategy and Information Retrieval By Rekha Gupta, NIC
Advertisements

ANALYSING RESEARCH – A GLOBAL PERSPECTIVE Krzysztof Szymanski – Country Manager Thomson Reuters October 2009.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Presented by: Charles Pallandt Title: Managing Director EMEA Academic & Governmental Markets Date: April 28 th, Turkey “Driving Research Excellence.
How to Make Manual Conjunctive Normal Form Queries Work in Patent Search Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science.
Collaborative Research, Technology Transfer and Networking Essential Tools for Europe’s Way towards Information Society John Tait.
Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information.
Modern Information Retrieval
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
A partner search service on the Participant Portal? Peter HÄRTWICH European Commission RTD J.3 A partner search service on the Participant Portal? Peter.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
Patent CLEF John Tait, Chief Scientific Officer, IRF.
Worldwide Nanotechnology Development: A Comparative Study of USPTO, EPO, and JPO Patents Artificial Intelligence Lab Department of Management Information.
Seminar on WIPO Services and Initiatives Topic 5; Global Databases for Intellectual Property, Platform and Tools for Connected Knowledge Economy Oslo October.
THE ROLE OF CITATION ANALYSIS IN RESEARCH EVALUATION Philip Purnell September 2010.
Funded under the EU ICT Policy Support Programme Automated Solutions for Patent Translation John Tinsley Project PLuTO WIPO Symposium of.
 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.
Lecturer: Ghadah Aldehim
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Intute and Organic.Edunet Jackie Wickham ALLCU, Oxford, July 2008.
Orientation to Web of Science Dr.Tariq Ashraf University of Delhi South Campus
Custom driven scientific information extraction from digital libraries using integrated text mining services Betim Çiço, Adrian Besimi, Visar Shehu 14th.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
1 DATABASES By: Hanna Ben-Or Phone: October 2011.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
Rajesh Singh Deputy Librarian University of Delhi Measuring Research Output.
1 How to find literature - A very short introduction SMED 8004 Medicine and Health Library October 2014.
H. Lundbeck A/S3-Oct-151 Assessing the effectiveness of your current search and retrieval function Anna G. Eslau, Information Specialist, H. Lundbeck A/S.
European Patent Office PCT Minimum Documentation EPO views on a new definition Gérard Giroud, Principal Director PD Tools European Patent Office WIPO,Geneva.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Presented by : Miss Vrindah Chaundee
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
SCOPUS AND SCIVAL EVALUATION AND PROMOTION OF UKRAINIAN RESEARCH RESULTS PIOTR GOŁKIEWICZ PRODUCT SALES MANAGER, CENTRAL AND EASTERN EUROPE LVIV, 11 SEPTEMBER.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
MEDLINE for Medical Research Juliet Ralph and César Pimenta Hilary Term 2007.
1 Analysing the contributions of fellowships to industrial development November 2010 Johannes Dobinger, UNIDO Evaluation Group.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
OARE Module 5A: Scopus (Elsevier). Table of Contents About Scopus (Elsevier) Using Scopus Search Page Results/Refine Search Pages Download, PDF, Export,
Researching & Writing a Literature Review Karen Ciccone NCSU Libraries.
Presented by Dr. S. C. Jindal Librarian Central Science Library University of Delhi Delhi Information Competency.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
The role of knowledge in conceptual retrieval: a study in the domain of clinical medicine Jimmy Lin and Dina Demner-Fushman University of Maryland SIGIR.
RESEARCH – DOING AND ANALYSING Gavin Coney Thomson Reuters May 2009.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Focusing on quality International Research Assessment Exercise 2008.
Recuperação de Informação Cap. 01: Introdução 21 de Fevereiro de 1999 Berthier Ribeiro-Neto.
Patent Landscape Reports Project Review Geneva 12 November 2012 Irene Kitsara Patent Information Section, Access to Knowledge and Information Division.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
L&I SCI 110: Information science and information theory Instructor: Xiangming(Simon) Mu Sept. 9, 2004.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Katy Börner Teaching & Research Teaching & Research Katy Börner
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
LEARN. CARE. COMMUNITY. PNWU.edu Figure 1: Concept Map for IPE Fidelity 1.Determine the rubric score that represents high, medium, and low fidelity. 2.Identify.
Measuring Research Impact Using Bibliometrics Constance Wiebrands Manager, Library Services.
SciENcv: NLM’s Fed-wide biosketch tool NIH Regional Meeting May 2016 Neil Thakur, PhD Office of Extramural Research Bart Trawick, PhD National Center for.
INTRODUCTION TO BIBLIOMETRICS 1. History Terminology Uses 2.
Ricardo EIto Brun Strasbourg, 5 Nov 2015
Walid Magdy Gareth Jones
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
Elsevier Activity Range
Patent Searching Appendix
Introduction of KNS55 Platform
PubMed.
PHARM Library Orientation
Presentation transcript:

TREC-CHEM The TREC Chemical IR Track Mihai Lupu 1, John Tait 1, Jimmy Huang 2, Jianhan Zhu 3 1 Information Retrieval Facility 2 York University 3 University College London 1 Network of excellence co-funded by the 7 th Framework Program of the European Comission, grant agreement number

Agenda Introduction „Prior Art“ Task (PA) „Technology Survey“ Task (TS) Conclusions 2

Motivation Increased awareness on behalf of the industry and regulatory authorities – Particularly in human-related chemistry (pharma and cosmetics) – Particularly in IP-related contexts Increased availability of data and meta- data Different demands from professional users wrt other evaluation campaigns 3

Partners Collaboration – National Institute for Science and Technology (US) – University College London (UK) – York University (Canada) Support from – Royal Society of Chemistry – Open access publishers – Experts in the field With the participation of – Research groups 4

Aims Assess the available Chemical Retrieval tools Generate interest among research groups for this domain Stimulate participation from industry Generate new Chemical Retrieval tools, at the intersection of chemoinformatics and text-mining 5

Data 2 collections 2009 – 1.2 million patent documents – 50k scientific articles – text only 2010 – 1.3 million patent documents – 172k scientific articles – text, images, structure information available 6

2010 Data Patent data – Addition of WIPO patents – Addition of attachments (images, structure data) Scientific articles – 3-fold increase, with attachments – Large mass from PubMed – Some directly from open access publishers: IUCrJnls, Oxford Publishers, Hindawi Publishers, MPCI 7

2010 Data Patent data across IPC classes Organic Chemistry Medical or Veterinary science; Hygiene Organic macromolecular compounds BioChemistry Physical or chemical processes or apparatus in general Dyes; Paints; Polishes… Petroleum; Gas.. 8

Tasks Technology Survey (TS) – Search for all potentially relevant documents, in both patents and scientific articles. – 30 manually defined and evaluated topics Prior Art (PA) – Search for patents that may invalidate a given patent – 1000 automatically created and evaluated topics (1000 patent files) 9

PA topics Tagline: recreate the citation list created by the patent examiner topic = patent application document evaluation based on – applicant’s citations – examiner’s report – opposition citations (if any) only patent corpus used 10

PA topics 11

TS topics topic = natural language information request evaluation done manually by – junior evaluators (students, others) – senior evaluators (topic creators) both patent and scientific articles requested 12

TS topics -example TS-23 Titanium tetrafluoride for improving dental health Titanium tetrafluoride can be used to prevent dental caries or tooth decay along with other fluoride containing compounds. We are specifically looking for the use of Titanium tetrafluoride for improving dental health or preventing decay. titanium tetrafluoride tooth decay A document will be considered RELEVANT if it refers to the use of titanium tetrafluoride for improving dental health, including caries or tooth decay A document will be considered HIGHLY RELEVANT when it is RELEVANT and it refers to the use of titanium tetrafluoride within a product such as toothpaste or mouthwash. 13

TS topics - example TS-47 Structure Search We are looking for patents and papers on use of the chemical described in TS-47.mol and TS-47.png for treating dementia. A document will be considered RELEVANT if it refers to the use of chemical X for treating dementia There are no HIGHLY RELEVANT documents. 14

Participants 13 participants registered to download the data PA – 4 submitted 10 runs – BiTeM Geneva, York University, Fraunhfer SCAI, Iowa University TS – 2 submitted 12 runs – BiTeM Geneva, York University 15

Methods Basic Probabilistic Model, Language Model and Vector Space Model – Different sections, weights on each section – bm25 Additional filtering/weighting based on IPC codes Linguistic processing – Emphasis on NP Concept based search – Query expansion – Using Oscar3, MeSH 16

Methods The addition of non-text data did not impact the methods – only 2 TS topics were purely structure based TODO – define interesting structure based topics – find ways to solve them 17

Evaluation – PA topics Topic Patent D D D D cites Family Member sibling F1 cites F2 F3 18

Evaluation PA topics qrels 19

Evaluation TS topics – Due to low participation -> pooling method might have resulted in biased results – However, still wanted to provide feedback to the 2 participating groups – Evaluated 6 topics: TS-21, TS-23, TS-30, TS-35, TS-36 and TS-43 20

Evaluation – TS Interface TS topics - interface 21

Evaluation – TS interface TS topics - interface 22

Evaluation TS topics – qrels Topic#pooled#sampled#relevant#highly relevant #non relevant TS TS TS TS TS TS

Results – Prior Art Task 24

Results – TS task 25

Results – TS Task 26

Conclusions & Outlook This year, more than the last, was a dry- run for the next campaign Fixed test collection 24 TS topics still to use next year Main objective for 2011 – More collaboration between structure-based search and text-mining 27

Thank you Questions 28