Applying the KISS Principle with Prior-Art Patent Search Walid Magdy Gareth Jones Dublin City University CLEF-IP, 22 Sep 2010.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Information Retrieval (IR) on the Internet. Contents  Definition of IR  Performance Indicators of IR systems  Basics of an IR system  Some IR Techniques.
Chapter 5: Introduction to Information Retrieval
PRES A Score Metric for Evaluating Recall- Oriented IR Applications Walid Magdy Gareth Jones Dublin City University SIGIR, 22 July 2010.
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Section Based Relevance Feedback Student: Nat Young Supervisor: Prof. Mark Sanderson.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Search Engines and Information Retrieval
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
Modern Information Retrieval
Information Retrieval in Practice
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Evaluating the Performance of IR Sytems
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
INEX 2003, Germany Searching in an XML Corpus Using Content and Structure INEX 2003, Germany Yiftah Ben-Aharon, Sara Cohen, Yael Grumbach, Yaron Kanza,
Important Task in Patents Retrieval Recall is an Important Factor Given Query Patent -> the Task is to Search all Related Patents Patents have Complex.
Chapter 5: Information Retrieval and Web Search
 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.
International Atomic Energy Agency INIS Training Seminar Principles of Information Retrieval and Query Formulation 07 – 11 October 2013 Vienna, Austria.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Overview of the PATENTSCOPE® search service Jerusalem 21 July 2010 Alex Riechel Associate Officer, Innovation and Technology Support Section.
Search Engines and Information Retrieval Chapter 1.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
JASS 2005 Next-Generation User-Centered Information Management Information visualization Alexander S. Babaev Faculty of Applied Mathematics.
Retrieving Food Science & Technology Information Elizabeth Lee & Mak Jie Ying 5 Aug 2015.
DCU meets MET: Bengali and Hindi Morpheme Extraction Debasis Ganguly, Johannes Leveling, Gareth J.F. Jones CNGL, School of Computing, Dublin City University,
Building a Domain-Specific Document Collection for Evaluating Metadata Effects on Information Retrieval Walid Magdy, Jinming Min, Johannes Leveling, Gareth.
Building a Domain-Specific Document Collection for Evaluating Metadata Effects on Information Retrieval Walid Magdy, Jinming Min, Johannes Leveling, Gareth.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
A Study on Query Expansion Methods for Patent Retrieval Walid MagdyGareth Jones Centre for Next Generation Localisation School of Computing Dublin City.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Genetic Learning for Information Retrieval Andrew Trotman Computer Science 365 * 24 * 60 / 40 = 13,140.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 6: Information Retrieval and Web Search
Multilingual Retrieval Experiments with MIMOR at the University of Hildesheim René Hackl, Ralph Kölle, Thomas Mandl, Alexandra Ploedt, Jan-Hendrik Scheufen,
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
UA in ImageCLEF 2005 Maximiliano Saiz Noeda. Index System  Indexing  Retrieval Image category classification  Building  Use Experiments and results.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
Talk Schedule Question Answering from Bryan Klimt July 28, 2005.
Information Retrieval at NLC Jianfeng Gao NLC Group, Microsoft Research China.
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Performance Measurement. 2 Testing Environment.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents Esaú Villatoro-Tello Manuel Montes-y-Gómez Luis Villaseñor-Pineda Language Technologies.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
1 INFILE - INformation FILtering Evaluation Evaluation of adaptive filtering systems for business intelligence and technology watch Towards real use conditions.
Information Retrieval in Practice
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Information Retrieval (in Practice)
Walid Magdy Gareth Jones
Experiments for the CL-SR task at CLEF 2006
Applying Key Phrase Extraction to aid Invalidity Search
Text Categorization Document classification categorizes documents into one or more classes which is useful in Information Retrieval (IR). IR is the task.
Are You Smarter Than a 5th Grader?
Advanced search techniques in databases
Introduction to Information Retrieval
Information Retrieval and Web Design
Presentation transcript:

Applying the KISS Principle with Prior-Art Patent Search Walid Magdy Gareth Jones Dublin City University CLEF-IP, 22 Sep 2010

DCU participation in CLEF-IP 2009 The more text, the better the results Structured search does not help Filtering helps Combination of terms and phrases does better Word matching for search is not the best Blind relevance feedback is ineffective Part of the answer is within the question

KISS Keep It Simple and Straightforward Three submitted simple runs: 1. IR run (simple search) 2. Cit run (straightforward citation extraction) 3. IR+Cit run (combine IR and Cit runs) Evaluation results (25 submitted runs): 1. IR run (3 rd in recall) 2. Cit run (1 st in precision) 3. IR+Cit run (2 nd in MAP, recall, and PRES)

IR run Different document versions of a patent are merged Only English parts are indexed (title, abstract, description, and claims) Query is constructed from the same fields as follows: - unigrams with freq>2 from “description” field - bigrams with freq>3 from all fields French and German topics are translated using Google translation 1 st three levels of classification are used to filter results

Cit and IR+Cit runs All patents IDs are extracted from description section in patent topics IDs that do not exist in collection are filtered out Remaining IDs are considered as relevant documents Only 771 out of 2,005 topics could have citations extracted from its text (2,307 citations) IR run is appended to Cit run after removing duplicates to create IR+Cit run

Results Run IR Cit IR+Cit

Conclusion & Future Work When simpler approaches achieve better results than sophisticated ones: Much research is still needed in this area Extracted citations can be useful for relevance feedback Better translations can be used for FR/DE topics Faster translation techniques can be used to translate FR/DE documents

Simply, Thank you this was the K KK KISS principle with patent search