Applying the KISS Principle with Prior-Art Patent Search Walid Magdy Gareth Jones Dublin City University CLEF-IP, 22 Sep 2010
DCU participation in CLEF-IP 2009 The more text, the better the results Structured search does not help Filtering helps Combination of terms and phrases does better Word matching for search is not the best Blind relevance feedback is ineffective Part of the answer is within the question
KISS Keep It Simple and Straightforward Three submitted simple runs: 1. IR run (simple search) 2. Cit run (straightforward citation extraction) 3. IR+Cit run (combine IR and Cit runs) Evaluation results (25 submitted runs): 1. IR run (3 rd in recall) 2. Cit run (1 st in precision) 3. IR+Cit run (2 nd in MAP, recall, and PRES)
IR run Different document versions of a patent are merged Only English parts are indexed (title, abstract, description, and claims) Query is constructed from the same fields as follows: - unigrams with freq>2 from “description” field - bigrams with freq>3 from all fields French and German topics are translated using Google translation 1 st three levels of classification are used to filter results
Cit and IR+Cit runs All patents IDs are extracted from description section in patent topics IDs that do not exist in collection are filtered out Remaining IDs are considered as relevant documents Only 771 out of 2,005 topics could have citations extracted from its text (2,307 citations) IR run is appended to Cit run after removing duplicates to create IR+Cit run
Results Run IR Cit IR+Cit
Conclusion & Future Work When simpler approaches achieve better results than sophisticated ones: Much research is still needed in this area Extracted citations can be useful for relevance feedback Better translations can be used for FR/DE topics Faster translation techniques can be used to translate FR/DE documents
Simply, Thank you this was the K KK KISS principle with patent search