MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.

Slides:



Advertisements
Similar presentations
IPC – a sound tool for Environmentally Sound Technologies?
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Vector Methods 1.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of the PATENTSCOPE® search service Jerusalem 21 July 2010 Alex Riechel Associate Officer, Innovation and Technology Support Section.
Some facets of knowledge management in mathematics Wolfram Sperber (Zentralblatt Math) Patrick Ion (Math Reviews) Facets of Knowledge Organization A tribute.
Adversarial Information Retrieval The Manipulation of Web Content.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Multiple Retrieval Models and Regression Models for Prior Art Search Participating institution: Humboldt Universität zu Berlin - IDSL Patrice Lopez also.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Access to patent information and the role of classification Mikhail Makarov World Intellectual Property Organization IPC Forum 2006 Geneva.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
A Study on Query Expansion Methods for Patent Retrieval Walid MagdyGareth Jones Centre for Next Generation Localisation School of Computing Dublin City.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Applying the KISS Principle with Prior-Art Patent Search Walid Magdy Gareth Jones Dublin City University CLEF-IP, 22 Sep 2010.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Espacenet and Patent Searching Dr Karen Ryan Patent Examiner 22 September 2011.
Chapter 6: Information Retrieval and Web Search
RANKING SUPPORT FOR KEYWORD SEARCH ON STRUCTURED DATA USING RELEVANCE MODEL Date: 2012/06/04 Source: Veli Bicer(CIKM’11) Speaker: Er-gang Liu Advisor:
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
Espacenet and Patent Searching Dr Dolores Cassidy Patent Examiner 09 October 2015.
Final Project Mei-Chen Yeh May 15, General In-class presentation – June 12 and June 19, 2012 – 15 minutes, in English 30% of the overall grade In-class.
INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents Esaú Villatoro-Tello Manuel Montes-y-Gómez Luis Villaseñor-Pineda Language Technologies.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Improving Music Genre Classification Using Collaborative Tagging Data Ling Chen, Phillip Wright *, Wolfgang Nejdl Leibniz University Hannover * Georgia.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Contextual Search and Name Disambiguation in Using Graphs Einat Minkov, William W. Cohen, Andrew Y. Ng Carnegie Mellon University and Stanford University.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Corpus Exploitation from Wikipedia for Ontology Construction Gaoying Cui, Qin Lu, Wenjie Li, Yirong Chen The Department of Computing The Hong Kong Polytechnic.
Introducing EPO PATSTAT EPO Worldwide Patent Statistical Database James Rollinson.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
PATENTSCOPE Patent Search Strategies and Techniques Andrew Czajkowski Head, Innovation and Technology Support Section.
Search Tools and Strategies Andrew Czajkowski Head, Innovation & Technology Support Section.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2018 Lecture 7 Information Retrieval: Ranking Aidan Hogan
Applying Key Phrase Extraction to aid Invalidity Search
Overview of PATENTSCOPE® search service Webinar September 2010
Milena Lonati PD Quality Management DG2, European Patent Office
CS 430: Information Discovery
Citation-based Extraction of Core Contents from Biomedical Articles
Feature Selection for Ranking
Relevance and Reinforcement in Interactive Browsing
Introduction to Search Engines
Active AI Projects at WIPO
Presentation transcript:

MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS

OUTLINE Motivation Proposed Approach Vector Generation Vector Propagation Evaluation Dataset Cosine similarity (CS) Graded Cosine Similarity (GSC) Text Score + Similarity Score Results Future Work

SEARCH PATENTS, BUT HOW ? Input : A patent application Process : Study the patent Find key words related to the invention Formulate query with help of several operators Query the patent database Refine the query depending on the results Problem : Search results depend on the query, query depends on keywords selection. Quality of search results depends heavily on the choice of words and their weight in the query

MOTIVATION Selection of keywords requires domain expertise, restricts the number and areas of examiner’s patent searches. Manual and tedious process of query creation. So many words and so many operators, optimal combination needs expertise. What claims to focus on, which ones to leave. Some patent applications have over 100 claims.

WHAT MAY HELP Automatic query creation and refinement. Exploiting the meta-data present in the application to improve results.

WHAT'S IN PATENTS Inventor information IPC (International Patent Classification) class code information. Date of filling Citations Images And of course the patent text – Title, Abstract, Description and Claims

OUR APPROACH 1.Use Category and citation information in patents to filter results. 2.Use text search to improve precision.

IPC VECTORS Category information at each level is represented as a Vector. Combination of vectors of all the levels results in IPC Vector of a patent. Level 1 : Section + Class + Subclass Level 2 : Section + Class + Subclass + Main Group Level 3 : Entire classification code

VECTOR PROPAGATION Citation graph of patents is used to enrich the vector. It is a directed graph which has a link from Node A to Node B if patent A cites patent B. Inlinks (incoming edges) of a node are used to add information to its vector. Propagation ensures that if Node A is retrieved then its neighbors are also present in the solution set, this improves the recall of the system

VECTOR PROPAGATION CONTD.. For a given node P i, let In(P i ) be subset of the set of nodes that point to it (predecessors) and k be the current iteration. The vector of node P i for k +1 th iteration is defined as follows : 1/2 k is used to dampen the effect of adjacent vectors as the iterations increase. The above formula simply adds the average of vectors of all nodes that point to P i.

EVALUATION We use the CLEF-IP 2011 collection of Prior Art Search (PAC) task that has 2.6 million patents pertaining to 1.3 million patents European Patent Office (EPO) with content in English, German and French, and extended by documents from WIPO. There are 300 sample patent applications as queries with the dataset. Both English and original patents are used for making queries.

EVALUATION Base: Simple Text Retrieval, 20 words, from the query patent, with high tf-idf values are used to form a weighted query. The weight of each word is its tf-idf score. COS: Cosine Similarity, IPC information present in the patent is used to make the vector. Entire vector is used to calculate cosine similarity between a patent and query. GCS: Graded Cosine Similarity, calculating similarity at each level and linearly combining them to get final score.

GRADED COSINE SIMILARITY If P q is the query patent vector and P i is a vector of i th patent in the corpus, we use following to calculate graded similarity: where a j represent importance of similarity score at level j and sim level j is cosine similarity between vectors of level j.

RE-RANKING TOP PATENTS We re-rank top 1000 documents by using following methods: COS + Base : top 1000 documents obtained from COS are re-ranked using λBase + (1 − λ)COS. GCS + Base : top 1000 documents from GCS are re- ranked using λBase + (1 − λ)GCS.

RESULTS

CONCLUSION AND FUTURE WORK Both IPC based representation and re-ranking on sample queries of CLEF-IP 2011 dataset perform better than the baseline i.e. text based retrieval in terms of precision and recall. An extension to this work would be to use a learning-to-rank approach to re-rank top documents. It would be interesting to observe effects of combining both vector representation with patent text to avoid re-ranking.

QUESTIONS ????? THANK YOU