Inventor Name Disambiguation

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Also By The Same Author: AKTiveAuthor, A Citation Graph Approach To Name Disambiguation AKT DTA Colloquium January 23, 2006 Duncan McRae-Spencer.
Introduction to Information Retrieval
Large-Scale Entity-Based Online Social Network Profile Linkage.
Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.
Presenter: Hsini Huang Co-authors: Li Tang and John P. Walsh Georgia institute of Technology ESF-APE-INV 2 nd “Name Game” workshop, Dec 9, 2010 Madrid,
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.
Copyright © Siemens Medical Solutions, USA, Inc.; All rights reserved. Polyhedral Classifier for Target Detection A Case Study: Colorectal Cancer.
PatentViz Mike Wooldridge and Ken Langford Information Visualization and Presentation Fall 2005.
Evaluation of kernel function modification in text classification using SVMs Yangzhe Xiao.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Text Classification With Labeled and Unlabeled Data Presenter: Aleksandar Milisic Supervisor: Dr. David Albrecht.
Mining Binary Constraints in the Construction of Feature Models Li Yi Peking University March 30, 2012.
Combining Supervised and Unsupervised Learning for Zero-Day Malware Detection © 2013 Narus, Inc. Prakash Comar 1 Lei Liu 1 Sabyasachi (Saby) Saha 2 Pang-Ning.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
Espacenet and Patent Searching Dr Karen Ryan Patent Examiner 22 September 2011.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
LINK PREDICTION IN CO-AUTHORSHIP NETWORK Le Nhat Minh ( A N) Supervisor: Dongyuan Lu 1.
Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date :
Citation Provenance FYP/Research Update WING Meeting 28 Sept 2012 Heng Low Wee 1/5/
Citing Datasets. Research: search for knowledge or any systematic investigation to establish facts. And to establish facts, one needs Data.
Disambiguating inventor names using deep neural networks Steve Petrie T’Mir Julius.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Classification using Co-Training
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Classification Cheng Lei Department of Electrical and Computer Engineering University of Victoria April 24, 2015.
Date of download: 7/8/2016 Copyright © 2016 SPIE. All rights reserved. A scalable platform for learning and evaluating a real-time vehicle detection system.
The simultaneous evolution of author and paper networks
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Recommendation in Scholarly Big Data
Name: Sushmita Laila Khan Affiliation: Georgia Southern University
How to forecast solar flares?
JRC – Territorial Development Unit Petros Gkotsis 08 March 2017
Machine Learning – Classification David Fenyő
Sofus A. Macskassy Fetch Technologies
Machine Learning: Methodology Chapter
Simone Paolo Ponzetto University of Heidelberg Massimo Poesio
Rule Induction for Classification Using
CRF &SVM in Medication Extraction
Moran Feldman The Open University of Israel
Goal: use counters to add integers
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Lecture 9 Measures and Metrics.
Collective Network Linkage across Heterogeneous Social Platforms
Support Vector Machines (SVM)
Asymmetric Gradient Boosting with Application to Spam Filtering
Lecture 10 Measures and Metrics.
Students: Meiling He Advisor: Prof. Brain Armstrong
PEBL: Web Page Classification without Negative Examples
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Applying Key Phrase Extraction to aid Invalidity Search
Learning with information of features
CellNetQL Image Segmentation without Feature Definition
Disambiguation Algorithm for People Search on the Web
iSRD Spam Review Detection with Imbalanced Data Distributions
Learning Literature Search Models from Citation Behavior
Model Evaluation and Selection
Evaluating Models Part 1
Xin Qi, Matthew Keally, Gang Zhou, Yantao Li, Zhen Ren
Improving Overlap Farrokh Alemi, Ph.D.
Basics of ML Rohan Suri.
Xiao-Yu Zhang, Shupeng Wang, Xiaochun Yun
Machine Learning: Methodology Chapter
Multiple DAGs Learning with Non-negative Matrix Factorization
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presentation transcript:

Inventor Name Disambiguation 09-24-2015 Tao-yang Fu, Zhen Lei, Wang-chien Lee

A patent of the inventor is likely to cite his own prior patents: Main Ideas: #1 Patent citation network can be useful for inventor disambiguation An inventor’s research over time is likely to be related and/or builds upon the same prior research Patent citations reflect knowledge flows and technological linkage among patents A patent of the inventor is likely to cite his own prior patents: Citing relationship Two patents of the inventor are likely to cite the same patents Co-citing relationship (1) technological knowledge flow, i.e., the cited patent B is part of the giants shoulder, that the citing patent A builds on; (2) legally and economically blocking, i.e., the cited patent B might block the citing patent A in the marketplace.

Main Ideas: #2 Missing Patent Citations However, Citations (in patent documents) are often incomplete Missing citations due to applicants and examiners Identifying missing citations to construct more complete patent citation networks might be helpful for inventor name disambiguation (1) technological knowledge flow, i.e., the cited patent B is part of the giants shoulder, that the citing patent A builds on; (2) legally and economically blocking, i.e., the cited patent B might block the citing patent A in the marketplace.

Heterogeneous citation-bibliographic networks Main Ideas: #3 Inventor Name Disambiguation Can Be Useful for Identifying Missing Patent Citations Our prior work (ICDM 2015, DSAA 2014, CIKM 2013) in identifying missing citations Heterogeneous citation-bibliographic networks Meta-paths that involve inventor names are important in identifying missing citations and missing linkages among patents P1 - Inventor A - P2 - Cites - P3 P1 - Inventor A - P2 - Inventor B - P3 - Cites - P4 (1) technological knowledge flow, i.e., the cited patent B is part of the giants shoulder, that the citing patent A builds on; (2) legally and economically blocking, i.e., the cited patent B might block the citing patent A in the marketplace.

So: Patent citations (both existing and missing), reflecting technological linkages and knowledge flows among patents, can be used for inventor name disambiguation. Name-disambiguated inventor information, can be used to improve heterogeneous citation-bibliographic networks, which can be used for identifying missing patent citations.

Our Approach An iterative process between inventor name disambiguation and missing citation identification Inventor Name Disambiguation Missing Citation Identification Heterogeneous Citation-Bibliographic Network

What We Have Done: Use machine learning Model the inventor disambiguation problem as a classification problem Binary classification for inventor pairs Class 1: two inventors are the same individual Class 0: two inventors are different individuals An inventor here actually means an inventor-patent record Adopt the Blocking approach by Fleming et al. to improve efficiency

What We Have Done: Verify that patent citation network is useful for inventor name disambiguation Actively learning to optimize the training set for the classifier

Classifier: Training Set Selection We use the disambiguated result in patents_DB provided from patentView as the ground truth Randomly select K inventors To generate pairs of each inventor to all other inventors in the database (total 12 millions inventors) The imbalanced issue Positive and negative pairs are highly imbalanced about 1:1 million Undersampling: randomly remove negative pairs to shrink the number of negative pairs

Classifier: Training Set Selection Active learning Add the most important/informative pairs to the training set Add some false-positive pairs (FP) Pairs of inventors who have exactly matched name but are not the same individual Add further some false-negative pairs (FN) Pairs of inventor who don’t have exactly matched name but are the same individual intuition of FP and FN

Classifier: Features Features Citing relationship has_citing Co-citing relationship has_intersection, intersection count, Jaccord coefficient Inventor name exactly matched, partially matched Inventor’s assignee Inventor’s location Published years of patents difference of published years of two patents Patent classifications

Experiments Classifiers Experiments We use SVM with linear kernel which has best performance and accepatable training time Experiments 1. Different training sets Basic training set (with undersampling) Basic training set (with undersampling) + FP Basic training set (with undersampling) + FP + FN 2. To check if citation based features are useful With / without citation based features

Experiments Different training sets Observation Adding FP improves the precision but hurts the recall. Adding FP + FN maintains the precision and improves the recall at the same time, and gets the best performance of F-measure

Experiments Citation based features Observation Citation based features maintain the precision and slightly improve the recall They may be more effective with complete citation networks There are many citation based features we do not use currently the preci

Some Conclusions Citation based features are useful They maintain the precision and slightly improve the recall Training set selection is an important issue

Future Work An iterative process between inventor name disambiguation and missing citation identification Inventor Name Disambiguation Missing Citation Identification Heterogeneous Citation-Bibliographic Network