Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inventor Name Disambiguation

Similar presentations


Presentation on theme: "Inventor Name Disambiguation"— Presentation transcript:

1 Inventor Name Disambiguation
Tao-yang Fu, Zhen Lei, Wang-chien Lee

2 A patent of the inventor is likely to cite his own prior patents:
Main Ideas: #1 Patent citation network can be useful for inventor disambiguation An inventor’s research over time is likely to be related and/or builds upon the same prior research Patent citations reflect knowledge flows and technological linkage among patents A patent of the inventor is likely to cite his own prior patents: Citing relationship Two patents of the inventor are likely to cite the same patents Co-citing relationship (1) technological knowledge flow, i.e., the cited patent B is part of the giants shoulder, that the citing patent A builds on; (2) legally and economically blocking, i.e., the cited patent B might block the citing patent A in the marketplace.

3 Main Ideas: #2 Missing Patent Citations
However, Citations (in patent documents) are often incomplete Missing citations due to applicants and examiners Identifying missing citations to construct more complete patent citation networks might be helpful for inventor name disambiguation (1) technological knowledge flow, i.e., the cited patent B is part of the giants shoulder, that the citing patent A builds on; (2) legally and economically blocking, i.e., the cited patent B might block the citing patent A in the marketplace.

4 Heterogeneous citation-bibliographic networks
Main Ideas: #3 Inventor Name Disambiguation Can Be Useful for Identifying Missing Patent Citations Our prior work (ICDM 2015, DSAA 2014, CIKM 2013) in identifying missing citations Heterogeneous citation-bibliographic networks Meta-paths that involve inventor names are important in identifying missing citations and missing linkages among patents P1 - Inventor A - P2 - Cites - P3 P1 - Inventor A - P2 - Inventor B - P3 - Cites - P4 (1) technological knowledge flow, i.e., the cited patent B is part of the giants shoulder, that the citing patent A builds on; (2) legally and economically blocking, i.e., the cited patent B might block the citing patent A in the marketplace.

5 So: Patent citations (both existing and missing), reflecting technological linkages and knowledge flows among patents, can be used for inventor name disambiguation. Name-disambiguated inventor information, can be used to improve heterogeneous citation-bibliographic networks, which can be used for identifying missing patent citations.

6 Our Approach An iterative process between inventor name disambiguation and missing citation identification Inventor Name Disambiguation Missing Citation Identification Heterogeneous Citation-Bibliographic Network

7 What We Have Done: Use machine learning
Model the inventor disambiguation problem as a classification problem Binary classification for inventor pairs Class 1: two inventors are the same individual Class 0: two inventors are different individuals An inventor here actually means an inventor-patent record Adopt the Blocking approach by Fleming et al. to improve efficiency

8 What We Have Done: Verify that patent citation network is useful for inventor name disambiguation Actively learning to optimize the training set for the classifier

9 Classifier: Training Set Selection
We use the disambiguated result in patents_DB provided from patentView as the ground truth Randomly select K inventors To generate pairs of each inventor to all other inventors in the database (total 12 millions inventors) The imbalanced issue Positive and negative pairs are highly imbalanced about 1:1 million Undersampling: randomly remove negative pairs to shrink the number of negative pairs

10 Classifier: Training Set Selection
Active learning Add the most important/informative pairs to the training set Add some false-positive pairs (FP) Pairs of inventors who have exactly matched name but are not the same individual Add further some false-negative pairs (FN) Pairs of inventor who don’t have exactly matched name but are the same individual intuition of FP and FN

11 Classifier: Features Features Citing relationship
has_citing Co-citing relationship has_intersection, intersection count, Jaccord coefficient Inventor name exactly matched, partially matched Inventor’s assignee Inventor’s location Published years of patents difference of published years of two patents Patent classifications

12 Experiments Classifiers Experiments
We use SVM with linear kernel which has best performance and accepatable training time Experiments 1. Different training sets Basic training set (with undersampling) Basic training set (with undersampling) + FP Basic training set (with undersampling) + FP + FN 2. To check if citation based features are useful With / without citation based features

13 Experiments Different training sets Observation
Adding FP improves the precision but hurts the recall. Adding FP + FN maintains the precision and improves the recall at the same time, and gets the best performance of F-measure

14 Experiments Citation based features Observation
Citation based features maintain the precision and slightly improve the recall They may be more effective with complete citation networks There are many citation based features we do not use currently the preci

15 Some Conclusions Citation based features are useful
They maintain the precision and slightly improve the recall Training set selection is an important issue

16 Future Work An iterative process between inventor name disambiguation and missing citation identification Inventor Name Disambiguation Missing Citation Identification Heterogeneous Citation-Bibliographic Network


Download ppt "Inventor Name Disambiguation"

Similar presentations


Ads by Google