Inventor Mobility Index Thorsten Doherr Zentrum für Europäische Wirtschaftsforschung Center of Economic Research, Mannheim Germany.

Slides:



Advertisements
Similar presentations
Law Dictionaries. Blacks Law Dictionary is the most widely used of a number of general and specialized law dictionaries consists of one volume identifies.
Advertisements

Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Chapter 5: Introduction to Information Retrieval
Introduction to Computer Science 2 Lecture 7: Extended binary trees
2P13 Week 11. A+ Guide to Managing and Maintaining your PC, 6e2 RAID Controllers Redundant Array of Independent (or Inexpensive) Disks Level 0 -- Striped.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 6 Advanced Data Modeling.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 5 Advanced Data Modeling.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 6 Advanced Data Modeling.
Presenter: Hsini Huang Co-authors: Li Tang and John P. Walsh Georgia institute of Technology ESF-APE-INV 2 nd “Name Game” workshop, Dec 9, 2010 Madrid,
Aki Hecht Seminar in Databases (236826) January 2009
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Matching patents to firm accounting data for European countries Rachel Griffith Rupert Harrison Gareth Macartney 24 February 2006.
The Relational Database Model. 2 Objectives How relational database model takes a logical view of data Understand how the relational model’s basic components.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 3 The Relational Database Model.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
INTEGRATION INTEGRATION Ramon Lawrence University of Iowa
3 1 Chapter 3 The Relational Database Model Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. ACCESS 2007 M I C R O S O F T ® THE PROFESSIONAL APPROACH S E R I E S Lesson 3 – Finding, Filtering,
Text Search and Fuzzy Matching
To quantitatively test the quality of the spell checker, the program was executed on predefined “test beds” of words for numerous trials, ranging from.
With Microsoft Office 2007 Intermediate© 2008 Pearson Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Office 2007 Intermediate.
With Microsoft Access 2007 Volume 1© 2008 Pearson Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Access 2007 Volume 1 Chapter.
Database Design - Lecture 1
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
Where Innovation Is Tradition SYST699 – Spec Innovations Innoslate™ System Engineering Management Software Tool Test & Analysis.
CSC 336 Data Communications and Networking Lecture 7d: Interconnecting LAN Dr. Cheer-Sun Yang Spring 2001.
Recent developments in patents statistics and data bases at EPO and OECD EPIP – Bocconi February 24-25, 2006 Dominique Guellec OECD.
© Hanson Research Corporation Deduping contacts in Sage CRM 24 th Day of November 2010.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Validated Model Transformation Tihamér Levendovszky Budapest University of Technology and Economics Department of Automation and Applied Informatics Applied.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Graphs.
Assignee Name Harmonization Efforts at the U.S. Patent and Trademark Office US Patent and Trademark Office Office of Electronic Information Products Patent.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Databases. What is a database?  A database is used to store data. The word DATA is actually Latin for FACTS. A database is, therefore, a place, or thing.
Chapter 4c, Database H Definition H Structure H Parts H Types.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
Fanny Widadie, S.P, M.Agr 1 Database Management Systems.
Database Systems, 9th Edition 1.  In this chapter, students will learn: That the relational database model offers a logical view of data About the relational.
Presented by: Aneeta Kolhe. Named Entity Recognition finds approximate matches in text. Important task for information extraction and integration, text.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Recent Results in Combined Coding for Word-Based PPM Radu Rădescu George Liculescu Polytechnic University of Bucharest Faculty of Electronics, Telecommunications.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 3 The Relational Database Model.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 3 The Relational Database Model.
Chapter 9 & 10 Database Planning, Design and Administration Database Application Lifecycle DBMS Selection Database Administration.
Lesson 2: Designing a Database and Creating Tables.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
THRio Database Linkage and THRio Database Issues.
Description and exemplification use of a Data Dictionary. A data dictionary is a catalogue of all data items in a system. The data dictionary stores details.
Chapter 3 The Relational Database Model. Database Systems, 10th Edition 2 * Relational model * View data logically rather than physically * Table * Structural.
Chapter – 8 Software Tools.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 3 The Relational Database Model.
Microsoft Access 2010 Chapter 11 Database Design.
ELEC692 VLSI Signal Processing Architecture Lecture 12 Numerical Strength Reduction.
Introducing EPO PATSTAT EPO Worldwide Patent Statistical Database James Rollinson.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
Creating patent indicators with multiple information sources Grid Thoma IPTS-Patent Data Meeting May 14-15, Seville IPTS, Seville, Grid ThomaMay 14-15,
Generic Trees—Trie, Compressed Trie, Suffix Trie (with Analysi
Why indexing? For efficient searching of a document
JRC – Territorial Development Unit Petros Gkotsis 08 March 2017
Indexing Structures for Files and Physical Database Design
The Relational Database Model
Database Systems: Design, Implementation, and Management Tenth Edition
Databases.
Query Languages.
Databases and Information Management
Overview of Electronic Lifecycle of USPTO Reclassification Projects
Database Systems: Design, Implementation, and Management
Presentation transcript:

Inventor Mobility Index Thorsten Doherr Zentrum für Europäische Wirtschaftsforschung Center of Economic Research, Mannheim Germany

 Two inventors with the same name are not neccessarily the same person  Defining an inventor only by its name results in too much false mobility especially for inventors with common names  Restricting the definition too much (i.e.: name and home address) will cancel any mobility You have to decide wether two patents from inventors with the same name are actually from the same person or from different persons that share the same name Mission: The complete patent data Problem: Tools: Mission

if they are inventing for the same applicant if they have the same home address if they are working with the same co-inventors if one is citing the other if they have patents in the same area of technology (ipc) Two inventors with the same name are the same person… Plausibility Rules Inventor: A single inventor entry in a patent document Person: All inventors with a specific name that are linked by at least one plausibility rule

Harmonization of Applicants The SearchEngine is an in-house developed software package specialized in company address matching. It implements the following steps:  Normalizing of the search fields (company name, address fields) by transforming them to uppercase, replacing special letters to their common (phonetic) representation (i.e.: Ü  UE, ß  SS), compressing abbreviations (i.e.: S.P.A.  SPA) and replacing special characters with blanks  Creating a dictionary containing all the words of the search fields along with their occurrence. To preserve the context, every search field has its own chapter. The occurence is the base for the heuristic search algorithm. There are also supporting tables that link the dictionary entries back to the company table.  The search algorithm separates a search term into words. Each word is associated with the occurrence counter of the appropriate dictionary entry. The occurrence reflects the identification potential of the word. A low occurrence has a high identity, because the resulting list of potential hits is small. SearchEngine

ENTRYOCCURSIDENTITY ……… CORPORATION161/16 = ……… ITALIA4911/491 = ……… LEAR41/4 = ……… SPA61191/6119 = DICTIONARY - Chapter: APPLICANT_NAME LearCorporationITALIAS.p.A. LEARCORPORATIONITALIASPA SUM %19.860%0.647%0.052%100% NAMEIDENTITY LEAR CORPORATION ITALIA S.p.A % Lear Corporation Italia S.r.l % LEAR ITALIA SEATING S.p.A % Searching for… Result Example of the SearchEngine Algorithm Harmonization of Applicants

 The resulting list of matching pairs is not symmetric: A can be linked to B but it is not required that B is linked to A  linked pairs create a network  Network Analysis: if A is linked to B and B is linked to C, the analysis identifies the group A,B,C  Re-iteration of the network analysis for too large groups with an increased cutoff limit for their members. Finalization  A cutoff limit for the identity is applied to filter all results (i.e. 90%)

 Creating phonetic representations of the name using the Metaphone algorithm by Lawrence Philips, 1990  Phonetic algorithms create unique representations for similar sounding words (names) and can be indexed  direct database access  Originally the results they delivered were manually validated because of their strong tendency for false positives  automated matching requires an automated validation process Harmonization of Inventor Names  Automated comparison of the retrieved names with the searched name  The function is based on the least relative character position deltas and requires two words as parameters  can not be used for index based direct access  Needs phonetic indexing to quickly generate a list of potential candidates  Tolerance for typing errors increases with the length of the words  longer words are more prone to typing errors  The SearchEngine is of limited use because…  it is most efficient with search terms consisting of multiple words  the main problem are typing errors and misspellings

Harmonization of Inventor Names MRBRTN MAUROBARATONI MARIOBERRETTONI MARIOBERTINI MARIOBERTON MAUROBERTONI MAUROBORDIN FIRST NAMELAST NAME Example for the Metaphone Search

Harmonization of Inventor Names 01.0 CZARNITZKI CHARNIZKI == Example for the Least Relative Character Position Deltas

if they are inventing for the same applicant. if they have the same home address. if they are working with the same co-inventors. if one citing the other. if they have patents in the same area of technology (ipc). Two inventors with the same name are the same person… Plausibility Rules Inventor: A single inventor entry in a patent document. Person: All inventors with a specific name that are linked by at least one plausibility rule.

All Patents of an Inventor Name

The Same Applicant Rule

The Same Home Address Rule

The Co-Inventor Rule

The Citation Rule

The IPC Rule

Italian Inventor Mobility Index patents from Italian applicants and inventors different harmonized inventor names nodes after applying the same applicant rule nodes after applying the co-inventor rule nodes after applying the citation rule nodes after applying the same home address rule nodes after applying the ipc rule Espace Bulletin (March 2010), EPO Patstat (September 2010), OECD Main Database: Citations: Development:Microsoft Visual FoxPro 9.0

FROMTO …… …… Traversal of a Network Table GROUPMEMBER