Download presentation
Presentation is loading. Please wait.
PublishMaude Patrick Modified over 9 years ago
1
C I R C L E Centre for Innovation, Research and Competence in the Learning Economy L U N D U N I V E R S I T Y P.O. Box 117, SE-221 00 Lund, Sweden Swedish inventors ‐ matching to registers and descriptive data Presentation at APE-INV Brussels September 5 th 2011 Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se
2
On the agenda What is so special with Swedish data 1st matching 2nd matching Future – how to reach 100% match rate? (Results)
3
Linking inventors to registers EPO applied patents 1978-2009 for inventors with addresses in Sweden. Matching done on name-home address combinations Problem 1: different inventors may have the same name Problem 2: addresses may be old How to verify person identity and connect to Swedish register data?
4
Swedish data Q: What makes Swedish data so exciting (and why we want a high match rate)? A: Through Statistics Sweden it is possible to connect individuals to register data which connects several levels of information relevant for innovation studies: Individual level: field/level of education, age, income, gender, workplace Regions: workplace, home municipality Sectoral level: sectors, firm size, level of R&D... can give a multifacetted view of innovation, but need a personal identifier ”personnummer” to do this e.g. 19500131-3422 Birth date Jan 31st, 1950Even number = female
5
1st matching (Oct-Dec 2010) All Swedes (incl. Personnummer) listed on address register ”SPAR” Matching of addresses through InfoTorg stores addresses/address changes latest 3 years addition of personnummer – Individuals under 16 not matched Old patents added under the assumption that: Sven Ivar Johanson Storgatan 1= Storgatan 1 111 00 Stockholm Match rate 64% of inventor-patent pairs. Low peak 23% in 1978 to high peak 93% in 2008. This is because of mobility of inventors. Register 2008-2010Patent applied for in 1992
6
InfoTorg returned 56% match rate Manual check (visual – no robot) + 8%
7
64% match rate 1985-2005: present access to individual registers at Statistics Sweden2006-2009: additions as of Sep. 30th 2011
8
2nd matching (April-Sep 2011) Use public access to registers (Swedish geneaological association ) – CD:s of Swedish population (1980)/1990 published by old addresses and birth date – CD ”Book of dead” 1901-2009 address at death + personnummer Match birth date + name to personnummer using service by InfoTorg or online sources
9
Methodology Extract data from Swedish deadbook and Swedish genealogy records for 1990 (to some extent also 1980) on all individuals in the population by letter Generate a variable containing name, address and postal address for all individuals in the population as well as for inventors who are not fully matched
10
Normalized Levenshtein (”strgroup”) in STATA An example of the ”name-address” string: ”Sven Ivar Johanson, Storgatan 1, 111 00 Stockholm” (from EPO) = ”Sven Ifwar Johanson, Storgatan 1, 111 00 Stockholm” (from Swedish population 1990) Replace/insert 3 letters to make strings equal Divided by length of shortest string (48) (3/48) = 0.0625 (=a good hit)
11
Adding date of birth 1.1990 Levensthein names & adresses 2.1990 Levensthein unique names 3.Levenshtein from CD dead 1901-2009 - names and adresses 4.Strgroup: similarity on name-address hits 1-3 5.Some manual additions and minor changes 6.1980 Levenshtein names and addresses (letters D&H)
12
Methodology: continued Manually examine each match to see whether Levenshtein-command has matched correctly Some hits discarded incl ambiguous name match hits
13
New match rate 80%
14
Adding personnummer (ongoing) New match rate 80%, but not full personnummer. What to do? 1.Use date of birth-part of personal number for fully matched inventors 2.Join all possible combinations of birth dates for those fully matched and those with only birth dates. 3.Run Levenshtein-distance on inventor names 4.Small Levenshtein-distance: accept that the inventors are the same since name and birth date match 5.Large Levenshtein-distance: reject 6.Further, manually check remaining inventors. Look at addresses for further confirmation if uncertain.
15
Adding personnummer ctd. Use Deathbook yrs 1975-2009. Use date of birth-part of personal numbers Re-run step 2-6 on previous slide
16
Adding personnummer ctd. Problem: not all inventors were previously identified no 4 last digits Two options to get full personal numbers from birth dates: 1.Use InfoTorg again with name + added parameter ”birthdate” 2.Manually add four last digits by using internet service (www.upplysning.se)www.upplysning.se
17
Some matching problems Difficult to match individuals who change last names (mainly women) or with common names and who move a lot. Two people with the same name can live on the same address (i.e. father names his son after himself) – possibility to match the wrong person. If detected, oldest person is chosen. For inventors affiliated with some firms (AstraZeneca), company address given
18
Towards 100% Idea: scoring methods based on identified inventors – Name – Identified co-inventors – Technology class – City – Postal code – Which algorithm? Statistics Sweden for validating parent/child name similarity problem? Use 1980 population CD? Strategy of focusing on highly productive unmatched inventors?
19
Suggestions/questions
20
Patent distribution by sector
21
Patent distribution in manufacturing (share of total patenting)
22
Patent distribution in services (share of total patenting).
23
Education level among inventors
24
Percentile distribution of inventors’ patent productivity. PercentileAll patentsContributionPatents 2004-07Contribution 2004-07 Percentile value 1%10.1210.11 5%10.2010.17 10%10.2510.20 25%10.331 50%10.8310.50 75%31.5021.00 90%63.0042.00 95%95.0063.00 99%2111.50125.83 Mean/inventor2.811.402.060.97 Number of inventors 18 489 8 526
25
Sectors, SNI92-codes, # inventors, contribution 2004-2005. SectorSNI92-codes Unique inventors, mean/year 2004- 2005 Contribution*, mean 2004-2005 % cooperation cross sector 1994-1995 % cooperation cross sector 2004-2005 Primary1000-149998.55.928% Manufacturing15000-379991567749.911% Services 38000-74999, 80410, 80423- 80425, 80427-80429, 85200, 85325, 91111-91330, 92110- 92130, 92310, 92330-92400, 92611-92614, 92621-99000 806.5411.123% Academia80301-80309 and **19072.654% Public sector75000-80299, 80421-80422, 80426, 85000-85140, 85311- 85324, 90000-90008, 92200, 92320, 92511/92530, 92615 62.528.467% * ”Contribution” counts patent fractions which adjusts for co-inventorship. ** ”Academia” can also in a few cases be found in the sectors R&D in technical and natural sciences (73101-73104) and in technical testing and analysis (74300).
26
Cooperation by sector, 2004-05 PrimaryManufacturingServicesAcademiaPublic sectorSum Primary 43%57%0% 100% Manufacturing 1%77%17%5%100% Services 1%66%24%9%100% Academia 0%29%48%22%100% Public sector 0%18%37%45%100%
27
The most important patenting academic institutions 2004-2005 Univ/institute Contributions/y ear Share Patents/billion research revenue SEK Patents/thousan d FTE, NTM Lund20.323%6.315.0 Uppsala11.613%4.29.7 Karolinska11.613%3.99.3 KTH9.811%5.78.7 Göteborg9.010%3.710.9 Linköping7.99%6.410.3 Chalmers7.28%5.18.6 Stockholm2.93%1.74.1 Umeå2.33%1.52.8 Sum82.694%4.49.3 Others (13)5.06%1.31.8
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.