Analyses of the Effect of Patent Category Diversity on Patent Quality Wenping Wang 1 ; Alan Porter 2,3 ; Ismael Rafols 4 ; Nils Newman 5 ; Yun Liu 1 1.School of Management and Economics, Beijing Institute of Technology, Beijing, China 2.Technology Policy and Assessment Center, Georgia Institute of Technology, Atlanta, USA 3.Search Technology, Inc., Atlanta, USA 4.SPRU -Science and Technology Policy Research, University of Sussex, Brighton, UK 5.IISC, Inc., Atlanta, USA
Research Objectives Aim: To gauge the effect of patent category diversity (PCD) on patent quality We address three research questions: How to measure PCD? How to measure patent quality? Does high PCD lead to higher patent quality? 2 cases studied: “Measuring chemical, physical properties” “Optical measurement”
Try the counterpart of our ‘Integration indicator’(Porter et al., 2007) on patents (Rao-Stirling diversity) Measure how integrative particular patents are based on the patents they cite A patent will have a higher PCD if there is greater heterogeneity among its cited patent categories. 3 Patent Category Diversity (PCD)
Patent Category Diversity Patent Category Diversity: Diversity of cited patents comprising different categories (e.g., NBER technology classes or International Patent Classes – IPCs) Characteristics of Diversity: Variety: Number of distinctive categories Balance: Evenness of the distribution Disparity: Degree to which the categories are different
Selected Measures of PCD (based on Rao, 1982; Stirling, 1998, 2007; Rafols and Meyer, 2010) Notation Indices Number of cited categories (Variety) Simpson diversity measuring a combination of Variety and Balance Rao-Stirling diversity incorporating Variety, Balance, and Disparity
A case to compute PCD Patent CategoryPatents Cited by the Focal PatentFocal Patent A B C Ref1 Ref2 Patent Ref3 Ref4 6
Selected Measures of Patent Quality Times Cited: most typical indicator of patent quality Patent H-index: at least h forward citing patents, each of which are not cited less than h times.
Data source Database: 2006 edition of the NBER patent database Advantage: detailed patent classification and multiple generations of patent citations. ▪Limitations: −ONLY incorporating the citation relations among the Utility patents granted in USPTO in −ONLY having basic information of the patent Sample from two categories: ▪IPC4=G01N -- Measuring chemical, physical properties (MCPP) ▪IPC4=G02B -- Optical measurement (OM) Timespan: Grant Year from 1996 to 2006 Country: First Assignee’s Country 8
Patent category: 4-digit International Patent Category(IPC4) Main IPC4 is adopted as the unique IPC4 of each patent. A finer classification will lead to higher diversity measures. Threshold: Number of cited categories>2 Patent category similarity matrix made with Square root of the Cosine Similarity between IPC4s (constructed by Rafols, 2011) Patent Categories 9
Regression Variables Unit of analysisIndividual patent Timespan Variables Dependent Variable Times Cited Explanatory Variable # of cited categories Simpson diversity Rao-Stirling diversity Control Variable Grant Year First Assignee's country Number of Patent References 10
Temporal Change of PCD Number of Cited IPC4s for both MCPP and OM is increasing modestly over time. 11
Temporal Change of PCD The Simpson diversity of MCPP seems be increasing from 0.62 to 0.66 in small steps, whereas that of OM has no significant change. 12
Temporal Change of PCD Annual Rao-Stirling diversity Range: 0.55 – 0.59 Difficult to conclude the trend of Rao-Stirling diversity for MCPP and OM 13
MCPP vs. OM Times CitedN. Cited IPC4SimpsonRao-Stirling MCPPLHHL* OMHLLH* Note: H: higher; L: lower; * Rao-Stiring diversity for OM is higher than that for MCPP in 6 of 9 years. Patent citations and diversity measures vary on different technology fields. Even though the patents in MCPP receive fewer citations than those in OM, the cited patents for MCPP comprise more distinctive categories, higher Simpson diversity & slightly lower Rao- Stirling diversity. 14
Initial Estimation of the Effect of PCD on Patent citations Scatter diagram: Times Cited vs. Simpson diversity Looking like cloud A bell with its top leaning to the right 15
Times Cited Times Cited of a given patent is the count data (with many zeros); the frequency follows the power law Data Source: USA-assigned patents in the field of Optical measurement granted in 1996, NBER Patent Database 16
Regression Model Why do we choose Zero-inflated Negative Binomial regression model? Ordinary Least Square Regression? Count data are highly non-normal. Zero-inflated Poisson Regression? Times Cited is too dispersed -- i.e., variance is much larger than its mean. Ordinary Count Models? Too many zeros. 17
Results of ZINB Gyear N obs. Correlation Coefficient -lnL Chi-Squared Test InterceptN. PatRefN. CitedIPC4SimpsonLog(theta)ChisqP(>Chisq) Coef ** Sig.<2e-16*** Coef e-10 *** Sig.<2e-16*** *** Coef e-11 *** Sig.<2e-16*** *** ** Coef e-12 *** Sig.<2e-16*** * *** *** Coef e-12 *** Sig. 2.29e- 05*** 1.11e-05***4.86e-10***5.99e-05***2.05e-10*** Table: Results of the ZINB models on Times Cited for OM Dependent Variable: Times Cited First Assignee’s Country: USA Note: (1) *** sig. 0.01, ** sig. 0.05, * sig. 0.1 (2) N. PatRef: Number of patent references; N. CitedIPC4: Number of cited categories; lnL: Log likelihood (3) The ZINB regression is run by R software for statistical computing and graphics (downloaded at 18
Results of ZINB Gyear N obs. Correlation Coefficient -lnL Chi-Squared Test InterceptN. PatRefStirlingLog(theta)ChisqP(>Chisq) Coef ** Sig. <2e-16*** *** *** Coef e-08*** Sig. <2e-16***5.79e-07*** e-06*** Coef e-15 *** Sig. <2e-16***1.05e-10*** * *** Coef < 2.2e-16 *** Sig. <2e-16***7.34e-10***2.01e-15*** *** Coef e-12 *** Sig **4.72e-07***8.08e-05***1.31e-09*** Table: Results of the ZINB models on Times Cited for MCPP Dependent Variable: Times Cited First Assignee’s Country: USA Note: (1) *** sig. 0.01, ** sig. 0.05, * sig. 0.1 (2) N. PatRef: Number of patent references; Stirling: Rao-Stirling diversity; lnL: Log likelihood (3) The ZINB regression is run by R software for statistical computing and graphics (downloaded at 19
Effect of PCD on TC # of cited categories has modest positive effect on TC. Simpson diversity has slightly negative correlation with TC. The effect of Rao-Stirling diversity on TC depends upon the categories. 20 Patent Indicators of PCD Correlation PCD vs. Times Cited CategoryPositiveNegativeSignificantRelation Sig. Chisq MCPP N. Cited IPC Simpson275 - Rao-Stirling OM N. Cited IPC Simpson096 - Rao-Stirling
Discussion(1) Different measures of diversity lead to different influence on citations. The diversity of different technology fields shows slight differences Both in MCPP and OM, number of cited categories (Variety) slightly favors patent quality; while Simpson diversity (incorporating both Variety and Balance) has a modest negative effect. Rao-Stirling diversity (comprising Variety, Balance and Disparity) shows opposite influence on TC for MCPP and OM 21
Discussion(2) The effect of PCD on patent quality depends upon the categories. The correlations for "Electric battery"(IPC4=H01M), "Electrography“ (IPC4=G03G) and "Medical preparations, toiletries"(IPC4=A61K) are not so significant as that in MCPP and OM. The analysis results vary in different patent category systems. A finer classification leads to higher diversity measures. No significant effect of PCD on citations if NBER technology category system(Hall et al. 2001), a coarser system, is selected as the patent category. 22
Limitations and further research Limitations: Patent category diversity is seen on the basis of problematic predefined categories (IPC4). Patent citations only include the citation relation among the Utility patents granted in USPTO. The patents that are not granted yet or granted in other patent office are not in this consideration. Due to the limitation of NBER patent database, we only currently select Times Cited and Patent H-index as the indices of patent quality. Further research: A more appropriate patent category system Case study in another patent database(e.g. EPO) 23
Chen, C., & Hicks, D. (2004). Tracing knowledge diffusion. Scientometrics, 59(2), Hall, B. H., Jaffe, A. B., & Trajtenberg, M. (2001). The nber patent citation data file: Lessons, insights and methodological tools. NBER Working Papers 8498, Bessen J. (2009). Matching Patent Data to Compustat Firms. NBER PDP Project User Documentation: Accessed Porter, A. L., Cohen, A. S., Roessner, J. D., & Perreault, M. (2007). Measuring researcher interdisciplinarity. Scientometrics, 72(1), 117–147. Rafols, A., & Meyer, M.(2010). Diversity and network coherence as indicators of interdisciplinarity: case studies in bionanoscience. Scientometrics, 82: Rao, C. R. (1982). Diversity and dissimilarity coefficients: A unified approach. Theoretical Population Biology, 21, 24–43. Stirling, A. (1998). On the economics and analysis of diversity. SPRU Electronic Working Paper. sewp28/sewp28.pdf Accessed Stirling, A. (2007). A general framework for analysing diversity in science, technology and society. Journal of the Royal Society Interface, 4(15), 707–719. Yegros, A., Amat, C. B., D'Este, P., Porter, A. L., & Rafols, I. (2011). Does interdisciplinary research lead to higher scientific impact?. Atlanta Conference on Science and Innovation Policy, Final.pdf Accessed References
Thank you!