Patent Citation Networks Bernard Gress Fannie Mae Inc., Washington DC. Forthcoming in The Mathematica Journal
The Patent Citation Dataset Patent citations are part of the legal patent process where the patent applicant has the duty to disclose any knowledge of 'prior art' amongst previous patents. Some objectivity in the process is provided by the government patent examiner who is supposed to be an expert in the area and who approves the final citation. The network established by patent citations allows one to trace the flow of technology through time, from patent to patent, and across fields. Studies of technological spillover effects, the impact or influence of individual patents, the rates of technological development, and other such issues, can be assisted by the consideration of patent citations.
The Patent Citation Dataset - continued Hall, Jaffe, and Trajtenberg, and the National Bureau of Economic Research (NBER) ( The primary database (cite75_99.zip) contains 22,309,440 pairs of pair-wise patent citation dataset on more than 3 million U.S. patents granted between January 1963 and December The secondary database (pat63_02f.txt) contains records for 3,414,910 patents with 25 fields each.
Structure of Primary Database (cite75_99.zip)
Structure of Secondary Database (pat63_02f.txt)
Patent Numbers Issued Serially
Two Types of Citation Networks A Citation Lineage –all of the progenitors and descendants by citation reference, so long as no siblings are brought into the picture A Citation Neighborhood –all those patents that are within a specified network distance of the patent of interest, regardless of relationship, including all 'siblings' and 'cousins'.
There are 14 nodes for the 1- generation lineage of patent # : PatentLineage[ ,1] –PatentsOfInterest ® { }, –PrintRules ® {1 ® , 2 ® , 3 ® , 4 ® , 5 ® , 6 ® , 7 ® , 8 ® , 9 ® , 10 ® , 11 ® , 12 ® , 13 ® , 14 ® , 15 ® } –Relations ® { ® , ® , ® , ® , ® , ® , ® , ® , ® , ® , ® , ® , ® , ® } –Vertexes ® { , , , , , , , , , , , , , , } –IndexPairs ® {{1,10},{1,11},{1,12}, {1,13},{1,14}, {1,15}, {2,1},{3,1}, {4,1},{5,1},{6,1},{7,1}, {8,1},{9,1}} –IndexRules ® {1 ® 10, 1 ® 11, 1 ® 12, 1 ® 13, 1 ® 14, 1 ® 15, 2 ® 1, 3 ® 1, 4 ® 1, 5 ® 1, 6 ® 1, 7 ® 1, 8 ® 1, 9 ® 1}
There are 15 nodes for the 1-generation Neighborhood of patent # : PatentNeighborhood[ ,1] –PatentsOfInterest ® { } –PrintRules ® {1 ® , 2 ® , 3 ® , 4 ® , 5 ® , 6 ® , 7 ® , 8 ® , 9 ® , 10 ® , 11 ® , 12 ® , 13 ® , 14 ® , 15 ® } –Relations ® { ® , ® , ® , ® , ® , ® , ® , ® , ® , ® , ® , ® , ® , ® } –Vertexes ® { , , , , , , , , , , , , , , } –IndexPairs ® {{1,10}, {1,11}, {1,12}, {1,13}, {1,14}, {1,15}, {2,1}, {3,1},{4,1},{5,1},{6,1}, {7,1}, {8,1}, {9,1}} –IndexRules ® {1 ® 10, 1 ® 11, 1 ® 12,1 ® 13, 1 ® 14, 1 ® 15, 2 ® 1, 3 ® 1, 4 ® 1, 5 ® 1, 6 ® 1, 7 ® 1, 8 ® 1, 9 ® 1}
Mathematica has Nice Built-in Graph Visualization Functions for Unstructured Graphs: GraphPlot GraphPlot3D ShowGraph But to Plot Graphs Over Time then Have to Use My Function: PatentPlot
Citation Networks Over Time - continued The 2-Generation Lineage of
Citation Networks Over Time - continued The 2-Generation Neighborhood of
GraphPlot[PatentNeighborHood[ { , }, 2]]
A nice illustration of the spread of technology over time.
Coloring nodes by criteria I also add functions to color nodes and edges by different patent characteristics, e.g. –Patent Technology Category (2- and 4-digit HJT) –Patent Originality/ Generality Index –Total Number of Citations
GraphPlot3D[PatentNeighborhood[ , 7]]
GraphPlot[PatentNeighborhood[ ,12]] Colored by technology category
Time Constrained The 7-Generation Neighborhood of # , Colored by Technology Class
Network Statistics and Structure Analysis Citation Lags Network Curvature Citation Count Distributions HJT Technology Categories Originality and Generality
Distributions of Backward Lags
Network Curvature the average number of patents reached at subsequent network distances -some simple graphs and their respective curvature plots-
Network Curvature the average number of patents reached at subsequent network distances
A much larger network of 91,000 patents over 40 years Curvature graphs for each year
Curvature graphs for each year, all together
Curvature graphs for each year, all together, different view
Patent Technological Composition
HJT Technology Category Distribution
Cumulative distribution of patents by tech category
Citation Count Distributions
Citation Count Distributions - continued
Generality and Originality where J is the number of patent classes, N i is the total number of forward citations for patent i, and N i,j is the number of forward citations in each patent class for patent i. The second term is a Herfindal-type of index. The 'Originality' of Patent 'i' is the same, except with backwards citations (i.e. citations made). "Thus if a patent cites previous patents that belong to a narrow set of technologies, the originality score will be low, whereas Citing patents in a wide range of fields would render a higher score."
Generality and Originality - Continued Not very interesting - at least no trends over time – and seemingly no necessary relationship to the concepts they intend to measure.
Conclusions Mathematica is a nice platform for networks analysis There is a lot of opportunity for research in this area Don’t know what the value of this research is to the IPI-ConfEx clientele
References [1] B. Hall, Jaffe, Trajtenberg, "The NBER Patent Citations Data File: Lessons, Insights and Methodological Tools," 2002, NBERpatdata.pdf NBERpatdata.pdf [2] S. Wolfram, A New Kind of Science, : 2002