Download presentation
Presentation is loading. Please wait.
Published byBarnard McBride Modified over 9 years ago
1
Bio-REGNET Developing an Ontology for the U.S. Patent System Siddharth Taduri, Hang Yu, Gloria T. Lau, Kincho H. Law, Jay P. Kesan Stanford University University of Illinois Urbana-Champaign 06/13/2011
2
Problem Statement Patent Validity and Enforcement Questions involves analysis of documents in various domains – World-wide Patents, PTO File Wrappers, Scientific Publications and Court documents The information is siloed into several diverse information sources 06/13/2011 2 Issued Patents and Applications Court Cases File Wrappers Technical Publications Regulations and Laws
3
The sources are diverse in structure, formats, semantics and syntax How to develop a comprehensive knowledge of patents in a particular technological space? Problem Statement Issued Patents and Applications Court Cases File Wrappers Technical Publications Regulations and Laws Specific Technical Domain 06/13/2011 3
4
Patents Documents Over 7 million U.S. patents In 2009, 485,312 patent applications were filed Information is contained in various sections of the documents; a full-text search alone is not sufficient – other metrics such as classification, citations etc. need to be considered Documents are available in HTML Format and can be easily parsed 06/13/2011 4
5
Court Cases Court Cases are not very well structured! Comparatively more difficult to parse information PACER – an electronic system to access databases for U.S. Courts - requires one to know party/assignee name, case number/type, etc. which may not be known 927 F.2d 1200 (1991) AMGEN, INC., Plaintiff/Cross-Appellant, v. CHUGAI PHARMACEUTICAL CO., LTD., and Genetics Institute, Inc., Defendants- Appellants. Nos. 90-1273, 90-1275. United States Court of Appeals, Federal Circuit. March 5, 1991. Suggestion for Rehearing Declined May 20, 1991. … Before MARKEY, LOURIE and CLEVENGER, Circuit Judges. … THE PATENTS On June 30, 1987, the United States Patent and Trademark Office (PTO) issued to Dr. Rodney Hewick U.S. Patent 4,677,195, entitled "Method for the Purification of Erythropoietin and Erythropoietin Compositions" (the '195 patent). The patent claims both homogeneous EPO and compositions thereof and a method for purifying human EPO using reverse phase high performance liquid chromatography. The method claims are not before us. The relevant claims of the '195 patent are: 1.Homogeneous erythropoietin characterized by a molecular weight of about 34,000 daltons on SDS PAGE, movement as a single peak on reverse phase high performance liquid chromatography and a specific activity of at least 160,000 IU per absorbance unit at 280 nanometers. * * * * * * 3.A pharmaceutical composition for the treatment of anemia comprising a therapeutically effective amount of the homogeneous erythropoietin of claim 1 in a pharmaceutically acceptable vehicle. 4.Homogeneous erythropoietin characterized by a molecular weight of about 34,000 daltons on SDS PAGE, movement as a single peak on reverse phase high performance liquid chromatography and a specific activity of at least about 160,000 IU per absorbance unit at 280 nanometers. 06/13/2011 5
6
Patent File Wrappers File Wrappers are folders which contain all documents exchanged between a patent applicant and the patent office Every File Wrapper is different! No standardized ordering of events The relevant information is embed within lots of irrelevant text File Wrappers are available as images requiring additional processing in order to extract text EventsText 06/13/2011 6
7
There are many aspects of these documents which can be utilized; especially the cross-referencing between the documents PATENT United States Patent, 5,955,422 September 21, 1999 Production of erthropoietin Abstract: Disclosed are novel polypeptide s possessing part or all of the primary structural conformation and one or more of the biological properties of mammalian erythropoietin ("EPO") … Inventors: Lin; Fu-Kuen (Thousand Oaks, CA) Assignee: Kirin-Amgen, Inc. (Thousand Oaks, CA) Appl. No.: 08/100,197 Filed: August 2, 1993. COURT CASE 314 F.3d 1313 (2003) AMGEN INC., Plaintiff-Cross Appellant v. HOECHST MARION ROUSSEL, INC. (now known as Aventis Pharmaceuticals, Inc.) and Transkaryotic Therapies, Inc., Defendants- Appellants. … Plaintiff-Cross Appellant Amgen Inc. is the owner of numerous patents directed to the production of erythropoietin ("EPO"), …alleging that TKT's Investigational New Drug Application ("INDA") infringed United States Patent Nos. 5,547,933; 5,618,698; and 5,621,080. The complaint was amended in October 1999 to include United States Patent Nos. 5,756,349 and 5,955,422, which issued after suit was filed. FILE WRAPPER U.S. Patent 5,955,422 … Claims 61-63 are rejected under 35 U.S.C. § 103 as being unpatentable over any one of Miyake et al., 1977 (R) … In accordance with the provisions of 37 C.F.R. §1.607, the present continuation is being filed for the purpose of … Publication Database REGULATIONS: U.S. Code Title 35, C. F. R Title 37, M. P. E. P. … BIOPORTAL: DOMAIN KNOWLEDGE Cross-Referencing 06/13/2011 7
8
Basis on Developing Patent System Ontology Established semantics allow us to reason over the classes, properties and instances to infer new facts Documents can be connected to form a network similar to citation networks. Only now we have not just citations, but other metadata such as co-inventorships, technological classification and other cross-domain relevancy metrics between documents (ex: patents occurring in court cases etc.) Allows us to perform link analysis using algorithms such as Page Rank to establish importance Can develop rules to perform additional inferences over the knowledge 06/13/2011 8
9
Single Domain Return all patent documents which contain the keyword “erythropoietin” in the “claims” Return all court cases which involve “Amgen_Inc” either as the plaintiff, defendant of both, and from the court “courtA” Multi-domain: Return all patents which contain the keyword – “erythropoietin” in the “claims”, which have been challenged in the courts The complexity of the queries, depends on the user’s requirement In general, the ontology should be able to answer: 1.Textual queries 2.Metadata queries, with numeric filters 3.Multi-source queries Competancy Questions 5/24/2011 9
10
Class Hierarchy - I 06/13/2011 10
11
Class Hierarchy - II 06/13/2011 11
12
Class Hierarchy - III 06/13/2011 12
13
Parsing the document to instantiate the Ontology Case 1 Amgen.. Chugai.. hasPlaintiff hasDefendant Documents are automatically parsed using a regular expression based script Separate scripts needed for each document domain Ontology is automatically instantiated using the Protégé-OWL API 06/13/2011 13
14
Simple questions can be answered by currently existing systems Return all Patents by the Inventor – “Fu-Kuen Lin” Return all Court Cases prior to yyyy-mm-dd Return all the patent documents which contain the keyword “erythropoietin” in the Claims and Assigned to “Amgen_Inc” The Patent System Ontology is intended to answer simple queries as well as complex queries which span more than a single information domain Return a court case which involves 3 or more patents From a file wrapper, identify the patents involved in an interference, display information about the inventor, assignee, and claims of that patent. Further, enlist the other patents the inventor owns, if any. Note: The patent system ontology allows inferring details about one document type (patents), based on the information from other document types (file wrappers) What can you ask the Patent Ontology? 06/13/2011 14
15
Return all the patent documents which contain the keyword “erythropoietin” in the Claims and Assigned to “Amgen_Inc”. What technology classes do these patent documents belong to? SPARQL Query: Example Query PatentInventor 5856298Strickland_Thomas_W 5885574Elliott_Steven_G 7304150Egrie_Joan_C 7304150Elliott_Steven_G 7304150Browne_Jeffrey_K 7304150Sitney_Karen_C 7217689Elliott_Steven_G 7217689Byrne_Thomas_E 6319499Elliott_Steven_G 5756349Lin_Fu-Kuen SELECT DISTINCT ?patent ?inventor FROM WHERE{ ?patent a ont:Patent. ?patent ont:hasAbstract ?abs. ?abs ont:resourceVal ?val. ?val bif:contains "erythropoietin". ?patent ont:hasAssignee ont:Amgen_Inc. ?patent ont:hasInventor ?inventor } Limit 10 06/13/2011 15
16
54 Classes, 40 Properties and over 15,000 individuals from 1150 patents, 30 court cases and one partially instantiated file wrapper Used Protégé-OWL to edit the ontology and Protégé-OWL API to programmatically instantiate physical documents Can query any SPARQL endpoint such as Protégé or Virtuoso’s Triple Store Can also use SWRL to query (We haven’t developed SWRL query rules) So Far … 06/13/2011 16
17
Use-Case: Erythropoietin 5 Core patents – U.S. Patents 5,621,080, 5,756,349, 5,955,422, 5,547,933, 5,618,698 135 directly related patents (through citations) form our gold standard for computing formal measures such as Precision and Recall Total patent corpus of 1150 patents Identified over related 3000 publications through citations. These are available on PubMed and can be accessed through Entrez – A tool that provides a search interface to PubMed database Around 30 court cases, patent litigation involving major companies including Amgen, Hoechst Marion Roussel, Inc., Transkaryotic Therapies, Inc. Current Corpus : experimental platform to test the overall effectiveness of the framework 06/13/2011 17
18
Querying BioPortal to Extract Concepts and Terms 06/13/2011 18
19
Original Term: Erythropoietin Synonyms: Erythropoietin, Recombinant Erythropoietin, erythropoietin receptor binding, Hematopoietin, Recombinant EPO, Erythrocyte Colony Stimulating Factor, Epoetin, EPO … Children: Darbopoietin Alfa, Epoetin Alfa, Epoetin Beta … Parents: Colony Stimulating Factors, cytokine receptor binding, recombinant hematopoietic growth factors… Grand-Parents: hematopoietic growth factor, receptor binding, recombinant growth factor … An appropriate ranking function is to be applied to balance the more general terms. Heuristically, we assign a higher weight to synonyms, and a lower weight as we traverse away from the concept node Resulting Query: “original term” OR [synonyms]^weight OR [children]^weight OR …. Expanded Query 06/13/2011 19
20
1.Use bio-ontologies to expand user’s query, covering broader terms and concepts 2.Search document domain using expanded query 3.Use patent system ontology’s properties to relate documents (from all document domains) 4.Support user feedback to ensure search progresses in right directions Current prototype framework Patent System Ontology 06/13/2011 20
21
Querying with SPARQL 06/13/2011 21 SELECT ?subject ?predicate ?object WHERE { ?subject ?predicate ?object } Variables Operation Triples SPARQL is a query language for RDF Syntactically very similar to SQL – for relational databases Any number of variables can be specified Many triples can be used in conjunction to form more complex queries We will use Virtuoso’s triple store to query the ontology
22
SELECT DISTINCT ?cases WHERE { ?cases a :CourtCase. ?cases :hasBody ?caseBody. ?caseBody :resourceVal ?comment. FILTER REGEX (?comment, "erythropoietin", "i"). } Court Cases with “Erythropoietin” 06/13/2011 22 Case_4: Amgen v/s Chugai … Case_5: Amgen v/s Genetics … Case_2: Amgen v/s Chugai … Case_3: Amgen v/s F. Hoffma… …. 30 Cases retrieved
23
SELECT DISTINCT ?patents WHERE { ?cases a :CourtCase. ?cases :hasBody ?caseBody. ?caseBody :resourceVal ?comment. FILTER REGEX (?comment, "erythropoietin", "i"). ?cases :patentsInvolved ?patents. } Patents Involved in the Court Cases 06/13/2011 23 5411868 5621080: Production of Erythropoietin 5547933: Production of Erythropoietin 5618698: Production of Erythropoietin 5756349: Production of Erythropoietin 5955422: Production of Erythropoietin 5441868 4703008 4677195 5322837 Core Patents are in bold
24
SELECT DISTINCT ?doc WHERE { :FileWrapper_5955422 :contains ?doc. ?doc :hasDate ?date } ORDER BY ?date List of Events in the File Wrapper 07_609741 07_609741_Amendment_1 07_609741_Interference_1 07_609741_Rejection_1 07_957073_Amendment_1 … P5955422 (Issued Patent) 06/13/2011 24
25
SELECT DISTINCT ?claim WHERE { :07_609741 :hasClaim ?claim. } ORDER BY ?claim Initial Claims of File Wrapper 07_609741_claim_1 07_609741_claim_2 07_609741_claim_3 … 07_609741_claim_60 A purified and isolated polypeptide having part or all of the primary structural conformation and one or more of the biological properties of naturally occurring erythropoietin and characterized by being the product of procaryotic or eucaryotic expression of an exogenous DNA sequence. 06/13/2011 25
26
SELECT DISTINCT ?claim WHERE { :07_609741_Interference_1 :InterferingClaims ?claimInt. :07_609741_Interference_1 :affectedClaims ?claim. } ORDER BY ?claim Summary of Interference Record P4879272_claim_2 P4879272_claim_3 An erythropoietin-containing, pharmaceutically-acceptable composition wherein human serum albumin is mixed with erythropoietin either during the preparation of said composition or just before administration thereof. 07_609741_claim_60 07_609741_claim_61 07_609741_claim_62 An erythropoietin-containing, pharmaceutically-acceptable preparation wherein human serum albumin is mixed with erythropoietin. 06/13/2011 26
27
One needs to know SPARQL in order to query One needs to know the semantics of the ontology such as the relations, domain and range restrictions etc. Performing manual querying can be very time consuming. Automation is needed Domain specific semantics need to be separately integrated Probabilistic weighing – ranking inventors, assignees, patents etc. is not possible using the SPARQL endpoint We are developing a user-friendly automated tool to search the patent system Current Limitations 06/13/2011 27
28
Include other information sources – publications, regulations, laws Develop automated tool and search framework (Currently under development) Experiment with more use cases outside of the biomedical domain Future Work 06/13/2011 28
29
Tool Snapshot 06/13/2011 29
30
Acknowledgement 06/13/2011 30 This research is partially supported by NSF Grant Number 0811975 awarded to the University of Illinois at Urbana-Champaign and NSF Grant Number 0811460 to Stanford University. Any opinions and findings are those of the authors, and do not necessarily reflect the views of the National Science Foundation.
31
Please Visit the System Demonstration Thank You! Questions? 06/13/2011 31
32
Extra Slides 06/13/2011 32
33
SELECT DISTINCT ?inv ?class ?assignee WHERE { ?cases a :CourtCase. ?cases :hasBody ?caseBody. ?caseBody :resourceVal ?comment. FILTER REGEX (?comment, "erythropoietin", "i"). ?cases :patentsInvolved ?patents. ?patents :hasInventor ?inv. ?patents :hasUSClass ?class. ?patents :hasAssignee ?assignee. } Common US Classes, Inventors and Assignee 06/13/2011 33 ?inv Lin_Fu-Kuen Hewick_Rodney_ M Seehra_Jasbir_S ?class USPC 530/380 USPC 530/399 USPC 530/397 USPC 514/8 USPC 435/69_6 USPC 530/835 USPC 530/388_7 … ?assignee Kirin-Amgen_Inc Genetics_Institute_In c …
34
SELECT DISTINCT ?forw ?backw WHERE { ?cases a :CourtCase. ?cases :hasBody ?caseBody. ?caseBody :resourceVal ?comment. FILTER REGEX (?comment, "erythropoietin", "i"). ?cases :patentsInvolved ?patents. ?patents :hasCitation ?forw ?backw :hasCitation ?patents. } Extracting Citations 06/13/2011 34 Results 6541033 4710473 4358535 4558005 4465624 4757006 4399216 4558006 3865801 3033753 …
35
Generated Results 06/13/2011 35 Around 30 court cases Several patents including core patents and forward/backward citations Can search patents by the inventors, assignees and/or US class identified What’s more? Can go search court cases with new keywords or information gathered Gathered Results Case_4: Amgen v/s Chugai … Case_5: Amgen v/s Genetics In. Case_2: Amgen v/s Chugai … …. 5621080: Production of Erythropoietin 5547933: Production of Erythropoietin 5618698: Production of Erythropoietin 5756349: Production of Erythropoietin 5955422: Production of Erythropoietin … 5441868 4703008 4677195 5322837 … Patents with Inventor: Lin_Fu- Kuen Patents owned by Genetics_Inc …
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.