CLAIMS CLassification Automated InforMation System Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS Project Manager WIPO & Caspar J. Fall, CLAIMS Consultant ELCA ICIC’03, Nîmes, 22 October 2003
Agenda Introduction to CLAIMS project (PF) Computer-assisted categorization prototypes (CJF) CLAIMS Categorizer perspectives (PF)
1. Introduction to CLAIMS Project
1.1 CLAIMS Context World Intellectual Property Organization (WIPO) International Patent Classification (IPC) Classification Automated Information System (CLAIMS)
1.2 CLAIMS Project Objectives IT support for the promotion of the IPC IPC Reform and revision support IPC Tutorials Translation and Natural Language Search in the IPC IPC Categorization assistance to Patent Offices
2. Computer-assisted Categorization
2.1 Objectives Develop a solution for predicting International Patent Classification (IPC) codes Facilitate accurate classification in small and medium patent offices Support for documents in multiple languages Categorization assistance tool Open questions Depth of computer-assisted categorization What accuracy?
2.1 Key issues Survey of automated categorization research Patent categorization The IPC is a hierarchical classification 120 classes, 628 subclasses, 69’000 groups Patents have secondary IPC codes The categories are modified over time Vocabulary very diverse and technical
2.1 Patent categorization approach Machine-learning method to recognize categories Statistical distribution of words Establish training data Training documents with good IPC codes 210’000 to 830’000 documents Disadvantages No need for keywords Easy to train the tools Can support many languages Never absolute certainty in the results Difficult to have reliable full automation Advantages
2.2 Prototype Custom development Measure categorization success State-of-the-art algorithm Language independent Measure categorization success Compare the predictions with other manually classified documents
2.2 Prototype results
2.2 Improving accuracy with category refining Scenario 1 Scenario 2 validate refine direct
2.3 Conclusions It works well! To get accurate results, one needs: Useful user assistance Direct categorization at subclass level possible IPC codes can be refined accurately to main group level To get accurate results, one needs: Large datasets Good category coverage Accurate IPC codes Read the proceedings for more details Demonstration available after the presentation
3. IPCCAT
3.1 CLAIMS Categorizer Perspectives 1. Implementation : IPCCAT 2. Training sets for IPC Categorization: English, French, Spanish and Russian, German possibly chinese 3. IPC Data sets improvement & Categorizer Retraining
3.2 CLAIMS Categorizer Perspectives 4. Improve integration of the IPC Categorizer with other CLAIMS tools 5. CLAIMS policy for distribution of data sets in various Languages
3.2 Access to IPCCAT for PCT Login: IBGST01 Password: clobterib
Questions / Answers Patrick Fiévet: patrick.fievet@wipo.in
Thank you for your attention CLAIMS CLassification Automated InforMation System Thank you for your attention