0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS Project Manager WIPO & Caspar J. Fall, CLAIMS Consultant ELCA ICIC’03, Nîmes, 22 October 2003 CLassification Automated InforMation System
1 © WIPO – 2003 PF & CJF Agenda 1. Introduction to CLAIMS project (PF) 2. Computer-assisted categorization prototypes (CJF) 3. CLAIMS Categorizer perspectives (PF)
2 © WIPO – 2003 PF & CJF 1. Introduction to CLAIMS Project
3 © WIPO – 2003 PF & CJF 1.1 CLAIMS Context World Intellectual Property Organization (WIPO) International Patent Classification (IPC) Classification Automated Information System (CLAIMS)
4 © WIPO – 2003 PF & CJF 1.2 CLAIMS Project Objectives IPC Reform and revision support IPC Categorization assistance to Patent Offices IPC Tutorials Translation and Natural Language Search in the IPC IT support for the promotion of the IPC
5 © WIPO – 2003 PF & CJF 2. Computer-assisted Categorization
6 © WIPO – 2003 PF & CJF 2.1 Objectives Develop a solution for predicting International Patent Classification (IPC) codes Facilitate accurate classification in small and medium patent offices Support for documents in multiple languages Categorization assistance tool Open questions Depth of computer-assisted categorization What accuracy?
7 © WIPO – 2003 PF & CJF 2.1 Key issues Survey of automated categorization research Patent categorization The IPC is a hierarchical classification »120 classes, 628 subclasses, 69’000 groups »Patents have secondary IPC codes The categories are modified over time Vocabulary very diverse and technical
8 © WIPO – 2003 PF & CJF 2.1 Patent categorization approach Machine-learning method to recognize categories »Statistical distribution of words Establish training data »Training documents with good IPC codes »210’000 to 830’000 documents Disadvantages No need for keywords Easy to train the tools Can support many languages Never absolute certainty in the results Difficult to have reliable full automation Advantages
9 © WIPO – 2003 PF & CJF 2.2 Prototype Custom development State-of-the-art algorithm Language independent Measure categorization success Compare the predictions with other manually classified documents
10 © WIPO – 2003 PF & CJF 2.2 Prototype results
11 © WIPO – 2003 PF & CJF 2.2 Improving accuracy with category refining direct validate refine Scenario 1Scenario 2
12 © WIPO – 2003 PF & CJF 2.3 Conclusions It works well! Useful user assistance Direct categorization at subclass level possible IPC codes can be refined accurately to main group level To get accurate results, one needs: Large datasets Good category coverage Accurate IPC codes Read the proceedings for more details Demonstration available after the presentation
13 © WIPO – 2003 PF & CJF 3. IPCCAT
14 © WIPO – 2003 PF & CJF 3.1 CLAIMS Categorizer Perspectives 1. Implementation : IPCCATIPCCAT 2. Training sets for IPC Categorization: English, French, Spanish and Russian, German possibly chinese 3. IPC Data sets improvement & Categorizer Retraining
15 © WIPO – 2003 PF & CJF 3.2 CLAIMS Categorizer Perspectives 4. Improve integration of the IPC Categorizer with other CLAIMS tools 5. CLAIMS policy for distribution of data sets in various Languages
16 © WIPO – 2003 PF & CJF 3.2 Access to IPCCAT for PCT Login: IBGST01 Password: clobterib
17 © WIPO – 2003 PF & CJF Questions / Answers Patrick Fiévet:
18 © WIPO – 2003 PF & CJF Thank you for your attention CLAIMS CLassification Automated InforMation System