Presentation is loading. Please wait.

Presentation is loading. Please wait.

9.a Report on IPC-related IT systems IPC Committee of Experts 50

Similar presentations


Presentation on theme: "9.a Report on IPC-related IT systems IPC Committee of Experts 50"— Presentation transcript:

1 9.a Report on IPC-related IT systems IPC Committee of Experts 50
Patrick Fiévet Head of IT Systems Section International Classifications and Standards Division Geneva February 08, 2018

2 Agenda Artificial Intelligence: IPC Text categorization in the IPC i.e. IPCCAT-Neural What is it / what is new ? Demonstration What it could be for What comes next in the short term What comes next in the longer term

3 IPCCAT-neural text categorization in the IPC
What is it about? Automatic Prediction (guess) of the most appropriate IPC symbols on the basis on a text input (e.g. patent abstract) i.e. 3 guesses among N categories with an associated level of confidence in this prediction Implementation based on several neural networks IP knowledge added value and technology used in the processing of the training collection is as important as the technology used in the classifier

4 IPCCAT-neural text categorization in the IPC at subgroup level !
IPCCAT neural 2016 at IPC main group level : Number of categories: 7,374 Precision (three guesses): 80% Number of Neural networks: ~700 IPCCAT neural 2018 at Subgroup level: Number of categories: 72,137 Precision (three guesses) based on 1.5 million of test cases: 82% Number of Neural networks: ~8,000

5 IPCCAT-neural text categorization in the IPC at subgroup level
Why was It actually doable? Recent evolution of the IPCCAT classifier (available on-demand as open source by the Olanto foundation) Added value in data processing: Training based on patent documents computed from DOCDB XML excerpts Computation of both IPC and CPC classifications Progress in computing power opens new R&D horizons e.g. GPU, text processing,…

6 Evolution of IPCCAT R&D over years
2018: IPC Group level ~73,000 categories : IPC Main Group level (~7,000 categories) 2017

7 IPCCAT-neural 2018: text categorization in the IPC at subgroup level
Training collection, IPC coverage and precision: Training collection: 27.7 million in EN and 4.4 in FR Coverage of the IPC (using IPC and CPC through concordance): 99% at subgroup level (EN) 91% at subgroup level (FR) Precision (three guesses): 82.5 % at subgroup level (EN) !! 72% at subgroup level (FR)

8 IPCCAT-neural 2018: text categorization in the IPC at subgroup level
Training collection, IPC coverage and precision: Side-effects of n-gram improvements on precision at IPC main Group level (three guesses): 89 % at Main Group level (EN) 83% at Main Group level (FR)

9 IPCCAT-neural text categorization in the IPC at subgroup level
Demonstration

10 Artificial Intelligence / IPCCAT-neural: on the way to assist IPC reclassification
Chronology: (Still a long way to go) Evidence that text categorization works at IPC subgroup level with acceptable precision: Done Integration of IPCCAT neural at sub-group level into IPCPUB v 7.5 (February 2018) Confirmation that Cross-lingual text categorization can assist in other languages than EN, even in absence of large training collections: to be prototyped based on a commercial CAT tool and limited testing (for costs containment reasons)

11 Artificial Intelligence / IPCCAT-neural: on the way to assist IPC reclassification
Chronology: (Still a long way to go) Incentives for R&D in automated text categorization: WIPO DELTA training collection (Bilateral discussion EPO-WIPO in progress) Q2 2018? Propose alternatives to Default Transfer e.g. more than one symbol based on IPCCAT guesses and confidence levels CE Decisions, WIPO resource planning, etc… (2019) Development of the production-scale solution integrating neural cross-lingual text categorization (based on IPCCAT neural and WIPO translate ?) (202x) Integration into IPCWLMS for Stage 3 reclassification (202x)

12 Incentive to R&D in text categorization: WIPO-Alpha training collection

13 Incentive to R&D in text categorization: WIPO-Delta training collection
Short term perspective: Further AI incentives for research and development institutes interested in automatic text categorization e.g. in patent classification Fully specified XML format (DONE) Complement the public WIPO-ALPHA training collection with a WIPO-DELTA XML collection ? (see ) from IPCWLMS (upload in database for R&D purpose and XML training collection export)

14 Text categorization in the IPC
Other 2018 perspectives: Cross lingual text categorization in the IPC at subgroup level Confirmation of expectations through prototyping of ES, FR, EN, DE, RU support through use of automatic translation by commercial product (bound by budget limitations) e.g. DE text translated text into EN and submitted to IPCCAT neural trained with EN documents Available through IPCPUB interface or web service (Q2 2018) IPCCAT retraining based on IPC (Q3 2018)

15 Thank you for your attention!
QUESTIONS? contact WIPO at


Download ppt "9.a Report on IPC-related IT systems IPC Committee of Experts 50"

Similar presentations


Ads by Google