| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine in the German National Library Frank Busse
Outline 1.General Information 2.Automatic Classification of DNB Subject Categories 3.Automatic Classification of DDC Short Numbers for Medicine | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April
General Information | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April
4 Automated Cataloguing – why?
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2009Start of PETRUS project 2010 Ceasing of intellectual cataloguing of online publications 2012 Automatic classification / DNB Subject Categories 2014 Automatic indexing 2015 Automatic classification / DDC Short Numbers 2015 PETRUS project completed Timeline
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April Further information: Subject cataloguing DNB Subject Categories Subject headings DDC numbers Subject Cataloguing at the DNB
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April Automatic Classification of DNB Subject Categories
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April Since 2004 Based on Dewey Decimal Classification (DDC) 102 categoriescategories DNB Subject Categories
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April Economics 560 Paleontology 640 Home and family management Examples of Subject Categories
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April Automatic Classification Start: 2012 Method: machine learning / SVM Document type: All online publications / without fiction PDF (since 2012) Epub (since 2015) Language Ger/Eng Volume: online publications (03/2016)
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April Supervised learning (Learning by example) Pattern recognition Generalization of rules Classifying unknown objects Machine Learning
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April Averbis GmbH / Freiburg im Breisgau Averbis GmbH Averbis Extraction Platform (AEP) Version 2.2.2a Improvements and further development Software
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April Workflow Training Base Create a model Software: Averbis Software Routine Daily processing of new online publications Retro-processing Software: Averbis Software DNB Interface CBS
Routine | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April Training Selection Training data Parameter setting Linguistic analysis Training
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April Training Data Online publications & digitised Tables of Contents (ToC) Since 2004 Language Ger/Eng April 2016: Online publications & ToC
Training Workflow 17 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 Selection Training data Parameter setting Linguistic analysis Training
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April Parameter Setting Language Text length Metadata weighting Exclusion conditions etc.
Training Workflow 19 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 Selection Training data Parameter setting Linguistic analysis Training
Training Workflow 20 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 Selection Training data Parameter setting Linguistic analysis Training
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April Quality Management sample check data analysis improvement Two ways of generating sample data: Intellectual supervision Comparison with printed edition
Results Classified objects: Sample check: (18%) Result: 75% correct | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April DDC Short Numbers for Medicine
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April DDC Short Numbers for Medicine Developed in 2006/2007 Classification of printed medical theses Fast and time-saving
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April Example Book content: Study Overweight Children Kiel DNB-SC610 DDC Short Number
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April DDC Short Numbers Start: Oct Method: machine learning / SVM Document type: Subject Category 610 „Medicine and health“ Online publications (PDF / Epub) Language Ger/Eng Volume: online publications (03/2016)
Results October – December 2015 Classified objects: Sample check: 574 (14%) Result: 74% correct | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April
Future challenges | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April Improve results Development of DDC Short Numbers for other DNB Subject Categories No „automatic DDC“ with this tool
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April Thank you for your attention! Questions? Frank Busse German National Library Section Automatic Indexing, Online Publications