Download presentation
Presentation is loading. Please wait.
Published byCathleen Bryant Modified over 8 years ago
1
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 1 Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine in the German National Library Frank Busse
2
Outline 1.General Information 2.Automatic Classification of DNB Subject Categories 3.Automatic Classification of DDC Short Numbers for Medicine | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 2
3
General Information | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 3
4
4 Automated Cataloguing – why?
5
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 5 2009Start of PETRUS project 2010 Ceasing of intellectual cataloguing of online publications 2012 Automatic classification / DNB Subject Categories 2014 Automatic indexing 2015 Automatic classification / DDC Short Numbers 2015 PETRUS project completed Timeline
6
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 6 Further information: http://www.dnb.de/EN/Erwerbung/Inhaltserschliessung/inhaltserschliessung_node.html Subject cataloguing DNB Subject Categories Subject headings DDC numbers Subject Cataloguing at the DNB
7
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 7 Automatic Classification of DNB Subject Categories
8
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 8 Since 2004 Based on Dewey Decimal Classification (DDC) 102 categoriescategories DNB Subject Categories
9
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 9 330 Economics 560 Paleontology 640 Home and family management Examples of Subject Categories
10
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 10 Automatic Classification Start: 2012 Method: machine learning / SVM Document type: All online publications / without fiction PDF (since 2012) Epub (since 2015) Language Ger/Eng Volume: 444.586 online publications (03/2016)
11
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 11 Supervised learning (Learning by example) Pattern recognition Generalization of rules Classifying unknown objects Machine Learning
12
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 12 Averbis GmbH / Freiburg im Breisgau Averbis GmbH Averbis Extraction Platform (AEP) Version 2.2.2a Improvements and further development Software
13
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 13 Workflow Training Base Create a model Software: Averbis Software Routine Daily processing of new online publications Retro-processing Software: Averbis Software DNB Interface CBS
14
Routine | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 14
15
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 15 Training Selection Training data Parameter setting Linguistic analysis Training
16
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 16 Training Data Online publications & digitised Tables of Contents (ToC) Since 2004 Language Ger/Eng April 2016: 451.333 Online publications & ToC
17
Training Workflow 17 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 Selection Training data Parameter setting Linguistic analysis Training
18
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 18 Parameter Setting Language Text length Metadata weighting Exclusion conditions etc.
19
Training Workflow 19 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 Selection Training data Parameter setting Linguistic analysis Training
20
Training Workflow 20 | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 Selection Training data Parameter setting Linguistic analysis Training
21
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 21 Quality Management sample check data analysis improvement Two ways of generating sample data: Intellectual supervision Comparison with printed edition
22
Results 2012 - 2015 Classified objects: 413.363 Sample check: 73.509 (18%) Result: 75% correct | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 22
23
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 23 DDC Short Numbers for Medicine
24
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 24 DDC Short Numbers for Medicine Developed in 2006/2007 Classification of printed medical theses Fast and time-saving
25
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 25 Example Book content: Study Overweight Children Kiel 2000-2009 DNB-SC610 DDC618.92398009435123090511 Short Number618.92398009435123090511
26
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 26 DDC Short Numbers Start: Oct. 2015 Method: machine learning / SVM Document type: Subject Category 610 „Medicine and health“ Online publications (PDF / Epub) Language Ger/Eng Volume: 8.121 online publications (03/2016)
27
Results October – December 2015 Classified objects: 4.072 Sample check: 574 (14%) Result: 74% correct | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 27
28
Future challenges | 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 28 Improve results Development of DDC Short Numbers for other DNB Subject Categories No „automatic DDC“ with this tool
29
| 29 | Machine-based issuing of DNB Subject Categories and DDC Short Numbers for Medicine | 25. April 2016 29 Thank you for your attention! Questions? Frank Busse German National Library Section Automatic Indexing, Online Publications f.busse@dnb.de
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.