Download presentation
Presentation is loading. Please wait.
Published byHomer Floyd Modified over 9 years ago
1
| IFLA2010. Newspaper section | 2010-02-26 Changing preservations tasks for the German National Library: Some insights and preliminary remarks IFLA International Newspaper Conference 2010 at IGNCA, New Delhi in India during 26th February to 28th February, 2010 "Digital Preservation and Access to news and views” Reinhard Altenhöner 1
2
| IFLA2010. Newspaper section | 2010-02-26 ToC 2 1.Starting situation / setting 2.Digital Preservation in DNB 3.Practical Example: E-Papers
3
| IFLA2010. Newspaper section | 2010-02-26 Publications issued in Germany since 1913 Since June 22, 2006: Online- / Net- publications are covered by the new law Newspapers as well: Ca. 450 newspapers (this means selection!) are microfilmed every day About 9.000 datasets in the central database Some years ago we started some brainstorming on alternatives for this MF- approach collecting e-papers from the web Archiving of print-files Cooperation with media / clipping agencies DNB: Our task: Collecting and archiving, providing permanent access 3
4
| IFLA2010. Newspaper section | 2010-02-26 Frequent update-processes Dedicated publication workflow: database, Content-Management-System, presentation on the fly Web 2.0-facilities for comments, blogging & tagging Multiple ways of embedded advertisement Complex navigation and search functions Harvesting extremely difficult some experiments (e.g. on newsletters), but no running workflow Characteristics Online- newspapers 4c
5
| IFLA2010. Newspaper section | 2010-02-26 „kopal“ Co-operative development of a long-term digital information archive Start in 2004 Task: Development of a standardized long- term preservation solution to facilitate resp. solutions for other libraries / industries Basis: DIAS (Digital Information and Archiving System) of the Royal Dutch Library, condensed and extended with peripheral open-source Enhancement for cooperative usage Development of an universal object scheme Hosting outside the library (remote access) 5
6
| IFLA2010. Newspaper section | 2010-02-26 kopal: cooperation GWDG: Hosting IBM: Archiving SW DNB: Ingest/Acess SW SUB: Ingest/Acess SW Common task: Preservation Planning 6
7
| IFLA2010. Newspaper section | 2010-02-26 GWDG (Göttingen) DIAS by IBM Account 1 Account 2 SUB Göttingen DNB (Frankfurt) Local software Local software Local software Local software kopal: Structure & concept Partners nn 7
8
| IFLA2010. Newspaper section | 2010-02-26 Packaging Submission Information Package Object METS 1.4 UniversalObjectFormat LMER 1.2 – Long-term preservation Metadata for Electronic Ressources Header dmdSec amdSec File Section Structural Map Mets.xml 8
9
| IFLA2010. Newspaper section | 2010-02-26 Administration Interface koLibRI Online-Archivist Machine Interface
10
| IFLA2010. Newspaper section | 2010-02-26 Kopal preservation strategy Migrate object with urn xxx into new format yyy Migrate all objects of format xxx and/or that have been ingested before a certain date and/or that are larger than zzz MB into new format xyz (e.g. from TIFF to PNG) Implementation of emulation view paths No restriction as of file size or file format / type – all known and unknown file formats are being accepted (text, pictures, video, audio, executables,... etc.) 10
11
| IFLA2010. Newspaper section | 2010-02-26 Digital newspapers in DNB Some results (collections) from digitisation projects -Simple graphics-data -access in a dedicated system -Including full text OCR & access Online-Newspapers: Some pre-studies on objects like „Spiegel“ – but no running workflow Concentration on e-papers 11
12
| IFLA2010. Newspaper section | 2010-02-26 Digitisation results in DNB 1 12
13
| IFLA2010. Newspaper section | 2010-02-26 Digitisation results in DNB 2 13
14
| IFLA2010. Newspaper section | 2010-02-26 E-papers in DNB Preliminary thoughts: Requirements Structured normalised metadata-set: Article/photo – issue – newspaper Persistent identification of each unique objects, linkage between them, citable Added information for author / title on the article level is useful but not necessarily needed 14
15
| IFLA2010. Newspaper section | 2010-02-26 Quantity: -One newspaper: ca. 150 articles per day / 900 a week / 47.000 per year -21.150.000 per year Start modestly Retrodigitisation (collection started with 1913) will extend this to more than 1 bil. articles Challenge in terms of resources and technical capacities E-paper requirements 15
16
| IFLA2010. Newspaper section | 2010-02-26 In cooperation with a vendor after a tender procedure Ca. 20 important newspapers, starting with two Metadata should be delivered in ONIX. Harvesting Interface OAI-PMH All data delivered in a XML-File Integrated Digital Preservation in the kopal environment E-paper project (recently started) 16
17
| IFLA2010. Newspaper section | 2010-02-26 XML record for e-Papers 17
18
| IFLA2010. Newspaper section | 2010-02-26 E-Paper & Access Principal question for access: Integration in Portal environment or dedicated (independent) search-area Advanced requirements for segmentation of text Direct link between portal (metadata) and text Navigation / Browsing within the object, direct access to single chapters / pages Zooming, scroll Integrated Full text search Print and Store facilities DRM, IDM 18
19
| IFLA2010. Newspaper section | 2010-02-26 6 Film Information about actors, director, producers, music, sequence, year of production. Short description of the picture, video sequence… What is in the film, rights. Any other relevant information as short summary of content for fast access… Related books Year of printing, editions, authors, summary of the book…. Related internet links Year of printing, editions, authors, summary of the book…. Related music score Year of printing, editions, authors, summary of the book…. Related films Year of printing, editions, authors, summary of the book…. Related songs Year of printing, editions, authors, summary of the book…. Related news Year of printing, editions, authors, summary of the book…. Semantic Multimedia- Search 5 CORE Professionals (Media archives…) MANTLE Automated (Learning) SHELL End-User (Wikipedia) Open Knowledg eNetworks 4 Knowledge base Semantic relation 3 Face Logo Text Person Speaker 1 Speaker 2 Image Text Title Content- analysis 2 Automated optimisation 1 digitisation Reuse of results from CONTENTUS-project 19
20
| IFLA2010. Newspaper section | 2010-02-26 Data processing Automated Page- segmentation (headlines, images, tables) OCR + entity recognition Full text search Semantic search interface Based on: Intellectual approved authority files Statistical data analysis | 20 20
21
| IFLA2010. Newspaper section | 2010-02-26 Our solution currently 21 Integrated search and retrieval
22
| IFLA2010. Newspaper section | 2010-02-26 Next step: Integrated E-papers 22
23
| IFLA2010. Newspaper section | 2010-02-26 Integrated E-paper „ZEIT“ 1 23
24
| IFLA2010. Newspaper section | 2010-02-26 Bereitstellung von freien Texten 24 Integrated E-paper „ZEIT“ 2
25
| IFLA2010. Newspaper section | 2010-02-26 25 Integrated E-paper „ZEIT“ 3
26
| IFLA2010. Newspaper section | 2010-02-26 Reinhard Altenhöner mailto:r.altenhoener@d-nb.de http://www.d-nb.de 26
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.