Presentation is loading. Please wait.

Presentation is loading. Please wait.

| IFLA2010. Newspaper section | 2010-02-26 Changing preservations tasks for the German National Library: Some insights and preliminary remarks IFLA International.

Similar presentations


Presentation on theme: "| IFLA2010. Newspaper section | 2010-02-26 Changing preservations tasks for the German National Library: Some insights and preliminary remarks IFLA International."— Presentation transcript:

1 | IFLA2010. Newspaper section | 2010-02-26 Changing preservations tasks for the German National Library: Some insights and preliminary remarks IFLA International Newspaper Conference 2010 at IGNCA, New Delhi in India during 26th February to 28th February, 2010 "Digital Preservation and Access to news and views” Reinhard Altenhöner 1

2 | IFLA2010. Newspaper section | 2010-02-26 ToC 2 1.Starting situation / setting 2.Digital Preservation in DNB 3.Practical Example: E-Papers

3 | IFLA2010. Newspaper section | 2010-02-26  Publications issued in Germany since 1913  Since June 22, 2006: Online- / Net- publications are covered by the new law  Newspapers as well: Ca. 450 newspapers (this means selection!) are microfilmed every day  About 9.000 datasets in the central database  Some years ago we started some brainstorming on alternatives for this MF- approach  collecting e-papers from the web  Archiving of print-files  Cooperation with media / clipping agencies DNB: Our task: Collecting and archiving, providing permanent access 3

4 | IFLA2010. Newspaper section | 2010-02-26  Frequent update-processes  Dedicated publication workflow: database, Content-Management-System, presentation on the fly  Web 2.0-facilities for comments, blogging & tagging  Multiple ways of embedded advertisement  Complex navigation and search functions  Harvesting extremely difficult  some experiments (e.g. on newsletters), but no running workflow Characteristics Online- newspapers 4c

5 | IFLA2010. Newspaper section | 2010-02-26 „kopal“  Co-operative development of a long-term digital information archive  Start in 2004  Task: Development of a standardized long- term preservation solution to facilitate resp. solutions for other libraries / industries  Basis: DIAS (Digital Information and Archiving System) of the Royal Dutch Library, condensed and extended with peripheral open-source  Enhancement for cooperative usage  Development of an universal object scheme  Hosting outside the library (remote access) 5

6 | IFLA2010. Newspaper section | 2010-02-26 kopal: cooperation GWDG: Hosting IBM: Archiving SW DNB: Ingest/Acess SW SUB: Ingest/Acess SW Common task: Preservation Planning 6

7 | IFLA2010. Newspaper section | 2010-02-26 GWDG (Göttingen) DIAS by IBM Account 1 Account 2 SUB Göttingen DNB (Frankfurt) Local software Local software Local software Local software kopal: Structure & concept Partners nn 7

8 | IFLA2010. Newspaper section | 2010-02-26 Packaging Submission Information Package Object METS 1.4 UniversalObjectFormat LMER 1.2 – Long-term preservation Metadata for Electronic Ressources Header dmdSec amdSec File Section Structural Map Mets.xml 8

9 | IFLA2010. Newspaper section | 2010-02-26 Administration Interface koLibRI Online-Archivist Machine Interface

10 | IFLA2010. Newspaper section | 2010-02-26 Kopal preservation strategy  Migrate object with urn xxx into new format yyy  Migrate all objects  of format xxx and/or  that have been ingested before a certain date and/or  that are larger than zzz MB  into new format xyz (e.g. from TIFF to PNG)  Implementation of emulation view paths  No restriction as of file size or file format / type – all known and unknown file formats are being accepted (text, pictures, video, audio, executables,... etc.) 10

11 | IFLA2010. Newspaper section | 2010-02-26 Digital newspapers in DNB  Some results (collections) from digitisation projects -Simple graphics-data -access in a dedicated system -Including full text OCR & access  Online-Newspapers: Some pre-studies on objects like „Spiegel“ – but no running workflow  Concentration on e-papers 11

12 | IFLA2010. Newspaper section | 2010-02-26 Digitisation results in DNB 1 12

13 | IFLA2010. Newspaper section | 2010-02-26 Digitisation results in DNB 2 13

14 | IFLA2010. Newspaper section | 2010-02-26 E-papers in DNB Preliminary thoughts: Requirements  Structured normalised metadata-set: Article/photo – issue – newspaper  Persistent identification of each unique objects, linkage between them, citable  Added information for author / title on the article level is useful but not necessarily needed 14

15 | IFLA2010. Newspaper section | 2010-02-26  Quantity: -One newspaper: ca. 150 articles per day / 900 a week / 47.000 per year -21.150.000 per year  Start modestly  Retrodigitisation (collection started with 1913) will extend this to more than 1 bil. articles  Challenge in terms of resources and technical capacities E-paper requirements 15

16 | IFLA2010. Newspaper section | 2010-02-26  In cooperation with a vendor after a tender procedure  Ca. 20 important newspapers, starting with two  Metadata should be delivered in ONIX.  Harvesting Interface OAI-PMH  All data delivered in a XML-File  Integrated Digital Preservation in the kopal environment E-paper project (recently started) 16

17 | IFLA2010. Newspaper section | 2010-02-26 XML record for e-Papers 17

18 | IFLA2010. Newspaper section | 2010-02-26 E-Paper & Access  Principal question for access: Integration in Portal environment or dedicated (independent) search-area  Advanced requirements for segmentation of text  Direct link between portal (metadata) and text  Navigation / Browsing within the object, direct access to single chapters / pages  Zooming, scroll  Integrated Full text search  Print and Store facilities  DRM, IDM 18

19 | IFLA2010. Newspaper section | 2010-02-26 6 Film Information about actors, director, producers, music, sequence, year of production. Short description of the picture, video sequence… What is in the film, rights. Any other relevant information as short summary of content for fast access… Related books Year of printing, editions, authors, summary of the book…. Related internet links Year of printing, editions, authors, summary of the book…. Related music score Year of printing, editions, authors, summary of the book…. Related films Year of printing, editions, authors, summary of the book…. Related songs Year of printing, editions, authors, summary of the book…. Related news Year of printing, editions, authors, summary of the book…. Semantic Multimedia- Search 5 CORE Professionals (Media archives…) MANTLE Automated (Learning) SHELL End-User (Wikipedia) Open Knowledg eNetworks 4 Knowledge base Semantic relation 3 Face Logo Text Person Speaker 1 Speaker 2 Image Text Title Content- analysis 2 Automated optimisation 1 digitisation Reuse of results from CONTENTUS-project 19

20 | IFLA2010. Newspaper section | 2010-02-26 Data processing  Automated Page- segmentation (headlines, images, tables)  OCR + entity recognition  Full text search  Semantic search interface Based on:  Intellectual approved authority files  Statistical data analysis | 20 20

21 | IFLA2010. Newspaper section | 2010-02-26 Our solution currently 21 Integrated search and retrieval

22 | IFLA2010. Newspaper section | 2010-02-26 Next step: Integrated E-papers 22

23 | IFLA2010. Newspaper section | 2010-02-26 Integrated E-paper „ZEIT“ 1 23

24 | IFLA2010. Newspaper section | 2010-02-26 Bereitstellung von freien Texten 24 Integrated E-paper „ZEIT“ 2

25 | IFLA2010. Newspaper section | 2010-02-26 25 Integrated E-paper „ZEIT“ 3

26 | IFLA2010. Newspaper section | 2010-02-26 Reinhard Altenhöner mailto:r.altenhoener@d-nb.de http://www.d-nb.de 26


Download ppt "| IFLA2010. Newspaper section | 2010-02-26 Changing preservations tasks for the German National Library: Some insights and preliminary remarks IFLA International."

Similar presentations


Ads by Google