Download presentation
Presentation is loading. Please wait.
Published byJody Warren Modified over 9 years ago
1
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.cornell.edu Lecture 6 Populating Digital Libraries
2
2 herbert van de sompel KWF: populating DLs Originator digital object originator makes a digital object Data which consists of Key-Metadata handle client Repository digital object goes into a repository
3
3 herbert van de sompel Populating DLs originator makes a digital object born digital / convert to digital digital media formats document model: structure of digital object - later naming digital objects (identifiers) - later digital object goes into a repository technological/organizational issues central/decentral submission central/decentral storage submission direct by author / via organization quality control terms and conditions (copyright, …)
4
4 herbert van de sompel Populating DLs The way in which the issues are addressed have fundamental impact on: economics of the DL there is no free lunch success of a DL with the target group arXiv physics: teX, central submission arXiv CS: does not fly originator makes a digital object digital object goes into a repository
5
5 herbert van de sompel Populating DLs The way in which the issues are addressed have fundamental impact on: searchability/retrievability of do’s decentral submission&storage => distributed searching? do identifiers: URL 404 archiving of do’s choice of media formats, do-model, central/decentral organization originator makes a digital object digital object goes into a repository
6
6 herbert van de sompel originator makes a digital object
7
7 herbert van de sompel Convergence of media Evolution of digital representation of media: Text => Images => Audio => Video processing software/hardware initially high-end, later desktop Evolution of formats to represent the media Different formats can serve different purposes Compression / Destructive Compression
8
8 herbert van de sompel Evolution of representation of characters basic ASCII - 7 bit ftp://dkuug.dk/i18n/WG15- collection/charmaps/ANSI_X3.4-1968ftp://dkuug.dk/i18n/WG15- collection/charmaps/ANSI_X3.4-1968 EBCDIC - 8 bit http://www.natural- innovations.com/boo/asciiebcdic.htmlhttp://www.natural- innovations.com/boo/asciiebcdic.html language-specific ASCII extensions ASCII/ISO 8859-1 – 8 bit http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/i so_table.html http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/i so_table.html UNICODE - 16 bit (currently 49,194 characters) http://www.unicode.org/unicode/standard/standard.html http://www.unicode.org/unicode/standard/standard.html
9
9 herbert van de sompel Evolution of Representation of text 2 families: based on looks or based on content all kinds of Wordprocessor formats (starting mid 80’s) rtf (cross-wordprocessor format) doc: MS Word 6 will not read MS Word 1 (Lesk, p. 194) ps TeX SGML XML HTML
10
10 herbert van de sompel different formats / different purposes Text: Original - doc, TeX, wp Archival - SGML, RTF Presentation – ps, pdf, HTML Images: see http://www.imagemontage.com/Docs/GIF.htmlhttp://www.imagemontage.com/Docs/GIF.html Original: eps Archival: TIFF, PICT Presentation: JPEG, png, GIF Audio: see http://128.253.200.106/xplat/xplat.aud.htmlhttp://128.253.200.106/xplat/xplat.aud.html Original: AIFF, wav Archival: AIFF Presentation: mp3, RealAudio, wav Video: see http://www.hut.fi/~iisakkil/videoformats.htmlhttp://www.hut.fi/~iisakkil/videoformats.html Original: DV Archival: digital BETACAM Presentation: RealVideo, QuickTime
11
11 herbert van de sompel Born digital / Become digital analog domaindigital domain analog record digital recording digitization born analog born digital record 110110 000101 001101
12
12 herbert van de sompel Born digital Text: text typed into PC Images: image created from scratch in Photoshop Audio: computer generated audio files (C-Sound, Max DSP), software synths writing to disk, … Video: Special Effects in movies, Toy Story, …
13
13 herbert van de sompel Converting into digital Text/Images: Keying Speech-to-Text Scanning (lecture Anne Kenney): from paper to image quality: dpi, … OCR-ing from image to text quality: hardware/software ; heuristics ; learning
14
14 herbert van de sompel Converting into digital Audio: Sampling (DSP-cards) quality: sample rate (frequency – 44 kHz), bits/sample (dynamic range – 16 bit), mono/stereo, software tools for noise reduction, removal of clicks, … Text to Speech from text to phonemes from phonemes to audio file (MBROLA) Video: Capturing (Video-boards) quality: fps, window size, …
15
15 herbert van de sompel Converting to digital Rules of thumb: Create digital master copy in highest quality (although: see Kenney!) Archive master in format that includes some guarantees re longevity Do definitely not compress master in a lossy manner
16
16 herbert van de sompel digital object goes into a repository
17
17 herbert van de sompel KWF: populating DLs Originatordigital object user Originator digital object user submission model storage model publication model retrieval model
18
18 herbert van de sompel preprint archives (repositories)
19
19 herbert van de sompel Readings Lesk, M. 1997. Books into Bytes. In: Scientific American, March 1997. http://www.lesk.com/mlesk/sciam97/sciam97.html http://www.lesk.com/mlesk/sciam97/sciam97.html Van de Sompel, H. & Krichel, T. & Neslon, M. & others. 2000. The UPS Prototype: An experimental End-user service across E-Print archives. In: D-Lib Magazine. http://www.dlib.org/dlib/february00/vandesompel- ups/02vandesompel-ups.html http://www.dlib.org/dlib/february00/vandesompel- ups/02vandesompel-ups.html
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.