Presentation is loading. Please wait.

Presentation is loading. Please wait.

22.-23.11.2002EMANI Göttingen1 Data Formats in Mathematics EMANI and DML EMANI Meeting Göttingen, 22.-23.11.2002 Dr. Thomas Fischer Metadaten und Datenbanken.

Similar presentations


Presentation on theme: "22.-23.11.2002EMANI Göttingen1 Data Formats in Mathematics EMANI and DML EMANI Meeting Göttingen, 22.-23.11.2002 Dr. Thomas Fischer Metadaten und Datenbanken."— Presentation transcript:

1 22.-23.11.2002EMANI Göttingen1 Data Formats in Mathematics EMANI and DML EMANI Meeting Göttingen, 22.-23.11.2002 Dr. Thomas Fischer Metadaten und Datenbanken SUB Göttingen fischer@mail.sub.uni-goettingen.de

2 EMANI Göttingen: Thomas Fischer 22.-23.11.2002 2 Overview  Basis Situation  Purposes of formats  Formats for purposes Text formats for archiving Text formats for retrieval Image formats for archiving Presentation formats: text and images  Co-operation and compatibility Import of data Coordination

3 EMANI Göttingen: Thomas Fischer 22.-23.11.2002 3 Basis Situation Archiving for presentation:  Preserve original appearance of documents for clients of electronic journals over (long) time.  Archive documents in a fashion independent of software and hardware to minimize problems of mingration. Is there a possibility to unify the procedures of electronic publishing and archiving/presentation of this material.

4 EMANI Göttingen: Thomas Fischer 22.-23.11.2002 4 Purposes of Formats I  In the EMANI Project, we suggest to collect a vast corpus of data from different sources: Retrodigitized material (different formats of images) Digital material (different formats of text and images) Multimedia-type material (interactive, video …) Programs  This material comes in different formats, because there is no single format that would serve the needs of producing this material.  To integrate this material into one collection, standardization will be extremely valuable.

5 EMANI Göttingen: Thomas Fischer 22.-23.11.2002 5 Purposes of Formats II A closer look:  Retrodigitization: This process produces images. Requirements come from usage and administration. The participants can usually decide on the format.  Digital material: The majority of mathematics text is written in T E X these days, but articles may include images (e.g. EPS). There is material which is not produced in T E X (e.g. editorials) or where the sources are no longer available. Some text- processing formats (Word, WordPerfect) and presentation formats (PS, PDF) are common.  Programs should be archived as source code (essentially ASCII). Compiled programs cause migration problems.  Multimedia: Not considered for now (e.g. videos from fractal images).

6 EMANI Göttingen: Thomas Fischer 22.-23.11.2002 6 Formats for Purposes In the context of archiving mathematics, different formats are needed for different purposes:  Archiving: A stable representation of content and form of the article. The format should be independent of proprietary software and insensitive to minor errors.  Retrieval: Metadata and probably full textual representation of the contents. Formulas are still an open question (much more complex than in Chemistry).open question Chemistry  Presentation: The presentation of the material should be as true as possible to the original “look and feel”. If should be rendered by standard agents and not require special programs (beyond simple plug-ins) on the client’s side. Special measures may be necessary for the visually impaired.

7 EMANI Göttingen: Thomas Fischer 22.-23.11.2002 7 Text formats for archiving  The best-suited formats for archiving are mark-up formats like T E X or MathML. There is some progress in the conversion from MathML to T E X and vice versa.MathML to T E X vice versa  For documents in T E X, a suitable environment is necessary for correct rendering. This needs documentation and archiving of the respective additional files (stylesheets, fonts etc.)  Included images will come in different formats, usually EPS, but PDF and T E X-defined graphics are possible. It is not clear to me how to handle these. The format of the images is not necessarily obvious.

8 EMANI Göttingen: Thomas Fischer 22.-23.11.2002 8 Retrieval formats  For metadata, a scheme (application profile) is needed (another work package)  For retrieval using full text, an additional textual layer may to be added to the text file (unless some T E X based full text search mechanism becomes available). This textual layer should be stored in a normalized format, the one provided by the Text Encoding Initiative (TEI) might be useful (mark-up of structural information).  The alternative is an integrated search engine which provide access to the data by storing relevant information in it own database removed from the original data (like Google does for internet files)

9 EMANI Göttingen: Thomas Fischer 22.-23.11.2002 9 Image Formats for Archiving  High quality images are necessary for further pcosessing of the data: conversion to other formats, OCR, printing.  For management of the files, the possible inclusion of metadata is extremely desirable.  If the images are to be archived in compressed form, the compression algorithm should be lossless and free of copyrights.  This points to 600 dpi TIFF as standard format, compressed using CCITT G4 compression for bitonal images.

10 EMANI Göttingen: Thomas Fischer 22.-23.11.2002 10 Presentation formats I  T E X is not well suited for the presentation of mathematical articles on the net: Requirements of additional files like stylesheets Special fonts necessary IBM techexplorer Hypermedia Browser shows possibilities and limitations  T E X files have to be processed on the server side and delivered in a unified format. Possible options are DVI, PS, PDF and DjVu. Since DVI-viewer usually only exist in a T E X environment, and almost the same holds for GhostView for reading PS on-screen, PDF or DjVu have to be considered.DjVu

11 EMANI Göttingen: Thomas Fischer 22.-23.11.2002 11 Presentation formats II  Image files from scanning are usually very large, so would create a heavy load when sent via the net.  Image files have to be processed on the server side and delivered in a unified format. This may be different depending on desired resolution on the client’s side, e.g. for viewing onscreen or (high) quality printing.  Possible options are JPEG, PDF and DjVu.DjVu

12 EMANI Göttingen: Thomas Fischer 22.-23.11.2002 12 Cooperation: Exchange of Metadata  Metadata come in different format, Springer uses the MAJOUR-header DTD (European Workgroup on SGML) in different version (note: this is fairly complicated, rignal documentation has 151 pages).  This header is presented in SGML mark-up and/or RDF syntax.  Both can be technically imported into an envisaged EMANI system.  The compatibility of the metadata schemes has to be studied (richness and availability of data compared with the emerging EMANI scheme).

13 EMANI Göttingen: Thomas Fischer 22.-23.11.2002 13 MAJOUR-header: SGML Springer-Verlag Berlin Heidelberg 211 Numerische Mathematik Numer. Math. 0029-599X NUMMA7 85 3 0000134 Original article 343 366 Springer-Verlag Berlin Heidelberg 2000

14 EMANI Göttingen: Thomas Fischer 22.-23.11.2002 14 MAJOUR-header: RDF 0100227 On the dual of complex Ol'shanski\uı semigroups EN <rdfs:label rdf:parseType="Literal">English

15 EMANI Göttingen: Thomas Fischer 22.-23.11.2002 15 Cooperation: Import of Articles  Import of articles in PDF format: Quality check: appropriate resolution, scalabilty, printable? Need standards for handling PDF  Import of articles in T E X format: Check necessary additional files Create appropriate container for all files referring to one article Create structure to manage general additional files.

16 EMANI Göttingen: Thomas Fischer 22.-23.11.2002 16 T E X files from Springer TeX needs friendly environment: Document Class: svjour 2001/10/17 LaTeX document class for Springer journals - version 1.9 Class Springer-SVJour Warning: Specified option or subpackage "leqno" not found - on input line 92. ! Class Springer-SVJour Error: No valid journal specified in option list. See the Springer-SVJour class documentation for explanation. Type H for immediate help.... l.93...ournal specified in option list}{} ? ) ) No pages of output. produced output after installation of svjour.cls svnummat.clo TOTAL00.NUM But: references missing!

17 EMANI Göttingen: Thomas Fischer 22.-23.11.2002 17 Cooperation  Internal: NSF/DFG: Access to Mathematical Literature over Time Jahrbuch Projekt Mathematical Monographs...  External (?) DML NUMDAM Elsevier?...

18 22.-23.11.2002EMANI Göttingen18 Thank you for your attention!


Download ppt "22.-23.11.2002EMANI Göttingen1 Data Formats in Mathematics EMANI and DML EMANI Meeting Göttingen, 22.-23.11.2002 Dr. Thomas Fischer Metadaten und Datenbanken."

Similar presentations


Ads by Google