22.-23.11.2002EMANI Göttingen1 Data Formats in Mathematics EMANI and DML EMANI Meeting Göttingen, 22.-23.11.2002 Dr. Thomas Fischer Metadaten und Datenbanken.

Slides:



Advertisements
Similar presentations
Repository models and policies for preservation Steve Hitchcock Preserv Project Intelligence Agents Multimedia Group, School of Electronics and Computer.
Advertisements

Delivering textual resources. Overview Getting the text ready – decisions & costs Structures for delivery Full text Marked-up Image and text Indexed How.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
METS: An Introduction Structuring Digital Content.
PubMed Central Mahyar Ahmadpour-B. Kowsar Publicatin Corp. Kowsar Editorial Meeting 1 September 19th, 2013 Tehran, Iran.
Selecting Preservation Strategies for Web Archives Stephan Strodl, Andreas Rauber Department of Software.
Extraction of text data and hyperlink structure from scanned images of mathematical journals Ann Arbor, March 19, 2002 Masakazu Suzuki (Kyushu University)
ETD 2003, Berlin 1 LaTeX as an Archiving Format: Benefits and Problems Experiences from the MathDiss International Project and the EMANI project.
R.Jantz, August 31, Two-day forum on PREMIS Preservation Metadata and the Trusted Digital Repositories August 31, September 1 National Library of.
Contents and Formats Existing Digital Sources Gertraud Griepke Cornell University, July 26th 2002.
WMES3103 : INFORMATION RETRIEVAL
1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation.
Content Management Systems Digital Resources for Research in the Humanities 2001.
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
Overview of Search Engines
Software and Multimedia
A closer look Dynamic Webpages Jessica Meyerson March 1, 2011.
Create and Manage METS in retrodigitization Markus Enders Goettingen State and University Library
Document Delivery Formats for the Web and Legal Digital Collections Kevin Reiss June 18 th, 2004 Law Library Rutgers-Newark School of Law.
Archiving Techniques Frank Klaproth EMANI – Project Meeting February 14 th - 16 th, 2002 Springer-Verlag Heidelberg Göttingen State and University Library.
HTML 5 New Standardization of HTML. I NTRODUCTION HTML5 is The New HTML Standard, New Elements New Attributes Full CSS3 Support Video and Audio 2D/3D.
Sem 1 v2 Chapter 14: Layer 6 - The Presentation layer.
© Tanner, KCL 2007 How do I decide if JPEG 2000 is for me? Choosing standards when there are so many… Simon Tanner Director.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Open Textbooks and Electronic Publishing Formats/Standards Arctic Virtual Learnng Tools
Institute of Technology Sligo - Dept of Computing Sem 1 Chapter 14: Layer 6 - The Presentation layer.
Presentation SUB Prof. Dr. Elmar Mittler EMANI – Project Meeting February 14 th - 16 th, 2002 Springer-Verlag Heidelberg Göttingen State and University.
What Agencies Should Know About PDF/A September 20, 2005 Susan J. Sullivan, CRM
Lakeland Click arrow to advance show. Click on the “A” under “Listed By Name.” (“A” for Academic Search Database)
Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,
CHAPTER TEN AUTHORING.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere
Quality Levels of Reproduction Adolf Knoll National Library of the Czech Republic.
Section 8.1 Create a custom theme Design a color scheme Use shared borders Section 8.2 Identify types of graphics Identify and compare graphic formats.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
International Seminary on Digitisation: Experience and Technology 11 th May 2004 | National Library | Lisbon – Portugal DIGITAL ARCHIVE OF PORTUGUESE ART.
Digitization Programmes National Library of the Czech Republic Adolf Knoll
Digital Image Capture of Musical Scores Jenn Riley, Indiana University Digital Library Program Ichiro Fujinaga, McGill University.
ITGS Databases.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Digitization of mathematical editions in Serbia Žarko Mijajlović 1 Zoran Ognjanović 2 Aleksandar Pejović 3.
The New DRS Introduction. What is DRS? Digital repository for preservation and access – Maintains integrity of deposited content – Preserves content for.
Corporation For National Research Initiatives Technical Issues in Electronic Publishing Corporation for National Research Initiatives William Y. Arms.
Web Design and Development. World Wide Web  World Wide Web (WWW or W3), collection of globally distributed text and multimedia documents and files 
File Formats in the Context of Archiving Dr. Thomas Fischer EMANI – Project Meeting February 14 th - 16 th, 2002 Springer-Verlag Heidelberg Göttingen State.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
MULTIMEDIA Multimedia is the field concerned with the computer- controlled integration of text, graphics, drawings, still and moving images (Video), animation,
From Access to Archive Transforming Scholars Portal into an E-Journal Archive.
Layer 6 Presentation Layer. Overview Now that you have learned about Layer 5 of the OSI model, it is time to look at Layer 6, the presentation layer.
Learning Aim C.  Once the website is complete, you should test it using the test plan you created at the design stage.
A Beginner’s Guide to Preserving Digital Resources in Historic Environment Records Catherine Hardman and Kieron Niven Archaeology Data Service.
21 October 2000 MathML & Math on the Web Illinois D-Lib Testbed: Technologies for Converting Legacy Mathematics for Display on the Web Timothy W. Cole.
Section 8.1 Section 8.2 Create a custom theme Design a color scheme
Joint Meeting of CSUL Committees,
Improving Braille accessibility and personalization on Internet
TOPICS Information Representation Characters and Images
Markup Languages Gilok Choi 9/17/2018
Pre-Production Determine the overall purpose of the project.
Software and Multimedia
Software and Multimedia
DIGITAL LIBRARY.
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
Building an Online Store
Ann Arbor, March 19, 2002 Masakazu Suzuki (Kyushu University)
Presentation transcript:

EMANI Göttingen1 Data Formats in Mathematics EMANI and DML EMANI Meeting Göttingen, Dr. Thomas Fischer Metadaten und Datenbanken SUB Göttingen

EMANI Göttingen: Thomas Fischer Overview  Basis Situation  Purposes of formats  Formats for purposes Text formats for archiving Text formats for retrieval Image formats for archiving Presentation formats: text and images  Co-operation and compatibility Import of data Coordination

EMANI Göttingen: Thomas Fischer Basis Situation Archiving for presentation:  Preserve original appearance of documents for clients of electronic journals over (long) time.  Archive documents in a fashion independent of software and hardware to minimize problems of mingration. Is there a possibility to unify the procedures of electronic publishing and archiving/presentation of this material.

EMANI Göttingen: Thomas Fischer Purposes of Formats I  In the EMANI Project, we suggest to collect a vast corpus of data from different sources: Retrodigitized material (different formats of images) Digital material (different formats of text and images) Multimedia-type material (interactive, video …) Programs  This material comes in different formats, because there is no single format that would serve the needs of producing this material.  To integrate this material into one collection, standardization will be extremely valuable.

EMANI Göttingen: Thomas Fischer Purposes of Formats II A closer look:  Retrodigitization: This process produces images. Requirements come from usage and administration. The participants can usually decide on the format.  Digital material: The majority of mathematics text is written in T E X these days, but articles may include images (e.g. EPS). There is material which is not produced in T E X (e.g. editorials) or where the sources are no longer available. Some text- processing formats (Word, WordPerfect) and presentation formats (PS, PDF) are common.  Programs should be archived as source code (essentially ASCII). Compiled programs cause migration problems.  Multimedia: Not considered for now (e.g. videos from fractal images).

EMANI Göttingen: Thomas Fischer Formats for Purposes In the context of archiving mathematics, different formats are needed for different purposes:  Archiving: A stable representation of content and form of the article. The format should be independent of proprietary software and insensitive to minor errors.  Retrieval: Metadata and probably full textual representation of the contents. Formulas are still an open question (much more complex than in Chemistry).open question Chemistry  Presentation: The presentation of the material should be as true as possible to the original “look and feel”. If should be rendered by standard agents and not require special programs (beyond simple plug-ins) on the client’s side. Special measures may be necessary for the visually impaired.

EMANI Göttingen: Thomas Fischer Text formats for archiving  The best-suited formats for archiving are mark-up formats like T E X or MathML. There is some progress in the conversion from MathML to T E X and vice versa.MathML to T E X vice versa  For documents in T E X, a suitable environment is necessary for correct rendering. This needs documentation and archiving of the respective additional files (stylesheets, fonts etc.)  Included images will come in different formats, usually EPS, but PDF and T E X-defined graphics are possible. It is not clear to me how to handle these. The format of the images is not necessarily obvious.

EMANI Göttingen: Thomas Fischer Retrieval formats  For metadata, a scheme (application profile) is needed (another work package)  For retrieval using full text, an additional textual layer may to be added to the text file (unless some T E X based full text search mechanism becomes available). This textual layer should be stored in a normalized format, the one provided by the Text Encoding Initiative (TEI) might be useful (mark-up of structural information).  The alternative is an integrated search engine which provide access to the data by storing relevant information in it own database removed from the original data (like Google does for internet files)

EMANI Göttingen: Thomas Fischer Image Formats for Archiving  High quality images are necessary for further pcosessing of the data: conversion to other formats, OCR, printing.  For management of the files, the possible inclusion of metadata is extremely desirable.  If the images are to be archived in compressed form, the compression algorithm should be lossless and free of copyrights.  This points to 600 dpi TIFF as standard format, compressed using CCITT G4 compression for bitonal images.

EMANI Göttingen: Thomas Fischer Presentation formats I  T E X is not well suited for the presentation of mathematical articles on the net: Requirements of additional files like stylesheets Special fonts necessary IBM techexplorer Hypermedia Browser shows possibilities and limitations  T E X files have to be processed on the server side and delivered in a unified format. Possible options are DVI, PS, PDF and DjVu. Since DVI-viewer usually only exist in a T E X environment, and almost the same holds for GhostView for reading PS on-screen, PDF or DjVu have to be considered.DjVu

EMANI Göttingen: Thomas Fischer Presentation formats II  Image files from scanning are usually very large, so would create a heavy load when sent via the net.  Image files have to be processed on the server side and delivered in a unified format. This may be different depending on desired resolution on the client’s side, e.g. for viewing onscreen or (high) quality printing.  Possible options are JPEG, PDF and DjVu.DjVu

EMANI Göttingen: Thomas Fischer Cooperation: Exchange of Metadata  Metadata come in different format, Springer uses the MAJOUR-header DTD (European Workgroup on SGML) in different version (note: this is fairly complicated, rignal documentation has 151 pages).  This header is presented in SGML mark-up and/or RDF syntax.  Both can be technically imported into an envisaged EMANI system.  The compatibility of the metadata schemes has to be studied (richness and availability of data compared with the emerging EMANI scheme).

EMANI Göttingen: Thomas Fischer MAJOUR-header: SGML Springer-Verlag Berlin Heidelberg 211 Numerische Mathematik Numer. Math X NUMMA Original article Springer-Verlag Berlin Heidelberg 2000

EMANI Göttingen: Thomas Fischer MAJOUR-header: RDF On the dual of complex Ol'shanski\uı semigroups EN <rdfs:label rdf:parseType="Literal">English

EMANI Göttingen: Thomas Fischer Cooperation: Import of Articles  Import of articles in PDF format: Quality check: appropriate resolution, scalabilty, printable? Need standards for handling PDF  Import of articles in T E X format: Check necessary additional files Create appropriate container for all files referring to one article Create structure to manage general additional files.

EMANI Göttingen: Thomas Fischer T E X files from Springer TeX needs friendly environment: Document Class: svjour 2001/10/17 LaTeX document class for Springer journals - version 1.9 Class Springer-SVJour Warning: Specified option or subpackage "leqno" not found - on input line 92. ! Class Springer-SVJour Error: No valid journal specified in option list. See the Springer-SVJour class documentation for explanation. Type H for immediate help.... l.93...ournal specified in option list}{} ? ) ) No pages of output. produced output after installation of svjour.cls svnummat.clo TOTAL00.NUM But: references missing!

EMANI Göttingen: Thomas Fischer Cooperation  Internal: NSF/DFG: Access to Mathematical Literature over Time Jahrbuch Projekt Mathematical Monographs...  External (?) DML NUMDAM Elsevier?...

EMANI Göttingen18 Thank you for your attention!