Use of XML in the Publications Office: Critical issues for publishing Dr. Holger Bagola Publications Office DIR/R 5 “IT Projects” section “Formats & Linguistic.

Slides:



Advertisements
Similar presentations
CSCI N241: Fundamentals of Web Design Copyright ©2004 Department of Computer & Information Science Introducing XHTML: Module B: HTML to XHTML.
Advertisements

DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
XML: Extensible Markup Language
XML/EDI Overview West Chester Electronic Commerce Resource Center (ECRC)
XHTML Basics.
 Fundamentals of Web Design.  Describe the history and theory of XHTML  Understand the rules for creating valid XHTML documents  Apply a DTD to an.
History Leading to XHTML
ConsLeg The consolidation of the European legislation Dr. Holger Bagola Publications Office Dir A – Methods and development ‘Formats’
XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen
TC3 Meeting in Montreal (Montreal/Secretariat)6 page 1 of 10 Structure and purpose of IEC ISO - IEC Specifications for Document Management.
XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
Upgrading to XHTML DECO 3001 Tutorial 1 – Part 1 Presented by Ji Soo Yoon 19 February 2004 Slides adopted from
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
Introduction to XML This material is based heavily on the tutorial by the same name at
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
(C) 2013 Logrus International Practical Visualization of ITS 2.0 Categories for Real World Localization Process Part of the Multilingual Web-LT Program.
DHTML. What is DHTML?  DHTML is the combination of several built-in browser features in fourth generation browsers that enable a web page to be more.
Formex XML Two years after introduction Dr. Holger Bagola Publications Office Directorate A ‘OJ and Access to Legislation’ ‘Methodology and development’
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Pervasive e-commerce with XML Babak Esfandiari Carleton University Ottawa, Canada.
Selenium Web Test Tool Training Using Ruby Language Discover the automating power of Selenium Kavin School Kavin School Presents: Presented by: Kangeyan.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Introduction to XML Eugenia Fernandez IUPUI. What is XML? From the World Wide Web Consortium (W3C) The Extensible Markup Language (XML) is the universal.
CISC 3140 (CIS 20.2) Design & Implementation of Software Application II Instructor : M. Meyer Address: Course Page:
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
FIGIS’ML Hands-on training - © FAO/FIGIS An introduction to XML Objectives : –what is XML? –XML and HTML –XML documents structure well-formedness.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
EXtensible Markup Language (XML) and Documentation --ManojBokil -- Manoj Bokil.
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XML.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
R. Addie & S. Dekeyser XML for M&C / USQ ? What ? Why ? How ? When ?
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
XML Introduction. Markup Language A markup language must specify What markup is allowed What markup is required How markup is to be distinguished from.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
XML Engr. Faisal ur Rehman CE-105T Spring Definition XML-EXTENSIBLE MARKUP LANGUAGE: provides a format for describing data. Facilitates the Precise.
XML Steve Fisher/RAL. 20 October 2000XML - Steve Fisher/RAL2 Warning Information may not be all completely up to date.
Introduction to Markup Languages January 31, 2002.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
Document Computing Technologies for Managing Electronic Document Collections Ross Wilkinson... [et al.] Circulation Counter [RES3H] ZA4080.D
Representing data with XML SE-2030 Dr. Mark L. Hornick 1.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
Linda Schmandt Structured Text & XML in Medicine 16 Jan 2004.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
XML Extensible Markup Language
XML Introduction to XML Extensible Markup Language.
XML BASICS and more…. What is XML? In common:  XML is a standard, simple, self-describing way of encoding both text and data so that content can be processed.
Unit 4 Representing Web Data: XML
XML QUESTIONS AND ANSWERS
XML in Web Technologies
Chapter 7 Representing Web Data: XML
Creating an XML Document
Introducing HTML & XHTML:
XML Data Introduction, Well-formed XML.
XML Problems and Solutions
CSE591: Data Mining by H. Liu
Presentation transcript:

Use of XML in the Publications Office: Critical issues for publishing Dr. Holger Bagola Publications Office DIR/R 5 “IT Projects” section “Formats & Linguistic Informatics” October 2006

Use of XML in the Publications Office 2 History From SGML to XML Structure of publications in Formex Streamlining of models Current status of Formex Particular needs for publishing Conclusion

Use of XML in the Publications Office 3 History From SGML to XML Structure of publications in Formex Streamlining of models Current status of Formex Particular needs for publishing Conclusion

Use of XML in the Publications Office 4 History In the 70ies more and more publication procedures were supported by computer applications. No common standard for applications in the context of publishing Publishing houses were confronted by a large variety of formats.

Use of XML in the Publications Office 5 History A considerable amount of documents published in the Official Journal can be totally of partially re-used for the publications of other documents. As the electronic formats of published documents were not standardized, it was impossible to install convenient procedures.

Use of XML in the Publications Office 6 History First information published on SGML as a future standard for the exchange of documents in the early 80ies Main advantages of the approach: –Independence from any application or operating platform –Description of logical document structure instead of presentation

Use of XML in the Publications Office 7 History In 1982 the Publications Office decided to define a format for the exchange of published documents: Formex (Format for the exchange of electronic publications).

Use of XML in the Publications Office 8 History Publication of Formex specifications in 1984/1985 Formex part of the framework contract for OJ publications in : Adoption of the SGML standard by ISO (ISO 8879)

Use of XML in the Publications Office 9 History BUT... There was not a real support of the format on the market (parsers, editors, etc.). The approach seemed to be rather exotic for printing houses which were used to the presentation of documents. The quality of delivered SGML documents was rather poor.

Use of XML in the Publications Office 10 History Revision and partial redesign of Formex Addition of a basic table model Formex 2 was easier to understand by the framework contractors. Better quality, but still insufficient for publication: impossible to derive the document presentation from the rough description of the document structure.

Use of XML in the Publications Office 11 History Total redesign of Formex specifications –Implementation of more flexible table model –Integration of metadata into the SGML document structure –Finer granularity and distinct elements for description of document structure (possibility of deriving presentation from structure

Use of XML in the Publications Office 12 History Rather complex specification which needed an intensive validation of the deliveries.

Use of XML in the Publications Office 13 History Since 1998: XML as a new, but compatible standard was adopted by W3C. XML was immediately accompanied by additional standards which supported the navigation and transformation of documents. A new standard for the specification of XML grammars was adopted in 2001: XML Schema

Use of XML in the Publications Office 14 History In 2001 the Publications Office organized a Formex user meeting to discuss about future development of the approach. The main results of this meeting were: –Migration to XML for which various tools were on the market (partly as open source) –Replacement of the DTD methodology for specifying XML grammars by XML Schema

Use of XML in the Publications Office 15 History From SGML to XML Structure of publications in Formex Streamlining of models Current status of Formex Particular needs for publishing Conclusion

Use of XML in the Publications Office 16 From SGML to XML Revision of approach in order to define a grammar which meets the needs of printing houses without abandoning the description of the logical document structure Definition of a table model based on the HTML model (keeping logical relations and functions in attributes)

Use of XML in the Publications Office 17 From SGML to XML Abandon of parallel models: distinction made by context analysis Replacement of character encoding based on ISO 2022 by Unicode (UTF- 8, the default for XML instances) All documents contain a reference to the Formex schema on the web:

Use of XML in the Publications Office 18 From SGML to XML Distinction of up to four levels of a publication Definition of rules for automatic validation of Formex instances beyond parsing Development of a comparison tool for the contents of Formex instances with corresponding PDF instances Automatic extraction of metadata for updating of EUR-Lex

Use of XML in the Publications Office 19 From SGML to XML The XML based version of the Formex 4 specifications entered into force on May 1 st,2004. The current release is 3.00.

Use of XML in the Publications Office 20 History From SGML to XML Structure of publications in Formex Streamlining of models Current status of Formex Particular needs for publishing Conclusion

Use of XML in the Publications Office 21 Structure of publications in Formex Formex instances concern OJ publications only (L and C series) Other publications are possible, but currently not realized

Use of XML in the Publications Office 22 Structure of publications in Formex Description of publication structure: –Description of structure and composition of the publication stricto sensu –Description of structure and composition of a document –Contents of document and sub-documents –Non-XML parts or fragments of documents

Use of XML in the Publications Office 23 Structure of publications in Formex Publication Description of logical structure and composi- tion References to documents Document References to main and sub-docu- ments Document References to main and sub-docu- ments Main document Sub- document Main document Non-XML instance

Use of XML in the Publications Office 24 Structure of publications in Formex In order to keep a minimum of metadata information together with the contents of a document some of the corresponding items are present on various levels. All sub-levels contain references to the superior hierarchical level (except for non-XML instances).

Use of XML in the Publications Office 25 History From SGML to XML Structure of publications in Formex Streamlining of models Current status of Formex Particular needs for publishing Conclusion

Use of XML in the Publications Office 26 Streamlining of models Whenever a Formex 3 element could appear in various contexts distinct elements were created. Thus there were parallel models such as TI.DOC, TI.ANNEX, TI.GRSEQ etc. These elements were grouped together, the context expressing the distinct functions.

Use of XML in the Publications Office 27 Streamlining of models Old ACT/TI.DOC ANNEX/TI.ANNEX GR.SEQ/TI.GRSEQ New ACT/TITLE ANNEX/TITLE GR.SEQ/TITLE TITLE[parent::ACT] TITLE[parent::ANNEX] TITLE[parent::GR.SEQ]

Use of XML in the Publications Office 28 Streamlining of models Old table model The table model in Formex 1-3 was a logical one, distinguishing between the column and line headings and the body. The body could easily be identified and copied to another linguistic version.

Use of XML in the Publications Office 29 Streamlining of models Old table model Empty cells were not present in old instances. Attributes expressed the relation between cells and columns.

Use of XML in the Publications Office 30 Streamlining of models New table model Top-down model for headings and body. Attributes express the distinct function of a specific cell. Empty cells are present containing a special attribute which explicitely confirms the absence of any contents.

Use of XML in the Publications Office 31 History From SGML to XML Structure of publications in Formex Streamlining of models Current status of Formex Particular needs for publishing Conclusion

Use of XML in the Publications Office 32 Current status of Formex Formex 4 is totally W3C Schema based. It is in use since May Minor changes were integrated (release 3.0) All OJ (L and C) documents are covered. Further document types (not published in OJ) will be taken into account.

Use of XML in the Publications Office 33 Current status of Formex Specification, documentation of all elements, physical specification, examples (> 600) publicly available on web-site:

Use of XML in the Publications Office 34 Current status of Formex Availability of Formex via the LegisWrite interface XML instances are not (yet?) publicly accessible Different quality levels according to validation

Use of XML in the Publications Office 35 Current status of Formex Printing house CERES Quality 1Quality 2Quality 3 Automatic validation Manual validation EUDOR LegisWrite Interface Conversion to LW Client

Use of XML in the Publications Office 36 History From SGML to XML Structure of publications in Formex Streamlining of models Current status of Formex Particular needs for publishing Conclusion

Use of XML in the Publications Office 37 Particular needs for publishing Publishing mostly concerns the presentation of documents in a readable form. A “good” logical XML model allows for the derivation of the presentation of a given document. Printing houses are obliged to work with Formex instances along the production processes.

Use of XML in the Publications Office 38 Particular needs for publishing Some parts of a document (words, parts of a sentence) require a specific presentation which is not always logical. Specific elements for text highlighting and presentation had to be created. Ex. Foreign words in some language versions in italics.

Use of XML in the Publications Office 39 Particular needs for publishing Quotation marks differ from one language version to the other. Exceptions for the use on nested levels require the presence of the specific symbols.

Use of XML in the Publications Office 40 Particular needs for publishing For special cases the printing houses are allowed to use temporary additional markup (processing instructions, elements from other namespaces). In most cases this kind of information depends on the publishing system.

Use of XML in the Publications Office 41 Particular needs for publishing All this additional information has to be deleted before sending the electronic version of the publication. For the design of new elements the relation to presentation has to be analyzed. In most cases it has to be assured to guarantee the correct identification of the new element.

Use of XML in the Publications Office 42 Particular needs for publishing Conversion into other electronic formats requires similar measures. Regular derivations are –Presentation in the Official Journal –Presentation in LegisWrite –Presentation in HTML

Use of XML in the Publications Office 43 Particular needs for publishing Formex (XML) instance Format “Official Journal” (PDF) Format “LegisWrite” (RTF) Format “EUR-Lex” (HTML)

Use of XML in the Publications Office 44 History From SGML to XML Structure of publications in Formex Streamlining of models Current status of Formex Particular needs for publishing Conclusion

Use of XML in the Publications Office 45 Conclusion Since the beginnings Formex is a common exchange format which is independent from any application or platform. Clear character encoding in all versions

Use of XML in the Publications Office 46 Conclusion Availability of tools on the market for XML based instances: –RXP for validating DTD parsing –XSV for validating XML Schema parsing –XMLSpy for development (+ Saxon) –XMetal for content editing –renderX for generation of PDF

Use of XML in the Publications Office 47 Conclusion Stylesheets (based XSL FO) for presentation Future enhancements: –Better integration of other source formats (RTF/LegisWrite) –Addition of other document types not necessarily related to the Official Journal