Unit no. 4 Mark-up Adolf Knoll National Library of the Czech Republic

Slides:



Advertisements
Similar presentations
CSCI N241: Fundamentals of Web Design Copyright ©2004 Department of Computer & Information Science Introducing XHTML: Module B: HTML to XHTML.
Advertisements

Delivering textual resources. Overview Getting the text ready – decisions & costs Structures for delivery Full text Marked-up Image and text Indexed How.
DIGITIZATION OF RARE LIBRARY MATERIALS Metadata -Introduction Mark-up © Adolf Knoll, National Library of the Czech Republic.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
METS: An Introduction Structuring Digital Content.
WeB application development
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 13-1 COS 346 Day 24.
1 eVenzia Technologies Learning HTML, XHTML & CSS Chapter 1.
History Leading to XHTML
3 November 2008CIS 340 # 1 Topics To define XML as a technology To place XML in the context of system architectures.
XHTML1 Building Document Structure. XHTML2 Objectives In this chapter, you will: Learn how to create Extensible Hypertext Markup Language (XHTML) documents.
Creating a Well-Formed Valid Document. 2 Objectives Introducing XHTML Creating a Well-Formed Document Creating a Valid Document Creating an XHTML Document.
XML Introduction What is XML –XML is the eXtensible Markup Language –Became a W3C Recommendation in 1998 –Tag-based syntax, like HTML –You get to make.
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
Upgrading to XHTML DECO 3001 Tutorial 1 – Part 1 Presented by Ji Soo Yoon 19 February 2004 Slides adopted from
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
Introduction to XML This material is based heavily on the tutorial by the same name at
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
DIGITIZATION OF RARE LIBRARY MATERIALS Metadata Format Access to Digital Documents © Adolf Knoll, National Library of the Czech Republic.
Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.
Chapter 12 Creating and Using XML Documents HTML5 AND CSS Seventh Edition.
Tutorial 3: XML Creating a Valid XML Document. 2 Creating a Valid Document You validate documents to make certain necessary elements are never omitted.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Pemrograman Berbasis WEB XML part 2 -Aurelio Rahmadian- Sumber: w3cschools.com.
Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information Engineering Da-Yeh University.
EAD: A Technical Introduction Julie Hardesty, Metadata Analyst June 3, 2014.
XP Tutorial 9New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
XML modelling Adolf Knoll National Library of the Czech Republic
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
XHTML1 Building Document Structure Chapter 2. XHTML2 Objectives In this chapter, you will: Learn how to create Extensible Hypertext Markup Language (XHTML)
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
CP2022 Multimedia Internet Communication1 HTML and Hypertext The workings of the web Lecture 7.
XHTML. Introduction to XHTML What Is XHTML? – XHTML stands for EXtensible HyperText Markup Language – XHTML is almost identical to HTML 4.01 – XHTML is.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
FIGIS’ML Hands-on training - © FAO/FIGIS An introduction to XML Objectives : –what is XML? –XML and HTML –XML documents structure well-formedness.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
E0262 – MIS – Multimedia Storage Techniques XML (Extensible Markup Language  XML is a markup language for creating documents containing structured information.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
CA Professional Web Site Development Class 2: Anatomy of a Web Site and Web Page & Intro to HTML.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
Tutorial 13 Validating Documents with Schemas
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Metadata Metadata Mark-up and Management © Adolf Knoll, National Library of the Czech Republic.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
Introduction to XML XML – Extensible Markup Language.
HTML Basics. HTML Coding HTML Hypertext markup language The code used to create web pages.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
HTML HYPER TEXT MARKUP LANGUAGE. INTRODUCTION Normal text” surrounded by bracketed tags that tell browsers how to display web pages Pages end with “.htm”
Lifecycle Metadata for Digital Objects October 2, 2006 Implementing Metadata in XML.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
XML Extensible Markup Language
XML Schema – XSLT Week 8 Web site:
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
WEBSITE DESIGN Chp 1
New Perspectives on XML
HyperText Markup Language
Presentation transcript:

Unit no. 4 Mark-up Adolf Knoll National Library of the Czech Republic

Learning objectives After the completion of this unit the learner will be able to:  Understand what to do with the digital output for further use  Understand the basics of the mark-up languages, especially XML  Have a basic orientation in their application to be able to make correct decisions for building a digitization project

Production of a digital document Digital document Original document Digitization Description Data Metadata

What do we produce? Data  direct product of digitization: digital images, full text, video & audio files  usually a set of files that represent the original document Metadata  added value through textual information  they express:  identification with the original  structure and links to data files  technical information about data  accessibility  administrative matters  etc.

Mark-up Created because of a need to store additional (hidden) information in text in order to:  better format it when displayed and/or printed = prescriptive mark-up  classify parts of it as objects relevant to various rules of description such as cataloguing rules, rules of providing technical parameters, various good practices, rules of associating them with their visual representation, etc. = descriptive mark-up

Mark-up  For example, in MS Word the paragraph is marked with a ¶  In the HTML code the paragraph is marked with paragraph  In the HTML code the paragraph is marked with paragraph  In HTML the bold text or the break of the line are marked as follows: This is an HTML document, which consists of elements. of elements.  All this is procedural (prescriptive) mark-up. Mind the use of <> brackets to start with and end with the marked-up element. The paragraph is marked with ¶ Paragraph¶

Objects  The markup marks:  OBJECTS  Which objects?  THOSE, WHICH WE DEFINE AS OBJECTS  On which basis do we define them?  On the basis of CERTAIN RULES  How the rules are establish?  On the basis of an agreement; they are usually a written (even published) document specifying the objects that should be followed and described. Examples: AACR2 Cataloguing Rules in libraries, ISBD rules, CDWA or AMICO description rules for museum objects, Data Dictionary for Still Digital Images, etc.  The description rules do not define how the objects are marked up – this is done via a mark-up formal language  The most sophisticated mark-up approach is SGML

General markup language SGML  Standard Generalized Markup Language (ISO standard from 1986) is the base for other derived approaches that may be called mark-up languages of the 2nd generation:  HTML (prescriptive)  TEI  …  XML (descriptive) The markup language marks the object without assigning any kind of behaviour to it. Its behaviour is prescribed by an independent rule.

How does it work?  the main construction unit of an SGML-based mark-up approach is called ELEMENT  each element must be defined by an external content descriptive rule; e.g. a cataloguing rule (AACR2 or another one) defines the element Title; it may also define the sub-elements such as Main Title, Parallel Title, or Sub-Title, etc.  it results there may be hierarchical relationships between elements (parents with children)

How to define the metadata standard?  We need formal rules to express the content descriptive standards  In SGML environment, this is done in the Document Type Definition (DTD)  DTD can, among others, do the following:  List all the elements and set up their properties (mandatory, non-mandatory, repeatable etc.)  Define relations between elements  Refine their attributes, e.g. through a list of permitted values  Point from them to external entitities, i.e. other definitions or binary data, e.g. digital images

If we take as example that we need a description element author, then: Formal rule for display of the element author formal definition of the element author Content definition of the element author description rules / e.g., AACR2 rules for formal definition / e.g., DTD rules of transformation for display / e.g., XSLT for XML is given by In this way, we work in XML

XML eXtensible Markup Language XML file *.xml It contains the reference to the DTD that controls it It can contain the reference to the transformation rule that formats it for display, e.g. a XSLT file DTD *.dtd DTD for XML is still written in SGML syntax; therefore, a W3C Schema has been introduced to replace it. Like this, a document can be controlled either by a DTD (*.dtd) or by a Schema (*.xsd). *.xslt

DTD = Document Type Definition  The basic construction piece is ELEMENT  ELEMENT can have a content or it can be EMPTY  ELEMENTS can consist of other elements

Here the element Title consists of a group of three elements (MainTitle, SubTitle, and ParallelTitle); from them only the MainTitle is mandatory, SubTitle and ParallelTitle are not, while ParallelTitle can be repeatable. In a DTD it is written like this:

The element PageRepresentation enables to link the concrete page with the image or full text that represent it. <!ATTLIST MonographPage Type (Advertisement | BackCover | BackEndSheet | Blank | FlyLeaf | FrontCover | FrontEndSheet | Index | ListOfIllustrations | ListOfMaps | ListOfTables | NormalPage | Spine | Table | TableOfContents | TitlePage) "NormalPage" > <!ATTLIST PageImage href CDATA #REQUIRED > <!ATTLIST PageText href CDATA #REQUIRED > To note: we can also set up a list of attributes; here these are Type of the MonographPage or href, i.e. reference to external data entity.

<!ATTLIST MonographPage Type (Advertisement | BackCover | BackEndSheet | Blank | FlyLeaf | FrontCover | FrontEndSheet | Index | ListOfIllustrations | ListOfMaps | ListOfTables | NormalPage | Spine | Table | TableOfContents | TitlePage) "NormalPage" > The above part of a DTD means this: The element MonographPage consists of the elements PageNumber, Notes and PageRepresentation. We classify the MonographPage in relationship to its content into the Types such as Advertisement, BackCover, …, TableOfContents, and TitlePage. We have set up the defaulf value as NormalPage, because we expect this will be the most frequent choice. The meaning of the qualifying signs is as follows: Element - lack of sign = the element is mandatory and it occurs only once Element+ - the sign + = the element is mandatory and occurs at least once Element? - the sign ? = the element is not mandatory and it can occur only once Element* - the sign * = the element is not mandatory and it occurs at least once

<!ATTLIST PageImage href CDATA #REQUIRED > <!ATTLIST PageText href CDATA #REQUIRED > Each element that does not consist of any further elements must be defined, too. The expression (#PCDATA) announces that in the XML files written on the basis of this DTD, an analyzable string of metadata is expected, here, for example, a page number like this 221 The sign | in (PageImage | PageText) indicates that only one of the two elements is applied for the concrete PageRepresentation. The philosophy of this DTD shows that in case of the page representation both by image and text, each of them is attached to a new PageRepresentation. The ATTLIST (list of attributes) sets up the href attribute as a reference/navigation link to non-analyzable external data (CDATA). The elements PageImage and PageText are empty as they serve only to link the page to the image or full text files.

2 List of publications of U. Eco at Bompiani This is a concrete section from an XML file, where we can see that the reference is made to the image in GIF format located in the Data subdirectory. We can also see that it is the page no. 2 of the Type Flyleaf. For more understanding, we will now make a simple project whose aim is to write a DTD for the document we may need in a project of digitization of old postcards. The steps are: analysis of the document, establishment of needed elements and their relationships, setup of the element linking to digitized images, writing the DTD, writing an XML file based on the DTD, and its display. The aim is to show how it is done, not to teach everything as it requires a more thourough XML training course.

How to write a simple DTD? 1. Analyze well the object you wish to describe and represent 2. Try to establish the necessary elements for description and their basic properties (mandatory yes/no, repeatable yes/no) 3. Try to define whether these elements will consist of other elements 4. Establish from which elements the visual image files will be referenced to

Digitized postcard  Root element: PostcardDescription  Elements of the 2 nd level:  author (consists of surname and name elements)  title  theme  publisher (consists of PlaceOfPublication, NameOfPublisher, DateOfPublication)  PhysicalDescription (consists of Size and Technique elements)  TypeOfDocument  VisualRepresentation (consists of ImageOfRectoPart and ImageOfVersoPart elements)  language  annotation The necessary elements and hierarchies for a DTD of a Digitized Postcard

They can be represented by this graph

<!ATTLIST ImageOfRectoPart (preview | normal | excellent) #REQUIRED CDATA #REQUIRED > <!ATTLIST ImageOfVersoPart (preview | normal | excellent) #REQUIRED CDATA #REQUIRED > Postcard.dtd

Lyer Antonín Hronov views of streets Nádražní ulice Dvorská ulice Jiráskova ulice Náměstí Hronov Karel Šefelín [1910] 9x13 cm colour printing postcard cz The postcard was sent by my great grand-mother to her husband, who was in military service in first years of the World War I. Postcard.xml Reference to a formatting stylesheet Reference to image files

How does it work in a web browser?  When we click on the xml file:  The browser will look for the formatting file (stylesheet – the *.xslt file) and will call it  It will display the file following the prescribed rules  We can click on the links leading to images that represent the postcard visually and we will be navigated to them  So, let’s try it and click on the file Postcard.xml Postcard.xml

XML Conclusions  The language enables to define and control any type of descriptions  It can relate them to the outer data  It makes the structure of the digitized documents clear and readable for the long term  It enables that the output of our work (production of XML files and digitized documents) corresponds with what we defined we wished to do  It means that for example our Digital Library can be fed by correct and standardized documents that enable, among others, also their long-term digital preservation

Work with XML  From the user perspective a good digitization project develops XML editors that:  make the work easy (filling forms)  check the validity against the applied DTD  output only correct XML structures  If you wish to check your forces, dowload the free M-TOOL from the Manuscriptorium Digital Library free tools at and try to work with it

Where to find more? General  (XML Home)  (Technical Introduction to XML)  (XMLSpy editor) Applied  (several DTDs implemented in functioning digital libraries)  (METS format for containerization of XML-based digital documents)  (TEI – Text Encoding Initiative)